Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Console output of MR jobs fails to properly update progress #323

Open
simleo opened this issue Jul 30, 2018 · 0 comments
Open

Console output of MR jobs fails to properly update progress #323

simleo opened this issue Jul 30, 2018 · 0 comments
Labels

Comments

@simleo
Copy link
Member

simleo commented Jul 30, 2018

This is a problem with our mapreduce version of the submitter. The original mapred submitter is unaffected.

The minimal setup is a map-only, java reader & writer app:

import pydoop.mapreduce.api as api
import pydoop.mapreduce.pipes as pipes

class Mapper(api.Mapper):
    def map(self, context):
        context.emit(context.key, len(context.value))

def __main__():
    pipes.run_task(pipes.Factory(mapper_class=Mapper))

Run this with only one mapper on a substantial amount of input (e.g., replicate examples/input/alice_1.txt 1000 times). Monitor the job on the console: with our mapreduce submitter, progrss will remain stuck at 0%, then jump to 100% right before the end of the job. With the mapred submitter, progress is gradually updated as expected.

Note that this was NOT fixed by #322.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant