Skip to content


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP


Expand regex to account for INFO: being prepended to logs #421

merged 1 commit into from

2 participants


In, HADOOP_JOB_TIMESTAMP_RE is used to parse output from hadoop to determine the job's timestamp and step number. In my usage, I found that "INFO: " was being prepended to hadoop's log output (possibly due to my own failure to set up the logs correctly), and the regex was failing to pick the line up. This resulted in mrjob raising an error despite the job apparently running fine otherwise.

I simply added an optional check for "INFO: " in this regex. It's possible that something more generic would be desirable, but this seems to fix things for me and I don't think it would screw things up for anyone else.

(fwiw, I am using hadoop 0.20.2 and python 2.5, but the original regex seemed to fail with python 2.7 as well)


Thanks, and you added the r so the escapes are correct! Dave: recommend I pull this into the release_033 branch.

@irskep irskep merged commit dcfb30b into Yelp:master

Merged, thanks!



Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
This page is out of date. Refresh to see the latest.
Showing with 1 addition and 1 deletion.
  1. +1 −1  mrjob/
2  mrjob/
@@ -61,7 +61,7 @@
# used to extract the job timestamp from stderr
- 'Running job: job_(?P<timestamp>\d+)_(?P<step_num>\d+)')
+ r'(INFO: )?Running job: job_(?P<timestamp>\d+)_(?P<step_num>\d+)')
# find version string in "Hadoop 0.20.203" etc.
HADOOP_VERSION_RE = re.compile(r'^.*?(?P<version>(\d|\.)+).*?$')
Something went wrong with that request. Please try again.