Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Expand regex to account for INFO: being prepended to logs #421

Merged
merged 1 commit into from

2 participants

@danielhfrank

In hadoop.py, HADOOP_JOB_TIMESTAMP_RE is used to parse output from hadoop to determine the job's timestamp and step number. In my usage, I found that "INFO: " was being prepended to hadoop's log output (possibly due to my own failure to set up the logs correctly), and the regex was failing to pick the line up. This resulted in mrjob raising an error despite the job apparently running fine otherwise.

I simply added an optional check for "INFO: " in this regex. It's possible that something more generic would be desirable, but this seems to fix things for me and I don't think it would screw things up for anyone else.

(fwiw, I am using hadoop 0.20.2 and python 2.5, but the original regex seemed to fail with python 2.7 as well)

@irskep
Collaborator

Thanks, and you added the r so the escapes are correct! Dave: recommend I pull this into the release_033 branch.

@irskep irskep merged commit dcfb30b into Yelp:master
@irskep
Collaborator

Merged, thanks!

@danielhfrank

sweet!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
This page is out of date. Refresh to see the latest.
Showing with 1 addition and 1 deletion.
  1. +1 −1  mrjob/hadoop.py
View
2  mrjob/hadoop.py
@@ -61,7 +61,7 @@
# used to extract the job timestamp from stderr
HADOOP_JOB_TIMESTAMP_RE = re.compile(
- 'Running job: job_(?P<timestamp>\d+)_(?P<step_num>\d+)')
+ r'(INFO: )?Running job: job_(?P<timestamp>\d+)_(?P<step_num>\d+)')
# find version string in "Hadoop 0.20.203" etc.
HADOOP_VERSION_RE = re.compile(r'^.*?(?P<version>(\d|\.)+).*?$')
Something went wrong with that request. Please try again.