Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-244] Modify hive operator to inject analysis data #1607

Conversation

paulbramsen
Copy link
Contributor

@paulbramsen paulbramsen commented Jun 17, 2016

Dear Airflow Maintainers,

Please accept this PR that addresses the following issues:

Testing Done:
Test dags were run as backfills on an Airbnb Airflow dev box.

This PR exposes task/dag id/run data through the HiveOperator for ingestion by performance analysis tools like Dr. Elephant.

As per discussion this supersedes #1594

@krishnap @artwr @aoen @mistercrunch

@paulbramsen paulbramsen force-pushed the paulbramsen/modify_hive_operator_to_inject_analysis_data branch from f759942 to 888ef8b Compare June 17, 2016 23:05
@codecov-io
Copy link

codecov-io commented Jun 17, 2016

Current coverage is 64.07%

Merging #1607 into master will decrease coverage by 0.10%

@@             master      #1607   diff @@
==========================================
  Files           120        121     +1   
  Lines          8443       8462    +19   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           5419       5422     +3   
- Misses         3024       3040    +16   
  Partials          0          0          

Powered by Codecov. Last updated by 54f1c11...ace79c0

@paulbramsen paulbramsen force-pushed the paulbramsen/modify_hive_operator_to_inject_analysis_data branch 2 times, most recently from 3dc61f3 to 3b32e4b Compare June 20, 2016 18:37
@@ -107,6 +115,10 @@ def run_cli(self, hql, schema=None, verbose=True):
if conn.password:
cmd_extra += ['-p', conn.password]

if hive_conf:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need the if

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait why not? If hive_conf is None won't this cause issues?

Copy link
Contributor

@aoen aoen Jun 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make hive_conf default to dict()/frozendict() (and pass in dict() instead of None in hive_operator.py)

@aoen
Copy link
Contributor

aoen commented Jun 20, 2016

LGTM once landscape check is green. Might get another +1 from Max just in case.

import unicodecsv as csv
import logging
import re
import subprocess
from frozendict import frozendict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need that, let's not bring a dep on an external lib just for this.

@paulbramsen paulbramsen force-pushed the paulbramsen/modify_hive_operator_to_inject_analysis_data branch 3 times, most recently from bfd79fd to ace79c0 Compare June 21, 2016 17:56
@aoen
Copy link
Contributor

aoen commented Jun 21, 2016

code-quality is a false-positive, merging.

@artwr
Copy link
Contributor

artwr commented Jun 21, 2016

A little unit test?

@asfgit asfgit closed this in 7f6d877 Jun 21, 2016
@paulbramsen
Copy link
Contributor Author

@artwr I'll add some tests in a new PR

@artwr
Copy link
Contributor

artwr commented Jun 21, 2016

Our code coverage thanks you :)

@paulbramsen paulbramsen deleted the paulbramsen/modify_hive_operator_to_inject_analysis_data branch June 24, 2016 21:14
aoen pushed a commit to aoen/incubator-airflow that referenced this pull request Aug 5, 2016
Testing Done:
Test dags were run as backfills on an Airbnb Airflow dev box.

This PR exposes task/dag id/run data through the HiveOperator for
ingestion by performance analysis tools like Dr. Elephant.

Closes apache#1607 from paulbramsen/paulbramsen/modify_hive_operator_to_inject_analysis_data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants