Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-2412] Fix HiveCliHook.load_file to address HIVE-10541 #3327

Closed
wants to merge 1 commit into from

Conversation

sekikn
Copy link
Contributor

@sekikn sekikn commented May 8, 2018

Make sure you have checked all steps below.

JIRA

  • My PR addresses the following Airflow JIRA issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"

Description

  • Here are some details about my PR, including screenshots of any UI changes:

HiveCliHook.load_file doesn't actually execute
LOAD DATA statement via beeline bundled with
Hive under 2.0 due to HIVE-10541.
This PR provides a workaround for this problem.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

This PR adds the following tests. In addition, I confirmed the problem described above is resolved using a real Hadoop cluster with Hive 1.2.1.

  • TestHiveCliHook.test_run_cli
  • TestHiveCliHook.test_load_file

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added.

Code Quality

  • Passes git diff upstream/master -u -- "*.py" | flake8 --diff

HiveCliHook.load_file doesn't actually execute
LOAD DATA statement via beeline bundled with
Hive under 2.0 due to HIVE-10541.
This PR provides a workaround for this problem.
@codecov-io
Copy link

Codecov Report

Merging #3327 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3327      +/-   ##
==========================================
- Coverage   75.89%   75.88%   -0.01%     
==========================================
  Files         197      197              
  Lines       14729    14730       +1     
==========================================
  Hits        11178    11178              
- Misses       3551     3552       +1
Impacted Files Coverage Δ
airflow/hooks/hive_hooks.py 57.92% <100%> (+0.12%) ⬆️
airflow/models.py 86.64% <0%> (-0.05%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b9eb52c...7ec74bd. Read the comment docs.

@Fokko
Copy link
Contributor

Fokko commented May 8, 2018

@gglanzani Can you take a look? I'm not a hive expert

@gglanzani
Copy link
Contributor

@Fokko This seem as a pretty innocent PR to me, adding only a newline at the end of the SQL.

The beeline bug is indeed real and, although it has been addressed in newer versions of beeline, it still affects most clusters.

Since this doesn't functionally change anything, I'd +1 it.

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sekikn and @gglanzani

LGTM, merging to master and v1-10

@asfgit asfgit closed this in baf15e1 May 8, 2018
asfgit pushed a commit that referenced this pull request May 8, 2018
HiveCliHook.load_file doesn't actually execute
LOAD DATA statement via beeline bundled with
Hive under 2.0 due to HIVE-10541.
This PR provides a workaround for this problem.

Closes #3327 from sekikn/AIRFLOW-2412

(cherry picked from commit baf15e1)
Signed-off-by: Fokko Driesprong <fokkodriesprong@godatadriven.com>
@sekikn
Copy link
Contributor Author

sekikn commented May 8, 2018

@Fokko @gglanzani Thanks!

@sekikn sekikn deleted the AIRFLOW-2412 branch May 8, 2018 10:26
aliceabe pushed a commit to aliceabe/incubator-airflow that referenced this pull request Jan 3, 2019
HiveCliHook.load_file doesn't actually execute
LOAD DATA statement via beeline bundled with
Hive under 2.0 due to HIVE-10541.
This PR provides a workaround for this problem.

Closes apache#3327 from sekikn/AIRFLOW-2412
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants