Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-5921] Add bulk_load_custom to MySqlHook #6575

Merged

Conversation

feluelle
Copy link
Member

Make sure you have checked all steps below.

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"
    • https://issues.apache.org/jira/browse/AIRFLOW-5921
    • In case you are fixing a typo in the documentation you can prepend your commit with [AIRFLOW-XXX], code changes always need a Jira issue.
    • In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal (AIP).
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Description

  • Here are some details about my PR, including screenshots of any UI changes:

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

@codecov-io
Copy link

codecov-io commented Nov 13, 2019

Codecov Report

Merging #6575 into master will decrease coverage by 0.32%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6575      +/-   ##
==========================================
- Coverage   83.83%   83.51%   -0.33%     
==========================================
  Files         651      651              
  Lines       37431    37437       +6     
==========================================
- Hits        31379    31264     -115     
- Misses       6052     6173     +121
Impacted Files Coverage Δ
airflow/hooks/mysql_hook.py 93.42% <100%> (+0.56%) ⬆️
airflow/kubernetes/volume_mount.py 44.44% <0%> (-55.56%) ⬇️
airflow/kubernetes/volume.py 52.94% <0%> (-47.06%) ⬇️
airflow/kubernetes/pod_launcher.py 45.25% <0%> (-46.72%) ⬇️
airflow/kubernetes/refresh_config.py 50.98% <0%> (-23.53%) ⬇️
...rflow/contrib/operators/kubernetes_pod_operator.py 77.14% <0%> (-21.43%) ⬇️
airflow/configuration.py 89.13% <0%> (-3.63%) ⬇️
airflow/utils/dag_processing.py 58.15% <0%> (-0.33%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update baae140...1805099. Read the comment docs.

Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@OmerJog
Copy link
Contributor

OmerJog commented Nov 14, 2019

@feluelle to my understanding bulk load is default to not be supported since MySQL 8 due to security reasons :
https://dev.mysql.com/doc/refman/8.0/en/load-data-local.html

We also have jira about it https://issues.apache.org/jira/browse/AIRFLOW-4051

@feluelle
Copy link
Member Author

@feluelle to my understanding bulk load is default to not be supported since MySQL 8 due to security reasons

That's true it is not enabled by default. But you can enable this by setting local_infile (for mysqlclient) or allow_local_infile (for mysql-connector-python) in your connection.

@feluelle
Copy link
Member Author

Yes, exactly what you wrote in your jira ticket.

@OmerJog
Copy link
Contributor

OmerJog commented Nov 14, 2019

@feluelle do you think it should be mentioned in the notes of the function?

@feluelle
Copy link
Member Author

I think I wouldn't do it because this can change again and again. It is really special to the mysql client library used. That's why the library is responsible for a good error message and I think it actually logs something useful :)

@feluelle
Copy link
Member Author

But I am not 100 percent sure. If you guys agree I can add something like:

.. warning:: According to the mysql docs using this function is a security risk.
        If you want to use it anyway you can do so by setting a client-side + server-side option.
        This depends on the mysql client library used. See its docs for more information.

WDYT?

@OmerJog
Copy link
Contributor

OmerJog commented Nov 14, 2019

WDYT?

You are correct.
If local_infile it will raise warning.

if not it will raise error:

ERROR 3950 (42000): Loading local data is disabled; this must be
enabled on both the client and server side

I think this PR also solves https://issues.apache.org/jira/browse/AIRFLOW-4051 but not https://issues.apache.org/jira/browse/AIRFLOW-4050

@feluelle
Copy link
Member Author

This PR does not solve 4051 - it is already solved isn't it?

@feluelle feluelle changed the title [AIRFLOW-5921] Add bulk_load_custom to MySqlHook [WIP][AIRFLOW-5921] Add bulk_load_custom to MySqlHook Nov 19, 2019
@feluelle feluelle force-pushed the feature/AIRFLOW-5921-mysql-hook-bulk_load_custom branch from 7a91fbf to 1805099 Compare November 19, 2019 11:05
@feluelle feluelle requested a review from kaxil November 19, 2019 11:06
@feluelle
Copy link
Member Author

@OmerJog PTAL

@feluelle feluelle changed the title [WIP][AIRFLOW-5921] Add bulk_load_custom to MySqlHook [AIRFLOW-5921] Add bulk_load_custom to MySqlHook Nov 19, 2019

def test_bulk_load_custom(self):
self.db_hook.bulk_load_custom(
'table',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'table',
table='table',

What do you think about setting the params explicitly? Makes it more clear I think.

Copy link
Contributor

@OmerJog OmerJog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kaxil kaxil merged commit 4be0879 into apache:master Nov 20, 2019
kaxil pushed a commit that referenced this pull request Nov 20, 2019
eladkal pushed a commit to eladkal/airflow that referenced this pull request Dec 2, 2019
kaxil pushed a commit that referenced this pull request Dec 12, 2019
@feluelle feluelle deleted the feature/AIRFLOW-5921-mysql-hook-bulk_load_custom branch April 30, 2020 07:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants