New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add replace option to hive csv upload #9764
Conversation
superset/db_engine_specs/hive.py
Outdated
engine = cls.get_engine(database) | ||
|
||
if if_exists == "replace": | ||
engine.execute(f"DROP TABLE IF EXISTS {full_table_name}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks ripe for sql injection, but honestly, the existing lines below are as well. This is blocked on enabling the csv upload feature to a datasource, so that security might be ok for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a few ways around this, but for now I think we can leave these as-is. Perhaps add a TODO here so we can sweep them all up in one go later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is scary. Let's see if we can explore some of the possible solutions for this to get around adding another instance of SQL injection vulnerability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this might not be possible as per https://stackoverflow.com/a/43879809?
Codecov Report
@@ Coverage Diff @@
## master #9764 +/- ##
==========================================
- Coverage 70.81% 70.79% -0.02%
==========================================
Files 586 586
Lines 30445 30453 +8
Branches 3121 3121
==========================================
+ Hits 21559 21560 +1
- Misses 8772 8779 +7
Partials 114 114
Continue to review full report at Codecov.
|
8eba5dc
to
f10cc01
Compare
It would be great to add some tests for this. We see a lot of bugs crop up in db_engine_spec. |
f10cc01
to
a36f145
Compare
Codecov Report
@@ Coverage Diff @@
## master #9764 +/- ##
==========================================
- Coverage 70.41% 70.41% -0.01%
==========================================
Files 585 585
Lines 31056 31066 +10
Branches 3277 3277
==========================================
+ Hits 21869 21875 +6
- Misses 9076 9080 +4
Partials 111 111
Continue to review full report at Codecov.
|
a36f145
to
352ea06
Compare
352ea06
to
b0bca50
Compare
@john-bodley @villebro @willbarrett, this is working and ready for review now. Unfortunately, it doesn't look like sqlalchemy supports params for structural components of sql (as noted in the stack overflow comment) so I don't think there's anything i can do about this. I've also added one small test to hive_tests, but since this is all untested already (and relies on an s3 url and a bunch of other stuff) I'm not sure how else to add tests here |
# ensure table doesn't already exist | ||
if ( | ||
if_exists == "fail" | ||
and not database.get_df( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Pandas seems a little heavy handed to simply check if the number of records is non-zero, though I guess it's fewer lines than having to use a cursor etc. and this code is probably rarely executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Given that this feature is currently broken in many respects, I think this is a big improvement despite some shortcomings 👍
CATEGORY
Choose one
SUMMARY
Thanks to @villebro for helping start this PR.
Adds the replace option to hive csv uploads
TEST PLAN
CI
ADDITIONAL INFORMATION
REVIEWERS
to: @villebro @john-bodley @serenajiang