Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add replace option to hive csv upload #9764

Merged
merged 1 commit into from Jun 10, 2020

Conversation

etr2460
Copy link
Member

@etr2460 etr2460 commented May 7, 2020

CATEGORY

Choose one

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

Thanks to @villebro for helping start this PR.

Adds the replace option to hive csv uploads

TEST PLAN

CI

  • upload to a non existent table successfully with fail or replace selected
  • upload with append selected and see correct error message
  • upload to an existing table with fail selected and see correct error message
  • upload to an existing table with replace selected and see the table be replaced

ADDITIONAL INFORMATION

REVIEWERS

to: @villebro @john-bodley @serenajiang

engine = cls.get_engine(database)

if if_exists == "replace":
engine.execute(f"DROP TABLE IF EXISTS {full_table_name}")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks ripe for sql injection, but honestly, the existing lines below are as well. This is blocked on enabling the csv upload feature to a datasource, so that security might be ok for now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few ways around this, but for now I think we can leave these as-is. Perhaps add a TODO here so we can sweep them all up in one go later?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is scary. Let's see if we can explore some of the possible solutions for this to get around adding another instance of SQL injection vulnerability.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this might not be possible as per https://stackoverflow.com/a/43879809?

@codecov-io
Copy link

codecov-io commented May 7, 2020

Codecov Report

Merging #9764 into master will decrease coverage by 0.01%.
The diff coverage is 11.11%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9764      +/-   ##
==========================================
- Coverage   70.81%   70.79%   -0.02%     
==========================================
  Files         586      586              
  Lines       30445    30453       +8     
  Branches     3121     3121              
==========================================
+ Hits        21559    21560       +1     
- Misses       8772     8779       +7     
  Partials      114      114              
Flag Coverage Δ
#cypress 53.71% <ø> (-0.01%) ⬇️
#javascript 59.06% <ø> (ø)
#python 70.91% <11.11%> (-0.03%) ⬇️
Impacted Files Coverage Δ
superset/db_engine_specs/hive.py 45.00% <11.11%> (-1.13%) ⬇️
superset-frontend/src/SqlLab/actions/sqlLab.js 66.81% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5b430ea...269fb1b. Read the comment docs.

@etr2460 etr2460 force-pushed the erik-ritter--hive-csv-upload branch 2 times, most recently from 8eba5dc to f10cc01 Compare May 13, 2020 21:18
@willbarrett
Copy link
Member

It would be great to add some tests for this. We see a lot of bugs crop up in db_engine_spec.

@etr2460 etr2460 force-pushed the erik-ritter--hive-csv-upload branch from f10cc01 to a36f145 Compare June 9, 2020 17:26
@pull-request-size pull-request-size bot added size/M and removed size/S labels Jun 9, 2020
@codecov-commenter
Copy link

codecov-commenter commented Jun 9, 2020

Codecov Report

Merging #9764 into master will decrease coverage by 0.00%.
The diff coverage is 44.44%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9764      +/-   ##
==========================================
- Coverage   70.41%   70.41%   -0.01%     
==========================================
  Files         585      585              
  Lines       31056    31066      +10     
  Branches     3277     3277              
==========================================
+ Hits        21869    21875       +6     
- Misses       9076     9080       +4     
  Partials      111      111              
Flag Coverage Δ
#cypress 53.93% <ø> (ø)
#javascript 59.36% <ø> (ø)
#python 69.99% <44.44%> (-0.01%) ⬇️
Impacted Files Coverage Δ
superset/db_engine_specs/hive.py 48.48% <44.44%> (+0.05%) ⬆️
superset/viz.py 57.20% <0.00%> (+0.05%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 526ee3c...b0bca50. Read the comment docs.

@etr2460 etr2460 force-pushed the erik-ritter--hive-csv-upload branch from a36f145 to 352ea06 Compare June 9, 2020 19:09
@etr2460 etr2460 force-pushed the erik-ritter--hive-csv-upload branch from 352ea06 to b0bca50 Compare June 9, 2020 19:18
@etr2460
Copy link
Member Author

etr2460 commented Jun 9, 2020

@john-bodley @villebro @willbarrett, this is working and ready for review now.

Unfortunately, it doesn't look like sqlalchemy supports params for structural components of sql (as noted in the stack overflow comment) so I don't think there's anything i can do about this. I've also added one small test to hive_tests, but since this is all untested already (and relies on an s3 url and a bunch of other stuff) I'm not sure how else to add tests here

# ensure table doesn't already exist
if (
if_exists == "fail"
and not database.get_df(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Pandas seems a little heavy handed to simply check if the number of records is non-zero, though I guess it's fewer lines than having to use a cursor etc. and this code is probably rarely executed.

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Given that this feature is currently broken in many respects, I think this is a big improvement despite some shortcomings 👍

@etr2460 etr2460 merged commit e17da58 into apache:master Jun 10, 2020
auxten pushed a commit to auxten/incubator-superset that referenced this pull request Nov 20, 2020
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.37.0 labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/M 🚢 0.37.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants