Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support delimiters in hive csv upload #9971

Merged

Conversation

serenajiang
Copy link
Contributor

@serenajiang serenajiang commented Jun 3, 2020

SUMMARY

Previously, hive csv upload only supported , as a delimiter. This is confusing for users because the UI allows you to specify a delimiter.

Altered to allow other delimiters.

TEST PLAN

Tested csv + tsv uploads.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

@etr2460 @bkyryliuk @john-bodley

@serenajiang serenajiang changed the title [csv upload][hive] support other delimiters feat: support regex delimiters in hive csv upload Jun 3, 2020
@codecov-commenter
Copy link

codecov-commenter commented Jun 3, 2020

Codecov Report

Merging #9971 into master will decrease coverage by 0.00%.
The diff coverage is 25.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9971      +/-   ##
==========================================
- Coverage   68.90%   68.89%   -0.01%     
==========================================
  Files         584      584              
  Lines       31055    31056       +1     
  Branches     3180     3180              
==========================================
- Hits        21397    21396       -1     
- Misses       9549     9551       +2     
  Partials      109      109              
Flag Coverage Δ
#cypress 53.91% <ø> (-0.01%) ⬇️
#javascript 59.48% <ø> (ø)
#python 67.34% <25.00%> (-0.01%) ⬇️
Impacted Files Coverage Δ
superset/db_engine_specs/hive.py 48.27% <25.00%> (-0.21%) ⬇️
.../src/dashboard/components/gridComponents/Chart.jsx 88.76% <0.00%> (-1.13%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d1588c7...b10fb8a. Read the comment docs.

@serenajiang serenajiang changed the title feat: support regex delimiters in hive csv upload feat: support delimiters in hive csv upload Jun 5, 2020
@pull-request-size pull-request-size bot added size/S and removed size/XS labels Jun 5, 2020
@serenajiang serenajiang force-pushed the serena-hive-csv-upload-delimiters branch from 4d93b9c to 26201b4 Compare June 5, 2020 20:30
sql = f"""CREATE TABLE {str(table)} ( {schema_definition} )
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS
TEXTFILE LOCATION '{location}'
sql = text(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note text is used for language agnostic :params.

engine.execute(sql)
engine.execute(
sql,
delim=csv_to_df_kwargs['sep'].encode().decode('unicode_escape'),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required to escape \\t as \t.

@serenajiang
Copy link
Contributor Author

@bkyryliuk Used params as suggested. Switched back to the original row format (single character delimiter) because unicode made it hard to use complex delimiters.

@serenajiang
Copy link
Contributor Author

ping @bkyryliuk

Copy link
Member

@bkyryliuk bkyryliuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serenajiang looks good, please fix the linter

@serenajiang serenajiang force-pushed the serena-hive-csv-upload-delimiters branch from d6a0571 to b10fb8a Compare June 10, 2020 23:53
@bkyryliuk bkyryliuk merged commit 8744dad into apache:master Jun 11, 2020
auxten pushed a commit to auxten/incubator-superset that referenced this pull request Nov 20, 2020
Co-authored-by: serena-jiang <serena.jiang@airbnb.com>
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.37.0 labels Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/S 🚢 0.37.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants