Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix csv upload schema_name bug #5940

Merged
merged 1 commit into from
Sep 20, 2018

Conversation

timifasubaa
Copy link
Contributor

@timifasubaa timifasubaa commented Sep 20, 2018

This PR fixes a few bugs in the upload CSV to hive flow. Most notably in the specification of the schema in the create table statement. Previously, we would accidentally reset the table_name variable back to just the table name after appending the schema. This PR fixes that.

@john-bodley
cc @graceguo-supercat

@timifasubaa timifasubaa force-pushed the fix_csv_schema_bug branch 3 times, most recently from 8f75ecc to 3a730be Compare September 20, 2018 07:14
@timifasubaa timifasubaa changed the title fix csv upload bugs fix csv upload schema_name bug Sep 20, 2018

hive_table_schema = Table(upload_path).infer()
column_name_and_type = []
for column_info in hive_table_schema['fields']:
column_name_and_type.append(
'{} {}'.format(
"'" + column_info['name'] + "'",
'`' + column_info['name'] + '`',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these backticks be included in the format statement to improve readabily?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

upload_path = config['UPLOAD_FOLDER'] + \
secure_filename(form.csv_file.data.filename)
secure_filename(filename)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change this line? It seems like filename is merely used as a variable alias and only appears once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used once more and the variable name is long. This looks cleaner to me

table_name = form.name.data
schema_name = form.schema.data
full_table_name = ''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed given it’s guaranteed to be defined in the if/else block and is scoped for the whole of the function.


filename = form.csv_file.data.filename
bucket_path = config['CSV_TO_HIVE_UPLOAD_S3_BUCKET']
full_table_name = '{}.{}'.format(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this is outside the scope of this PR but frontend form validation to prevent a period in the table name and enforcing a schema seems like a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I will follow up with adding frontend validation

filename = form.csv_file.data.filename
bucket_path = config['CSV_TO_HIVE_UPLOAD_S3_BUCKET']
full_table_name = '{}.{}'.format(
schema_name, table_name) if schema_name else table_name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the schema_name is None and the table_name doesn’t include a period? Does this simply create the table in the default schema? Again I sense improving the form validation could simplify the logic and improve the UX.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. I will work on improving the ux with some form validation

@codecov-io
Copy link

codecov-io commented Sep 20, 2018

Codecov Report

Merging #5940 into master will increase coverage by <.01%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #5940      +/-   ##
=========================================
+ Coverage   48.19%   48.2%   +<.01%     
=========================================
  Files         393     393              
  Lines       23600   23597       -3     
  Branches     2630    2630              
=========================================
  Hits        11375   11375              
+ Misses      12225   12222       -3
Impacted Files Coverage Δ
superset/db_engine_specs.py 55.7% <0%> (+0.24%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 71f014e...d5ddb55. Read the comment docs.

@timifasubaa timifasubaa merged commit 00c4c7e into apache:master Sep 20, 2018
timifasubaa added a commit to airbnb/superset-fork that referenced this pull request Sep 20, 2018
(cherry picked from commit 00c4c7e)
timifasubaa added a commit to airbnb/superset-fork that referenced this pull request Sep 20, 2018
(cherry picked from commit 00c4c7e)
betodealmeida pushed a commit to lyft/incubator-superset that referenced this pull request Oct 12, 2018
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.28.0 labels Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.28.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants