New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support nulls in the csv uploads #10208
feat: support nulls in the csv uploads #10208
Conversation
6de52fa
to
3e3927c
Compare
Codecov Report
@@ Coverage Diff @@
## master #10208 +/- ##
==========================================
- Coverage 65.70% 59.44% -6.26%
==========================================
Files 594 404 -190
Lines 31501 13107 -18394
Branches 3221 3221
==========================================
- Hits 20697 7792 -12905
+ Misses 10623 5134 -5489
Partials 181 181
Continue to review full report at Codecov.
|
4d3b949
to
0033d1a
Compare
0033d1a
to
d6c0a4b
Compare
{}, | ||
{"if_exists": "append"}, | ||
) | ||
def test_0_progress(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored to be pytest friendly
d6c0a4b
to
3201703
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly lgtm
Are integer columns affected by this change at all? I'm not sure how we deal with null integers in csv upload.
tblproperties.append(f"'serialization.null.format'='{null_values[0]}'") | ||
tblproperties_stmt = "" | ||
if tblproperties: | ||
tblproperties_stmt = f"tblproperties ({', '.join(tblproperties)})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we write this in some way that lets us use :params
for the table properties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue is that tblproperties is optional - and it makes it quite tricky to implement, I did not figure it out. Open to ideas here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could make this function return Tuple[text, Dict[str, str]]
. The code in the outer function would be something like
sql, params = cls.get_create_table_stmt(...)
...
engine.execute(sql, **params)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@serenajiang done
def _value(self) -> str: | ||
return json.dumps(self.data) | ||
|
||
def process_formdata(self, valuelist: List[str]) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious - what happens if the user passes in a malformed input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will not allow to submit the form, error message is quite cryptic, that's why description has examples
Thanks @serenajiang for the review |
@@ -15,7 +15,6 @@ | |||
# limitations under the License. | |||
# | |||
[pytest] | |||
addopts = -ra -q |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleaned it up to be able to use -vv flag for the verbose output
cdde17c
to
08143e0
Compare
Maybe this isn't easy/possible with WTForms, but is there a way we can structure this new input so that instead of passing in a JSON encoded array, we can add individual items and they'll render as individual inputs that then get parsed correctly by the backend? Something similar to the owner fields in the crud view: |
@etr2460 good suggestion, I've explored FieldList but wasn't able to get it work. Owners are built using many to many ORM mapping, it wouldn't fit this use case. Hopefully this form will be shortlived and migrated to react in some foreseeable future. |
08143e0
to
a225116
Compare
a225116
to
63d502a
Compare
63d502a
to
dba2d9b
Compare
superset/views/database/forms.py
Outdated
@@ -206,3 +210,13 @@ def at_least_one_schema_is_allowed(database: Database) -> bool: | |||
validators=[Optional()], | |||
widget=BS3TextFieldWidget(), | |||
) | |||
null_values = JsonListField( | |||
_("Null values"), | |||
default=config.get("CSV_DEFAULT_NA_NAMES", []), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be config["CSV_DEFAULT_NA_NAMES"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit, other than that LGTM. Also needs a rebase.
Refactor Add tests, and refactor them to be pytest friendly Use lowercase table names Ignore isort
dba2d9b
to
3881fac
Compare
thanks for the review! |
* Support more table properties for the hive upload Refactor Add tests, and refactor them to be pytest friendly Use lowercase table names Ignore isort * Use sql params Co-authored-by: bogdan kyryliuk <bogdankyryliuk@dropbox.com>
SUMMARY
There are use cases when users would prefer to treat empty strings in the csv as nulls in the database.
This PR makes the field to be configurable and bring more transparency to how it is handled.
A default value is set to be the same as pandas, it doesn't look pretty and probably has too many values, that's why I also made it configurable.
Default values:
In addition to the feature, I've refactored hive_tests.py to follow pytest standards, now they can be run directly via pytest command.
TEST PLAN
Hive testing screenshots
ADDITIONAL INFORMATION