New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create using Woodwork with Dask guide #304
Conversation
thehomebrewnerd
commented
Oct 22, 2020
- Create using Woodwork with Dask guide
- Partially closes Create a guide that documents limitations or special considerations when using Dask/Koalas #272
Codecov Report
@@ Coverage Diff @@
## dask-inference #304 +/- ##
================================================
Coverage 100.00% 100.00%
================================================
Files 24 24
Lines 2682 2682
================================================
Hits 2682 2682 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just minor things. Looks good overall and helpful.
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Will bring Dask dataframe into memory\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the comment because it kind of looks weird in the code snipped. We already mentioned it above. You could bold or quote above if you want to highlight it.
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Will bring Dask dataframe into memory\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above.
] | ||
}, | ||
{ | ||
"cell_type": "code", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Now that we have a Dask dataframe, we can use this dataframe to create a Woodwork DataTable, just as we would with a pandas DataFrame:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this sentence, we have:
- Dask dataframe
- Woodwork DataTable
- pandas DataFrame
which doesn't seem consistent from a capitalization standpoint. I think we usually capitalize Woodwork and Dask, but not pandas--is there a reason for that?
With DataFrame/DataTable, it's probably not as clear what we want to do. I think I've been trying to stick to pascal case when actually talking about the object and not capitalizing when referring to it as a word, but I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've typically adopted the conventions from the respective libraries. If you look at the pandas documentation, they do not capitalize pandas, whereas Dask does in their documentation.
Regarding DataFrame/DataTable/dataframe - I will review my usage of this as I haven't been too consistent, but my thoughts are similar to yours. If we are referring to a specific object or implementation - like Dask DataFrame, we should use pascal case, but if we are referring to a more general representation that is not tied to a specific implementation, we should follow normal capitalization rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes a lot of sense to follow each library's conventions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went through everything and tried to clean up the capitalization. Hopefully I caught everything.
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Using Woodwork with Dask Dataframes\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dataframes
-> DataFrames
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
* dask logical type inference * add dask to requirements.txt * lint fix * refactor inference tests * more test refactors * update release notes * update test fixtures * update conftest.py * Various updates for Dask validation and testing (#260) * updates for dask validation * update release notes * comment out unused code * update time index test for dask * parameterize test series and fix tests * lint fix * add rows to sample_df and fix head * Skip Ordinal order validation for Dask dataframes (#270) * updates for dask validation * update release notes * comment out unused code * update time index test for dask * parameterize test series and fix tests * lint fix * fix ordinal test with Dask * update release notes * fix accidental change in conftest.py * fix changelog * fix changelog * Better dask datetime support (#286) * better dask datetime support * update release notes * remove test xfail * Test numeric time index with Dask input (#288) * parameterize numeric time index tests * parameterize additional tests * update numeric time index test * update pop method to work with Dask * Update DataTable.describe to work with Dask (#296) * update describe to work with Dask * update release notes * improve describe performance with Dask * fix release notes * Update DataTable.get_mutual_information() to work with Dask (#300) * update mi to work with Dask * update release notes * update test fixture * Create using Woodwork with Dask guide (#304) * add using Woodwork with Dask docs guide * update release notes * remove comments and empty cell * update dataframe wording * remove empty cell * fix title capitalization * update release notes * fix merge issues * remove unused args from to_pandas util func