-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update pandas and fix load_fraud()
#486
Conversation
load_fraud()
load_fraud()
requirements.txt
Outdated
@@ -2,7 +2,7 @@ scipy>=1.2.1 | |||
scikit-learn>=0.21.3,!=0.22 | |||
dask[complete]>=2.1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If tests pass we should think about removing Dask since i believe load_data()
requires it currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I'm working on that in #315
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like we'll try to do that in this PR. Awesome!
|
||
labels = [label] + (drop or []) | ||
y = feature_matrix[label] | ||
X = feature_matrix.drop(columns=labels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeremyliweishih nice, you beat me to it RE #315 !
Just for clarity, are these changes needed to support pandas 1.0.0? Or is this just something you wanted to do? I'm on board either way, although technically if it were the latter, it should be in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we support pandas 1.0.0 it introduces a future warning when importing evalml since Dask hasn't properly silenced it yet. I think it should be in this PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, cool, then yeah I'm on board with deleting the dask dependency in requirements.txt too!
docs/source/changelog.rst
Outdated
@@ -7,6 +7,7 @@ Changelog | |||
* Fixes | |||
* Changes | |||
* Undo version cap in XGBoost placed in :pr:`402` and allowed all released of XGBoost :pr:`407` | |||
* Remove version cap on Pandas :pr:`486` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could say Support pandas 1.0.0
instead?
Codecov Report
@@ Coverage Diff @@
## master #486 +/- ##
==========================================
+ Coverage 98.22% 98.42% +0.19%
==========================================
Files 104 104
Lines 3437 3427 -10
==========================================
- Hits 3376 3373 -3
+ Misses 61 54 -7
Continue to review full report at Codecov.
|
load_fraud()
load_fraud()
labels = [label] + (drop or []) | ||
y = feature_matrix[label].compute() | ||
X = feature_matrix.drop(labels=labels, axis=1).compute() | ||
feature_matrix = pd.read_csv(path, index_col=index, nrows=n_rows, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, we used dask to support loading globs. i think it's fine to remove that, but we could update the doc string above for path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah got it, I'll update the docstring!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that we no long support files. It has to be a single file
labels = [label] + (drop or []) | ||
y = feature_matrix[label].compute() | ||
X = feature_matrix.drop(labels=labels, axis=1).compute() | ||
feature_matrix = pd.read_csv(path, index_col=index, nrows=n_rows, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that we no long support files. It has to be a single file
@@ -1,8 +1,7 @@ | |||
scipy>=1.2.1 | |||
scikit-learn>=0.21.3,!=0.22 | |||
dask[complete]>=2.1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
Fixes #322.