Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pandas and fix load_fraud() #486

Merged
merged 17 commits into from Mar 13, 2020
Merged

Update pandas and fix load_fraud() #486

merged 17 commits into from Mar 13, 2020

Conversation

jeremyliweishih
Copy link
Contributor

@jeremyliweishih jeremyliweishih commented Mar 12, 2020

Fixes #322.

@jeremyliweishih jeremyliweishih changed the title Update pandas and fix load_fraud() [WIP] Update pandas and fix load_fraud() Mar 12, 2020
requirements.txt Outdated
@@ -2,7 +2,7 @@ scipy>=1.2.1
scikit-learn>=0.21.3,!=0.22
dask[complete]>=2.1.0
Copy link
Contributor Author

@jeremyliweishih jeremyliweishih Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If tests pass we should think about removing Dask since i believe load_data() requires it currently.

Copy link
Collaborator

@dsherry dsherry Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'm working on that in #315

Copy link
Collaborator

@dsherry dsherry Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like we'll try to do that in this PR. Awesome!

@jeremyliweishih jeremyliweishih self-assigned this Mar 12, 2020

labels = [label] + (drop or [])
y = feature_matrix[label]
X = feature_matrix.drop(columns=labels)
Copy link
Collaborator

@dsherry dsherry Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremyliweishih nice, you beat me to it RE #315 !

Just for clarity, are these changes needed to support pandas 1.0.0? Or is this just something you wanted to do? I'm on board either way, although technically if it were the latter, it should be in a separate PR.

Copy link
Contributor Author

@jeremyliweishih jeremyliweishih Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we support pandas 1.0.0 it introduces a future warning when importing evalml since Dask hasn't properly silenced it yet. I think it should be in this PR!

Copy link
Collaborator

@dsherry dsherry Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, cool, then yeah I'm on board with deleting the dask dependency in requirements.txt too!

@@ -7,6 +7,7 @@ Changelog
* Fixes
* Changes
* Undo version cap in XGBoost placed in :pr:`402` and allowed all released of XGBoost :pr:`407`
* Remove version cap on Pandas :pr:`486`
Copy link
Collaborator

@dsherry dsherry Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could say Support pandas 1.0.0 instead?

@codecov
Copy link

codecov bot commented Mar 12, 2020

Codecov Report

Merging #486 into master will increase coverage by 0.19%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #486      +/-   ##
==========================================
+ Coverage   98.22%   98.42%   +0.19%     
==========================================
  Files         104      104              
  Lines        3437     3427      -10     
==========================================
- Hits         3376     3373       -3     
+ Misses         61       54       -7
Impacted Files Coverage Δ
...tests/guardrail_tests/test_detect_label_leakage.py 100% <ø> (ø) ⬆️
evalml/guardrails/utils.py 95.91% <100%> (-0.09%) ⬇️
evalml/preprocessing/utils.py 100% <100%> (+16.27%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ed21d9...3f101d7. Read the comment docs.

@jeremyliweishih jeremyliweishih changed the title [WIP] Update pandas and fix load_fraud() Update pandas and fix load_fraud() Mar 12, 2020
@jeremyliweishih jeremyliweishih requested review from dsherry and kmax12 Mar 12, 2020
labels = [label] + (drop or [])
y = feature_matrix[label].compute()
X = feature_matrix.drop(labels=labels, axis=1).compute()
feature_matrix = pd.read_csv(path, index_col=index, nrows=n_rows, **kwargs)
Copy link
Contributor

@kmax12 kmax12 Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, we used dask to support loading globs. i think it's fine to remove that, but we could update the doc string above for path

Copy link
Contributor Author

@jeremyliweishih jeremyliweishih Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it, I'll update the docstring!

Copy link
Contributor

@kmax12 kmax12 Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that we no long support files. It has to be a single file

labels = [label] + (drop or [])
y = feature_matrix[label].compute()
X = feature_matrix.drop(labels=labels, axis=1).compute()
feature_matrix = pd.read_csv(path, index_col=index, nrows=n_rows, **kwargs)
Copy link
Contributor

@kmax12 kmax12 Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that we no long support files. It has to be a single file

@jeremyliweishih jeremyliweishih requested a review from kmax12 Mar 12, 2020
kmax12
kmax12 approved these changes Mar 12, 2020
@@ -1,8 +1,7 @@
scipy>=1.2.1
scikit-learn>=0.21.3,!=0.22
dask[complete]>=2.1.0
Copy link
Collaborator

@dsherry dsherry Mar 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Copy link
Collaborator

@dsherry dsherry left a comment

Awesome!

@jeremyliweishih jeremyliweishih merged commit 14b48cc into master Mar 13, 2020
2 checks passed
@dsherry dsherry deleted the 322_pandas branch Oct 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unit test fails after upgrading to pandas 1.0.0
3 participants