-
Notifications
You must be signed in to change notification settings - Fork 90
Unpin pandas version
#1708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unpin pandas version
#1708
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1708 +/- ##
=======================================
Coverage 100.0% 100.0%
=======================================
Files 273 273
Lines 22356 22356
=======================================
Hits 22350 22350
Misses 6 6
Continue to review full report at Codecov.
|
bchen1116
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding this correctly, pandas' value_counts method now first orders them by frequency (in descending order), then to break ties, it takes the order of occurrence into account?
Does that mean in your example in the description, currency_NAD, currency_HTG, and currency_PAB occur as frequently as currency_MOP, currency_MUR, currency_NIS in the new encoding, but that the first 3 occur AFTER the second 3 in the original input data?
Just wanted to check I am understanding this properly.
|
@bchen1116 Correct! |
bchen1116
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin got it! looks good to me then!
Closes #1618
This change will break our tests that use an older version aka break our conda package build 📦 🤯
A lot of the test changes have to do with how pandas now handles values returned in
value_counts(). Now, if two values occur the same number of times, it is stored in the order that it was in the original data. Ex: [5, 1, 6, 5, 1] will be [5, 1, 6] :DRE prediction explanation tests: confirmed by checking the transformed cols from the OHE that this is the reason for the changes. The new columns are:
Notably,
currency_NAD,currency_HTG, andcurrency_PABexist.Using pandas 1.1.5, we get:
This aligns with the reason why the tests failed, as the columns chosen for the OHE have changed (due to behavior of
value_countschanging)