New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use OrderedSet to maintain DataFrame column order through set operations #108
Conversation
…on set operations This aims to maintain the original order when printing out the set of common columns as well as unique columns, both which are based on set operations.
compare = datacompy.Compare(df1, df2, ["join"]) | ||
assert list(compare.df1_unq_columns()) == ["f", "c"] | ||
assert list(compare.df2_unq_columns()) == ["e", "d"] | ||
assert list(compare.intersect_columns()) == ["join", "g", "b", "h", "a"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is intended to try to test edge cases (make sure the intersection is "stable" based on the left side), but doesn't necessarily reflect a realistic case. A realistic case is comparing two DataFrames where the columns are virtually identical with minimal differences, and it is in this context that maintaining the original DataFrame ordering actually becomes useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @gandhis1. @KrishanBhasin previous PR #107 partially fixes this correct. This is just a variation on the implementation from what I am seeing? In principal I'm fine with the changes, its more adding another dependency which I'd like to discuss. The library was last touched in June 2020. @jborchma @elzzhu thoughts?
@gandhis1 Would you be open to implementing this without an external dependency? Maybe a |
So we can copy in the source code here, which removes the PyPi package dependency. The library is under MIT license which as I understand means we just need a copyright and license notice included in the source file. The challenge with using
|
Revisiting this - any issue if I literally copy the original source code, which is a single file covered under the MIT license, into this repository? MIT license does not affect the license of any other code in this repository, as long as the original license is included with the file. |
@SanthiSridharan sorry for the extended delay on this. Been really busy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@fdosani @KrishanBhasin
Fixes #81. Iteration on #107. This aims to maintain the original order when printing out the set of common columns as well as unique columns, both which are based on set operations.