-
Notifications
You must be signed in to change notification settings - Fork 373
Add isapprox support for dataframes. #2169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| function Base.isapprox(df1::AbstractDataFrame, df2::AbstractDataFrame) | ||
| size(df1, 2) == size(df2, 2) || return false | ||
| isequal(index(df1), index(df2)) || return false | ||
| Matrix(df1) ≈ Matrix(df2) || return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the whole point of this PR is not do do conversion to Matrix as this is inefficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you suggest an alternative to matrix conversion ?
Should I align with the isequal approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. But it order to stay consistent with Base the norm should be calculated using the rules that are defined by LinearAlgebra/src/generic.jl:1588.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also probably it will be more complex than for isequal as we also have to handle missing efficiently which means that probably a barrier function would be useful there.
| Test for "approximate" equality between two DataFrames | ||
| """ | ||
|
|
||
| function Base.isapprox(df1::AbstractDataFrame, df2::AbstractDataFrame) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kwargs defined in isapprox in Base are missing here.
|
Also I think that for The reason is that this is why defining There is a corner case of columns like Also we should properly handle because the fast variant errors. In summary - if we do not cover all these corner cases in this PR I do not think it is worth to add it, because the user can then use standard Because of all these complexities (that require a very careful design) I have put "helpwanted" but not put "intro issue" in the original issue. |
Are you sure that's for performance? The fact that The question of how to handle non-numeric columns is interesting. We could use |
I was considering this but this is an unrelated thing and did not want to complicate this PR. It should be fixable in Base. In general
In Base
This is what I assumed.
This is also what I assumed, but to test these columns with As I have written above, if we do not give all these options then we do not have to add this method at all (mostly), as you can write: and use the fact that data frame fully supports broadcasting (so we hit a similar situation to (I say mostly, because then we compute elementwise norms which in some cases might be different, but in some - like for L-infinity norm - it will be the same) |
Yes, AFAICT that's the only efficient way of doing this operation anyway. I'm fine with the plan you propose. |
|
@Sov-trotter - we have settled with the design now I think, unless you have some comments on it. |
|
I think we can go with the implementation you suggest, meanwhile we should begin with a PR in |
|
Yes - these two things can probably be done in parallel. |
|
@Sov-trotter - hi, I just wanted to check with you if you still are willing to work on trying to finalize this PR? Thank you! |
|
Yeah. Extremely sorry for holding this up for too long, I am not really sure of what can be done here exactly? |
|
As discussed above:
except the comments above (i.e. non numeric columns are compared using |
|
Also this goes in |
|
Yes - I would just add |
|
Closing as #2373 is the same |
@bkamins Can you take a look at it?