Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add 'is_' method to Index for identity checks #4909

Merged
merged 1 commit into from Sep 22, 2013

Conversation

jtratner
Copy link
Contributor

Closes #4859.

Adds a method that can quickly tell whether something is a view or not
(in the way that actually matters). Also changes checks in equals() to
use is_ rather than is.

@jreback
Copy link
Contributor

jreback commented Sep 21, 2013

worth it to push these into one of the mixin classes so that could be used on a series/frame? I guess in that case easier just to do if df1 is df2 (as the only way they will be equal if they are actually the same object with 2 diff references)

@jtratner
Copy link
Contributor Author

@jreback it's really easy to do with index, because they always go through __new__ or __array_finalize__ and they aren't mutable, so it's very simple to get working. Series/Frame are mutable and don't make sense for this...too likely to have an edge case. Could do it with blockmanager (maybe) but I don't think it's worth it for speed boost.

@jreback
Copy link
Contributor

jreback commented Sep 21, 2013

yep....as I was writing I was like...naw!

@jtratner
Copy link
Contributor Author

One upside to this: makes equals() and identical() much faster when you're comparing and it's the same object or a view.

@jreback
Copy link
Contributor

jreback commented Sep 21, 2013

so we need identical then? which is equiv to equals & is_?

@jreback
Copy link
Contributor

jreback commented Sep 21, 2013

I guess though could be identical but not a view...duh.....hmm

@cpcloud
Copy link
Member

cpcloud commented Sep 21, 2013

vbench results:

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
groupby_series_simple_cython                 |   4.3203 |   4.3933 |   0.9834 |
groupby_multi_cython                         |  14.0613 |  14.2470 |   0.9870 |
groupby_multi_different_functions            |  11.1340 |  11.2560 |   0.9892 |
groupby_frame_singlekey_integer              |   1.5600 |   1.5720 |   0.9924 |
groupby_multi_size                           |  24.0187 |  24.1320 |   0.9953 |
groupby_multi_different_numpy_functions      |   9.9383 |   9.9711 |   0.9967 |
groupby_multi_python                         | 115.3820 | 115.5843 |   0.9983 |
groupby_simple_compress_timing               |  20.6167 |  20.5903 |   1.0013 |
groupby_indices                              |   5.1444 |   5.1357 |   1.0017 |
groupby_last_float32                         |   2.7974 |   2.7874 |   1.0036 |
groupby_frame_median                         |   6.6326 |   6.5880 |   1.0068 |
groupby_multi_series_op                      |  12.7397 |  12.6499 |   1.0071 |
groupby_first_float32                        |   2.6823 |   2.6610 |   1.0080 |
groupby_last                                 |   2.8450 |   2.8173 |   1.0098 |
groupby_sum_booleans                         |   0.8566 |   0.8457 |   1.0130 |
groupby_first                                |   2.7236 |   2.6850 |   1.0144 |
groupby_pivot_table                          |  15.4137 |  15.1213 |   1.0193 |
groupby_transform                            | 127.4860 | 123.4310 |   1.0329 |
groupby_apply_dict_return                    |  29.0103 |  27.1187 |   1.0698 |
groupbym_frame_apply                         |  55.3214 |  45.1977 |   1.2240 |
groupby_frame_apply_overhead                 |  10.6590 |   8.5543 |   1.2460 |
groupby_frame_cython_many_columns            |   3.5886 |   2.7813 |   1.2903 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

@cpcloud
Copy link
Member

cpcloud commented Sep 21, 2013

run with ./test_perf.sh -b master -t HEAD --repeats 5 --hrepeats 10 --burnin 10 -r 'groupby'

@jtratner
Copy link
Contributor Author

So mostly okay? I can't really tell. We can squeeze a bit more performance out by removing the function call, so the line would be:

if self.index._id is res.index._id

that said, could be extra overhead from view creation too. What do you think?

@jtratner
Copy link
Contributor Author

@jreback also, for most intents and purposes, it really doesn't matter if two indices have the same names or not, we just want to know if their labels are the same. And that's pretty much the entire function of is_, to allow for more views everywhere without totally sacrificing performance/equality testing.

@jreback
Copy link
Contributor

jreback commented Sep 21, 2013

@jtratner yep...this all looks fine...perf is about the same (you might need a very specific test to pick up if you actually changing something)

@cpcloud
Copy link
Member

cpcloud commented Sep 21, 2013

i'll run vbench one more time just to see if anything changes drastically ... biggest hit is 10 ms...

@cpcloud
Copy link
Member

cpcloud commented Sep 21, 2013

orm aybe not if @jreback thinks its okay....honestly i feel unsure whenever vbench is not less than about 1.15 ish...i have no idea of the variability of the timings

@cpcloud
Copy link
Member

cpcloud commented Sep 21, 2013

end of vbench:

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
timeseries_large_lookup_value                |   0.0246 |   0.0220 |   1.1191 |
join_dataframe_index_single_key_small        |   4.8723 |   4.3330 |   1.1245 |
timeseries_timestamp_tzinfo_cons             |   0.0126 |   0.0110 |   1.1439 |
append_frame_single_homogenous               |   0.2307 |   0.2000 |   1.1538 |
mask_bools                                   |  13.8144 |  11.9720 |   1.1539 |
frame_xs_col                                 |   0.0290 |   0.0250 |   1.1624 |
dti_reset_index                              |   0.1973 |   0.1686 |   1.1701 |
reindex_multiindex                           |   1.0610 |   0.9023 |   1.1758 |
reshape_stack_simple                         |   1.7027 |   1.4450 |   1.1784 |
groupbym_frame_apply                         |  55.8174 |  44.3370 |   1.2589 |
groupby_frame_apply_overhead                 |  10.6933 |   8.4824 |   1.2607 |
frame_reindex_both_axes                      |  17.5817 |  13.7600 |   1.2777 |
frame_fillna_many_columns_pad                |  10.9213 |   6.3963 |   1.7074 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

@jtratner
Copy link
Contributor Author

that's a weird vbench...the frame_fillna_many_columns_pad.

@jtratner
Copy link
Contributor Author

this works, but if others could take a look (@jreback @cpcloud) and see whether this is useful. Without this, not sure how to do #4202 effectively.

@jreback
Copy link
Contributor

jreback commented Sep 21, 2013

looks fine to me

Adds a method that can quickly tell whether something is a view or not
(in the way that actually matters).

CLN: change 'is' check to 'is_' check in 'equals()'
jtratner added a commit that referenced this pull request Sep 22, 2013
ENH: Add 'is_' method to Index for identity checks
@jtratner jtratner merged commit ebfb4c8 into pandas-dev:master Sep 22, 2013
@jtratner jtratner deleted the add-is_-method-to-index branch September 22, 2013 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Allow fast comparisons of Index views, similar to 'is' checks.
3 participants