Skip to content
This repository has been archived by the owner on Feb 2, 2024. It is now read-only.

Optimize getitem operations by checking for same indexes #800

Conversation

kozlov-alexey
Copy link
Contributor

@kozlov-alexey kozlov-alexey commented Apr 11, 2020

Some performance results (from laptop):

with fix:

        median min max compile boxing
name nthreads type size          
DataFrame.getitem_filter_by_value 1 Python 10000000 0.328003 0.318 0.357003 NaN NaN
    SDC 10000000 0.34 0.287 0.404 0.715081 1.126521
  2 SDC 10000000 0.205 0.194 0.243 0.666005 0.958931
  4 SDC 10000000 0.154 0.128 0.176 0.616491 0.935159

without fix (on master):

        median min max compile boxing
name nthreads type size          
DataFrame.getitem_filter_by_value 1 Python 10000000 0.314004 0.311996 0.318 NaN NaN
    SDC 10000000 3.748 3.618 4.133 0.731427 0.853143
  2 SDC 10000000 3.158 3.113 3.632 0.78739 0.859766
  4 SDC 10000000 3.06 3.007 3.454 0.813918 0.958313

For some reason on nnlmlp01 (but not on ansatclx1004 and my laptop) SDC is slower than python by 2 times on single thread (on the same test). This needs to be investigated further.
with fix on nnlmlp01:

        median min max compile boxing
name nthreads type size          
DataFrame.getitem_filter_by_value 1 Python 10000000 0.268062 0.267576 0.270783 NaN NaN
    SDC 10000000 0.361877 0.361485 0.362171 0.612621 1.002072
  2 SDC 10000000 0.213841 0.212518 0.215923 0.672137 1.019482
  4 SDC 10000000 0.120625 0.11833 0.125019 0.622494 1.026524
  8 SDC 10000000 0.075155 0.074595 0.075756 0.624533 1.027116
  16 SDC 10000000 0.059031 0.058086 0.075269 0.612236 1.051869
  28 SDC 10000000 0.054463 0.051837 0.056121 0.608134 1.074058
  56 SDC 10000000 0.060003 0.056724 0.062026 0.644352 1.094665

Copy link
Collaborator

@AlexanderKalistratov AlexanderKalistratov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great



@lower_builtin(operator.is_, StringArrayType, StringArrayType)
def sdc_str_arr_operator_is(context, builder, sig, args):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

We would need to implement such method for each new series type (e.g. categorical, datetime, etc.)

@AlexanderKalistratov AlexanderKalistratov merged commit bb625dd into IntelPython:master Apr 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants