Skip to content
This repository was archived by the owner on Feb 2, 2024. It is now read-only.

Overload df.getitem with idx of tuple#573

Merged
AlexanderKalistratov merged 2 commits intoIntelPython:masterfrom
densmirn:feature/df_getitem
Feb 9, 2020
Merged

Overload df.getitem with idx of tuple#573
AlexanderKalistratov merged 2 commits intoIntelPython:masterfrom
densmirn:feature/df_getitem

Conversation

@densmirn
Copy link
Copy Markdown
Contributor

@densmirn densmirn commented Feb 6, 2020

No description provided.

Comment on lines +1245 to +1246
# SDC pd.df.getitem does not support idx as a list
sdc_func = self.jit(lambda df: df[('A', 'C')])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should state this as a limitation in our documentation too.
Also it might look better if you write it this way:

        test_impl = lambda df: df[['A', 'C']]
        sdc_func = self.jit(lambda df: df[('A', 'C')])

or use do_jit parameter, e.g.

      # Pandas implementation does not support idx as a tuple
       # but SDC implementation only supports tuple, so adjust tested code
       def test_impl(df, do_jit):
           if do_jit == True:  # noqa
               return df[('A', 'C')]
           else:
               return df[['A', 'C']]
       sdc_func = self.jit(test_impl)

       pd.testing.assert_frame_equal(sdc_func(df, True), test_impl(df, False))

func_lines += df_getitem_tuple_idx_main_codelines(self, literal_idx)
else:
# raise KeyError if input DF is empty or idx is invalid
func_lines += [' raise KeyError']
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception should really have a message, maybe something like "Column is not in the DataFrame" would fit?

return func_text, global_vars


def gen_df_getitem_impl_generator(codegen, impl_name):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks as a very generic function, that we could use everywhere else, so it might be useful to move it to sdc_typing_utils. So it's responsibility would be to call exec on provided func_code and return back created func object.


return _df_getitem_unicode_idx_impl

if isinstance(idx, types.Tuple):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be a more strict check, e.g.

Suggested change
if isinstance(idx, types.Tuple):
if (isinstance(idx, types.Tuple)
and all([isinstance(item, types.StringLiteral) for item in idx])):

Copy link
Copy Markdown
Contributor

@kozlov-alexey kozlov-alexey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good! No major comments from me, only minor ones, that might be applied later on.

@AlexanderKalistratov AlexanderKalistratov merged commit 7fc0df6 into IntelPython:master Feb 9, 2020
@densmirn densmirn deleted the feature/df_getitem branch June 9, 2020 12:12
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants