-
Notifications
You must be signed in to change notification settings - Fork 62
Dataframe.head() #363
Dataframe.head() #363
Conversation
|
Hello @1e-to! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2019-12-16 14:02:14 UTC |
| return pandas.Series(data=result_data, index=result_index) | ||
|
|
||
| return sdc_pandas_dataframe_count_impl | ||
| if not sdc.config.use_default_dataframe: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this variable is needed here.
You can just delete implementation of method count(or keep it commented if you need this).
| return sdc_pandas_dataframe_count_impl | ||
|
|
||
| else: | ||
| def sdc_pandas_dataframe_reduce_columns(df, name, param): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why you need this extra function? Could you please put this realization into @overload_method(DataFrameType, 'head')?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do this to prevent code duplication for reduce dataframe column,
the general function for reducing the columns of a data frame is described here:
| def sdc_pandas_dataframe_reduce_columns(df, name, param): |
| ty_checker.raise_exc(n, 'integer', 'n') | ||
|
|
||
| @overload_method(DataFrameType, 'head') | ||
| def median_overload(df, n=5): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def median_overload(df, n=5): | |
| def sdc_pandas_dataframe_head(df, n=5): |
|
|
||
| return _reduce_impl | ||
|
|
||
| def check_type(name, df, axis=None, skipna=None, level=None, numeric_only=None, ddof=1, min_count=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a common function for checking all parameters? How is it related to df.head()?
| return _reduce_impl | ||
|
|
||
| def check_type(name, df, axis=None, skipna=None, level=None, numeric_only=None, ddof=1, min_count=0): | ||
| def check_type(name, df, axis=None, skipna=None, level=None, numeric_only=None, ddof=1, min_count=0, n=5): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a specific function for head method? May be it is not necessary to separate it in one case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its common func for all cases that write in it
| def test_df_fillna1(self): | ||
| def test_impl(df): | ||
| return df.fillna(5.0) | ||
| return df.fillna(0.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason to change it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its has attention to other PR, I forget delete this
| "STRING": ['a', 'dd', 'c', '12', 'ddf']}) | ||
| pd.testing.assert_frame_equal(sdc_func(df), test_impl(df)) | ||
|
|
||
| def test_dataframe_head1(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if Index is set explicitly, this solution is ok for that case? I don't see tests on it
| name = 'head' | ||
|
|
||
| check_type(name, df, n=n) | ||
| check_type(name, df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is overhead here
| if len(params) > 0: | ||
| space.append(', ') | ||
| func_definition = 'def _reduce_impl(df{}{}):'.format("".join(space), ", ".join( | ||
| str(key) + '=' + str(value) for key, value in params)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use formatted strings or f-strings instead of concatenation?
str(key) + '=' + str(value) -> f'{key}={value}' or '{}={}'.format(key, value)
| space = [] | ||
| if len(params) > 0: | ||
| space.append(', ') | ||
| func_definition = 'def _reduce_impl(df{}{}):'.format("".join(space), ", ".join( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would propose to do so:
all_params = ['df'] + [f'{key}={value}' for key, value in params]
func_definition = ', '.join(all_params)
| if not (isinstance(n, (types.Omitted, types.Integer)) or n == 5): | ||
| ty_checker.raise_exc(n, 'int64', 'n') | ||
|
|
||
| params = [('n', 5)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could pass [('n', 5)] to the sdc_pandas_dataframe_reduce_columns_series directly without creating variable params.
| n_cols = len(saved_columns) | ||
| data_args = tuple('data{}'.format(i) for i in range(n_cols)) | ||
| all_params = ['df'] + [f'{key}={value}' for key, value in params] | ||
| func_definition = "def _reduce_impl(" + ', '.join(all_params) + "):" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use formatted strings or f-strings instead of concatenation.
func_definition = 'def _reduce_impl({}):'.format(', '.join(all_params))
| line = ' {} = sdc.hiframes.api.init_series(sdc.hiframes.pd_dataframe_ext.get_dataframe_data(df, {}))' | ||
| func_lines.append(line.format(d + '_S', i)) | ||
| func_lines.append(' {} = {}.{}({})'.format(d + '_O', d + '_S', name, ", ".join( | ||
| str(key) for key, value in params))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'{} = {}.{}({})'.format(d + '_O', d + '_S' -> '{}_O = {}_S.{}({})'.format(d, d
str(key) -> key
value -> _
| str(key) for key, value in params))) | ||
| func_lines.append(" return sdc.hiframes.pd_dataframe_ext.init_dataframe({}, None, {})\n".format( | ||
| ", ".join(d + '_O._data' for d in data_args), | ||
| ", ".join("'" + c + "'" for c in saved_columns))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use formatted strings or f-strings. E.g. f"'{c}'"
|
|
||
| return _reduce_impl | ||
|
|
||
| def check_type(name, df, axis=None, skipna=None, level=None, numeric_only=None, ddof=1, min_count=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is check_type used somewhere?
| func_lines.append(' {}_O = {}_S.{}({})'.format(d, d, name, ", ".join( | ||
| key for key, _ in params))) | ||
| func_lines.append(" return sdc.hiframes.pd_dataframe_ext.init_dataframe({}, None, {})\n".format( | ||
| ", ".join(d + '_O._data' for d in data_args), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You forgot to use f-string here.
| func_lines.append(line.format(d + '_S', i)) | ||
| func_lines.append(' {}_O = {}_S.{}({})'.format(d, d, name, ", ".join( | ||
| key for key, _ in params))) | ||
| func_lines.append(" return sdc.hiframes.pd_dataframe_ext.init_dataframe({}, None, {})\n".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you exactly need \n at the end?
| func_lines = [func_definition] | ||
| for i, d in enumerate(data_args): | ||
| line = ' {} = sdc.hiframes.api.init_series(sdc.hiframes.pd_dataframe_ext.get_dataframe_data(df, {}))' | ||
| func_lines.append(line.format(d + '_S', i)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use f-string or formatted string instead of concatenation.
No description provided.