Skip to content
This repository was archived by the owner on Feb 2, 2024. It is now read-only.

Conversation

@1e-to
Copy link
Contributor

@1e-to 1e-to commented Dec 3, 2019

No description provided.

@1e-to 1e-to requested a review from densmirn December 3, 2019 12:42
@pep8speaks
Copy link

pep8speaks commented Dec 3, 2019

Hello @1e-to! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 1632:1: E305 expected 2 blank lines after class or function definition, found 1
Line 1632:1: E402 module level import not at top of file

Line 1187:28: E127 continuation line over-indented for visual indent

Comment last updated at 2019-12-16 14:02:14 UTC

return pandas.Series(data=result_data, index=result_index)

return sdc_pandas_dataframe_count_impl
if not sdc.config.use_default_dataframe:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this variable is needed here.
You can just delete implementation of method count(or keep it commented if you need this).

return sdc_pandas_dataframe_count_impl

else:
def sdc_pandas_dataframe_reduce_columns(df, name, param):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you need this extra function? Could you please put this realization into @overload_method(DataFrameType, 'head')?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this to prevent code duplication for reduce dataframe column,
the general function for reducing the columns of a data frame is described here:

def sdc_pandas_dataframe_reduce_columns(df, name, param):

ty_checker.raise_exc(n, 'integer', 'n')

@overload_method(DataFrameType, 'head')
def median_overload(df, n=5):
Copy link
Contributor

@shssf shssf Dec 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def median_overload(df, n=5):
def sdc_pandas_dataframe_head(df, n=5):


return _reduce_impl

def check_type(name, df, axis=None, skipna=None, level=None, numeric_only=None, ddof=1, min_count=0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a common function for checking all parameters? How is it related to df.head()?

@shssf shssf mentioned this pull request Dec 5, 2019
return _reduce_impl

def check_type(name, df, axis=None, skipna=None, level=None, numeric_only=None, ddof=1, min_count=0):
def check_type(name, df, axis=None, skipna=None, level=None, numeric_only=None, ddof=1, min_count=0, n=5):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a specific function for head method? May be it is not necessary to separate it in one case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its common func for all cases that write in it

def test_df_fillna1(self):
def test_impl(df):
return df.fillna(5.0)
return df.fillna(0.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason to change it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its has attention to other PR, I forget delete this

"STRING": ['a', 'dd', 'c', '12', 'ddf']})
pd.testing.assert_frame_equal(sdc_func(df), test_impl(df))

def test_dataframe_head1(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if Index is set explicitly, this solution is ok for that case? I don't see tests on it

name = 'head'

check_type(name, df, n=n)
check_type(name, df)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is overhead here

if len(params) > 0:
space.append(', ')
func_definition = 'def _reduce_impl(df{}{}):'.format("".join(space), ", ".join(
str(key) + '=' + str(value) for key, value in params))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use formatted strings or f-strings instead of concatenation?
str(key) + '=' + str(value) -> f'{key}={value}' or '{}={}'.format(key, value)

space = []
if len(params) > 0:
space.append(', ')
func_definition = 'def _reduce_impl(df{}{}):'.format("".join(space), ", ".join(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose to do so:

all_params = ['df'] + [f'{key}={value}' for key, value in params]
func_definition = ', '.join(all_params)

if not (isinstance(n, (types.Omitted, types.Integer)) or n == 5):
ty_checker.raise_exc(n, 'int64', 'n')

params = [('n', 5)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could pass [('n', 5)] to the sdc_pandas_dataframe_reduce_columns_series directly without creating variable params.

n_cols = len(saved_columns)
data_args = tuple('data{}'.format(i) for i in range(n_cols))
all_params = ['df'] + [f'{key}={value}' for key, value in params]
func_definition = "def _reduce_impl(" + ', '.join(all_params) + "):"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use formatted strings or f-strings instead of concatenation.
func_definition = 'def _reduce_impl({}):'.format(', '.join(all_params))

line = ' {} = sdc.hiframes.api.init_series(sdc.hiframes.pd_dataframe_ext.get_dataframe_data(df, {}))'
func_lines.append(line.format(d + '_S', i))
func_lines.append(' {} = {}.{}({})'.format(d + '_O', d + '_S', name, ", ".join(
str(key) for key, value in params)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'{} = {}.{}({})'.format(d + '_O', d + '_S' -> '{}_O = {}_S.{}({})'.format(d, d
str(key) -> key
value -> _

str(key) for key, value in params)))
func_lines.append(" return sdc.hiframes.pd_dataframe_ext.init_dataframe({}, None, {})\n".format(
", ".join(d + '_O._data' for d in data_args),
", ".join("'" + c + "'" for c in saved_columns)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use formatted strings or f-strings. E.g. f"'{c}'"


return _reduce_impl

def check_type(name, df, axis=None, skipna=None, level=None, numeric_only=None, ddof=1, min_count=0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is check_type used somewhere?

func_lines.append(' {}_O = {}_S.{}({})'.format(d, d, name, ", ".join(
key for key, _ in params)))
func_lines.append(" return sdc.hiframes.pd_dataframe_ext.init_dataframe({}, None, {})\n".format(
", ".join(d + '_O._data' for d in data_args),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You forgot to use f-string here.

func_lines.append(line.format(d + '_S', i))
func_lines.append(' {}_O = {}_S.{}({})'.format(d, d, name, ", ".join(
key for key, _ in params)))
func_lines.append(" return sdc.hiframes.pd_dataframe_ext.init_dataframe({}, None, {})\n".format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you exactly need \n at the end?

func_lines = [func_definition]
for i, d in enumerate(data_args):
line = ' {} = sdc.hiframes.api.init_series(sdc.hiframes.pd_dataframe_ext.get_dataframe_data(df, {}))'
func_lines.append(line.format(d + '_S', i))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use f-string or formatted string instead of concatenation.

@1e-to 1e-to closed this Mar 2, 2020
@1e-to 1e-to deleted the dataframe branch March 2, 2020 12:15
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants