Skip to content
This repository was archived by the owner on Feb 2, 2024. It is now read-only.

Conversation

Rubtsowa
Copy link
Contributor

No description provided.

@pep8speaks
Copy link

pep8speaks commented Nov 28, 2019

Hello @Rubtsowa! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-12-31 12:51:56 UTC

sdc/config.py Outdated

use_default_dataframe = distutils_util.strtobool(os.getenv('SDC_CONFIG_USE_DEFAULT_DATAFRAME', 'True'))
'''
Default value used to select compiler pipeline in a function decorator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description is not right - should be something like "Config variable used to select DataFrameType model (default is legacy model)"

return sdc_pandas_dataframe_count_impl

else:
def reduce(df, name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think function name should be more specific, e.g. "sdc_pandas_dataframe_reduce_columns"

saved_columns = df.columns
n_cols = len(saved_columns)
data_args = tuple('data{}'.format(i) for i in range(n_cols))
func_text = "def _reduce_impl(df, axis=None, skipna=None, level=None, numeric_only=None):\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line defines function signature, which is now actually hardcoded to signature that 'median' has (the other function like max, min, mean also have the same signature, but not all). That won't work for other functions, e.g. Series.sum, which has additional arguments:
Series.sum(self, axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
You have to pass additional arguments to this function to be able to generate correct function text that matches the signature used in overload.

if not isinstance(df, DataFrameType):
raise TypingError('{} The object must be a pandas.dataframe. Given: {}'.format(name, df))

if not (isinstance(axis, types.Omitted) or axis is None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we should use ty_checker here and everywhere else, e.g.

Suggested change
if not (isinstance(axis, types.Omitted) or axis is None):
if (not isinstance(axis, (int, types.Integer, str, types.UnicodeType, types.StringLiteral, types.Omitted))
and axis not in (0, 'index')):
ty_checker.raise_exc(axis, 'integer or string', 'axis')

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All functions seems to have more or less the same checks. Shouldn't we move generic checks into generic function in order to avoid copy and paste?

def test_impl(df):
return df.min()
sdc_func = sdc.jit(test_impl)
df = pd.DataFrame({"A": [12, 4, 5, 44, 1],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we just use Series method for each column, I prefer to use an aggregated DF with many columns that actually differ between each other, that is, each testing some specific case - ints, floats, datatimes, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rubtsowa Why haven't you applied the above comment? You don't need two different tests for min, you can just use one with:

        df = pd.DataFrame({
                           "A": [12, 4, 5, 44, 1],
                           "B": [5.0, np.nan, 9, 2, -1],
                           "C": ['a', 'aa', 'd', 'cc', None],
                           "D": [True, True, False, True, True]
         })

if not isinstance(df, DataFrameType):
raise TypingError('{} The object must be a pandas.dataframe. Given: {}'.format(name, df))

if not (isinstance(axis, types.Omitted) or axis is None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All functions seems to have more or less the same checks. Shouldn't we move generic checks into generic function in order to avoid copy and paste?

loc_vars = {}

print()
print(func_text)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it should be removed

Comment on lines 108 to 115
func_text += " {} = hpat.hiframes.api.init_series(hpat.hiframes.pd_dataframe_ext.get_dataframe_data(df, {}))\n".format(
d + '_S', i)
func_text += " {} = {}.{}()\n".format(d + '_O', d + '_S', name)
func_text += " data = np.array(({},))\n".format(
", ".join(d + '_O' for d in data_args))
func_text += " index = hpat.str_arr_ext.StringArray(({},))\n".format(
", ".join("'{}'".format(c) for c in saved_columns))
func_text += " return hpat.hiframes.api.init_series(data, index)\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You concatenate strings via operator +, but more effective way is to create list, fill the list via .append() (do NOT forget to cut off \n), then join the list to the string via '\n'.join(), e.g.:

func_lines = []
func_lines.append('the first line')
func_lines.append('the second line')
func_lines.append('the last line')
func_text = '\n'.join(func_lines)

@densmirn
Copy link
Contributor

Hello @Rubtsowa! Thanks for opening this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 47:5: E303 too many blank lines (2)
Line 70:121: E501 line too long (122 > 120 characters)
Line 108:121: E501 line too long (132 > 120 characters)
Line 128:5: E303 too many blank lines (2)
Line 176:5: E303 too many blank lines (2)
Line 224:5: E303 too many blank lines (2)
Line 272:5: E303 too many blank lines (2)
Line 320:5: E303 too many blank lines (2)

Line 1641:1: E402 module level import not at top of file

Please fix the code style issues.

return sdc_pandas_dataframe_count_impl

else:
def sdc_pandas_dataframe_reduce_columns(df, name, param):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe replace param with params.

@@ -1994,8 +1994,7 @@ def hpat_pandas_series_sum(
.. only:: developer

Tests:
python -m sdc.runtests sdc.tests.test_series.TestSeries.test_series_sum1
# python -m sdc.runtests sdc.tests.test_series.TestSeries.test_series_sum2
python -m sdc.runtests -k sdc.tests.test_series.TestSeries.test_series_sum
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to refer to all tests for sum method:
python -m sdc.runtests -k sdc.tests.test_series.TestSeries.test_series_sum*

Comment on lines 2665 to 2666
if skipna is None:
skipna = True
Copy link
Contributor

@kozlov-alexey kozlov-alexey Dec 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not compile, since type won't be unified for skipna variable which is None and bool at the same time. You can add a branch based on compile time value, i.e. define skipna_is_none variable at typing and refer to it with:

if skipna_is_none == True:  #noqa
    _skipna = True
else:
    _skipna = skipna

"F": [np.nan, np.nan, np.inf, np.nan]})
pd.testing.assert_series_equal(hpat_func(df), test_impl(df))

def test_prod(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these tests test default argument values, so I suggest adding _default suffix for all of them.

from numba.errors import TypingError
from sdc.hiframes.pd_dataframe_ext import DataFrameType
import sdc.datatypes.hpat_pandas_dataframe_types
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import sdc.datatypes.hpat_pandas_dataframe_types

from numba.errors import TypingError
from sdc.hiframes.pd_dataframe_ext import DataFrameType
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from sdc.hiframes.pd_dataframe_ext import DataFrameType
from sdc.hiframes.pd_dataframe_type import DataFrameType

@AlexanderKalistratov AlexanderKalistratov merged commit 433fa97 into IntelPython:master Dec 31, 2019
@Rubtsowa Rubtsowa deleted the add_dataframe_min branch April 7, 2020 07:04
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants