Implement series.cumsum() in new style by densmirn · Pull Request #192 · IntelPython/sdc · GitHub

This repository was archived by the owner on Feb 2, 2024. It is now read-only.

Contributor

densmirn commented Oct 3, 2019

No description provided.

densmirn requested review from shssf, akharche, fschlimb, 1e-to and kozlov-alexey

October 3, 2019 16:08

shssf reviewed

View reviewed changes

hpat/hiframes/pd_series_ext.py Outdated

                   install_array_method(fname, generic_expand_cumulative_series)
               # TODO: add itemsize, strides, etc. when removed from Pandas
               _not_series_array_attrs = ['flat', 'ctypes', 'itemset', 'reshape', 'sort', 'flatten']
+              # Array attributes which overlap Series attributes
+              _excluded_array_attrs = ['cumsum']

Contributor

shssf Oct 3, 2019 •

edited

Loading

did you try just add cumsum into _not_series_array_attrs array?

Contributor Author

densmirn Oct 4, 2019 •

edited

Loading

Yes I did. In this case the method isn't skipped because condition attr not in _not_series_array_attrs is True due to attr = 'resolve_cumsum' but not just 'cumsum'. It looks like an issue, because now this condition is always True. Perhaps attr.replace('resolve_', '') might be useful in this case.

shssf reviewed

View reviewed changes

hpat/hiframes/pd_series_ext.py Outdated

               # use ArrayAttribute for attributes not defined in SeriesAttribute
               for attr, func in numba.typing.arraydecl.ArrayAttribute.__dict__.items():
                   if (attr.startswith('resolve_')
                           and attr not in SeriesAttribute.__dict__
-                          and attr not in _not_series_array_attrs):
+                          and attr not in _not_series_array_attrs
+                          and attr.replace('resolve_', '') not in _excluded_array_attrs):

Contributor

shssf Oct 3, 2019

Could you show all function names registered by setattr(SeriesAttribute, attr, func)?

Contributor Author

densmirn Oct 4, 2019

print(attr.replace('resolve_', '')) printed
dtype, itemsize, shape, strides, ndim, size, flat, ctypes, flags, T, real, imag, transpose, item, itemset, nonzero, reshape, sort, view, ravel, flatten, prod, sum, mean, var, std, argmin, argmax

densmirn force-pushed the feature/series_cumsum branch 5 times, most recently from 9d73d3d to c3baf97 Compare

October 7, 2019 07:06

densmirn requested a review from shssf

October 7, 2019 07:09

Contributor

shssf commented Oct 10, 2019

Please resolve conflicts

densmirn force-pushed the feature/series_cumsum branch 4 times, most recently from 4e8e09a to 6b74735 Compare

October 15, 2019 07:15

shssf suggested changes

View reviewed changes

hpat/datatypes/hpat_pandas_series_functions.py Outdated

+                  axis: :obj:`int`, :obj:`str`
+                      Axis along which the operation acts
+/None - row-wise operation
+- column-wise operation

Contributor

shssf Oct 15, 2019

it looks like description is not quite accurate. this parameter can be integer or string (as described in line 1507).
Please add string variant. I mean something like 0 or ‘index’, 1 or ‘columns’

hpat/datatypes/hpat_pandas_series_functions.py Outdated

+                  Returns
+                  -------
+                  :obj:`pandas.Series`
+                       returns :obj:`pandas.Series` object

Contributor

shssf Oct 15, 2019

it might return scalar but I don't know cases for this

hpat/datatypes/hpat_pandas_series_functions.py Outdated

+                  _func_name = 'Method cumsum().'
+                  if not isinstance(self, SeriesType):
+                      raise TypingError('{} The object must be a pandas.series. Given self: {}'.format(_func_name, self))

Contributor

shssf Oct 15, 2019

Given self -> Given

hpat/datatypes/hpat_pandas_series_functions.py Outdated

+                      raise TypingError('{} The object must be a pandas.series. Given self: {}'.format(_func_name, self))
+                  if not isinstance(self.dtype, types.Number):
+                      raise TypingError('{} The object must be a number. Given self.dtype: {}'.format(_func_name, self.dtype))

Contributor

shssf Oct 15, 2019

see #216 (comment)
also, message Given self.dtype means nothing for user. Please keep messages understandable as much as possible

hpat/datatypes/hpat_pandas_series_functions.py Outdated

+                  def hpat_pandas_series_cumsum_impl(self, axis=None, skipna=True):
+                      if skipna:
+                          # nampy.nancumsum replaces NANs with 0, series.cumsum does not, so replace back 0 with NANs
+                          data = numpy.nancumsum(self._data)

Contributor

shssf Oct 15, 2019

please use something like local_data instead data to avoid intersection with class variables and types

hpat/tests/test_series.py Outdated

+                      pyfunc = test_impl
+                      cfunc = hpat.jit(pyfunc)
+                      series = pd.Series([1.0, np.nan, 9.0, -1.0, 7.0])

Contributor

shssf Oct 15, 2019

Please see #217 (comment)

densmirn force-pushed the feature/series_cumsum branch 5 times, most recently from d0b90c9 to 49be153 Compare

October 16, 2019 12:27

shssf suggested changes

View reviewed changes

Contributor

shssf left a comment

In case of input data, it might be too early to implement this.
I would be better to implement it for all tests (at least in Series) at one time.
Anyway, please follow data type hierarchy if you would like to start this in this PR

hpat/tests/test_series.py Outdated

+                  [1.0, np.nan, -1.0, 0.0, 5e-324],
+                  [np.nan, np.inf, np.NINF, np.NZERO]
+              ]
+              FLOAT_EXAMPLE, *_ = FLOAT_EXAMPLES

Contributor

shssf Oct 16, 2019

on need this variable. FLOAT_EXAMPLES[0] instead

hpat/tests/test_series.py Outdated

@@ @@ -34,6 +37,28 @@ @@
                   ),
               ]]
+              FLOAT_EXAMPLES = [

Contributor

shssf Oct 16, 2019

Suggested change

      
            FLOAT_EXAMPLES = [
          
            test_global_input_data_float64 = [

hpat/tests/test_series.py Outdated

+              ]
+              FLOAT_EXAMPLE, *_ = FLOAT_EXAMPLES
+              INT_EXAMPLE = [1, -1, 0, 18446744073709551615]
+              NUM_EXAMPLES = [INT_EXAMPLE] + FLOAT_EXAMPLES

Contributor

shssf Oct 16, 2019

Suggested change

      
            NUM_EXAMPLES = [INT_EXAMPLE] + FLOAT_EXAMPLES
          
            test_global_input_data_numeric = [INT_EXAMPLE] + FLOAT_EXAMPLES

hpat/tests/test_series.py Outdated

+                  [np.nan, np.inf, np.NINF, np.NZERO]
+              ]
+              FLOAT_EXAMPLE, *_ = FLOAT_EXAMPLES
+              INT_EXAMPLE = [1, -1, 0, 18446744073709551615]

Contributor

shssf Oct 16, 2019

Suggested change

      
            INT_EXAMPLE = [1, -1, 0, 18446744073709551615]
          
            test_global_input_data_integer64 = [1, -1, 0, 18446744073709551615]

hpat/tests/test_series.py Outdated

+              STR_EXAMPLES = [
+                  ['', 'a' 'aa', 'aaa', 'b', 'aab', 'ab', 'abababab'],
+                  UNICODE_EXAMPLES

Contributor

shssf Oct 16, 2019

don't include it here

hpat/tests/test_series.py Outdated

+              INT_EXAMPLE = [1, -1, 0, 18446744073709551615]
+              NUM_EXAMPLES = [INT_EXAMPLE] + FLOAT_EXAMPLES
+              UNICODE_EXAMPLES = [

Contributor

shssf Oct 16, 2019

Suggested change

      
            UNICODE_EXAMPLES = [
          
            test_global_input_data_unicode_kind4 = [

shssf suggested changes

View reviewed changes

hpat/tests/test_series.py Outdated

+              ]
+              min_int64 = -9223372036854775808
+              max_int64 = 9223372036854775807

Contributor

shssf Oct 20, 2019

sys.intmax

Contributor

shssf Oct 22, 2019

@densmirn Use numba.targets.builtins.get_type_max_value for this

Contributor Author

densmirn Oct 23, 2019

Added.

densmirn added 2 commits

October 23, 2019 18:41


          Implement series.cumsum() in new style

835f8c0


          Minor fixed for series.cumsum()

c7e9319

densmirn force-pushed the feature/series_cumsum branch from 8dfe505 to c7e9319 Compare

October 23, 2019 15:54

densmirn requested a review from shssf

October 23, 2019 15:55


          Revert multiprocessing parallelism for series.cumsum()

55ed216

densmirn requested a review from Hardcode84

October 25, 2019 07:00

Contributor Author

densmirn commented Oct 25, 2019

@shssf, @kozlov-alexey could you take the next round to review the PR?

densmirn added the Ready for Review label


          Merge branch 'master' into feature/series_cumsum

b50b29d

shssf approved these changes

View reviewed changes

shssf merged commit 7722b6e into IntelPython:master

densmirn deleted the feature/series_cumsum branch

June 9, 2020 12:12

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Reviewers

akharche Awaiting requested review from akharche

fschlimb Awaiting requested review from fschlimb

1e-to Awaiting requested review from 1e-to

kozlov-alexey Awaiting requested review from kozlov-alexey

Hardcode84 Awaiting requested review from Hardcode84

1 more reviewer

shssf shssf approved these changes

Labels

Ready for Review