Skip to content
This repository has been archived by the owner on Feb 2, 2024. It is now read-only.

Implement series.var() in new style #220

Merged
merged 5 commits into from
Oct 26, 2019

Conversation

densmirn
Copy link
Contributor

No description provided.

@densmirn densmirn requested review from shssf and akharche and removed request for akharche October 14, 2019 08:55
@densmirn densmirn force-pushed the feature/series_var branch 5 times, most recently from f8a9a03 to f323e5b Compare October 15, 2019 06:14
Returns
-------
:obj:`scalar`
returns :obj:`scalar`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can return scalar or Series

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we have to add series here, because series is returned in case of level specified, but currently we don't support parameter level. So scalar is always returned in our case. Do you still think I need to add series here?

if not isinstance(self, SeriesType):
raise TypingError('{} The object must be a pandas.series. Given: {}'.format(_func_name, self))

if not isinstance(self.dtype, types.Number):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valuable_length = len(self._data) - numpy.sum(numpy.isnan(self._data))
return numpy.nanvar(self._data) * valuable_length / (valuable_length - ddof)

return self._data.var() * len(self._data) / (len(self._data) - ddof)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe numpy has no such functionality

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numpy implementation has ddof=0 by default, but Pandas has ddof=1 by default. So I has to make this extra change to align the parameter between these implementations.

@@ -851,7 +851,7 @@ def parse_impl(data):

def _run_call_series(self, assign, lhs, rhs, series_var, func_name):
# single arg functions
if func_name in ('sum', 'count', 'mean', 'var', 'min', 'max', 'prod'):
if func_name in ('sum', 'count', 'mean', 'min', 'max', 'prod'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should brake parallelism. not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, It breaks multiprocessing parallelism, but I still don't know how to save both approaches ("old" and "new") simultaneously. Please let me know if you have thoughts about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is no single and simple recipe

return series.var(skipna=skipna, ddof=ddof)

cfunc = hpat.jit(pyfunc)
series = pd.Series([1.3, -2.7, np.nan, 0.1, 10.9])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@densmirn densmirn force-pushed the feature/series_var branch 4 times, most recently from 9f57b16 to 80cc25d Compare October 16, 2019 15:11
@@ -851,7 +851,7 @@ def parse_impl(data):

def _run_call_series(self, assign, lhs, rhs, series_var, func_name):
# single arg functions
if func_name in ('sum', 'count', 'mean', 'var', 'min', 'max', 'prod'):
if func_name in ('sum', 'count', 'mean', 'min', 'max', 'prod'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is no single and simple recipe

@densmirn densmirn requested a review from shssf October 24, 2019 06:02
@densmirn
Copy link
Contributor Author

@shssf, @kozlov-alexey could you take the next round to review the PR?

0/None/'index' - row-wise operation
1/'columns' - column-wise operation
*unsupported*
skipna: :obj:`bool`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect skipna=True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default skipna=None in Pandas interface. So we did the same.

@@ -256,6 +256,86 @@ def hpat_pandas_series_values_impl(self):
return hpat_pandas_series_values_impl


@overload_method(SeriesType, 'var')
def hpat_pandas_series_var(self, axis=None, skipna=None, level=None, ddof=1, numeric_only=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skipna=True?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default skipna=None in Pandas interface.

@shssf shssf merged commit d443814 into IntelPython:master Oct 26, 2019
@densmirn densmirn deleted the feature/series_var branch June 9, 2020 12:12
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants