Skip to content

Conversation

@thehomebrewnerd
Copy link
Contributor

@thehomebrewnerd thehomebrewnerd commented Jun 23, 2022

Fixes for numpy 1.23.0

Implement fixes for compatibility with numpy 1.23.0

With numpy 1.23.0 the np.divide function used in the DivideNumeric primitive now generates an error with pyspark inputs.

np.divide(df["ser1"], df["ser2"])

NotImplementedError: pandas-on-Spark objects currently do not support <ufunc 'divide'>.

To resolve, the primitive was updated to divide the two series directly rather than using the np.divide function.

Also had to update the doctest for the NaturalLogarithm primitive to round primitive output to account for slightly different floating point output with the new version of numpy.

gsheni
gsheni previously approved these changes Jun 23, 2022
@codecov
Copy link

codecov bot commented Jun 23, 2022

Codecov Report

Merging #2137 (b0463e0) into main (1d8baf9) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #2137   +/-   ##
=======================================
  Coverage   99.21%   99.21%           
=======================================
  Files         143      143           
  Lines       16833    16835    +2     
=======================================
+ Hits        16701    16703    +2     
  Misses        132      132           
Impacted Files Coverage Δ
...retools/primitives/standard/transform_primitive.py 100.00% <ø> (ø)
...aturetools/primitives/standard/binary_transform.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1d8baf9...b0463e0. Read the comment docs.

@thehomebrewnerd thehomebrewnerd requested review from gsheni and rwedge June 23, 2022 14:52

def get_function(self):
return np.divide
def divide_numeric(val1, val2):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When primitives directly return a function, it tends to be faster. Is there no way to keep using np.divide?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Not sure it makes a difference in this particular case - perhaps because we are using optimized approaches in both cases (numpy vs pandas)?

@thehomebrewnerd thehomebrewnerd merged commit 9050335 into main Jun 23, 2022
@thehomebrewnerd thehomebrewnerd deleted the numpy-1.23-fixes branch June 23, 2022 15:29
@sbadithe sbadithe mentioned this pull request Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants