Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for numpy 1.23.0 #2137

Merged
merged 6 commits into from
Jun 23, 2022
Merged

Fixes for numpy 1.23.0 #2137

merged 6 commits into from
Jun 23, 2022

Conversation

thehomebrewnerd
Copy link
Contributor

@thehomebrewnerd thehomebrewnerd commented Jun 23, 2022

Fixes for numpy 1.23.0

Implement fixes for compatibility with numpy 1.23.0

With numpy 1.23.0 the np.divide function used in the DivideNumeric primitive now generates an error with pyspark inputs.

np.divide(df["ser1"], df["ser2"])

NotImplementedError: pandas-on-Spark objects currently do not support <ufunc 'divide'>.

To resolve, the primitive was updated to divide the two series directly rather than using the np.divide function.

Also had to update the doctest for the NaturalLogarithm primitive to round primitive output to account for slightly different floating point output with the new version of numpy.

gsheni
gsheni previously approved these changes Jun 23, 2022
@codecov
Copy link

codecov bot commented Jun 23, 2022

Codecov Report

Merging #2137 (b0463e0) into main (1d8baf9) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #2137   +/-   ##
=======================================
  Coverage   99.21%   99.21%           
=======================================
  Files         143      143           
  Lines       16833    16835    +2     
=======================================
+ Hits        16701    16703    +2     
  Misses        132      132           
Impacted Files Coverage Δ
...retools/primitives/standard/transform_primitive.py 100.00% <ø> (ø)
...aturetools/primitives/standard/binary_transform.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1d8baf9...b0463e0. Read the comment docs.

@@ -881,7 +881,10 @@ def __init__(self, commutative=False):
self.commutative = commutative

def get_function(self):
return np.divide
def divide_numeric(val1, val2):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When primitives directly return a function, it tends to be faster. Is there no way to keep using np.divide?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Not sure it makes a difference in this particular case - perhaps because we are using optimized approaches in both cases (numpy vs pandas)?

@thehomebrewnerd thehomebrewnerd merged commit 9050335 into main Jun 23, 2022
@thehomebrewnerd thehomebrewnerd deleted the numpy-1.23-fixes branch June 23, 2022 15:29
@sbadithe sbadithe mentioned this pull request Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants