New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use numpy and math functions inside verbs? #80
Comments
Hmmmm - try casting X.x to another array type that bumpy understands?
What type is X.x, can you do a type of or something similar?
…On Sat, 26 Jan 2019, 18:21 Derek Powell ***@***.*** wrote:
I'm running across errors when I try to use numpy or math functions (e.g.,
sqrt, log, etc) inside dfply verbs. Here's a minimal example:
import pandas as pdfrom dfply import *
df = pd.DataFrame({'x': np.linspace(1, 10, 500)})
df >> mutate(y = np.log(X.x))
This gives the error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-1-f8d61ebf2e20> in <module>()
3 df = pd.DataFrame({'x': np.linspace(1, 10, 500)})
4
----> 5 df >> mutate(y = np.log(X.x))
ValueError: invalid __array_struct__
Is this functionality not implemented? Maybe there's a workaround I'm not
seeing?
(I'm on python 3.6.3)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#80>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABOypO0yB0yH5FVQo8DtB3cvkg5XURPTks5vHJycgaJpZM4aUTeJ>
.
|
Thanks for the quick response.
In the example I gave, |
I've run into this before. It will be a matter of experimenting with
different type conversions until you get something that bumpy accepts.
I'm also not sure whether it's passing each cell into numpy, or the entire
row as a column. There must be some way to verify that.
…On Sun, 27 Jan 2019, 17:23 Derek Powell ***@***.*** wrote:
Thanks for the quick response.
type(X.x) returns dfply.base.Intention
type(df.x) returns pandas.core.series.Series (as expected).
In the example I gave, df.assign(y = np.log(df.x)) works fine. So I'm
pretty sure it's not a problem with the array in the dataframe.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#80 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABOypDGovieSAYZezQ7ohXcku5_zTJipks5vHeB-gaJpZM4aUTeJ>
.
|
I'm not an expert but I'm very confident it's not a problem with the original dataframe. Would be curious if the example I gave reproduces? Can also try an even simpler example: df = pd.DataFrame({"x":[.1,.2,.3,.4,.5,1,2,3]})
df >> mutate(y = np.log(X.x)) That gives the same error for me. Hopefully @kieferk can solve |
The problem here is that python isn't R: R has delayed interpretation which means that the call to the log function is delayed until the function receives the dataframe as a context. Python doesn't have delayed interpretation so the interpretation order is doing the log transformation to the The problem is when a function doesn't know about it, as in this case the np.log function. It expects an array (which is why df.x works) and gets the "recorder" object. What might work is a |
Aha, this is what I feared. That's unfortunate, definitely limits the utility of the I've also been playing with the plydata package which can handle these kinds of operations. In plydata, computations are passed as strings, e.g. |
i think a workaround could be:
|
I came across this issue because I was searching for the exact same problem. After reading the documentation I noticed this is actually addressed, and the correct way to solve this would be: @make_symbolic
def log(series):
return np.log(series)
df >> mutate(y_log = log(X.y)) I can verify this works without a problem! |
I've run into this issue as well and it'd be awesome if we could add @omrihar's solution into the codebase! |
I'm running across errors when I try to use numpy or math functions (e.g., sqrt, log, etc) inside dfply verbs. Here's a minimal example:
This gives the error:
Is this functionality not implemented? Maybe there's a workaround I'm not seeing?
(I'm on python 3.6.3)
The text was updated successfully, but these errors were encountered: