-
Notifications
You must be signed in to change notification settings - Fork 161
Manually append meta data #141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
776096a to
e2ee2c1
Compare
thunterdb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yupbank thanks a lot for doing this, this is a much requested feature. It looks like some of the tests are erroring (this is a known travis bug), but some others are failing on the test that you added.
| """Append extra metadata for a dataframe that | ||
| describes the numerical shape of the content. | ||
| This method is useful when a dataframe contains non-scalar tensors, for which the shape must be checked beforehand. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment also that indicates that the user is responsible for providing the right shape, and that a mismatch will trigger eventually an exception in Spark?
src/main/python/tensorframes/core.py
Outdated
| :param dframe: a Spark DataFrame | ||
| :param col: a Column expression | ||
| :param size: a shape corresponding to the tensor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can link to the documentation in python: https://www.tensorflow.org/programmers_guide/tensors#shape
This is important for people to understand the order of the elements.
| res = tfs.reduce_rows(x, ddf) | ||
| assert res == sum([r.x for r in data]) | ||
|
|
||
| # This test fails |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent?
| data = [Row(x=float(x)) for x in range(5)] | ||
| df = self.sql.createDataFrame(data) | ||
| ddf = tfs.append_shape(df, col('x'), [-1, 1]) | ||
| import ipdb; ipdb.set_trace() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you don't need that anymore
| def test_append_shape(self): | ||
| data = [Row(x=float(x)) for x in range(5)] | ||
| df = self.sql.createDataFrame(data) | ||
| ddf = tfs.append_shape(df, col('x'), [-1, 1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know what happens if you replace -1 by None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the py4j would be unhappy, so i can do a convert from None to -1 .
i miss the type annotation py3 have, so that i don't need to worry about it
|
|
||
| val meta = new MetadataBuilder | ||
| val colDtypes = df.select(col).schema.fields.head.dataType | ||
| val basicDatatype = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need the extra {}?
|
thanks for the review... which i should add a work in progress in the title. it's just i'm trying too hard to use py4j. and keep on hitting this error the scala function signature is can you help me correct this? |
| } | ||
|
|
||
| def appendShape(df: DataFrame, col:Column, shape: util.ArrayList[Int]): DataFrame = | ||
| appendShape(df, col, shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yupbank you want to convert shape to an array, otherwise you get into a loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah...that's why the stack overflowed! thanks a lot! saved my day!
|
@thunterdb can i have another round of 👁 |
|
@yupbank some travis-related changes got committed. Can you please rebase your pull request so that travis can take them into account? |
3d43f4d to
068fa74
Compare
|
Merging, thank you very much! |
Add a function that append shape to a column which can save memory and time to analyze