-
Notifications
You must be signed in to change notification settings - Fork 134
feat: Add flatten array function
#562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hello @andygrove do you mind giving me a hand with this PR ? I exposed |
Hi @mobley-trent # build and install package
maturin developAlso, don't forget to active the venv before this command. |
|
Hey @ongchi I tested the from datafusion import SessionContext, column
from datafusion import functions as f
import numpy as np
import pyarrow as pa
def py_flatten(arr):
# Testing helper function
result = []
for elem in arr:
if isinstance(elem, list):
result.extend(py_flatten(elem))
else:
result.append(elem)
return result
ctx = SessionContext()
data = [[1.0, 2.0, 3.0], [4.0, 5.0], [6.0]]
batch = pa.RecordBatch.from_arrays(
[np.array(data, dtype=object)], names=["arr"]
)
df = ctx.create_dataframe([[batch]])
col = column("arr")
stmt = f.flatten(col)
py_expr = lambda: [py_flatten(data)]
result = df.select(stmt).collect()[0].column(0).tolist()
print(f"flatten query: {result}")
print(f"py_expr: {py_expr()}")Results: I expected the flatten query to be identical to the |
|
Using a regular ctx = SessionContext()
ctx.sql("select flatten([[1.0, 2.0, 3.0], [4.0, 5.0], [6.0]]);")Result: |
Hi @mobley-trent It's contains of multiple rows of one-dimensional array values. For the |
andygrove
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mobley-trent
|
Fixed the merge conflicts |
Which issue does this PR close?
Refer to issue #463
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?