Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] infer_type() fails for lists where the first element is NULL #32881

Closed
asfimport opened this issue Sep 7, 2022 · 4 comments
Closed

[R] infer_type() fails for lists where the first element is NULL #32881

asfimport opened this issue Sep 7, 2022 · 4 comments
Assignees
Milestone

Comments

@asfimport
Copy link

  • Works
    reticulate::py_run_string("
    import pandas as pd
    df = pd.DataFrame( {'col1': [[1,2], None, [3,4]]}
    )
    df.to_parquet('/tmp/test1.parquet')
    ")
    df1 <- arrow::read_parquet("/tmp/test1.parquet")
    arrow::write_parquet(df1, tempfile(fileext = ".parquet"))

  • Fails in arrow 9.0; works in arrow 5.0
    reticulate::py_run_string("
    import pandas as pd
    df = pd.DataFrame( {'col1': [None, [1,2], [3,4]]}
    )
    df.to_parquet('/tmp/test2.parquet')
    ")
    df2 <- arrow::read_parquet("/tmp/test2.parquet")
    arrow::write_parquet(df2, tempfile(fileext = ".parquet"))

Environment: Ubuntu 18.04; R 4.1.1; arrow 9.0
Reporter: David
Assignee: Nicola Crane / @thisisnic

PRs and other links:

Note: This issue was originally created as ARROW-17639. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Nicola Crane / @thisisnic:
It looks like this results from the call to as_arrow_table() within write_parquet(). A simpler reprex:

df2 <- tibble::tibble(x = list(NULL, 1, 2))
as_arrow_table(df2)
# Error: Cannot infer type from vector

Under the hood, it looks like specifically, it's the call to Table__from_dots() on a df containing a list column in which the first element is NULL.

@asfimport
Copy link
Author

Nicola Crane / @thisisnic:
This is indeed a bug, and thanks for reporting it [~dmedw01]. It's due to how we infer types of lists - will get a PR up to fix this soon. A temporary workaround would be to reorder the list so that the first element is never NULL, though I can see that this is not ideal.

@asfimport
Copy link
Author

Nicola Crane / @thisisnic:
Actually, here's a better workaround (thanks @paleolimbot for this suggestion);

df2 <- tibble::tibble(x = list(NULL, 1, 2))
# manually specify the schema of the list column
df_to_save <- as_arrow_table(df2, schema = schema(x = list_of(int32())))
arrow::write_parquet(df_to_save, tempfile(fileext = ".parquet"))

@asfimport
Copy link
Author

Nicola Crane / @thisisnic:
Issue resolved by pull request 14062
#14062

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants