New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better support for Maps in Pandas #34729
Comments
wjones127
pushed a commit
that referenced
this issue
Apr 21, 2023
…34730) ### Rationale for this change Explained in issue #34729 ### What changes are included in this PR? - Add support for list of maps when converting Arrow to Pandas. There doesn't seem to be a strong reason to omit this. Previously it was a hard error as unsupported, due to a bool check. - Refactor Arrow Map -> Pandas to support two paths: (1) list of tuples, or (2) pydicts - Add another option in PandasOptions to enable (2), above - Bugfix in nested pydicts -> Arrow maps. - Unit tests ### Are these changes tested? Unit tests are added in `test_pandas.py` ### Are there any user-facing changes? - An additional option flag in PandasOptions - Enable list of maps to Pandas, which was previously disabled * Closes: #34729 Authored-by: Mike Lui <mikelui@meta.com> Signed-off-by: Will Jones <willjones127@gmail.com>
liujiacheng777
pushed a commit
to LoongArch-Python/arrow
that referenced
this issue
May 11, 2023
…ort (apache#34730) ### Rationale for this change Explained in issue apache#34729 ### What changes are included in this PR? - Add support for list of maps when converting Arrow to Pandas. There doesn't seem to be a strong reason to omit this. Previously it was a hard error as unsupported, due to a bool check. - Refactor Arrow Map -> Pandas to support two paths: (1) list of tuples, or (2) pydicts - Add another option in PandasOptions to enable (2), above - Bugfix in nested pydicts -> Arrow maps. - Unit tests ### Are these changes tested? Unit tests are added in `test_pandas.py` ### Are there any user-facing changes? - An additional option flag in PandasOptions - Enable list of maps to Pandas, which was previously disabled * Closes: apache#34729 Authored-by: Mike Lui <mikelui@meta.com> Signed-off-by: Will Jones <willjones127@gmail.com>
ArgusLi
pushed a commit
to Bit-Quill/arrow
that referenced
this issue
May 15, 2023
…ort (apache#34730) ### Rationale for this change Explained in issue apache#34729 ### What changes are included in this PR? - Add support for list of maps when converting Arrow to Pandas. There doesn't seem to be a strong reason to omit this. Previously it was a hard error as unsupported, due to a bool check. - Refactor Arrow Map -> Pandas to support two paths: (1) list of tuples, or (2) pydicts - Add another option in PandasOptions to enable (2), above - Bugfix in nested pydicts -> Arrow maps. - Unit tests ### Are these changes tested? Unit tests are added in `test_pandas.py` ### Are there any user-facing changes? - An additional option flag in PandasOptions - Enable list of maps to Pandas, which was previously disabled * Closes: apache#34729 Authored-by: Mike Lui <mikelui@meta.com> Signed-off-by: Will Jones <willjones127@gmail.com>
rtpsw
pushed a commit
to rtpsw/arrow
that referenced
this issue
May 16, 2023
…ort (apache#34730) ### Rationale for this change Explained in issue apache#34729 ### What changes are included in this PR? - Add support for list of maps when converting Arrow to Pandas. There doesn't seem to be a strong reason to omit this. Previously it was a hard error as unsupported, due to a bool check. - Refactor Arrow Map -> Pandas to support two paths: (1) list of tuples, or (2) pydicts - Add another option in PandasOptions to enable (2), above - Bugfix in nested pydicts -> Arrow maps. - Unit tests ### Are these changes tested? Unit tests are added in `test_pandas.py` ### Are there any user-facing changes? - An additional option flag in PandasOptions - Enable list of maps to Pandas, which was previously disabled * Closes: apache#34729 Authored-by: Mike Lui <mikelui@meta.com> Signed-off-by: Will Jones <willjones127@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the enhancement requested
Today (Py)Arrow -> Pandas treats:
Treating maps as a list of tuples has various pros (preserve ordering, allows duplicates, speed of iteration/creation). However, many times users simply want a ... map! (i.e. pydict).
Having to convert every element manually via
dict(map_elem)
is cumbersome, inefficient, and downright nasty when working with arbitrarily nested maps in Pandas.Today, Pyarrow already supports (pydicts -> arrow maps) when a schema is provided. So, it's a known use-case.
I propose a simple bool switch in PandasOptions for
table.to_pandas(...)
to generate pydicts for maps. This creates a symmetrical behavior for the (pydict -> arrow maps) flow, as well.As alluded to above, the cons are that:
I think the upsides of ergonomic flexibility outweigh these cons.
Related, I think there's a bug that precludes (pydicts -> arrow maps) when the type is nested (e.g. list of maps). That should be fixed as well to provide a more featureful map experience.
Component(s)
C++, Python
The text was updated successfully, but these errors were encountered: