Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-2074: [Python] Infer lists of dicts as struct arrays #1935

Closed
wants to merge 2 commits into from

Conversation

Projects
None yet
2 participants
@pitrou
Copy link
Contributor

commented Apr 23, 2018

Also refactor the type inference visitor and remove the superfluous separate SeqVisitor; improve inference visitor performance by 30%; and add a struct type inference benchmark.

ARROW-2074: [Python] Infer lists of dicts as struct arrays
Also refactor the type inference visitor, improve visitor performance by ~30%,
and a benchmark for struct type inference.

@pitrou pitrou force-pushed the pitrou:ARROW-2074-infer-dict-lists branch from bfc1f3b to 3baa2ea Apr 23, 2018

@pitrou

This comment has been minimized.

Copy link
Contributor Author

commented Apr 23, 2018

Benchmark numbers here:

  • before:
[100.00%] ··· Running convert_builtins.InferPyListToArray.time_infer                                                                                     ok
[100.00%] ···· 
               ============ =============
                   type                  
               ------------ -------------
                  int64       11.0±0.1ms 
                 float64     10.3±0.07ms 
                   bool      9.37±0.04ms 
                 decimal      297±0.9ms  
                  binary      14.9±0.2ms 
                  ascii       17.3±0.3ms 
                 unicode      29.7±0.8ms 
                int64 list    96.8±0.6ms 
               ============ =============
  • after:
[100.00%] ··· Running convert_builtins.InferPyListToArray.time_infer                                                                                     ok
[100.00%] ···· 
               ============ =============
                   type                  
               ------------ -------------
                  int64       7.41±0.2ms 
                 float64     6.68±0.04ms 
                   bool      5.75±0.01ms 
                 decimal      292±0.8ms  
                  binary      11.4±0.2ms 
                  ascii       14.1±0.3ms 
                 unicode      26.3±0.7ms 
                int64 list    74.8±0.6ms 
                  struct       70.7±4ms  
               ============ =============
@xhochy

xhochy approved these changes Apr 25, 2018

Copy link
Member

left a comment

+1, LGTM

@xhochy xhochy closed this in 3d7a5a6 Apr 25, 2018

@pitrou pitrou deleted the pitrou:ARROW-2074-infer-dict-lists branch Apr 25, 2018

pitrou added a commit that referenced this pull request May 1, 2018

ARROW-2499: [C++] Factor out Python iteration routines
Speeds up list to Arrow conversions by up to 15%. Also fixes a bug where creating a list array would not check that all input items are sequences.

Based on PR #1935.

Author: Antoine Pitrou <antoine@python.org>

Closes #1940 from pitrou/ARROW-2499-python-iteration-refactor and squashes the following commits:

ac31c6c <Antoine Pitrou> Fix Ndarray1DIndexer::is_strided (unused)
91c5af1 <Antoine Pitrou> Add TODO for performance issue
00cab9a <Antoine Pitrou> ARROW-2499:  Refactor Python iteration
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.