Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-11035: [Rust] Improved performance of casting to utf8 #9014

Closed
wants to merge 5 commits into from
Closed

ARROW-11035: [Rust] Improved performance of casting to utf8 #9014

wants to merge 5 commits into from

Conversation

jorgecarleitao
Copy link
Member

@jorgecarleitao jorgecarleitao commented Dec 26, 2020

cast i64 to string 512  time:   [92.618 us 92.839 us 93.097 us]                                   
                        change: [-14.915% -14.287% -13.743%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)

@codecov-io
Copy link

codecov-io commented Dec 26, 2020

Codecov Report

Merging #9014 (52e1ecb) into master (51672b2) will decrease coverage by 0.00%.
The diff coverage is 91.30%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9014      +/-   ##
==========================================
- Coverage   82.61%   82.61%   -0.01%     
==========================================
  Files         202      202              
  Lines       50048    50027      -21     
==========================================
- Hits        41347    41328      -19     
+ Misses       8701     8699       -2     
Impacted Files Coverage Δ
rust/arrow/src/json/reader.rs 81.39% <71.42%> (-0.11%) ⬇️
rust/arrow/src/compute/kernels/cast.rs 96.99% <100.00%> (+0.16%) ⬆️
rust/arrow/src/csv/reader.rs 94.48% <100.00%> (+0.15%) ⬆️
rust/parquet/src/encodings/encoding.rs 95.24% <0.00%> (-0.20%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 51672b2...52e1ecb. Read the comment docs.

@github-actions
Copy link


Ok(b.finish())
from.iter()
.map(|maybe_value| maybe_value.map(|value| value.to_string()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, we probably can use lexical like here later #9010

Ok(Arc::new(
array
.iter()
.map(|value| value.map(|value| if value { "1" } else { "0" }))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the speed up come here from not using the string builder or is using this iterator also faster?
It looks at least better, so if no difference this is better 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I suspect the builder, because the iterator does the same thing as before atm (i.e. same bound checks).

The builders IMO are inefficient atm. Since IMO they are less idiomatic, I do not see any issue in replacing them whenever we can ^_^

@github-actions github-actions bot added the needs-rebase A PR that needs to be rebased by the author label Dec 27, 2020
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks cleaner and it is faster. Double win. Nice work @jorgecarleitao

@alamb
Copy link
Contributor

alamb commented Dec 31, 2020

The full set of Rust CI tests did not run on this PR :(

Can you please rebase this PR against apache/master to pick up the changes in #9056 so that they do?

I apologize for the inconvenience.

@jorgecarleitao jorgecarleitao removed the needs-rebase A PR that needs to be rebased by the author label Jan 1, 2021
@alamb alamb closed this in fcc2227 Jan 1, 2021
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
```
cast i64 to string 512  time:   [92.618 us 92.839 us 93.097 us]
                        change: [-14.915% -14.287% -13.743%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
```

Closes apache#9014 from jorgecarleitao/speed_cast

Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants