-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Use Arc<str>
instead of Cow<&'a>
in the analyzer
#9824
Conversation
@alamb do you run sql planner benchmark vs baseline? |
With
|
@alamb @jayzhan211 I can see some improvements for the |
Is it possible to replace with |
I'll try but imho I dont expect much diff as those strings are mostly immutable and should not be much diff between |
I agree there is not a lot of practical difference -- but I think the for |
I think we should proceed 🚀 What I think we should do is get version 37 released so we can proceed. |
yes, its 1 more level of indirection, I'll try to go with |
Most of the time, you can make an |
oops |
with
|
Latest
@alamb @jayzhan211 which way you guys prefer? |
Arc<String>
instead of Cow<&'a>
in the analyzer
The technical difference between the two is:
So If you measure a perf difference, then this is probably because one (or more) of these points hold:
That said, since |
I prefer |
I am feeling DataFusion planning time is going to be very much improved in Version 38.0.0 🚀 -- I'll check this PR out later today. Very excited |
Modified to |
@@ -38,9 +33,9 @@ impl SchemaReference<'_> { | |||
} | |||
} | |||
|
|||
pub type OwnedSchemaReference = SchemaReference<'static>; | |||
pub type OwnedSchemaReference = SchemaReference; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gonna deprecate this type in following PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -246,24 +240,7 @@ impl<'a> TableReference<'a> { | |||
/// Converts directly into an [`OwnedTableReference`] by cloning | |||
/// the underlying data. | |||
pub fn to_owned_reference(&self) -> OwnedTableReference { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gonna deprecate this method in following PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arc<String>
instead of Cow<&'a>
in the analyzerArc<str>
instead of Cow<&'a>
in the analyzer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TLDR is I think this is awesome -- thank you @comphead . Our planning speed is going to be very sweet indeed.
Modified to Arc as discussed. The better performance benefit for Arc I think still explained by using String more often across the analyzer than str and the conversion is cheaper
Yeah, this is really strange. I think the issue is exactly as you describe, which is that using an Arc<String>
can simply take the string that was allocated by
Latest Arc consistently improves 5-7%, in very good cases even 14%
I am sorry I think I missed this comment. In would like to change my vote and say Arc<String>
though I realize you already changed the PR once so I think it would also be ok to merge this as is
I am also running the benchmarks as well and I am going to try and make a test PR that tries using Update: testing with #9916 ps. it would be sweet to see what this does for the TPC-DS benchmarks #9907 |
I'll merge it as is, and will create another one with Arc again, so we can see if there is more benefits |
Sounds good -- we can also potentially switch to using |
Update it turns out that I got significant performance improvements by using |
Yes, my very first implementation was to get the |
Which issue does this PR close?
Closes #9764
Rationale for this change
In the analyzer there are mostly immutable strings so we may want to use
Arc<str>
which performs better thanCow<&'a str>
for immutable stringsWhat changes are included in this PR?
Replace datatype
Are these changes tested?
yes
Are there any user-facing changes?