Skip to content

ARROW-6038: [C++] Faster type equality#4983

Closed
pitrou wants to merge 1 commit intoapache:masterfrom
pitrou:ARROW-6038-faster-type-equality
Closed

ARROW-6038: [C++] Faster type equality#4983
pitrou wants to merge 1 commit intoapache:masterfrom
pitrou:ARROW-6038-faster-type-equality

Conversation

@pitrou
Copy link
Member

@pitrou pitrou commented Aug 1, 2019

When checking for type equality, compute and cache a fingerprint of the type so as to avoid costly nested type walking and multiple comparisons.

Before:

----------------------------------------------------------------
Benchmark                         Time           CPU Iterations
----------------------------------------------------------------
TypeEqualsSimple                 13 ns         13 ns   55242976   150.558M items/s
TypeEqualsComplex               430 ns        430 ns    1637275   4.43634M items/s
TypeEqualsWithMetadata          595 ns        595 ns    1199216   3.20778M items/s
SchemaEquals                   1465 ns       1465 ns     479512   1.30226M items/s
SchemaEqualsWithMetadata        922 ns        922 ns     763752    2.0683M items/s

After:

----------------------------------------------------------------
Benchmark                         Time           CPU Iterations
----------------------------------------------------------------
TypeEqualsSimple                 11 ns         11 ns   65531752   178.723M items/s
TypeEqualsComplex                20 ns         20 ns   33939830   95.1497M items/s
TypeEqualsWithMetadata           31 ns         31 ns   22979555   62.4052M items/s
SchemaEquals                     40 ns         40 ns   17786532   48.1683M items/s
SchemaEqualsWithMetadata         46 ns         46 ns   15173158   41.3242M items/s

@pitrou
Copy link
Member Author

pitrou commented Aug 1, 2019

@emkornfield @fsaintjacques @wesm Looking for design feedback on this.

Faster type equality may allow us to enable more runtime checks and make Arrow more user-friendly (see e.g. ARROW-6038).

@pitrou
Copy link
Member Author

pitrou commented Aug 1, 2019

(note a similar optimization can be added to Schema::Equals)

Copy link
Contributor

@fsaintjacques fsaintjacques left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to keep type.h as small as possible, maybe the ComputeFingerprint could be a seperate VisitorType class?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"I" (capital i) is not something I've seen often, the standard is often u for unsigned and i for signed. I also wonder if we should use bit-width instead of byte. This would allow you to use i1 as bool.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll consider it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we could simply use the type id... I think it's gonna be a long time before it goes past 128 :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete the boolean flag, make fingerprint a shared_ptr and use atomic_compare_exchange (upon failure, let the first writer win) in the write path. All read path should use atomic_load.

I'd use unique_ptr, but there's no atomic ops on them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the plan indeed. This PR is a draft.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may use a raw pointer actually, because atomic ops on shared_ptr make copies of the latter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possibility is to implement our own simplified atomic unique_ptr. What do you think?

Copy link
Contributor

@fsaintjacques fsaintjacques Aug 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another solution, why not pay this cost in the type constructor? Types are (should) rarely be created in a frequency where this optimization (lazy evaluation) matter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, I'm a bit wary of piling up costs for optional functionality in the constructor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also wary of adding extra work in constructors if it isn't always needed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str, bin are prefixes that are self-explanatory (su/sb are not), albeit one character longer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ss << nullable_ ? 'n' : 'N'

@kszucs
Copy link
Member

kszucs commented Aug 2, 2019

@fsaintjacques try running ursabot benchmark?

@fsaintjacques
Copy link
Contributor

@kszucs why?

@kszucs
Copy link
Member

kszucs commented Aug 4, 2019

Simply just to start using it regularly :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do stringstreams have some sort of small buffer allocated ahead of time (ala small string optimizations in standard strings?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea. I suspect it depends on the implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like it would make the check expensive again if clients made heavy use of metadata. I wonder if it makes sense to maybe make a first class TypeFingerprint class that can do the comparison incrementally?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fingerprint is cached on the field, so I'm not sure what you mean here.

@pitrou
Copy link
Member Author

pitrou commented Aug 5, 2019

Do we want to keep type.h as small as possible, maybe the ComputeFingerprint could be a seperate VisitorType class?

I have no strong preference, but why not.

@pitrou pitrou force-pushed the ARROW-6038-faster-type-equality branch 2 times, most recently from 4a7a855 to 83a2286 Compare August 5, 2019 18:01
@pitrou
Copy link
Member Author

pitrou commented Aug 5, 2019

Ok, I've updated the PR to handle check_metadata correctly and also handle schema equality. Will update benchmarks in the PR description.

@pitrou pitrou force-pushed the ARROW-6038-faster-type-equality branch 2 times, most recently from 01c6805 to 782cb85 Compare August 6, 2019 11:13
@pitrou pitrou changed the title [WIP] [C++] Faster type equality ARROW-6038: [C++] Faster type equality Aug 6, 2019
@pitrou pitrou changed the title ARROW-6038: [C++] Faster type equality [WIP] ARROW-6038: [C++] Faster type equality Aug 6, 2019
@pitrou
Copy link
Member Author

pitrou commented Aug 6, 2019

Note: still need to handle UnionType and ExtensionType, and I think that's all.

Edit: done. Fingerprint can be implemented by each ExtensionType subclass individually.

@pitrou pitrou force-pushed the ARROW-6038-faster-type-equality branch from 782cb85 to 8245882 Compare August 6, 2019 13:49
@pitrou pitrou changed the title [WIP] ARROW-6038: [C++] Faster type equality ARROW-6038: [C++] Faster type equality Aug 6, 2019
@pitrou
Copy link
Member Author

pitrou commented Aug 6, 2019

The main downside here is that this adds a bit of complexity and fragility (ComputeFingerprint must be carefully implemented to match Equals). The upside of course is much faster type and (especially) schema comparison. Comparing schemas for hundreds of columns should actually have reasonable performance (under the hood it's probably a memcmp call).

@pitrou pitrou force-pushed the ARROW-6038-faster-type-equality branch 2 times, most recently from 6f21730 to b58f8d8 Compare August 7, 2019 08:44
When checking for type equality, compute and cache a fingerprint of the type
so as to avoid costly nested type walking and multiple comparisons.

Before:

----------------------------------------------------------------
Benchmark                         Time           CPU Iterations
----------------------------------------------------------------
TypeEqualsSimple                 13 ns         13 ns   55242976   150.558M items/s
TypeEqualsComplex               430 ns        430 ns    1637275   4.43634M items/s
TypeEqualsWithMetadata          595 ns        595 ns    1199216   3.20778M items/s
SchemaEquals                   1465 ns       1465 ns     479512   1.30226M items/s
SchemaEqualsWithMetadata        922 ns        922 ns     763752    2.0683M items/s

After:

----------------------------------------------------------------
Benchmark                         Time           CPU Iterations
----------------------------------------------------------------
TypeEqualsSimple                 11 ns         11 ns   65531752   178.723M items/s
TypeEqualsComplex                20 ns         20 ns   33939830   95.1497M items/s
TypeEqualsWithMetadata           31 ns         31 ns   22979555   62.4052M items/s
SchemaEquals                     40 ns         40 ns   17786532   48.1683M items/s
SchemaEqualsWithMetadata         46 ns         46 ns   15173158   41.3242M items/s
@pitrou pitrou force-pushed the ARROW-6038-faster-type-equality branch from b58f8d8 to 2fdaf4a Compare August 7, 2019 13:12
Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Not able to scrutinize the details closely but this is a big performance improvement, so thank you for taking great care with it =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants