Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARQUET-1523: [C++] Vectorize comparator interface #3752

Closed
wants to merge 3 commits into from

Conversation

majetideepak
Copy link
Contributor

No description provided.

@wesm
Copy link
Member

wesm commented Feb 26, 2019

This isn't what I was thinking when I wrote PARQUET-1523. I thought the proposal was to remove the scalar virtual calls altogether -- this still has virtual calls on the hot path AFAICT. I would be willing to help with refactoring this further

@majetideepak
Copy link
Contributor Author

Ok. I will make another pass.

@wesm
Copy link
Member

wesm commented Apr 24, 2019

@majetideepak do you have some time to look at this? I can also take a pass on it possibly end of this week or next week

@majetideepak
Copy link
Contributor Author

@wesm I got side-tracked on fixing the build scripts for Vertica's build environment. I will likely finish that this week. I can only work on this next week. Feel free to take a pass on this if you get time before that.

@majetideepak
Copy link
Contributor Author

@wesm I will work on this today and get back to you.

@wesm
Copy link
Member

wesm commented Apr 26, 2019

Thanks. I'll take a look at the build system improvements

@majetideepak
Copy link
Contributor Author

@wesm I spent some more time with this but couldn't figure out a solution without virtual functions.
The challenge is that the comparator class depends on both the Physical Type and the SortOrder (SIGNED, UNSIGNED). The TypedRowGroupStatistics class that requires this comparator is only templated on the Physical Type. How can we resolve this at compile-time?

@wesm
Copy link
Member

wesm commented Apr 29, 2019

Let me have a look at this with fresh eyes and see what I can come up with.

@wesm
Copy link
Member

wesm commented Apr 30, 2019

Making some progress on this. I will try to get a patch up by mid-week if not sooner

wesm added a commit that referenced this pull request May 2, 2019
…lls on inner loop. Refactor Statistics to not require PARQUET_EXTERN_TEMPLATE

This patch supersedes #3752

I took the liberty of consolidating the comparator code with the statistics code since the two things are effectively inseparable. I also renamed the statistics classes for clarity, since "Statistics" is clearer than "RowGroupStatistics" -- the "scope" of the statistics need not be limited to a row group.

I apologize for the size of the diff; it is largely the result of moving code around and shuffling code from header files into parquet/statistics.cc

Author: Wes McKinney <wesm+git@apache.org>

Closes #4233 from wesm/PARQUET-1523-vectorize-comparator and squashes the following commits:

a1f2f38 <Wes McKinney> Code review feedback, fix doxygen warning
5493b59 <Wes McKinney> Add basic doxygen comments, fix clang warnings
e6efdc0 <Wes McKinney> Restore UNKNOWN default sort order for INT96
3a58b1f <Wes McKinney> Restore PARQUET_TEMPLATE_CLASS_EXPORT to TypedScanner
ab5d01f <Wes McKinney> Make benchmark data for int64 write path larger and more diverse
335607d <Wes McKinney> Fix parquet-column_writer-test
6273789 <Wes McKinney> Statistics tests passing again
d55bf7d <Wes McKinney> Compiling again, not quite there
9201261 <Wes McKinney> Refactoring progress
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants