Skip to content

ARROW-7377: [C++][Dataset] Add ScanOptions::MaterializedFields#6018

Closed
fsaintjacques wants to merge 1 commit intoapache:masterfrom
fsaintjacques:ARROW-7377-simplify-parquet-projection
Closed

ARROW-7377: [C++][Dataset] Add ScanOptions::MaterializedFields#6018
fsaintjacques wants to merge 1 commit intoapache:masterfrom
fsaintjacques:ARROW-7377-simplify-parquet-projection

Conversation

@fsaintjacques
Copy link
Copy Markdown
Contributor

A small utility used to simplify parquet's implementation. Will be useful for other file formats.

@github-actions
Copy link
Copy Markdown

@fsaintjacques fsaintjacques requested a review from pitrou December 12, 2019 12:54
@fsaintjacques fsaintjacques force-pushed the ARROW-7377-simplify-parquet-projection branch 2 times, most recently from 7c56e03 to 712a1f4 Compare December 12, 2019 13:27
Copy link
Copy Markdown
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

As a matter of taste I'd prefer that a helper like this was a non member function (in scanner_internal.h, probably).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non blocking: Maybe eventually this should be an accessor for a (maybe lazily initialized) shared_ptr<unordered_set<string>>

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's not compatible with being a non-member though

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this method does. Can you add a docstring?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what's the point of having this return an unordered_set? Do you plan to implement unions or intersections of those containers?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's used as a lookup structure. The ordering and the number of repeats is not important (and should not be exposed). I can still convert it to std::vector.

Copy link
Copy Markdown
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/me selects Approve

@fsaintjacques fsaintjacques force-pushed the ARROW-7377-simplify-parquet-projection branch from 712a1f4 to ee979d1 Compare December 12, 2019 14:23
@bkietz
Copy link
Copy Markdown
Member

bkietz commented Dec 13, 2019

@fsaintjacques fsaintjacques force-pushed the ARROW-7377-simplify-parquet-projection branch from ee979d1 to 5424b46 Compare December 13, 2019 19:33
Copy link
Copy Markdown
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this method does. Can you add a docstring?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what's the point of having this return an unordered_set? Do you plan to implement unions or intersections of those containers?

A small utility used to simplify parquet's implementation. Will be
useful for other file formats.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants