Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARQUET-7: [thrift] avoid thrift amender if all fields are optional. #7

Closed

Conversation

dvryaboy
Copy link
Contributor

No perf tests... I did some rudimentary testing which seems to bear out the idea but nothing scientific yet.

@dvryaboy
Copy link
Contributor Author

Hi @yan-qi , I created a patch that automatically disables the amender when the schema has no required fields. That's not as good as we could do, but it's a start, and should affect your schema in particular. Could you test using this pull request?

@yan-qi
Copy link

yan-qi commented Jul 7, 2014

Sorry for the late reply! :)

I did some experiment, and it proves working. Thanks again for the help!

When will this be released?

@dvryaboy
Copy link
Contributor Author

dvryaboy commented Jul 7, 2014

@isnotinvain @julienledem @tsdeng suggestions for improving this?

@yan-qi we need to get this into mergeable state -- as is it's probably a bit hacky and only applies to schemas that have no required fields, which is worse than we could potentially do; although perhaps a sufficiently common use case that we should just push that in for now.

Thanks for testing that this really solves your problem, I'm glad :-).

You could of course just build a jar off this branch, in the meantime, if you need it immediately.

case SET:
return somethingIsRequiredInSet((ThriftType.SetType) elementType);
default:
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be an exception? Is it valid to land here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct. since before it's saying "if (isRequired) return true;"
When it hits the default case, meaning it's a primitive type and also isRequired is false, so here it should just return false.
To make the intention of the code more clear, maybe @dvryaboy can change it to "return isRequired"

@tsdeng
Copy link
Contributor

tsdeng commented Aug 2, 2014

I reimplemented the idea here https://github.com/apache/incubator-parquet-mr/pull/28
We can close this PR if that one works

asfgit referenced this pull request Sep 3, 2014
…n the requested schema

https://issues.apache.org/jira/browse/PARQUET-61
This PR is trying to redo the https://github.com/apache/incubator-parquet-mr/pull/7

In this PR, it fixes the protocol event in a more precise condition:
Only when the requested schema missing some required fields that are present in the full schema

So even if there a projection, as long as the projection is not getting rid of the required field, the protocol events amender will not be called.

Could you take a look at this ? @dvryaboy @yan-qi

Author: Tianshuo Deng <tdeng@twitter.com>

Closes #28 from tsdeng/fix_protocol_when_required_field_missing and squashes the following commits:

ba778b9 [Tianshuo Deng] add continue for readability
d5639df [Tianshuo Deng] fix unused import
090e894 [Tianshuo Deng] format
13a609d [Tianshuo Deng] comment format
ef1fe58 [Tianshuo Deng] little refactor, remove the hasMissingRequiredFieldFromProjection method
7c2c158 [Tianshuo Deng] format
83a5655 [Tianshuo Deng] do ProtocolEvents fixing only when there is required fields missing in the requested schema
tongjiechen referenced this pull request in tongjiechen/incubator-parquet-mr Oct 8, 2014
…n the requested schema

https://issues.apache.org/jira/browse/PARQUET-61
This PR is trying to redo the https://github.com/apache/incubator-parquet-mr/pull/7

In this PR, it fixes the protocol event in a more precise condition:
Only when the requested schema missing some required fields that are present in the full schema

So even if there a projection, as long as the projection is not getting rid of the required field, the protocol events amender will not be called.

Could you take a look at this ? @dvryaboy @yan-qi

Author: Tianshuo Deng <tdeng@twitter.com>

Closes apache#28 from tsdeng/fix_protocol_when_required_field_missing and squashes the following commits:

ba778b9 [Tianshuo Deng] add continue for readability
d5639df [Tianshuo Deng] fix unused import
090e894 [Tianshuo Deng] format
13a609d [Tianshuo Deng] comment format
ef1fe58 [Tianshuo Deng] little refactor, remove the hasMissingRequiredFieldFromProjection method
7c2c158 [Tianshuo Deng] format
83a5655 [Tianshuo Deng] do ProtocolEvents fixing only when there is required fields missing in the requested schema
abayer referenced this pull request in cloudera/parquet-mr Oct 13, 2014
…n the requested schema

https://issues.apache.org/jira/browse/PARQUET-61
This PR is trying to redo the https://github.com/apache/incubator-parquet-mr/pull/7

In this PR, it fixes the protocol event in a more precise condition:
Only when the requested schema missing some required fields that are present in the full schema

So even if there a projection, as long as the projection is not getting rid of the required field, the protocol events amender will not be called.

Could you take a look at this ? @dvryaboy @yan-qi

Author: Tianshuo Deng <tdeng@twitter.com>

Closes #28 from tsdeng/fix_protocol_when_required_field_missing and squashes the following commits:

ba778b9 [Tianshuo Deng] add continue for readability
d5639df [Tianshuo Deng] fix unused import
090e894 [Tianshuo Deng] format
13a609d [Tianshuo Deng] comment format
ef1fe58 [Tianshuo Deng] little refactor, remove the hasMissingRequiredFieldFromProjection method
7c2c158 [Tianshuo Deng] format
83a5655 [Tianshuo Deng] do ProtocolEvents fixing only when there is required fields missing in the requested schema
@julienledem
Copy link
Member

this was merged as part of #28

chenjunjiedada pushed a commit to chenjunjiedada/parquet-mr that referenced this pull request Aug 3, 2019
Fix bug: Optional field in RowGroup is treated as always showing up
LantaoJin pushed a commit to LantaoJin/parquet-mr that referenced this pull request Jun 15, 2021
…ncement (apache#7)

* [CARMEL-1260][FOLLOWUP] cannot read non-encrypted columns without decryptProperties if the first column is encrypted

* [CARMEL-1858] fix ParquetFileReader for crypto after performance enhancement
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants