Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Parquet] Thrift: Generate movable_types #37379

Closed
mapleFU opened this issue Aug 25, 2023 · 5 comments · Fixed by #37461
Closed

[C++][Parquet] Thrift: Generate movable_types #37379

mapleFU opened this issue Aug 25, 2023 · 5 comments · Fixed by #37461

Comments

@mapleFU
Copy link
Member

mapleFU commented Aug 25, 2023

Describe the enhancement requested

By default, we're using

thrift --gen cpp -out src/generated src/parquet/parquet.thrift

When generating thrift. When I gothrough the code in PageIndex, I found ColumnIndex is copied here:

std::unique_ptr<ColumnIndex> ColumnIndex::Make(const ColumnDescriptor& descr,
                                               const void* serialized_index,
                                               uint32_t index_len,
                                               const ReaderProperties& properties) {
  format::ColumnIndex column_index;
  ThriftDeserializer deserializer(properties);
  deserializer.DeserializeMessage(reinterpret_cast<const uint8_t*>(serialized_index),
                                  &index_len, &column_index);
  switch (descr.physical_type()) {
    // ...
    case Type::DOUBLE:
      return std::make_unique<TypedColumnIndexImpl<DoubleType>>(descr,  column_index);
    case Type::BYTE_ARRAY:
      return std::make_unique<TypedColumnIndexImpl<ByteArrayType>>(descr,  column_index);
    case Type::FIXED_LEN_BYTE_ARRAY:
      return std::make_unique<TypedColumnIndexImpl<FLBAType>>(descr,  column_index);
    case Type::UNDEFINED:
      return nullptr;
  }
  ::arrow::Unreachable("Cannot make ColumnIndex of an unknown type");
  return nullptr;
}

And our default generated data would not have "move ctor". After changed to movable, the performance has 30% optimizations.

Component(s)

C++, Parquet

@mapleFU
Copy link
Member Author

mapleFU commented Aug 25, 2023

A POC can be seen here main...mapleFU:arrow:parquet/thrift-gen-movable

@wgtmac @emkornfield @pitrou would you think this is ok?

@wgtmac
Copy link
Member

wgtmac commented Aug 25, 2023

Is there any dependency on the tool chain? If not, we should probably enable it by default.

@mapleFU
Copy link
Member Author

mapleFU commented Aug 25, 2023

I think it's introduced in apache/thrift@3c5a788 . It's introduced so many years ago in 0.9.3

@pitrou
Copy link
Member

pitrou commented Aug 30, 2023

+1

@mapleFU
Copy link
Member Author

mapleFU commented Aug 30, 2023

I've draft a patch here: #37461

pitrou pushed a commit that referenced this issue Aug 30, 2023
### Rationale for this change

Our generated Thrift bindings do not define move constructors, implying that copies are made when passing values around.

### What changes are included in this PR?

Enable the `moveable_types` option when generating Parquet Thrift.   It's introduced in Thrift 0.9.3 ( See apache/thrift@3c5a788 )

Also, exploit move construction and assignment to improve `TypedColumnIndex` construction performance.

### Are these changes tested?

By existing tests.

### Are there any user-facing changes?

1. Binary might grow larger
2. Some overhead might be reduced

* Closes: #37379

Authored-by: mwish <maplewish117@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@pitrou pitrou added this to the 14.0.0 milestone Aug 30, 2023
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…#37461)

### Rationale for this change

Our generated Thrift bindings do not define move constructors, implying that copies are made when passing values around.

### What changes are included in this PR?

Enable the `moveable_types` option when generating Parquet Thrift.   It's introduced in Thrift 0.9.3 ( See apache/thrift@3c5a788 )

Also, exploit move construction and assignment to improve `TypedColumnIndex` construction performance.

### Are these changes tested?

By existing tests.

### Are there any user-facing changes?

1. Binary might grow larger
2. Some overhead might be reduced

* Closes: apache#37379

Authored-by: mwish <maplewish117@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…#37461)

### Rationale for this change

Our generated Thrift bindings do not define move constructors, implying that copies are made when passing values around.

### What changes are included in this PR?

Enable the `moveable_types` option when generating Parquet Thrift.   It's introduced in Thrift 0.9.3 ( See apache/thrift@3c5a788 )

Also, exploit move construction and assignment to improve `TypedColumnIndex` construction performance.

### Are these changes tested?

By existing tests.

### Are there any user-facing changes?

1. Binary might grow larger
2. Some overhead might be reduced

* Closes: apache#37379

Authored-by: mwish <maplewish117@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants