Skip to content

ARROW-898: [C++/Python] Use shared_ptr to avoid copying KeyValueMetadata, add to Field type also#605

Closed
wesm wants to merge 6 commits intoapache:masterfrom
wesm:ARROW-898
Closed

ARROW-898: [C++/Python] Use shared_ptr to avoid copying KeyValueMetadata, add to Field type also#605
wesm wants to merge 6 commits intoapache:masterfrom
wesm:ARROW-898

Conversation

@wesm
Copy link
Copy Markdown
Member

@wesm wesm commented Apr 26, 2017

This also adds support for adding and removing schema-level metadata to the Python Schema wrapper object. Need to do the same for Field but putting this up for @cpcloud to review since he's working on using this in parquet-cpp

wesm added 4 commits April 26, 2017 13:48
…d / not copied

Change-Id: Ia70e00c9709483899993024ae5a33405114fc84b
Change-Id: I860102ffb34934240ff482d0182d68f5db4217b7
Change-Id: I42532f474b858c401e395a3fc0f476b41d9951d6
Change-Id: Ic27aec0fc84f70c55fe935f4498206a26a951570
flatbuf::CreateKeyValue(fbb, fbb.CreateString(key), fbb.CreateString(value)));
}
*out = flatbuf::CreateSchema(
fbb, endianness(), fb_offsets, fbb.CreateVector(key_value_offsets));
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no metadata (and not a length-0 object), we will write null to flatbuffers

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes more sense than what I did originally.

Change-Id: Ief9409543b7c7f0c37c473ec0a145a5a6b733430
@wesm
Copy link
Copy Markdown
Member Author

wesm commented Apr 26, 2017

This is ready to go. I created https://issues.apache.org/jira/browse/ARROW-906 as a follow up to serialize the field-level metadata to Flatbuffers.

auto f0 = field("f0", int32());
auto f1 = field("f0", int32(), true, metadata);
std::shared_ptr<Field> f2;
ASSERT_OK(f1->RemoveMetadata(&f2));
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this and Schema::RemoveMetadata can only really fail for esoteric reasons (OOM in std::vector copy operations), we should maybe make them return the result rather than Status

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Change-Id: I80f9e8219920598e12edad763062e6f8aa16709c
const auto& value = custom_metadata_.value(i);
key_value_offsets.push_back(
flatbuf::CreateKeyValue(fbb, fbb.CreateString(key), fbb.CreateString(value)));
const KeyValueMetadata* metadata = schema.metadata().get();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because there's one fewer function call per iteration in the loop just below?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get() is not strictly needed, I think it was mostly making the type on the LHS easier to write down =)

@wesm
Copy link
Copy Markdown
Member Author

wesm commented Apr 27, 2017

+1

@asfgit asfgit closed this in e876abb Apr 27, 2017
@wesm wesm deleted the ARROW-898 branch April 27, 2017 13:29
@cpcloud
Copy link
Copy Markdown
Contributor

cpcloud commented Apr 27, 2017

Sweet, thanks for doing this

jeffknupp pushed a commit to jeffknupp/arrow that referenced this pull request Jun 3, 2017
…ata, add to Field type also

This also adds support for adding and removing schema-level metadata to the Python Schema wrapper object. Need to do the same for Field but putting this up for @cpcloud to review since he's working on using this in parquet-cpp

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#605 from wesm/ARROW-898 and squashes the following commits:

03873f7 [Wes McKinney] RemoveMetadata not return Status
c621c2c [Wes McKinney] Add metadata methods to Field, some code cleaning
581b9fa [Wes McKinney] Add unit tests for passing metadata to Field constructor
51fae29 [Wes McKinney] Add metadata to Field
2ce4003 [Wes McKinney] Test sharing metadata
48aa3ca [Wes McKinney] Use shared_ptr<const T> for KeyValueMetadata so metadata can be shared / not copied
pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
Fixes apacheGH-605.

If `dev/release/release.sh` shows what should we do at
repository.apache.org, it's helpful.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants