Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Schema::fields and Schema::metadata pub (public) #2239

Merged
merged 2 commits into from
Jul 31, 2022

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jul 30, 2022

Which issue does this PR close?

Closes #1091

(I will file a follow on ticket to make the metadata type handling uniform between Schema and Field)

Rationale for this change

Whenever working with Schema (e.g. project, etc) I often find myself having to clone() fields and metadat which is:

  1. Inefficient (often we are copying when we already have an owned Schema)
  2. Akward (and easy to forget to copy metadata)

This most recently happened to me on this PR like apache/datafusion#2985

@jorgecarleitao and I discussed various options on #1091 and he suggested that that making members of Schema pub would be good as he astutely observed that

there is no invariant betweem them [members of Schema] , so there is nothing that users can break by accessing individual members.

What changes are included in this PR?

  1. Make Schema fields pub
  2. Add a test outside the crate to use that

Are there any user-facing changes?

Fields are pub 🎉

Alternates considered

I also considered making a function like the following to allow destructuring the Schema

impl Schema {
  // Consume self and return the consitutuent parts
  pub fn into_parts(self) -> (Vec<Field>, HashMap<String, String>)) {
    (self.fields, self.metadata
  }
}

And this would satisfy my usecase and I am happy to do this instead if reviewers prefer.

After I reread #1091 I decided I would propose the 'make things pub' first and see what I reaction I got.

cc @nevi-me @seddonm1 @tustvold @HaoYang670 @andygrove @viirya

@@ -33,11 +33,11 @@ use super::Field;
/// memory layout.
#[derive(Serialize, Deserialize, Debug, Clone, PartialEq, Eq)]
pub struct Schema {
pub(crate) fields: Vec<Field>,
pub fields: Vec<Field>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the entire change -- make these two fields pub

@codecov-commenter
Copy link

codecov-commenter commented Jul 30, 2022

Codecov Report

Merging #2239 (c86bed7) into master (f41fb1c) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #2239   +/-   ##
=======================================
  Coverage   82.30%   82.31%           
=======================================
  Files         241      242    +1     
  Lines       62437    62446    +9     
=======================================
+ Hits        51389    51400   +11     
+ Misses      11048    11046    -2     
Impacted Files Coverage Δ
arrow/src/datatypes/schema.rs 70.41% <ø> (ø)
arrow/tests/schema.rs 100.00% <100.00%> (ø)
...row/src/array/builder/string_dictionary_builder.rs 90.64% <0.00%> (-0.72%) ⬇️
parquet/src/arrow/schema.rs 96.76% <0.00%> (-0.18%) ⬇️
...rquet/src/arrow/record_reader/definition_levels.rs 89.02% <0.00%> (+1.68%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us.

/// outside of the arrow crate

#[test]
fn schema_destructure() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Add a test outside the crate to use that

Do we need to move outside the tests of other public members to make sure they can be used outside the arrow crate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to move outside the tests of other public members to make sure they can be used outside the arrow crate?

If we have such tests then yes I think they need to be outside the crate. In general, actually, moving some of the arrow testing into separate integration tests might be good for the compile/test/change cycle.

However, I don't think we should go overboard and try to write tests for all pub fields -- I felt this was in particular special (mostly because I don't want someone in the future to remove the pub thinking it won't affect anyone)

@nevi-me nevi-me merged commit 3032a52 into apache:master Jul 31, 2022
@ursabot
Copy link

ursabot commented Jul 31, 2022

Benchmark runs are scheduled for baseline = 99ad915 and contender = 3032a52. 3032a52 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@alamb
Copy link
Contributor Author

alamb commented Aug 1, 2022

🎉

@alamb alamb deleted the alamb/schema_destructure branch August 1, 2022 12:46
@alamb
Copy link
Contributor Author

alamb commented Aug 1, 2022

Filed #2262 to track 'one metadata type'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ergonomic field and schema creation with Metadata
8 participants