Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-16431: [C++][Python] Improve AppendRowGroups error when schemas differ #14029

Merged
merged 3 commits into from
Sep 28, 2022
Merged

ARROW-16431: [C++][Python] Improve AppendRowGroups error when schemas differ #14029

merged 3 commits into from
Sep 28, 2022

Conversation

milesgranger
Copy link
Contributor

@milesgranger milesgranger commented Sep 2, 2022

Fix ARROW-16431

Feel free to opine on specific error messages or the implementation as a whole. 👌

Examples

# meta1 and meta2 differ in column types
meta1.append_row_groups(meta2)
*** RuntimeError: AppendRowGroups requires equal schemas.
The two columns with index 0 differ.
column descriptor = {
  name: col1,
  path: col1,
  physical_type: INT64,
  converted_type: NONE,
  logical_type: None,
  max_definition_level: 1,
  max_repetition_level: 0,
}
column descriptor = {
  name: col2,
  path: col2,
  physical_type: INT64,
  converted_type: NONE,
  logical_type: None,
  max_definition_level: 1,
  max_repetition_level: 0,
}


# meta1 and meta2 differ in number of columns
meta1.append_row_groups(meta2)
*** RuntimeError: This schema has 2 columns, other has 1

@github-actions
Copy link

github-actions bot commented Sep 2, 2022

cpp/src/parquet/schema.cc Outdated Show resolved Hide resolved
cpp/src/parquet/metadata.cc Show resolved Hide resolved
cpp/src/parquet/schema.cc Outdated Show resolved Hide resolved
@jorisvandenbossche
Copy link
Member

Can you rebase once more to see if that makes the failures go away?

cpp/src/parquet/schema.h Outdated Show resolved Hide resolved
python/pyarrow/tests/parquet/test_metadata.py Outdated Show resolved Hide resolved
python/pyarrow/tests/parquet/test_metadata.py Outdated Show resolved Hide resolved
python/pyarrow/tests/parquet/test_metadata.py Outdated Show resolved Hide resolved
- Refactor diff output to use ostream*
- Update diff message
- Fix import ordering
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Thanks for this @milesgranger !

@pitrou pitrou merged commit 6cccec5 into apache:master Sep 28, 2022
@milesgranger milesgranger deleted the ARROW-16431_better-err-msg-differing-schemas branch September 28, 2022 12:32
@ursabot
Copy link

ursabot commented Sep 28, 2022

Benchmark runs are scheduled for baseline = 35bfeb4 and contender = 6cccec5. 6cccec5 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.2% ⬆️0.0%] test-mac-arm
[Failed ⬇️1.1% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.07% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 6cccec5f ec2-t3-xlarge-us-east-2
[Failed] 6cccec5f test-mac-arm
[Failed] 6cccec5f ursa-i9-9960x
[Finished] 6cccec5f ursa-thinkcentre-m75q
[Finished] 35bfeb41 ec2-t3-xlarge-us-east-2
[Failed] 35bfeb41 test-mac-arm
[Failed] 35bfeb41 ursa-i9-9960x
[Finished] 35bfeb41 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@ursabot
Copy link

ursabot commented Sep 28, 2022

['Python', 'R'] benchmarks have high level of regressions.
ursa-i9-9960x

fatemehp pushed a commit to fatemehp/arrow that referenced this pull request Oct 17, 2022
… differ (apache#14029)

Fix [ARROW-16431](https://issues.apache.org/jira/browse/ARROW-16431)

Feel free to opine on specific error messages or the implementation as a whole. 👌 

Examples

```python
# meta1 and meta2 differ in column types
meta1.append_row_groups(meta2)
*** RuntimeError: AppendRowGroups requires equal schemas.
The two columns with index 0 differ.
column descriptor = {
  name: col1,
  path: col1,
  physical_type: INT64,
  converted_type: NONE,
  logical_type: None,
  max_definition_level: 1,
  max_repetition_level: 0,
}
column descriptor = {
  name: col2,
  path: col2,
  physical_type: INT64,
  converted_type: NONE,
  logical_type: None,
  max_definition_level: 1,
  max_repetition_level: 0,
}


# meta1 and meta2 differ in number of columns
meta1.append_row_groups(meta2)
*** RuntimeError: This schema has 2 columns, other has 1
```

Authored-by: Miles Granger <miles59923@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants