Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-7843 Recursive avro schemas fail to write with RecordWriter #4550

Closed
wants to merge 2 commits into from

Conversation

adenes
Copy link
Contributor

@adenes adenes commented Sep 24, 2020

Thank you for submitting a contribution to Apache NiFi.

Please provide a short description of the PR here:

Description of PR

Fixes StackOverflowError when trying to write avro data with recursive schema using RecordWriter.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced
    in the commit message?

  • Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

  • Has your PR been rebased against the latest commit within the target branch (typically main)?

  • Is your initial contribution a single, squashed commit? Additional commits in response to PR reviewer feedback should be made on this branch and pushed to allow change tracking. Do not squash or use --force when pushing to allow for clean monitoring of changes.

For code changes:

  • Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
  • Have you written or updated unit tests to verify your changes?
  • Have you verified that the full build is successful on JDK 8?
  • Have you verified that the full build is successful on JDK 11?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
  • If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
  • If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check GitHub Actions CI for build issues and submit an update to your PR as soon as possible.

Copy link
Contributor

@simonbence simonbence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

getSchemaName().isPresent() && getSchemaName().equals(other.getSchemaName())) {
return true;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would worth to extend TestSimpleRecordSchema#testHashCodeAndEqualsWithSelfReferencingSchema with this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @simonbence for the comment, I added a new test case to the TestSimpleRecordSchema

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking care of it!

@turcsanyip
Copy link
Contributor

Testing with different RecordSetWriters, configurations and schemas.
It seems the ParquetRecordSetWriter still has the issue.

@tpalfy
Copy link
Contributor

tpalfy commented Sep 24, 2020

It seems the ParquetRecordSetWriter still has the issue.

It seems Parquet doesn't support recursive schema.

They have an open Jira and even a pull request, but the last meaningful updates on those are 2 years old:
https://issues.apache.org/jira/browse/PARQUET-129
apache/parquet-java#445

@adenes
Copy link
Contributor Author

adenes commented Sep 24, 2020

Thanks for the reply @turcsanyip, it seems it's in the Parquet library, as @tpalfy pointed out and the stacktrace supports this:

2020-09-24 21:04:05,504 ERROR [Timer-Driven Process Thread-1] o.a.n.processors.standard.ConvertRecord ConvertRecord[id
=bee7ac17-0174-1000-6ace-013d1b0a0b80] ConvertRecord[id=bee7ac17-0174-1000-6ace-013d1b0a0b80] failed to process sessio
n due to java.lang.StackOverflowError; Processor Administratively Yielded for 1 sec: java.lang.StackOverflowError
java.lang.StackOverflowError: null
        at org.apache.parquet.schema.Types$BasePrimitiveBuilder.<init>(Types.java:329)
        at org.apache.parquet.schema.Types$BasePrimitiveBuilder.<init>(Types.java:309)
        at org.apache.parquet.schema.Types$PrimitiveBuilder.<init>(Types.java:518)
        at org.apache.parquet.schema.Types$PrimitiveBuilder.<init>(Types.java:511)
        at org.apache.parquet.schema.Types.primitive(Types.java:1314)
        at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:150)
        at org.apache.parquet.avro.AvroSchemaConverter.convertUnion(AvroSchemaConverter.java:214)
        at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:171)
        at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:130)
        at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:227)
        at org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:124)
        at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:152)
        at org.apache.parquet.avro.AvroSchemaConverter.convertUnion(AvroSchemaConverter.java:214)
        at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:171)
        ...

@turcsanyip
Copy link
Contributor

@tpalfy, @adenes Thanks for the clarification regarding Parquet.

I tested the following cases and it works properly:

  • Avro and Json readers with explicit schema and schema inference
  • Avro, Json, XML, CSV and FreeFormText writers with explicit schema and schema inheritance
  • schemas: simple recursion, recursion in array, indirect recursion
  • processors: ConvertRecord, UpdateRecord

Found only one use case where still getting StackOverflow: both the reader and writer schemas defined explicitly, the schemas use the same field names but have different schema names.
As it is quite an unusual setup, I believe we can ignore it.

+1 LGTM

Add new test case to TestSimpleRecordSchema to test the scenario
when schema name and schema namespace match.
Copy link
Contributor

@tpalfy tpalfy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm+1

@turcsanyip
Copy link
Contributor

@adenes Thanks for the fix.
@simonbence, @tpalfy Thanks for the reviews.

Merging to main.

@asfgit asfgit closed this in f73a019 Sep 28, 2020
driesva pushed a commit to driesva/nifi that referenced this pull request Mar 19, 2021
NIFI-7843 Recursive avro schemas fail to write with RecordWriter
Add new test case to TestSimpleRecordSchema to test the scenario
when schema name and schema namespace match.

This closes apache#4550.

Signed-off-by: Peter Turcsanyi <turcsanyi@apache.org>
adenes added a commit to adenes/nifi that referenced this pull request Jul 5, 2021
NIFI-7843 Recursive avro schemas fail to write with RecordWriter
Add new test case to TestSimpleRecordSchema to test the scenario
when schema name and schema namespace match.

This closes apache#4550.

Signed-off-by: Peter Turcsanyi <turcsanyi@apache.org>
krisztina-zsihovszki pushed a commit to krisztina-zsihovszki/nifi that referenced this pull request Jun 28, 2022
NIFI-7843 Recursive avro schemas fail to write with RecordWriter
Add new test case to TestSimpleRecordSchema to test the scenario
when schema name and schema namespace match.

This closes apache#4550.

Signed-off-by: Peter Turcsanyi <turcsanyi@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants