Skip to content

NIFI-751 Add Processor To Convert Avro Formats#70

Closed
jackowaya wants to merge 1 commit intoapache:developfrom
jackowaya:avroconvert
Closed

NIFI-751 Add Processor To Convert Avro Formats#70
jackowaya wants to merge 1 commit intoapache:developfrom
jackowaya:avroconvert

Conversation

@jackowaya
Copy link
Copy Markdown

Implemented a new NiFi processor that allows avro records to be converted from one Avro schema
to another. This supports..

  • Flattening records using . notation like "parent.id"
  • Simple type conversions to String or base primitive types.

@joewitt
Copy link
Copy Markdown
Contributor

joewitt commented Jul 8, 2015

Alan,

Thanks for contributing! We'll work with you to get this promptly merged. Here are some findings of an initial review:

  • Please run the build with 'mvn clean install -Pcontrib-check'. You'll find there many formatting issues. It appears that many of the lines have had extraneous newlines and tabs added to them. Please take a look and tweak until the contrib check runs cleanly and that you still think the code looks good.
  • There do not appear to be any unit tests for the processor itself. I do see a couple unit tests for the converter class/logic which is good but it is best to also have a test or two, particularly for something that seems as easily tested as this one is. You can find lots of examples throughout the codebase including in this kite bundle.
  • The conversion routines for Long, Double, Float, etc.. you really should consider adding regex checks before calling those methods. Since the Java methods use exception handling for flow control the performance penalty can be extremely severe compared to simply doing a regex check beforehand. In the event that the data is clean you'll be good to go but when it isn't the impact to the system as a whole can be dramatic. Given the nature of this sort of processor that is probably something to tackle right away.
  • This processor is probably a great candidate to use the 'Advanced Documentation' feature. Users will need this to understand the schema/syntax of the conversion configuration and examples would go a long way for that. You can see more about this here http://nifi.incubator.apache.org/docs/nifi-docs/html/developer-guide.html#advanced-documentation and there are some examples in the existing standard processors to consider.
  • There are a couple of copy/paste errors in the processor from the CSV/Avro converter. Look for these "Failed to convert {}/{} records from CSV to Avro" and "Failed to convert {}/{} records from CSV to Avro"

I realize this looks like a lot of stuff but it should be pretty easy to address and is a good first step. If you have any questions on it just let us know.

Thanks
Joe

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this check to see whether the outputSchema allows null?

@jackowaya
Copy link
Copy Markdown
Author

Made many changes here, including I believe all of Joe's suggestions and the majority of Ryan's. A few little notes:

  • I wound up using Scanner to do the text parsing. I figured Scanner probably knows better how to parse numbers than I do. However, that does mean that text like "123 Fake Street" will convert to int which is a little questionable. Should be easy to write up a full-regex version if we decide we want that.
  • The code for sending just the failed records down the error relationship got a little nasty. It calls session.write on a copy of the incoming flowfile, then completely ignores that flowfile and uses the failed records it collected during the first pass through the input data.

@joewitt
Copy link
Copy Markdown
Contributor

joewitt commented Jul 13, 2015

Alan - excellent and thanks! Do you mind providing a rebased/squashed commit?

Implemented a new NiFi processor that allows avro records to be converted from one Avro schema
to another. This supports..
* Flattening records using . notation like "parent.id"
* Simple type conversions to String or base primitive types.
* Specifying field renames using dynamic properties.
@jackowaya
Copy link
Copy Markdown
Author

rebased to upstream/develop and squashed into one commit.

@joewitt
Copy link
Copy Markdown
Contributor

joewitt commented Jul 14, 2015

@rdblue You might want to double check the CSV and JSON to Avro processors and add documentation to them like Alan has done here (additional documentation). Providing an example of usage could be a big help to folks. Also please verify whether your processor needs to accept the hadoop configuration. If not you can simply remove it from the properties being added.

@jackowaya Great job! I've made a small change to remove the pulling in of the hadoop configuration property.

@joewitt
Copy link
Copy Markdown
Contributor

joewitt commented Jul 14, 2015

i just want to add how easy this was to test. We're definitely going to build an avro viewer like we have for JSON, XML, etc.. because of this cool use case.

I setup a 'generate flow file' and replace text to make some CSV lines then ran that into "Convert CSV to Avro" and a "ConvertAvroSchema" all using Alan's example schemas. And i didn't have it right at first because my id wasn't a long and my revenue wasn't a double. But that was obvious as it told me right in context. So i could just easily iterate and go.

It is a good example of taking a cool project, Kite, and combining it with the power NiFi has to make these things interactive to quickly iterate and building something useful.

Nice job Alan and Ryan!

joewitt added a commit that referenced this pull request Jul 14, 2015
@asfgit asfgit closed this in ec6be9e Jul 14, 2015
@avishsaha
Copy link
Copy Markdown

avishsaha commented Dec 14, 2016

Hey @joewitt @jackowaya , somehow I dont see AvroRecordConverter in NiFi's available Processors. I need to be able to convert an incoming source file (CSV/XML/JSON) to a generic schema. However, the problem is I am not sure how to - 1. Convert a AVRO 'record' type to another and 2. When using ConvertAvroSchema how do we specify the dynamic properties as specified in the NiFi documentaion here - https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.kite.ConvertAvroSchema/index.html

Please advice. Thank you.

UPDATE - I see now that the AvroRecordConverter is instead used in ConvertAvroSchema.java, but the question remains as to how do we specify the dynamic properties via NiFi?

tequalsme added a commit to tequalsme/nifi that referenced this pull request May 9, 2017
JPercivall pushed a commit to JPercivall/nifi that referenced this pull request Apr 23, 2018
This closes apache#70

Signed-off-by: Joseph Percivall <JPercivall@apache.org>
iadamcsik pushed a commit to iadamcsik/nifi that referenced this pull request Oct 22, 2025
)

(cherry picked from commit a0b2830d487153b971e13da4ac16ca2f9fac4e96)
(cherry picked from commit e46c8a3d0dd7cc74b7ef9e9da20b498e504f45f9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants