NIFI-5960 add compatibility rating to RECORD schemas#3269
NIFI-5960 add compatibility rating to RECORD schemas#3269SavtechSolutions wants to merge 1 commit intoapache:masterfrom
Conversation
1d5aa61 to
6dedd43
Compare
|
I think it will be important, in the JIRA, to explain the plan/strategy for what a compatibility score would mean and how it would be calculated in general. It is probably best to discuss if that score would be sufficiently meaningful or not before diving too far into the PR itself. The idea is certainly interesting but we probably need to discuss what it means, how it is calculated, how it is used. |
|
I intended for JIRA to only describe the issue, not the solution (as I'd prefer not to steer anyone trying to solve that issue into a particular solution, there might be alternate approaches). The issue at hand is that, when determining if a record is compatible with a given schema, we consider nullable fields to be compatible with null values. Generally, this makes sense, but not for union types. If one of the union choices is a record with only nullable fields, this choice suddenly becomes compatible with pretty much everything - like, you can feed it a record with completely unrelated field names, and it would still be considered compatible (see JIRA for an example). This PR aims to address that issue by keeping track of how many non-nullable fields were found compatible, as a ratio to the total # of fields checked. The ratio goes from -1 (not compatible) to 100 (fully compatible). So the above example would still be considered "sorta-compatible", with a rating of 0, but the compatibility check would still continue, and if a schema choice with a higher rating is found (all it takes for that is one non-nullable field to be found compatible), that schema would be used instead. |
|
Ok I now think I understand what you are proposing. The JIRA states the problem is that we can choose a suboptimal type in certain situations. Your PR aims to address this by having a compatibility scoring approach and then iterate through all possible compatible options and choose the most compatible. Is the choice of 0...100 as ints an optimization over using a double between 0...1? Thanks |
|
The range goes -1..100, not 0..100. The reason is because a schema with rating 0 (all nullable fields) is still a valid backup choice, if we can't find anything better. So I either had to make the formula more complex, or to have the "incompatible" value be outside of range (I chose the latter). As for why using integers - I prefer not to deal with floating point arithmetic unless it's genuinely necessary. |
6dedd43 to
80b28a9
Compare
80b28a9 to
dbfd23f
Compare
|
Merged latest master and resolved conflicts (record compatibility check was refactored into a separate method, made my job a little easier) |
dbfd23f to
181b9c7
Compare
181b9c7 to
6b645df
Compare
|
Fixed the Javadoc for the new method |
|
closing due to inactivity |
Thank you for submitting a contribution to Apache NiFi.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes:
Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically master)?
Is your initial contribution a single, squashed commit?
For code changes:
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?For documentation related changes:
Have you ensured that format looks appropriate for the output in which it is rendered?Note:
Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.