NIFI-5059 Updated MongoDBLookupService to be able to detect record sc… #2619

MikeThomsen · 2018-04-09T15:37:28Z

…hemas or take one provided by the user.

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically master)?
Is your initial contribution a single, squashed commit?

For code changes:

Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
Have you written or updated unit tests to verify your changes?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?

For documentation related changes:

Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

MikeThomsen · 2018-04-10T11:58:46Z

@mattyb149 Had to change the schema handling in MongoDBLookupService. Can you take a look?

MikeThomsen · 2018-04-23T21:38:42Z

@mattyb149 Any chance you can take a look?

MikeThomsen · 2018-05-19T13:08:18Z

@pvillard31 @mattyb149 I updated this to have a clean separation between the controller and lookup service code and subclassed the lookup service from SchemaRegistryService. Can one of you do a review sometime soon?

MikeThomsen · 2018-05-19T13:09:33Z

...bundle/nifi-mongodb-services/src/main/java/org/apache/nifi/mongodb/MongoDBLookupService.java

+        for (Map.Entry<String, Object> entry : result.entrySet()) {
+
+            RecordField field;
+            if (entry.getValue() instanceof Integer) {


At some point once this and the ES Lookup Service are merged, I'll refactor this into a helper method.

MikeThomsen · 2018-05-30T11:26:26Z

@mattyb149 can you review this?

mattyb149 · 2018-06-04T14:57:40Z

Reviewing...

mattyb149 · 2018-06-04T15:31:04Z

...bundle/nifi-mongodb-services/src/main/java/org/apache/nifi/mongodb/MongoDBLookupService.java

@@ -52,68 +54,125 @@
    "The query is limited to the first result (findOne in the Mongo documentation). If no \"Lookup Value Field\" is specified " +
    "then the entire MongoDB result document minus the _id field will be returned as a record."
 )
-public class MongoDBLookupService extends MongoDBControllerService implements LookupService<Object> {
+public class MongoDBLookupService extends SchemaRegistryService implements LookupService<Object> {
+    public static final PropertyDescriptor CONTROLLER_SERVICE = new PropertyDescriptor.Builder()


AFAICT this property is never added to the list of supported property descriptors, so I couldn't set it on the UI which causes an NPE when lookup() is called. Seems odd that for a required property that is not supported, setting it (in tests) would not complain. I haven't run the integration tests yet, just put the NARs into a live NiFi to try it out.

I added to the property list.

mattyb149 · 2018-06-04T15:31:35Z

...bundle/nifi-mongodb-services/src/main/java/org/apache/nifi/mongodb/MongoDBLookupService.java

+        .displayName("Client Service")
+        .description("A MongoDB controller service to use with this lookup service.")
+        .required(true)
+        .identifiesControllerService(MongoDBControllerService.class)


I believe this is supposed to be an interface not the impl class (see my other comment below), so I think you want MongoDBClientService here.

mattyb149 · 2018-06-04T15:32:29Z

...bundle/nifi-mongodb-services/src/main/java/org/apache/nifi/mongodb/MongoDBLookupService.java

        this.lookupValueField = context.getProperty(LOOKUP_VALUE_FIELD).getValue();
-        super.onEnabled(context);
+        this.controllerService = context.getProperty(CONTROLLER_SERVICE).asControllerService(MongoDBControllerService.class);


I get a runtime error here as it is expecting an interface not an impl class, I had to change this to MongoDBClientService to get past the runtime error.

Looks like this one is still there, still getting the runtime errors

MikeThomsen · 2018-06-04T16:25:05Z

Accidentally rebased it a while and so I had to force push. Sorry about that.

MikeThomsen · 2018-06-04T16:29:50Z

@mattyb149 once this and the ES one are merged, it would probably be a good time to discuss extracting the schema builder code into a utility class.

mattyb149 · 2018-06-04T16:35:27Z

Agreed, I'll try to get this one in today then take a look at the ES one.

MikeThomsen · 2018-06-04T19:00:21Z

@mattyb149 updated, but looks like Travis is busted at the moment (saying it can't find our repo)

MikeThomsen · 2018-06-05T18:28:46Z

@mattyb149 can we close this one out? It's a good starting point for this cleanup task.

mattyb149 · 2018-06-06T03:23:25Z

I may have shot myself in the foot here by asking that this extend SchemaRegistryService, as that requires you supply some way to get to the schema. In this current form, how would I get to the code path where the Mongo document's schema is gleaned vs being provided from somewhere else?

MikeThomsen · 2018-06-06T11:50:11Z

...bundle/nifi-mongodb-services/src/main/java/org/apache/nifi/mongodb/MongoDBLookupService.java

-
-                final RecordSchema schema = new SimpleRecordSchema(fields);
-                return Optional.ofNullable(new MapRecord(schema, result));
+                RecordSchema toUse = schema != null ? schema : convertSchema(result);


@mattyb149 I think the answer to your last question is here. If you specify schema.name in the coordinates, it'll get that from loadSchema. If not, it calls convertSchema. The rest of the lookup strategies don't make much sense in this case so I can back out the change to extended SchemaRegistryService if that makes sense.

One thing you could do is to override getSupportedPropertyDescriptors() and add your own property for Schema Access Strategy that only has the relevant ones, including your own strategy of "Infer Schema From Document" or something.

That's probably the right way to do it because we should have it blow up if it can't get the schema on the first pass instead of silently falling back onto the inference option. Once I get that worked out, I'll copy pasta it over the ES one as well.

MikeThomsen · 2018-06-07T12:31:59Z

@mattyb149 @ijokarumawak @bbende I built on the schema registry service to add a new option for NoSQL options like Mongo, ES, Solr, etc. to just throw JSON in Map form and say "you figure it out." Please take a look at the new schema code when you get a chance.

bbende · 2018-06-07T12:48:15Z

...ls/nifi-avro-record-utils/src/main/java/org/apache/nifi/schema/access/SchemaAccessUtils.java

@@ -176,6 +176,8 @@ public static SchemaAccessStrategy getSchemaAccessStrategy(final String allowabl
            return new HortonworksAttributeSchemaReferenceStrategy(schemaRegistry);
        } else if (allowableValue.equalsIgnoreCase(CONFLUENT_ENCODED_SCHEMA.getValue())) {
            return new ConfluentSchemaRegistryStrategy(schemaRegistry);
+        } else if (allowableValue.equalsIgnoreCase(INFER_SCHEMA.getValue())) {


Since this inference only works when the content is JSON, I think this option should only be available when using a JSON related record reader, and not available in the default case.

This would be similar to how the AvroReader makes available the option for "Embedded Avro Schema" - https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/avro/AvroReader.java#L63

Ok. I'll work on that.

bbende · 2018-06-07T12:51:57Z

...ndard-record-utils/src/main/java/org/apache/nifi/schema/access/JsonSchemaAccessStrategy.java

+import java.io.IOException;
+import java.util.Map;
+
+public interface JsonSchemaAccessStrategy extends SchemaAccessStrategy {


Can this be done without introducing a new method to the interface?

The original interface has:
getSchema(Map<String, String> variables, InputStream contentStream, RecordSchema readSchema

Since we know the content has to be json in this case, can't we read contentStream into the Map<String,Object> in the implementation of the access strategy, rather than requiring callers to do that first?

The client APIs for the third party systems usually return a Map, not a String that we can just pass on. I didn't want to serialize the client's output and then deserialize it later.

Ok but I'm confused because I'm not seeing an actual call that uses the new method...

The MongoLookupService does this:

private RecordSchema loadSchema(Map<String, Object> coordinates, Document doc) { + Map<String, String> variables = coordinates.entrySet().stream() + .collect(Collectors.toMap( + e -> e.getKey(), + e -> e.getValue().toString() + )); + ObjectMapper mapper = new ObjectMapper(); + try { + byte[] bytes = mapper.writeValueAsBytes(doc); + return getSchema(variables, new ByteArrayInputStream(bytes), null); + } catch (Exception ex) { + return null; + } + }

So since we are reserializing the Doc here and putting the coordinates as variables, I'm not seeing where we call the new method, but I may be missing it.

You're not missing anything...

bbende · 2018-06-07T12:56:14Z

...rd-utils/src/main/java/org/apache/nifi/serialization/JsonInferenceSchemaRegistryService.java

+import static org.apache.nifi.schema.access.SchemaAccessUtils.SCHEMA_TEXT_PROPERTY;
+import static org.apache.nifi.schema.access.SchemaAccessUtils.SCHEMA_VERSION;
+
+public class JsonInferenceSchemaRegistryService extends SchemaRegistryService {


I'm not totally sure about this, but I think if we take the approach mentioned in my other comments, we probably wouldn't need this class since the JSON readers would handle the logic for when schemaAccess is set to "JSON Inference", similar to how AvroReader handles when embedded schema is selected - https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/avro/AvroReader.java#L78

Yeah, if this gets expanded into the readers I could definitely see that being the case. For now, this is limited to being used by LookupServices that need schema access + JSON help like this one, the ES one and possibly later something like a RethinkDBLookupService.

bbende · 2018-06-07T13:54:35Z

I haven't gone too deep looking at this, but if the goal is to have a re-usable way to infer a schema from JSON across various NoSQL components, have we considered just putting some utility code in a JAR somewhere under nifi-nar-bundles/nifi-extension-utils rather than trying to hook into the SchemaAccessStrategy/SchemaRegistryService?

I'm just on the fence about whether the schema access stuff makes sense here since that was designed for the readers/writers, and this is really coming from a different angle of already having some Map object in memory.

MikeThomsen · 2018-06-07T14:41:47Z

That was the original approach. I'm now leaning toward going back to that because it's feeling like "less is more" here.

MikeThomsen · 2018-06-19T13:37:21Z

@mattyb149 @ijokarumawak do either of you have time to get this reviewed before 1.7.0 release vote starts?

…hemas or take one provided by the user.

…ryService.

MikeThomsen · 2018-06-26T11:29:30Z

@mattyb149 can we close the loop on this?

MikeThomsen · 2018-06-30T11:17:56Z

@zenfenan can you review? I think we're almost at close out point.

zenfenan · 2018-07-02T05:34:40Z

@MikeThomsen I'm actually traveling with limited access to mails and internet. I'll try to take a look as soon as I can, if someone doesn't get to already.

mattyb149 · 2018-07-02T13:05:49Z

Reviewing...

mattyb149 · 2018-07-02T14:32:55Z

...ls/nifi-avro-record-utils/src/main/java/org/apache/nifi/schema/access/SchemaAccessUtils.java

@@ -50,7 +50,7 @@
        "The content of the FlowFile contains a reference to a schema in the Schema Registry service. The reference is encoded as a single "
            + "'Magic Byte' followed by 4 bytes representing the identifier of the schema, as outlined at http://docs.confluent.io/current/schema-registry/docs/serializer-formatter.html. "
            + "This is based on version 3.2.x of the Confluent Schema Registry.");
-
+    public static final AllowableValue INFER_SCHEMA = new AllowableValue("infer", "Infer from JSON");


Shouldn't this be "Infer from Result" or something? It could be used by other processors to infer the schema from whatever object is returned.

Probably. I'll go ahead and make that change.

mattyb149 · 2018-07-02T14:34:53Z

...rd-utils/src/main/java/org/apache/nifi/serialization/JsonInferenceSchemaRegistryService.java

+    }
+
+    @Override
+    protected SchemaAccessStrategy getSchemaAccessStrategy(final String strategy, final SchemaRegistry schemaRegistry, final ConfigurationContext context) {


Since this impl is specifically for JSON inference, perhaps it should override getDefaultSchemaAccessStrategy() to return the Infer one?

Done. I think that probably is the right thing to do here.

mattyb149 · 2018-07-02T14:38:29Z

...b-bundle/nifi-mongodb-services/src/test/java/org/apache/nifi/mongodb/TestSchemaRegistry.java

+import java.util.List;
+import java.util.Set;
+
+public class TestSchemaRegistry extends AbstractControllerService implements SchemaRegistry {


Should this be called StubSchemaRegistry or MockSchemaRegistry? With Test at the front, I imagine it gets picked up by JUnit (although there are no @test methods, but still)

Yeah. Changed it to StubSchemaRegistry.

MikeThomsen · 2018-07-02T19:50:17Z

@mattyb149 made the changes you requested.

mattyb149 · 2018-07-03T02:03:16Z

+1 LGTM, one of the unit tests in Travis is failing but it's not the fault of this code. I ran the unit tests and some tests on a live NiFi instance with "Infer" and "Schema Text" strategies, all looked well. Thanks for the addition! Merging to master

MikeThomsen force-pushed the NIFI-5059 branch from 40cbd11 to 975933f Compare April 10, 2018 11:58

MikeThomsen force-pushed the NIFI-5059 branch from 975933f to 84e31e6 Compare April 24, 2018 18:38

MikeThomsen force-pushed the NIFI-5059 branch from 84e31e6 to 266169b Compare May 19, 2018 12:35

MikeThomsen commented May 19, 2018

View reviewed changes

mattyb149 reviewed Jun 4, 2018

View reviewed changes

MikeThomsen force-pushed the NIFI-5059 branch from 266169b to 1371185 Compare June 4, 2018 16:23

MikeThomsen commented Jun 6, 2018

View reviewed changes

bbende reviewed Jun 7, 2018

View reviewed changes

MikeThomsen force-pushed the NIFI-5059 branch from 199c251 to a829588 Compare June 14, 2018 11:18

MikeThomsen added 6 commits June 25, 2018 21:10

NIFI-5059 Updated MongoDBLookupService to be able to detect record sc…

bd9bd3d

…hemas or take one provided by the user.

NIFI-5059 Changed it to use a schema registry.

b716f81

NIFI-5059 Updated MongoDBLookupService to be a SchemaRegistryService.

9e048f9

NIFI-5059 Added two changes from a code review.

8f08aa4

NIFI-5059 Fixed two bad references.

095040c

NIFI-5059 Refactored schema strategy handling.

2ea5e10

MikeThomsen added 4 commits June 25, 2018 21:10

NIFI-5059 Moved schema strategy handling to JsonInferenceSchemaRegist…

fda8d19

…ryService.

NIFI-5059 Updated to use new LookupService method.

18ad098

NIFI-5059 fixed schema inference bug.

01feb7f

NIFI-5059 Added test for schema text strategy

e15c480

MikeThomsen force-pushed the NIFI-5059 branch from 0fbf3db to e15c480 Compare June 26, 2018 01:37

NIFI-5059 incremented version number to make the build work.

3b0e811

NIFI-5059 fixed a stray 1.7.0 reference.

96525f6

NIFI-5059 Added getDatabase to client service.

0cfa4cc

mattyb149 reviewed Jul 2, 2018

View reviewed changes

NIFI-5059 Added changes requested in a code review.

0510977

asfgit closed this in 22ec069 Jul 3, 2018

MikeThomsen deleted the NIFI-5059 branch August 14, 2024 21:14

NIFI-5059 Updated MongoDBLookupService to be able to detect record sc… #2619

NIFI-5059 Updated MongoDBLookupService to be able to detect record sc… #2619

Conversation

MikeThomsen commented Apr 9, 2018 • edited Loading

For all changes:

For code changes:

For documentation related changes:

Note:

MikeThomsen commented Apr 10, 2018

MikeThomsen commented Apr 23, 2018

MikeThomsen commented May 19, 2018

Choose a reason for hiding this comment

MikeThomsen commented May 30, 2018

mattyb149 commented Jun 4, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeThomsen commented Jun 4, 2018

MikeThomsen commented Jun 4, 2018

mattyb149 commented Jun 4, 2018

MikeThomsen commented Jun 4, 2018

MikeThomsen commented Jun 5, 2018

mattyb149 commented Jun 6, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeThomsen commented Jun 7, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bbende Jun 7, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bbende commented Jun 7, 2018

MikeThomsen commented Jun 7, 2018

MikeThomsen commented Jun 19, 2018

MikeThomsen commented Jun 26, 2018

MikeThomsen commented Jun 30, 2018

zenfenan commented Jul 2, 2018

mattyb149 commented Jul 2, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattyb149 Jul 2, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeThomsen commented Jul 2, 2018

mattyb149 commented Jul 3, 2018

MikeThomsen commented Apr 9, 2018 •

edited

Loading

bbende Jun 7, 2018 •

edited

Loading

mattyb149 Jul 2, 2018 •

edited

Loading