NIFI-4516 FetchSolr Processor #2517

JohannesDaniel · 2018-03-06T22:50:33Z

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically master)?
Is your initial contribution a single, squashed commit?

For code changes:

Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
Have you written or updated unit tests to verify your changes?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?

For documentation related changes:

Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

JohannesDaniel · 2018-03-06T22:51:47Z

@ijokarumawak
To be aligned with processor GetSolr I implemented two options for the data format of Solr results: Solr XML and record functions. However, facets and stats are written to flowfiles in JSON (which has the same structure like the Solr-JSON). I did not implement record management for these two components to keep the complexity of the processor at a reasonably level. I chose JSON as it is probably the best integrated format in NiFi.

MikeThomsen · 2018-03-09T01:31:39Z

You should consider making the output format configurable. The Solr projects I've worked on in the past have used JSON instead of XML.

MikeThomsen

Haven't had a chance to try running it, but looks very promising.

MikeThomsen · 2018-03-09T01:33:22Z

...olr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/SolrUtils.java

@@ -66,6 +67,15 @@
    public static final AllowableValue SOLR_TYPE_STANDARD = new AllowableValue(
            "Standard", "Standard", "A stand-alone Solr instance.");

+    public static final PropertyDescriptor RECORD_WRITER = new PropertyDescriptor


I think this will work for now, but it would be great to have the Solr CRUD functions moved over to a controller service.

This is a really nice idea. However, I would prefer to do this in a separate ticket.

Ok. I might be able to help with some of that.

MikeThomsen · 2018-03-09T01:36:54Z

...olr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/FetchSolr.java

+    public static final String EXCEPTION = "fetchsolr.exeption";
+    public static final String EXCEPTION_MESSAGE = "fetchsolr.exeption.message";
+
+    public static final PropertyDescriptor SOLR_QUERY_STRING = new PropertyDescriptor


Are you going to support parameters like fq? If you intend to put them here, you might want to consider break it out into some additional fields for the sake of readability and ease of use.

For parameters such as fq a single designated property will not be sufficient, as it is possible (and sometimes necessary) to define multiple filter queries. Furthermore, depending on the query parser, there are various parameters that can be used for queries. I considered to make the most common query parameters configurable via designated properties, and to provide the option of additional parameters via dynamic properties. However, I personally expect this single field to be more straightforward. I expect the majority of the users to test queries in Solr directly and to paste the query string afterwards into the query property. Which option do you prefer?

I think that's fine. I did something similar in a PR I have open for ElasticSearch + Kibana. If that's how you see most users doing it, seems fine.

I personally prefer "Solr Query String" to mean a lucene style query like "(foo AND bar)" and then have properties for the common parameters like start/end row, fq, etc, and then use dynamic props for others.

However if we want to stick with the current approach, I think the description of this property should mention that value is a URL style query string and not just a Lucene query. You also may want to implement a custom validator that ensures this style of query string can be parsed (when EL is not used).

MikeThomsen · 2018-03-09T01:38:03Z

...olr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/FetchSolr.java

+        FlowFile flowFileOriginal = session.get();
+
+        if (flowFileOriginal == null) {
+            if (context.hasNonLoopConnection())


Needs to have curly brackets around it.

MikeThomsen · 2018-03-09T01:39:54Z

...olr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/FetchSolr.java

+                    final RecordSchema schema = writerFactory.getSchema(flowFileOriginal.getAttributes(), null);
+                    final RecordSet recordSet = SolrUtils.solrDocumentsToRecordSet(response.getResults(), schema);
+                    final StringBuffer mimeType = new StringBuffer();
+                    flowFileResponse = session.write(flowFileResponse, new OutputStreamCallback() {


Compressing this down to a lambda would save a few lines. Not necessary, but your IDE should be able to do it automatically.

I will change this and add the curly brackets :)

MikeThomsen · 2018-03-09T01:42:17Z

...olr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/FetchSolr.java

+            logger.error("Failed to execute query {} due to {}. FlowFile will be routed to relationship failure", new Object[]{solrQuery.toString(), e}, e);
+        }
+
+        if (!flowFileOriginal.isPenalized())


Needs curly brackets.

MikeThomsen · 2018-03-09T01:49:28Z

...olr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/FetchSolr.java

+        return Collections.unmodifiableSet(searchComponentsTemp);
+    }
+
+    private static void addStatsFromSolrResponseToJsonWriter(final QueryResponse response, final JsonWriter writer) throws IOException {


It might be simpler to tie this to adding wt=json and just pull out the entire stats/facets branch and put that into the flowfile.

See this for an example if you haven't done it before.

The parameter wt only works for communication with Solr via http. This processor uses Solrj (which is highly recommended for SolrCloud mode). Solrj is only capable to handle binary response and XML. However, the processor makes use of the records functions. Therefore, the user easily can define an custom format (JSON, CSV, ...).

Ok. I could have sworn that I used wt=json that way, but it's been a while.

In the StackOverflow issue you mentioned above this is also discussed.
For more detailed information see here: https://lucene.apache.org/solr/guide/6_6/using-solrj.html

MikeThomsen · 2018-03-09T01:52:16Z

...bundle/nifi-solr-processors/src/test/java/org/apache/nifi/processors/solr/TestFetchSolr.java

+        runner.setProperty(SolrUtils.SOLR_TYPE, SolrUtils.SOLR_TYPE_CLOUD.getValue());
+        runner.setProperty(SolrUtils.SOLR_LOCATION, "http://localhost:8443/solr");
+        runner.setProperty(SolrUtils.COLLECTION, "testCollection");
+        runner.setProperty(FetchSolr.SOLR_QUERY_STRING, "q=*:*" +


Again, might want to think about multiple query-related fields so that this can be spread out and tested piece by piece by the user.

MikeThomsen · 2018-03-09T01:53:49Z

...bundle/nifi-solr-processors/src/test/java/org/apache/nifi/processors/solr/TestFetchSolr.java

+        runner.setNonLoopConnection(false);
+        runner.run();
+
+        runner.assertTransferCount(FetchSolr.FAILURE, 1);


There's an assertAllTransferred that can be used here to simply things.

In this case, I wanted to validate that the flowfile actually is sent to FAILURE.

MikeThomsen · 2018-03-09T01:54:07Z

...bundle/nifi-solr-processors/src/test/java/org/apache/nifi/processors/solr/TestFetchSolr.java

+        runner.assertQueueEmpty();
+        runner.assertTransferCount(FetchSolr.RESULTS, 1);
+        runner.assertTransferCount(FetchSolr.ORIGINAL, 1);
+        for (MockFlowFile flowFile : runner.getFlowFilesForRelationship(FetchSolr.RESULTS))


Curly brackets.

MikeThomsen · 2018-03-09T01:54:17Z

...bundle/nifi-solr-processors/src/test/java/org/apache/nifi/processors/solr/TestFetchSolr.java

+        while (reader.hasNext()) {
+            reader.beginObject();
+            while (reader.hasNext()) {
+                if (reader.nextName().equals("integer_single"))


Curly brackets.

JohannesDaniel · 2018-03-09T18:33:15Z

Tested the processor in a local build where it worked as expected.

JohannesDaniel · 2018-03-10T23:01:45Z

Refactored the treatment of flowfiles routed to relationship ORIGINAL. More attributes are added to flowfiles for better descriptions of requests / responses. Additionally, I adjusted some tests.

JohannesDaniel · 2018-03-16T11:14:17Z

@MikeThomsen Any news?

MikeThomsen · 2018-03-16T11:22:16Z

Sorry, haven't had time. There's a merge conflict now. Can you fix that?

JohannesDaniel · 2018-03-16T14:15:19Z

Oh, sorry. Done.

JohannesDaniel · 2018-03-23T20:16:09Z

@MikeThomsen any news?

MikeThomsen · 2018-03-23T22:27:31Z

Sorry, other stuff got in the way. I'll try to find some time soon.

JohannesDaniel · 2018-03-24T11:27:48Z

@MikeThomsen ok, sorry, I am too impatient :D

MikeThomsen · 2018-03-24T13:37:25Z

Believe me, I understand. I just checked out your branch and ran into this when doing a full build with mvn clean install from the root, can you verify/fix?

[ERROR] The project org.apache.nifi:nifi-solr-processors:1.6.0-SNAPSHOT (/Users/michaelthomsen/workspace/nifi/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/pom.xml) has 1 error
[ERROR] 'dependencies.dependency.version' for com.google.code.gson:gson:jar is missing. @ org.apache.nifi:nifi-solr-processors:[unknown-version], /Users/michaelthomsen/workspace/nifi/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/pom.xml, line 78, column 21

MikeThomsen · 2018-03-24T13:38:18Z

BTW, that's the reason why your Travis CI build failed last time. Keep an eye on that because it'll slow down your code reviews.

MikeThomsen · 2018-03-24T14:02:54Z

I have Solr Cloud in a docker session on the usual ports, so if you haven't done them yet some integration tests (ex: FetchSolrIT) should be added as well.

JohannesDaniel · 2018-03-24T15:40:20Z

@MikeThomsen I will keep an eye on the CI tests in the future, thanks for the advice. Actually, I did not take them into account as they frequently appear to fail for no reason...
I tested the processor within a local NiFi build with Solr (cloud and no-cloud) running locally. Everything worked fine. Is that what you meant with integration tests? What do you mean with FetchSolrIT? Do you have a link for an example?

MikeThomsen · 2018-03-24T17:04:27Z

Is that what you meant with integration tests? What do you mean with FetchSolrIT? Do you have a link for an example?

You would create a JUnit test called FetchSolrIT and follow the model used in the MongoDB processors and others (ex. PutMongoRecordIT). It's a JUnit test that assumes a live instance of Solr and runs against that instead of mocks. If you name it FetchSolrIT instead of FetchSolrTest it will skip normal maven surefire execution and only run when the user wants it to run. That can be done with:

mvn clean integration-test -Pintegration-tests

JohannesDaniel · 2018-03-27T20:04:42Z

Hi @MikeThomsen
travic-ci test failed due to "The log length has exceeded the limit of 4 MB (this usually means that the test suite is raising the same exception over and over)." Is there a possibility to change the log level? There was no error message in the log.

I added an IT and added a version to the dependency.

bbende · 2018-03-28T14:41:21Z

nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/pom.xml

+        </dependency>
+        <dependency>
+            <groupId>org.apache.solr</groupId>
+            <artifactId>solr-core</artifactId>


Haven't made it through the rest of the PR yet, but what is the reason for making solr-core a non-test dependency? It would be better if we could only depend on Solr client.

If we do need to depend on solr-core we need to double check what additional dependencies this brings into the Solr NAR and possibly update the LICENSE/NOTICE of the NAR and the overall assembly. I would imagine solr-core has a lot more dependencies than solr-client.

bbende · 2018-03-28T14:43:32Z

...olr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/FetchSolr.java

+        @WritesAttribute(attribute = "fetchsolr.exeption.class", description = "The Java exception class raised when the processor fails"),
+        @WritesAttribute(attribute = "fetchsolr.exeption.message", description = "The Java exception message raised when the processor fails")
+})
+public class FetchSolr extends SolrProcessor {


Was thinking about this and wondering if this processor should actually be called QuerySolr?

I feel "fetch" is typically used in NiFi when a specific object is being retrieved by id, like FetchFile takes a specific file path and retrieves that file. Here we aren't fetching a specific Solr document, we are performing a general query.

bbende · 2018-03-28T14:50:32Z

...olr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/FetchSolr.java

+    public static final String EXCEPTION = "fetchsolr.exeption";
+    public static final String EXCEPTION_MESSAGE = "fetchsolr.exeption.message";
+
+    public static final PropertyDescriptor SOLR_QUERY_STRING = new PropertyDescriptor


I personally prefer "Solr Query String" to mean a lucene style query like "(foo AND bar)" and then have properties for the common parameters like start/end row, fq, etc, and then use dynamic props for others.

However if we want to stick with the current approach, I think the description of this property should mention that value is a URL style query string and not just a Lucene query. You also may want to implement a custom validator that ensures this style of query string can be parsed (when EL is not used).

bbende · 2018-03-28T14:53:25Z

...olr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/FetchSolr.java

+            responseAttributes.put(ATTRIBUTE_QUERY_TIME, String.valueOf(response.getQTime()));
+            flowFileResponse = session.putAllAttributes(flowFileResponse, responseAttributes);
+
+            if (response.getResults().size() > 0) {


What is the expected behavior when there are more results than were asked for in the initial search?

Say there are 1k total results, but the query asked for rows 0 to 10.

Are we going to page through the results and send out multiple flow files?

Do we just don't handle this case and it is up to the user to make the rows large enough (which also means they could blow up memory if they ask for too many results in a single query)?

MikeThomsen · 2018-04-11T11:15:12Z

@JohannesDaniel Build failed because of missing imports:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.6.0:compile (default-compile) on project nifi-solr-processors: Compilation failure: Compilation failure:
[ERROR] /home/travis/build/apache/nifi/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/SolrUtils.java:[98,41] error: cannot find symbol
[ERROR] symbol: variable ExpressionLanguageScope
[ERROR] location: class SolrUtils
[ERROR] /home/travis/build/apache/nifi/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/SolrUtils.java:[108,41] error: cannot find symbol
[ERROR] symbol: variable ExpressionLanguageScope

I can get back into testing this once you update.

JohannesDaniel · 2018-04-11T11:29:38Z

@MikeThomsen Yeah, this time I looked at this, but haven't had time to fix this. I think I have to merge master into the branch due to updates. Local build worked. But thanks for the ping :)

MikeThomsen · 2018-04-11T12:33:49Z

@JohannesDaniel just do a rebase against master. You shouldn't use git merge master for something like this. If you want to do a big squash event to make the rebase easier, go ahead.

JohannesDaniel · 2018-04-11T21:27:53Z

@MikeThomsen rebased everything, build for solr processors works without any problems. however, when I try to build the whole application, I receive the following error:
[ERROR] Failed to execute goal on project nifi-livy-processors: Could not resolve dependencies for project org.apache.nifi:nifi-livy-processors🫙1.7.0-SNAPSHOT: Failure to find org.apache.nifi:nifi-standard-processors🫙tests:1.7.0-SNAPSHOT in https://repository.apache.org/snapshots was cached in the local repository, resolution will not be reattempted until the update interval of apache.snapshots has elapsed or updates are forced -> [Help 1]
is this a known problem?

MikeThomsen · 2018-04-12T01:26:05Z

Do this:

git checkout master
git pull upstream master (whatever you call github.com/apache/nifi master)
git checkout NIFI-4516
git rebase master
git push origin NIFI-4516 --force

I just built master and it didn't have that problem. Try a full rebuild with mvn clean install -DskipTests=true from the root folder.

ottobackwards · 2018-04-12T10:44:04Z

If you are building and skipping the tests use -DskipTests=true. You can also do to do at least one mvn install run as well so that your local repo has everything.

MikeThomsen · 2018-04-12T10:49:10Z

@JohannesDaniel go ahead and check in the change even if it's broken. I'll help take a look.

JohannesDaniel · 2018-04-12T12:00:43Z

@MikeThomsen omg. now there are commits of others in the branch?? maybe I should simply close this PR and insert the code into a new branch.

MikeThomsen · 2018-04-12T12:04:54Z

@JohannesDaniel see my earlier comment about rebasing. You cannot just do git merge master into another branch like that since master is the foundation of that branch. Don't worry, though. The merge commit should get stomped when you do the rebase and forced push.

Give that a shot. No need to close out the PR over some minor learning pains with Git. Post a message when you've force pushed and I'll take a look.

JohannesDaniel · 2018-04-12T12:05:00Z

@MikeThomsen or is this a normal thing that the commits of others are shown here after the rebase?

MikeThomsen · 2018-04-12T12:06:19Z

@JohannesDaniel nope. If you did a rebase, those new commits would be behind your commits, not merged ahead of them. What a rebase does is it sets aside your commits, brings the base pointer for the branch up to the current pointer for the source branch (master in this case) and replays your commits on top of that.

MikeThomsen · 2018-04-12T12:08:40Z

@JohannesDaniel also, you don't need to wait for a new build to push the changes. Overwriting everything will schedule the job to be killed and a new one started.

JohannesDaniel · 2018-04-12T13:07:10Z

@MikeThomsen done :)

MikeThomsen · 2018-04-12T13:44:31Z

Ok. I'll take a look and let you know sometime in a little while. Got some other things on my plate at the moment.

MikeThomsen · 2018-04-12T14:35:29Z

Builds just fine for me locally. If you're still having build problems, try deleting everything under org/apache/nifi from .m2/repository in your home folder.

JohannesDaniel · 2018-04-12T19:36:48Z

@MikeThomsen Thank you for your help with Git!!
@bbende

Solr core is a test dependency again (I actually had a reason to add this as a non-test dependency, but it was not a good reason...)
Processor is now called QuerySolr
I added the common parameters for Solr as properties (except fq). For those parameters that can be set multiple times, I adopted your logic of PutSolrContentStream. These parameters can be added as dynamic properties, e. g. fq.1=field:value and fq.2=field:value. I inserted this method into SolrUtils and adjusted the pattern a little bit (parameters containing dots are also allowed now, e. g. facet.field). However, we could consider to allow even more characters. Currently, only word characters and dots are allowed. The Solr committers highly recommend to name fields only with these characters, but it is not mandatory. Probably, a pattern like ".*.\d+$" would be more suitable.
I made it configurable whether the processor only returns top results (for one request in a single flowfile) or full result sets (for multiple requests in multiple flowfiles with rows as batch size). In the latter case, the processor pages through the results. Additionally, the cursor mark for responses is added as attribute, so users principally should be enabled to build their own looping via dataflows.

ottobackwards · 2018-04-12T19:59:25Z

Are there best practices for handling paging? I think the ES processors handle this differently. Does it matter?

JohannesDaniel · 2018-04-12T20:11:42Z

@ottobackwards deep paging should be done with scrolling using cursor marks (as it is done in GetSolr). simple paging can be done by sucessively increasing the offset (start parameter). however, using cursor marks requires sorting for id. more information can be found here: https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html

ottobackwards · 2018-04-12T20:17:43Z

My comment is more about the way you work with the ES vs. the Solr processors and if there is or should be an expectation that there is some commonality between them. Like how this relates to QueryElasticSearchHttp etc. I'm sorry I don't mean to start a tangent on your PR. Please excuse me.

MikeThomsen · 2018-04-13T10:27:50Z

should be an expectation that there is some commonality between them.

Solr and ES are increasingly used for different use cases these days. Solr tends to be more commonly used for pure search whereas ES is frequently used for data analysis because it has really powerful and fast aggregation queries that Solr lacks.

MikeThomsen · 2018-04-13T10:58:53Z

Ok. Verified with a test run that it appears to run fine against a vanilla Solr installation.

Docker: docker run --name testsolr -p 8983:8983 solr:latest solr-demo

QuerySolr_Test_Case.xml.txt

MikeThomsen

I think we're pretty close to being ready to merge.

MikeThomsen · 2018-04-13T11:01:10Z

...ors/src/main/resources/docs/org/apache/nifi/processors/solr/QuerySolr/additionalDetails.html

+<p>
+    This definition will be appended to the Solr URL as follows:
+    fq=field1:value1&fq=field2:value2&fq=field3:value3
+</p>


This needs detailed documentation for how to add the facet and stats support. I had to dig through the code to find that out. I would recommend taking the Docker solr demo that I referenced in the comments and using that as the basis so that new users can have something pretty painless as a reference point for starting out.

MikeThomsen · 2018-04-13T11:27:50Z

...olr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/QuerySolr.java

+                if (getEntireResults) {
+                    final Integer totalDocumentsReturned = solrQuery.getStart() + solrQuery.getRows();
+                    if (totalDocumentsReturned < totalNumberOfResults) {
+                        solrQuery.setStart(totalDocumentsReturned);


I think there should be a sane default limit to how far this goes since it's not using a cursor to get there. As-is, I think it would enable a user to do something like page out tens of thousands of results which AFAIK without a cursor could be pretty bad for Solr's performance.

@bbende @ijokarumawak thoughts on that?

@MikeThomsen I could add a property limiting the total amout of results the processor requests. This property could have a default of let's say 10000. If this property is set to 0, there is no limit. The property's description could include a warning with respect to Solr performance issues in the case of deep paging.

I think we should set an upper limit of absolutely no more than 10000 because this doesn't use the Solr-approved method of deep pagination. In fact I would say that 1,000 to 5,000 should be where the range is with explicit documentation that this is not the processor you're looking for if your goal is to hoover up the collection and process it :)

FYI, ES has the same issue and the solution in place now is a separate "scroll processor" which would be the equivalent of a "SolrCursorQuery" processor. In general, the use cases here are radically different so for the sake of our sanity and code quality, don't worry about people who want to process everything for the moment.

MikeThomsen · 2018-04-13T11:28:48Z

...olr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/QuerySolr.java

+                    final Integer totalDocumentsReturned = solrQuery.getStart() + solrQuery.getRows();
+                    if (totalDocumentsReturned < totalNumberOfResults) {
+                        solrQuery.setStart(totalDocumentsReturned);
+                        session.transfer(flowFileResponse, RESULTS);


Needs a call to the provenance manager to track the provenance. I think the RECEIVE event is the right one here.

MikeThomsen · 2018-04-13T11:32:59Z

...olr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/QuerySolr.java

+        }
+
+        if (!flowFileResponse.isPenalized()) {
+            session.transfer(flowFileResponse, RESULTS);


Needs provenance tracking.

MikeThomsen · 2018-04-13T11:33:18Z

...olr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/QuerySolr.java

+                                }
+                            });
+                            flowFileFacets = session.putAttribute(flowFileFacets, CoreAttributes.MIME_TYPE.key(), MIME_TYPE_JSON);
+                            session.transfer(flowFileFacets, FACETS);


Needs provenance tracking.

MikeThomsen · 2018-04-13T11:33:27Z

...olr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/QuerySolr.java

+                                }
+                            });
+                            flowFileStats = session.putAttribute(flowFileStats, CoreAttributes.MIME_TYPE.key(), MIME_TYPE_JSON);
+                            session.transfer(flowFileStats, STATS);


Needs provenance tracking.

MikeThomsen · 2018-04-13T11:37:47Z

@JohannesDaniel I didn't see a test that tests deep pagination. If you don't have one, please add one. You can just throw garbage data in there like random strings. Also, you should add at least one test that proves all of the declared attributes are written when they're expected to be available on output flowfiles.

ottobackwards · 2018-04-13T12:12:46Z

@JohannesDaniel If there are no results from a page request, does that mean there is no output? I ask because I have a PR for that use case with QueryElasticsearchHttp.
#2601

JohannesDaniel · 2018-04-13T14:00:01Z

@MikeThomsen I'm still impeded, because my Maven does not download the NIFI-SNAPSHOTS. It downloads all dependencies except the SNAPSHOTS. When I follow the links (e. g. https://repository.apache.org/snapshots/org/apache/nifi/nifi-processor-utils/1.7.0-SNAPSHOT/nifi-processor-utils-1.7.0-SNAPSHOT.pom) I get a 404. Shouldn't I be able to see any dependencies at https://repository.apache.org/content/groups/snapshots/org/apache/nifi/?

ottobackwards · 2018-04-13T14:07:40Z

If you are building Nifi, and need to get things working after merging in the new version poms, you can do a mvn clean install -DskipTests and that should populate your local repo with the 1.7.0-SNAPSHOT versions shouldn't it?

JohannesDaniel · 2018-04-13T14:42:11Z

@ottobackwards When I try this I always run into the same problem:

[ERROR] Failed to execute goal on project nifi-livy-processors: Could not resolve dependencies for project org.apache.nifi:nifi-livy-processors🫙1.7.0-SNAPSHOT: Could not find artifact org.apache.nifi:nifi-standard-processors🫙tests:1.7.0-SNAPSHOT in apache.snapshots (https://repository.apache.org/snapshots) -> [Help 1]
[ERROR] Failed to execute goal on project nifi-slack-processors: Could not resolve dependencies for project org.apache.nifi:nifi-slack-processors🫙1.7.0-SNAPSHOT: Could not find artifact org.apache.nifi:nifi-standard-processors🫙tests:1.7.0-SNAPSHOT in apache.snapshots (https://repository.apache.org/snapshots) -> [Help 1]

already deleted everything in .m2/repository/org/apache/nifi
reimported / reran buid --> same error message
I never had that with prior versions of nifi

ottobackwards · 2018-04-13T14:48:39Z

how are you building? Like what command line?
if you are using -Dmaven.test.skip=true you will have this problem
if you want to skip tests use -DskipTests

MikeThomsen · 2018-04-13T17:52:33Z

@ottobackwards

@JohannesDaniel If there are no results from a page request, does that mean there is no output? I ask because I have a PR for that use case with QueryElasticsearchHttp.

There's a clause at the very bottom that will ensure they get sent. See:

if (!flowFileResponse.isPenalized()) {
    session.transfer(flowFileResponse, RESULTS);
}

JohannesDaniel · 2018-04-13T18:48:29Z

@ottobackwards you made my day!

JohannesDaniel · 2018-04-13T18:58:21Z

@MikeThomsen There are two tests that test Solr result paging:

testRetrievalOfFullResults()
testRetrievalOfFullResults2()

What do you mean exactly? A test that retrieves more results from Solr? What actually shall be the purpose of this test?

MikeThomsen · 2018-04-13T20:18:40Z

@JohannesDaniel I missed those tests for some reason. Basically I was just looking for something that goes into deep pagination behavior. Disregard for now. Thanks.

MikeThomsen · 2018-04-16T19:45:19Z

Have you had a chance to add the missing provenance tracking?

JohannesDaniel · 2018-04-17T11:49:02Z

hi @MikeThomsen I will upload the changes this evening.

JohannesDaniel · 2018-04-17T11:49:21Z

(means in 5-6 hous)

MikeThomsen · 2018-04-17T12:07:42Z

@JohannesDaniel Sounds good. Make sure it includes the requested documentation as well. That's going to be a big deal in making this easy for new users to use. Thanks.

JohannesDaniel · 2018-04-17T15:43:30Z

@MikeThomsen

added provenance receive notifications
added tests for attributes
added upper limit of solr start parameter (10000)
enhanced documentation for upper limit of start param and parameters for facet and stats

MikeThomsen

+1 LGTM. It looks like everything I requested in the last round is there. Integration tests still pass against a live SolrCloud instance.

MikeThomsen reviewed Mar 9, 2018

View reviewed changes

bbende reviewed Mar 28, 2018

View reviewed changes

JohannesDaniel force-pushed the NIFI-4516 branch from f25cb01 to b82707a Compare April 12, 2018 13:06

renamed attributes to add

f7bc96d

MikeThomsen reviewed Apr 13, 2018

View reviewed changes

finalization of QuerySolr

fc402cc

MikeThomsen approved these changes Apr 18, 2018

View reviewed changes

asfgit closed this in aa196bc Apr 18, 2018

NIFI-4516 FetchSolr Processor #2517

NIFI-4516 FetchSolr Processor #2517

Conversation

JohannesDaniel commented Mar 6, 2018

For all changes:

For code changes:

For documentation related changes:

Note:

JohannesDaniel commented Mar 6, 2018

MikeThomsen commented Mar 9, 2018

MikeThomsen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JohannesDaniel Mar 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JohannesDaniel commented Mar 9, 2018

JohannesDaniel commented Mar 10, 2018

JohannesDaniel commented Mar 16, 2018

MikeThomsen commented Mar 16, 2018

JohannesDaniel commented Mar 16, 2018

JohannesDaniel commented Mar 23, 2018

MikeThomsen commented Mar 23, 2018

JohannesDaniel commented Mar 24, 2018

MikeThomsen commented Mar 24, 2018

MikeThomsen commented Mar 24, 2018

MikeThomsen commented Mar 24, 2018

JohannesDaniel commented Mar 24, 2018

MikeThomsen commented Mar 24, 2018

JohannesDaniel commented Mar 27, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeThomsen commented Apr 11, 2018

JohannesDaniel commented Apr 11, 2018

MikeThomsen commented Apr 11, 2018

JohannesDaniel commented Apr 11, 2018

MikeThomsen commented Apr 12, 2018

ottobackwards commented Apr 12, 2018

MikeThomsen commented Apr 12, 2018

JohannesDaniel commented Apr 12, 2018

MikeThomsen commented Apr 12, 2018

JohannesDaniel commented Apr 12, 2018

MikeThomsen commented Apr 12, 2018

MikeThomsen commented Apr 12, 2018

JohannesDaniel commented Apr 12, 2018

MikeThomsen commented Apr 12, 2018

MikeThomsen commented Apr 12, 2018

JohannesDaniel commented Apr 12, 2018

ottobackwards commented Apr 12, 2018

JohannesDaniel commented Apr 12, 2018

ottobackwards commented Apr 12, 2018

MikeThomsen commented Apr 13, 2018

MikeThomsen commented Apr 13, 2018

MikeThomsen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeThomsen Apr 13, 2018 • edited Loading

Choose a reason for hiding this comment

JohannesDaniel Mar 10, 2018 •

edited

Loading

JohannesDaniel commented Mar 27, 2018 •

edited

Loading

MikeThomsen Apr 13, 2018 •

edited

Loading

MikeThomsen commented Apr 17, 2018 •

edited

Loading