-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NIFI-5642: QueryCassandra processor : output FlowFiles as soon fetch_size is reached #3051
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, you'd like to have the capability to start sending flow files downstream as soon as some threshold is reached, rather than waiting until a result set is exhausted before sending all the flow files at once. This pattern has showed up a few times already, I refer to it as "incremental commit" because in order to actually send flow files downstream, you have you commit the session. However at that point you lose the ability to keep session-level metrics such as result set count (some flow files will already be "gone" by the time you know how many were in the result set.
I agree with adding "incremental commit" behavior to QueryCassandra, but I think we need a different approach. The Fetch Size property is a setting at the "driver level" vs the user level, and refers to the number of results to fetch from the remote result set (aka "cursor") at once. For example, you may want to fetch 100 rows at a time even though you process them one at a time. This reduces the amount of network overhead from having to fetch each row individually. However at the user level, they may want to send out a flow file with 10 or 1000 results. So IMO we don't want to reuse the Fetch Size property for this.
Perhaps take a look at QueryDatabaseTable, it has a separate "Output Batch Size" property that I believe is more in line with what you'd like to do. Its documentation mentions the tradeoff between which attributes are populated in outgoing flow files (or not) depending on whether this property is set (i.e. if using "incremental commits").
One thing QueryDatabaseTable doesn't have is incoming connections, so it won't have an example of preserving lineage. But I believe if you keep the existing code (with "fileToProcess") intact, it will continue to work. If you need to generate more flow files, just use session.create(fileToProcess) as is done currently, and the session will handle the FORK provenance events, etc. for you.
...-cassandra-processors/src/main/java/org/apache/nifi/processors/cassandra/QueryCassandra.java
Outdated
Show resolved
Hide resolved
...-cassandra-processors/src/main/java/org/apache/nifi/processors/cassandra/QueryCassandra.java
Outdated
Show resolved
Hide resolved
Thanks for the comments @mattyb149! |
…size is reached NIFI-5642: QueryCassandra processor : output FlowFiles as soon fetch_size is reached Fixed checkstyle error Delete build.sh Delete local build file NIFI-5642 : letting fetch_size to control the Cassandra data flow creating a new MAX_ROWS_PER_FLOW_FILE parameter Fixed checkstyle error: no more import java.util.* Fixed missing imports NIFI-5642: added REL_ORIGINAL relationship in order to allow incremental commit
Hi @mattyb149! |
...rocessors/src/main/java/org/apache/nifi/processors/cassandra/AbstractCassandraProcessor.java
Outdated
Show resolved
Hide resolved
...-cassandra-processors/src/main/java/org/apache/nifi/processors/cassandra/QueryCassandra.java
Outdated
Show resolved
Hide resolved
...-cassandra-processors/src/main/java/org/apache/nifi/processors/cassandra/QueryCassandra.java
Outdated
Show resolved
Hide resolved
...-cassandra-processors/src/main/java/org/apache/nifi/processors/cassandra/QueryCassandra.java
Outdated
Show resolved
Hide resolved
...-cassandra-processors/src/main/java/org/apache/nifi/processors/cassandra/QueryCassandra.java
Outdated
Show resolved
Hide resolved
...-cassandra-processors/src/main/java/org/apache/nifi/processors/cassandra/QueryCassandra.java
Outdated
Show resolved
Hide resolved
...rocessors/src/main/java/org/apache/nifi/processors/cassandra/AbstractCassandraProcessor.java
Outdated
Show resolved
Hide resolved
...rocessors/src/main/java/org/apache/nifi/processors/cassandra/AbstractCassandraProcessor.java
Outdated
Show resolved
Hide resolved
...rocessors/src/main/java/org/apache/nifi/processors/cassandra/AbstractCassandraProcessor.java
Outdated
Show resolved
Hide resolved
...-cassandra-processors/src/main/java/org/apache/nifi/processors/cassandra/QueryCassandra.java
Outdated
Show resolved
Hide resolved
...-cassandra-processors/src/main/java/org/apache/nifi/processors/cassandra/QueryCassandra.java
Outdated
Show resolved
Hide resolved
...-cassandra-processors/src/main/java/org/apache/nifi/processors/cassandra/QueryCassandra.java
Outdated
Show resolved
Hide resolved
Hi @mattyb149! |
@aglotero what happened was is you did a merge of master instead of a rebase. It's a common mistake, but one that can cause a lot of problems for you in cases like this. The proper work flow in cases like this is:
|
We're marking this PR as stale due to lack of updates in the past few months. If after another couple of weeks the stale label has not been removed this PR will be closed. This stale marker and eventual auto close does not indicate a judgement of the PR just lack of reviewer bandwidth and helps us keep the PR queue more manageable. If you would like this PR re-opened you can do so and a committer can remove the stale tag. Or you can open a new PR. Try to help review other PRs to increase PR review bandwidth which in turn helps yours. |
…size is reached
Thank you for submitting a contribution to Apache NiFi.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes:
Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically master)?
Is your initial contribution a single, squashed commit?
For code changes:
For documentation related changes:
Note:
Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.