-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NIFI-4371 - add support for query timeout in Hive processors #2138
Conversation
@@ -290,6 +292,9 @@ public void process(final OutputStream out) throws IOException { | |||
} | |||
} | |||
|
|||
// set query timeout | |||
st.setQueryTimeout(queryTimeout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing to be done here necessarily, but just wanted to mention that the success of this method is dependent on the customValidate() preventing the processor from running if the Hive JDBC driver does not support setQueryTimeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the point of the customValidate but I'd keep its check but mark it as valid whether the timeout method is supported or not. YOu can set some processor instance boolean to advise whether it is supported then in this call/check set the timeout if it is and ignore it if not. Otherwise we're requiring them to use a version of hive which supports it and that seems a little heavy handed. It is probably a good idea to do the current validation logic in some other lifecycle call like 'onAdded' or something, warn if timeouts not supported and point out hanging threads are possible, and keep going.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
disregard what i said. matt pointed out that the code checks IF a timeout was set and is non-zero. Based on that I think the way it was implemented is awesome.
.name("hive-query-timeout") | ||
.displayName("Query timeout") | ||
.description("Sets the number of seconds the driver will wait for a query to execute. " | ||
+ "A value of 0 means no timeout. This feature is available starting with Hive 2.1.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove the 'This feature is available starting with Hive 2.1". Otherwise this all lgtm
.name("hive-query-timeout") | ||
.displayName("Query timeout") | ||
.description("Sets the number of seconds the driver will wait for a query to execute. " | ||
+ "A value of 0 means no timeout. This feature is available starting with Hive 2.1.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The part about the feature availability is nice if/when there's a choice, but for now it's not technically germane, since the Hive NAR ships with a particular version of Hive. I will remove that part and merge, the rest LGTM :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we replace the "This feature is available..." sentence with "NOTE: Non-zero values may not be supported by the driver"
The "unit tests" for TestSelectHiveQL use Derby as the database, only to test the functionality of getting the "HiveQL" statement to the database and parsing its results. In that vein, Derby supports setQueryTimeout (DERBY-31), so can we add a unit test that sets the value, to exercise that part of the code? |
Thanks for the review @mattyb149 and @joewitt. I updated the property description based on your comments. Regarding the unit test, since I'm using a |
I'm getting NPEs in the unit tests, something weird with MockPropertyValue getting created without "expectExpressions" being set to anything, causing isExpressionLanguagePresent() to throw the NPE |
I already noticed this error while working on others PRs (I'm a bit surprised I didn't notice the NPE on this PR...). It's because we're checking if the processor is valid before enabling expression validation (https://github.com/apache/nifi/blob/master/nifi-mock/src/main/java/org/apache/nifi/util/StandardProcessorTestRunner.java#L169). We can't really do it the other way around without changing a lot of things. I updated the PR to just check if |
I talked to @markap14 about it, perhaps this fix is fine or we can just change it to a boolean, but I'll let him take a look too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like possible merge conflicts, do you mind rebasing against the latest master? Please and thanks!
@@ -225,7 +225,7 @@ public String toString() { | |||
|
|||
@Override | |||
public boolean isExpressionLanguagePresent() { | |||
if (!expectExpressions) { | |||
if (expectExpressions == null || !expectExpressions) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix (in whatever form it takes) should be as part of NIFI-4590. As a temporary fix in my own PR #2260 , I put a try/catch around isExpressionLanguagePresent() with a comment about NIFI-4590. You don't have to do that specifically, but if you can get around changing the mock stuff in this PR, I can merge and add a reference to your workaround in NIFI-4590, to ask that your workaround be removed as part of that Jira.
Hey @mattyb149, I rebased against master to account for the changes made by Mark in the test framework. Let me know if there is something else. |
@@ -233,6 +234,7 @@ private FunctionContext(boolean rollbackOnFailure, Charset charset, String state | |||
} | |||
|
|||
// Execute the statement | |||
stmt.setQueryTimeout(context.getProperty(QUERY_TIMEOUT).evaluateAttributeExpressions(flowFile).asInteger()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the driver doesn't support this (at least for Apache Hive 1.2.1), it will throw an exception, perhaps wrap these calls in a try/catch and ignore any errors (as they would have been presented to the user via doc and validation)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, just pushed a commit for PutHiveQL and SelectHiveQL. Thanks!
problems.add(new ValidationResult.Builder() | ||
.subject("Query Timeout") | ||
.valid(false) | ||
.explanation(e.getLocalizedMessage()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message is "Method not supported", I can add more text around before merge, such as "setQueryTimeout caused the driver to report "+e.getLocalizedMessage() or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sure @mattyb149 !
@@ -232,6 +233,13 @@ private FunctionContext(boolean rollbackOnFailure, Charset charset, String state | |||
getLogger().warn("Failed to parse hiveQL: {} due to {}", new Object[]{hiveQL, e}, e); | |||
} | |||
|
|||
try { | |||
// set query timeout | |||
stmt.setQueryTimeout(context.getProperty(QUERY_TIMEOUT).evaluateAttributeExpressions(flowFile).asInteger()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately asInteger() will throw a NumberFormatException if the EL evaluates to a blank string/null. Should we warn that we're defaulting to zero, or keep the current behavior which is an exception and rollback, perhaps adding another logging statement to clarify the one that will follow (NFE for input string "") ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That shouldn't happen with the defined validator on the property, no?
.addValidator(StandardValidators.INTEGER_VALIDATOR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, if Expression Language is present, INTEGER_VALIDATOR will return Valid as it may not be able to evaluate the EL at validation time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, forgot about EL! I think best is to keep current behavior with an exception and rollback but with another logging statement. I'll push a commit for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 LGTM. I don't have Hive running, but the unit tests run just fine and the change is straight-forward.
&& !validationContext.getProperty(QUERY_TIMEOUT).isExpressionLanguagePresent() | ||
&& validationContext.getProperty(QUERY_TIMEOUT).asInteger() != 0) { | ||
try(HiveStatement stmt = new HiveStatement(null, null, null)) { | ||
stmt.setQueryTimeout(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You say in the description:
A value of 0 means no timeout. NOTE: Non-zero values may not be supported by the driver.
Shouldn't the test be on a non-zero value here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, in versions of the driver that does not implement this method, this call will throw an exception no matter what is the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True (at least for older Apache Hive drivers), but it is a tad confusing to see the if statement check for non-zero then test with zero.
Also, I can't find the discussion but I thought we were going to do similar error handling in the setTimeout() method below as we do in the customValidate, for when Expression Language is present but the driver doesn't support non-zero values. IIRC it would allow a query timeout of zero if the driver didn't support it, but if the user set it to a positive value and the driver didn't support it, it would throw an error (akin to being invalid if found in customValidate())?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean this one: #2138 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True (at least for older Apache Hive drivers), but it is a tad confusing to see the if statement check for non-zero then test with zero.
Do you suggest an arbitrary value like
stmt.setQueryTimeout(1); // just checking driver supports query timeout
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah anything but zero, just so it matches the logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I think yes, the idea is to catch the "not supported" thing during customValidate() where we can, but we also have to add it to the runtime checking because of EL
@@ -310,6 +311,15 @@ private void onTrigger(final ProcessContext context, final ProcessSession sessio | |||
try (final Connection con = dbcpService.getConnection(); | |||
final Statement st = (flowbased ? con.prepareStatement(selectQuery) : con.createStatement()) | |||
) { | |||
try { | |||
final int queryTimeout = context.getProperty(QUERY_TIMEOUT).evaluateAttributeExpressions(fileToProcess).asInteger(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fungible with the logic from PutHive
. Why not move it to the abstract base class as a method both can reuse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I just pushed a commit to address it.
04dfde7
to
aa004fe
Compare
@pvillard31 Mind doing a rebase here, and updating the QUERY_TIMEOUT property to use FlowFile Attribute scope? I pushed up a rebased branch with the additional commit (mattyb149@40b9d1d) but I don't know if you can cherry-pick from there or if you have to do your own rebase, then cherry-pick my additional commit (if you want to use it of course). |
e7d9451
to
f9e5e93
Compare
Done @mattyb149 - thanks! |
Hey @mattyb149 - I believe we added this one for Hive 3 processors but forgot this PR. I know you're not available at the moment, but just a reminder for when you're back ;) (or if someone else wants to merge it in) |
+ "A value of 0 means no timeout. NOTE: Non-zero values may not be supported by the driver.") | ||
.defaultValue("0") | ||
.required(true) | ||
.addValidator(StandardValidators.INTEGER_VALIDATOR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed this in the Hive 3 processors too, but shouldn't this be a NONNEGATIVE_INTEGER_VALIDATOR? Are there any situations in which a negative number has any semantics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True - will fix in AbstractHiveQLProcessor and AbstractHive3QLProcessor
finally got time to get back on this one... if you want to have another look @mattyb149 |
* @param flowFile flow file to evaluate expression language | ||
* @throws ProcessException exception in case configured value cannot be converted to an integer | ||
*/ | ||
protected void setTimeout(Statement stmt, ProcessContext context, FlowFile flowFile) throws ProcessException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a customValidate() for this like you added for AbstractHiveQLProcessor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... decided not to because I figured that all the drivers on Hive3 and beyond would implement the method now. I thought it would make less validation work when the framework is validating the processor. But I guess we could choose to be on the safest side and add it anyway. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned it only for consistency between the two sets of processors, actually I'd be fine with not even putting the setQueryTimeout in a try/catch for the Hive 3 processors, now that it's supported, I don't imagine there are any drivers that wouldn't support it at this point.
Either way you want to go is fine with me, let me know what you end up with (even if it is what you already have :) and I'll finish the review and merge, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The try/catch is still valid in case of error during the EL evaluation (when EL is used to set the timeout value). I could remove the setTimeout in the abstract class and just use setQueryTimeout directly in the Select/Put classes but I'd still have to define how I deal with SQLException and NumberFormatException. Honestly I think it's better to keep it as-is to have some sort of consistency between the two bundles.
We're marking this PR as stale due to lack of updates in the past few months. If after another couple of weeks the stale label has not been removed this PR will be closed. This stale marker and eventual auto close does not indicate a judgement of the PR just lack of reviewer bandwidth and helps us keep the PR queue more manageable. If you would like this PR re-opened you can do so and a committer can remove the stale tag. Or you can open a new PR. Try to help review other PRs to increase PR review bandwidth which in turn helps yours. |
Thank you for submitting a contribution to Apache NiFi.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes:
Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically master)?
Is your initial contribution a single, squashed commit?
For code changes:
For documentation related changes:
Note:
Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.