Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-4371 - add support for query timeout in Hive processors #2138

Closed
wants to merge 4 commits into from

Conversation

pvillard31
Copy link
Contributor

@pvillard31 pvillard31 commented Sep 9, 2017

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced
    in the commit message?

  • Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

  • Has your PR been rebased against the latest commit within the target branch (typically master)?

  • Is your initial contribution a single, squashed commit?

For code changes:

  • Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
  • Have you written or updated unit tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
  • If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
  • If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

@@ -290,6 +292,9 @@ public void process(final OutputStream out) throws IOException {
}
}

// set query timeout
st.setQueryTimeout(queryTimeout);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing to be done here necessarily, but just wanted to mention that the success of this method is dependent on the customValidate() preventing the processor from running if the Hive JDBC driver does not support setQueryTimeout.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the point of the customValidate but I'd keep its check but mark it as valid whether the timeout method is supported or not. YOu can set some processor instance boolean to advise whether it is supported then in this call/check set the timeout if it is and ignore it if not. Otherwise we're requiring them to use a version of hive which supports it and that seems a little heavy handed. It is probably a good idea to do the current validation logic in some other lifecycle call like 'onAdded' or something, warn if timeouts not supported and point out hanging threads are possible, and keep going.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disregard what i said. matt pointed out that the code checks IF a timeout was set and is non-zero. Based on that I think the way it was implemented is awesome.

.name("hive-query-timeout")
.displayName("Query timeout")
.description("Sets the number of seconds the driver will wait for a query to execute. "
+ "A value of 0 means no timeout. This feature is available starting with Hive 2.1.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove the 'This feature is available starting with Hive 2.1". Otherwise this all lgtm

.name("hive-query-timeout")
.displayName("Query timeout")
.description("Sets the number of seconds the driver will wait for a query to execute. "
+ "A value of 0 means no timeout. This feature is available starting with Hive 2.1.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The part about the feature availability is nice if/when there's a choice, but for now it's not technically germane, since the Hive NAR ships with a particular version of Hive. I will remove that part and merge, the rest LGTM :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we replace the "This feature is available..." sentence with "NOTE: Non-zero values may not be supported by the driver"

@mattyb149
Copy link
Contributor

The "unit tests" for TestSelectHiveQL use Derby as the database, only to test the functionality of getting the "HiveQL" statement to the database and parsing its results. In that vein, Derby supports setQueryTimeout (DERBY-31), so can we add a unit test that sets the value, to exercise that part of the code?

@pvillard31
Copy link
Contributor Author

Thanks for the review @mattyb149 and @joewitt. I updated the property description based on your comments. Regarding the unit test, since I'm using a HiveStatement object in the custom validate method, I'm not sure I can easily test the property (it'll always fail in a default build even with the Derby backend). And, if adding a unit test where I expect the validation to raise an error, this test may not have the expected result if using different profiles in the maven build.

@mattyb149
Copy link
Contributor

I'm getting NPEs in the unit tests, something weird with MockPropertyValue getting created without "expectExpressions" being set to anything, causing isExpressionLanguagePresent() to throw the NPE

@pvillard31
Copy link
Contributor Author

I already noticed this error while working on others PRs (I'm a bit surprised I didn't notice the NPE on this PR...). It's because we're checking if the processor is valid before enabling expression validation (https://github.com/apache/nifi/blob/master/nifi-mock/src/main/java/org/apache/nifi/util/StandardProcessorTestRunner.java#L169). We can't really do it the other way around without changing a lot of things.

I updated the PR to just check if expectExpressions is null and, if yes, return false. This way we can use isExpressionLanguagePresent() in a custom validate method.

@mattyb149
Copy link
Contributor

I talked to @markap14 about it, perhaps this fix is fine or we can just change it to a boolean, but I'll let him take a look too.

Copy link
Contributor

@mattyb149 mattyb149 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like possible merge conflicts, do you mind rebasing against the latest master? Please and thanks!

@@ -225,7 +225,7 @@ public String toString() {

@Override
public boolean isExpressionLanguagePresent() {
if (!expectExpressions) {
if (expectExpressions == null || !expectExpressions) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix (in whatever form it takes) should be as part of NIFI-4590. As a temporary fix in my own PR #2260 , I put a try/catch around isExpressionLanguagePresent() with a comment about NIFI-4590. You don't have to do that specifically, but if you can get around changing the mock stuff in this PR, I can merge and add a reference to your workaround in NIFI-4590, to ask that your workaround be removed as part of that Jira.

@pvillard31
Copy link
Contributor Author

Hey @mattyb149, I rebased against master to account for the changes made by Mark in the test framework. Let me know if there is something else.

@@ -233,6 +234,7 @@ private FunctionContext(boolean rollbackOnFailure, Charset charset, String state
}

// Execute the statement
stmt.setQueryTimeout(context.getProperty(QUERY_TIMEOUT).evaluateAttributeExpressions(flowFile).asInteger());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the driver doesn't support this (at least for Apache Hive 1.2.1), it will throw an exception, perhaps wrap these calls in a try/catch and ignore any errors (as they would have been presented to the user via doc and validation)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, just pushed a commit for PutHiveQL and SelectHiveQL. Thanks!

problems.add(new ValidationResult.Builder()
.subject("Query Timeout")
.valid(false)
.explanation(e.getLocalizedMessage())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message is "Method not supported", I can add more text around before merge, such as "setQueryTimeout caused the driver to report "+e.getLocalizedMessage() or something like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sure @mattyb149 !

@@ -232,6 +233,13 @@ private FunctionContext(boolean rollbackOnFailure, Charset charset, String state
getLogger().warn("Failed to parse hiveQL: {} due to {}", new Object[]{hiveQL, e}, e);
}

try {
// set query timeout
stmt.setQueryTimeout(context.getProperty(QUERY_TIMEOUT).evaluateAttributeExpressions(flowFile).asInteger());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately asInteger() will throw a NumberFormatException if the EL evaluates to a blank string/null. Should we warn that we're defaulting to zero, or keep the current behavior which is an exception and rollback, perhaps adding another logging statement to clarify the one that will follow (NFE for input string "") ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That shouldn't happen with the defined validator on the property, no?

.addValidator(StandardValidators.INTEGER_VALIDATOR)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, if Expression Language is present, INTEGER_VALIDATOR will return Valid as it may not be able to evaluate the EL at validation time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, forgot about EL! I think best is to keep current behavior with an exception and rollback but with another logging statement. I'll push a commit for that.

Copy link
Contributor

@MikeThomsen MikeThomsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM. I don't have Hive running, but the unit tests run just fine and the change is straight-forward.

&& !validationContext.getProperty(QUERY_TIMEOUT).isExpressionLanguagePresent()
&& validationContext.getProperty(QUERY_TIMEOUT).asInteger() != 0) {
try(HiveStatement stmt = new HiveStatement(null, null, null)) {
stmt.setQueryTimeout(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You say in the description:

A value of 0 means no timeout. NOTE: Non-zero values may not be supported by the driver.

Shouldn't the test be on a non-zero value here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, in versions of the driver that does not implement this method, this call will throw an exception no matter what is the value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True (at least for older Apache Hive drivers), but it is a tad confusing to see the if statement check for non-zero then test with zero.

Also, I can't find the discussion but I thought we were going to do similar error handling in the setTimeout() method below as we do in the customValidate, for when Expression Language is present but the driver doesn't support non-zero values. IIRC it would allow a query timeout of zero if the driver didn't support it, but if the user set it to a positive value and the driver didn't support it, it would throw an error (akin to being invalid if found in customValidate())?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean this one: #2138 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True (at least for older Apache Hive drivers), but it is a tad confusing to see the if statement check for non-zero then test with zero.

Do you suggest an arbitrary value like

stmt.setQueryTimeout(1); // just checking driver supports query timeout

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah anything but zero, just so it matches the logic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think yes, the idea is to catch the "not supported" thing during customValidate() where we can, but we also have to add it to the runtime checking because of EL

@@ -310,6 +311,15 @@ private void onTrigger(final ProcessContext context, final ProcessSession sessio
try (final Connection con = dbcpService.getConnection();
final Statement st = (flowbased ? con.prepareStatement(selectQuery) : con.createStatement())
) {
try {
final int queryTimeout = context.getProperty(QUERY_TIMEOUT).evaluateAttributeExpressions(fileToProcess).asInteger();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fungible with the logic from PutHive. Why not move it to the abstract base class as a method both can reuse?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I just pushed a commit to address it.

@mattyb149
Copy link
Contributor

@pvillard31 Mind doing a rebase here, and updating the QUERY_TIMEOUT property to use FlowFile Attribute scope? I pushed up a rebased branch with the additional commit (mattyb149@40b9d1d) but I don't know if you can cherry-pick from there or if you have to do your own rebase, then cherry-pick my additional commit (if you want to use it of course).

@pvillard31 pvillard31 force-pushed the NIFI-4371 branch 2 times, most recently from e7d9451 to f9e5e93 Compare June 18, 2018 17:46
@pvillard31
Copy link
Contributor Author

Done @mattyb149 - thanks!

@pvillard31
Copy link
Contributor Author

Hey @mattyb149 - I believe we added this one for Hive 3 processors but forgot this PR. I know you're not available at the moment, but just a reminder for when you're back ;) (or if someone else wants to merge it in)

+ "A value of 0 means no timeout. NOTE: Non-zero values may not be supported by the driver.")
.defaultValue("0")
.required(true)
.addValidator(StandardValidators.INTEGER_VALIDATOR)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this in the Hive 3 processors too, but shouldn't this be a NONNEGATIVE_INTEGER_VALIDATOR? Are there any situations in which a negative number has any semantics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True - will fix in AbstractHiveQLProcessor and AbstractHive3QLProcessor

@pvillard31
Copy link
Contributor Author

finally got time to get back on this one... if you want to have another look @mattyb149

* @param flowFile flow file to evaluate expression language
* @throws ProcessException exception in case configured value cannot be converted to an integer
*/
protected void setTimeout(Statement stmt, ProcessContext context, FlowFile flowFile) throws ProcessException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a customValidate() for this like you added for AbstractHiveQLProcessor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... decided not to because I figured that all the drivers on Hive3 and beyond would implement the method now. I thought it would make less validation work when the framework is validating the processor. But I guess we could choose to be on the safest side and add it anyway. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned it only for consistency between the two sets of processors, actually I'd be fine with not even putting the setQueryTimeout in a try/catch for the Hive 3 processors, now that it's supported, I don't imagine there are any drivers that wouldn't support it at this point.
Either way you want to go is fine with me, let me know what you end up with (even if it is what you already have :) and I'll finish the review and merge, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The try/catch is still valid in case of error during the EL evaluation (when EL is used to set the timeout value). I could remove the setTimeout in the abstract class and just use setQueryTimeout directly in the Select/Put classes but I'd still have to define how I deal with SQLException and NumberFormatException. Honestly I think it's better to keep it as-is to have some sort of consistency between the two bundles.

@github-actions
Copy link

We're marking this PR as stale due to lack of updates in the past few months. If after another couple of weeks the stale label has not been removed this PR will be closed. This stale marker and eventual auto close does not indicate a judgement of the PR just lack of reviewer bandwidth and helps us keep the PR queue more manageable. If you would like this PR re-opened you can do so and a committer can remove the stale tag. Or you can open a new PR. Try to help review other PRs to increase PR review bandwidth which in turn helps yours.

@github-actions github-actions bot added the Stale label May 12, 2021
@github-actions github-actions bot closed this May 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants