Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-8123: Added support for --wait-for-init when NiFi started with t… #4748

Closed
wants to merge 3 commits into from

Conversation

pgyori
Copy link
Contributor

@pgyori pgyori commented Jan 7, 2021

…he 'start' command

https://issues.apache.org/jira/browse/NIFI-8123

Description of PR

Enables starting NiFi with "bin/nifi.sh start --wait-for-init N" (where N is a positive integer) in which case the startup script will wait (up to N seconds) for NiFi to finish loading all components before returning.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced
    in the commit message?

  • Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

  • Has your PR been rebased against the latest commit within the target branch (typically main)?

  • Is your initial contribution a single, squashed commit? Additional commits in response to PR reviewer feedback should be made on this branch and pushed to allow change tracking. Do not squash or use --force when pushing to allow for clean monitoring of changes.

For code changes:

  • Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
  • Have you written or updated unit tests to verify your changes?
  • Have you verified that the full build is successful on JDK 8?
  • Have you verified that the full build is successful on JDK 11?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
  • If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
  • If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check GitHub Actions CI for build issues and submit an update to your PR as soon as possible.

Copy link
Contributor

@markap14 markap14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvement @pgyori! I think this is helpful, but there are a few rough corners that I think we should try to round out. Added some comments inline. A couple of the comments are very minor. I do think it's important, though, that if NiFi fails to startup that we stop polling/waiting for the initialization to complete.

@@ -215,6 +220,12 @@ public void run() {

writeDiagnostics(socket.getOutputStream(), verbose);
break;
case IS_LOADED:
logger.info("Received IS_LOADED request from Bootstrap");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably make this a DEBUG log. Since this is happening every 2 seconds, we don't really want to see it constantly in the logs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, thanks! Modified it in the next commit.

echo "Exited the script due to --wait-for-init timeout"
break;
fi
is_nifi_loaded=$( eval "cd ${NIFI_HOME} && ${run_bootstrap_cmd} is_loaded" )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If NiFi takes a while to load, the user experience can leave the use wondering if anything is happening, or if something is 'stuck', etc. Probably best to provide some feedback here. Perhaps just an echo NiFi has not fully initialized yet.... type of thing. Perhaps not every 2 seconds though. Maybe every 5 iterations (10 seconds) or so would make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. The $( eval "... is_loaded" ) call also takes a little time (besides the 2-second sleep), so I introduced a time-based feedback "mechanism" that will write on the standard output approximately every 10 seconds.

final Logger logger = defaultLogger;
final Integer port = getCurrentPort(logger);
if (port == null) {
logger.info("Apache NiFi is not currently running");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If NiFi fails to start, we see this in the logs. However, the --wait-for-init results in this constantly running, logging this over and over. If we encounter a case where NiFi is not running, it should not continue waiting for it to initialize, since we know that it will never complete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the code so that when the isNiFiFullyLoaded() method runs into this if branch, the nifi.sh script will detect that NiFi hasn't been started so there is no point in waiting. This logic is built on the premise that (port == null) means that NiFi hasn't been started. However, I found that when calling 'bin/nifi.sh start --wait-for-init', the first time isNiFiFullyLoaded() is executed happens before NiFi manages to reserve itself a port. So in my current solution (next commit) in the nifi.sh script we only assume that NiFi hasn't even been started if isNiFiFullyLoaded() runs into (port == null) 3 times (2 seconds pass between each call). Of course, it can result in exiting the nifi.sh script if it takes more than 4 seconds for NiFi to reserve a port.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you consider to extract the log message into a constant as it is duplicated at 746. and 815. rows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this particular case, I would prefer to leave it like that. There are many log messages in this class that would need to be extracted as constants, and the part of the class where the constants are, would be 'littered' with the crowd of log messages. In this case, I think extracting the messages would decrease readability.

WAIT_FOR_INIT_DEFAULT_TIMEOUT increased to 15 minutes.
IS_LOADED only logged on DEBUG level.
The scripts sends feedback to the user about the initialization status (every 10 seconds).
If NiFi fails to start, the script exits.
logger.debug("Established connection to NiFi instance.");
socket.setSoTimeout(60000);

logger.debug("Sending " + request + " Command to port {}", port);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use built-in formatting instead of concatenation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. The fix will be in my next commit.

final OutputStream socketOut = socket.getOutputStream();

final Properties nifiProps = loadProperties(logger);
final String secretKey = nifiProps.getProperty("secret.key");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

secretKey hides the variable at row 124, can it cause any problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Each running NiFi instance has 1 secret key. We should not rely on the instance variable, as the bootstrap process can (and in this wait-for-init feature it IS) started over and over again 'independently' from the running NiFi instance, so the value of the secret key needs to be loaded from the property file.


private static class NiFiNotRunningException extends Exception {
@Override
public Throwable fillInStackTrace() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you make this synchronized to match parent class implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks!

@@ -46,6 +46,7 @@

private volatile Listener listener;
private volatile ServerSocket serverSocket;
Copy link
Contributor

@Lehel44 Lehel44 Jan 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is not your modification but this one is interesting, so just pointing out: objects shouldn't be marked volatile. "Marking a mutable object field volatile means the object reference is volatile but the object itself is not, and other threads may not see updates to the object state... For mutable objects, the volatile should be removed, and some other method should be used to ensure thread-safety, such as synchronization, or ThreadLocal storage."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, in the cases above, it actually is the reference that needs to be volatile since the value of the reference is changed from multiple threads and we need to make sure that no matter which thread reads the object behind this reference, it gets to the current object and not an outdated previous instance.

Built-in formatting used for a log message instead of concatenation.
A custom exception class' method made synchronized.
@markap14 markap14 closed this in 5a8b18b Feb 3, 2021
@markap14
Copy link
Contributor

markap14 commented Feb 3, 2021

Great work @pgyori, this all looks good to me! Did some verification of corner cases such as causing nifi to die during startup, etc. All appears to work as expected. +1 merged to main. Thanks!

@pgyori
Copy link
Contributor Author

pgyori commented Feb 3, 2021

Thank you, @markap14 !

driesva pushed a commit to driesva/nifi that referenced this pull request Mar 19, 2021
…he 'start' command

This closes apache#4748.

Signed-off-by: Mark Payne <markap14@hotmail.com>
krisztina-zsihovszki pushed a commit to krisztina-zsihovszki/nifi that referenced this pull request Jun 28, 2022
…he 'start' command

This closes apache#4748.

Signed-off-by: Mark Payne <markap14@hotmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants