Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-8113 Adding persistent status history repository backed by embed… #4821

Closed
wants to merge 1 commit into from

Conversation

simonbence
Copy link
Contributor

@simonbence simonbence commented Feb 11, 2021

This would be a proposal for having persistent status history for both component and node level metrics. During the implementation I tried to balance between introducing some flexibility in usage and supporting the existing behaviour. As an end result, I did split the component status repository (now it is StatusRepository) into one which is responsible for component metrics and an other responsible for node level metrics. They might be configured independently on some level (like it is possible to store data into a different storage or for a different amount of time window), but in order to support the previous configuration, there is a facade for them which provides the same composite service as it did before (component and node).

As for code organisational part I worked with three concepts: repository is the top level entity, provides the service for the clients (FlowController, etc.). In general, this is the same as before, only split into two parts: node and component. The storage classes are part of the repositories and are responsible for manage the details of a given type, like processor status, node status, etc. Finally the WriterTemplates and ReaderTemplates are merely helpers exist to deal with QuestDB API calls.

Some remarks on design decisions:

  • I kept the name for VolatileComponentStatusRepository, but the actual repositories are using the prefix InMemory to show more contrast with QuestDB
  • The PR contains a small fix on ProcessGroupStatusDescriptor#calculateTaskMillis(): previously every subsequent level makes a nano->milli conversion, which accumulates, reducing the task time in children groups into 0. Now this should be fixed (The QuestDB tests are implicitly proves this as well)
  • Configuration contains the new parameters but they are commented out. At this point I think, the VolatileComponentStatusRepository should be kept as default
  • The implementation depends on the latest of the 4.X version of QuestDB. Currently QuestDB is at 5.0.5, but the 5.X line depends on Java 11. There are some non vital features in 5.X (all_tables request, dropping partition using WHERE closure), but these are not unavoidable for us.

Thank you for investing your time to my PR!
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced
    in the commit message?

  • Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

  • Has your PR been rebased against the latest commit within the target branch (typically main)?

  • Is your initial contribution a single, squashed commit? Additional commits in response to PR reviewer feedback should be made on this branch and pushed to allow change tracking. Do not squash or use --force when pushing to allow for clean monitoring of changes.

For code changes:

  • Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
  • Have you written or updated unit tests to verify your changes?
  • Have you verified that the full build is successful on JDK 8?
  • Have you verified that the full build is successful on JDK 11?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
  • If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
  • If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check GitHub Actions CI for build issues and submit an update to your PR as soon as possible.

Copy link
Contributor

@turcsanyip turcsanyip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonbence I ran through the code and it looks good overall.
Added some comments so far. Will continue with testing/checking it in more detail.

private final int numDataPoints;
private volatile long lastCaptureTime = 0L;
private final NodeStatusRepository nodeStatusRepository;
private final ComponentStatusRepository componentStatusRepository;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As VolatileComponentStatusRepository provides backward compatibility for the in memory storage and in the background it just delegates requests to the new InMemory*StatusRepositories, using those classes instead of the interface types would fit to the role of this class better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right with that, however the instance creation is managed by the InMemoryStatusRepositoryBuilder, which is an implementation for StatusRepositoryBuilder. This comes with that it will not expose the implementation class it returns with and I wish not to expose that. (Also: adding an other way to create the instances would bring in code duplication and unnecessary complexity in my opinion)

# nifi.status.repository.builder.inmemory=org.apache.nifi.controller.status.history.InMemoryStatusRepositoryBuilder
# nifi.status.repository.builder.persistent=org.apache.nifi.controller.status.history.EmbeddedQuestDbStatusRepositoryBuilder
# nifi.status.repository.roles.component=persistent
# nifi.status.repository.roles.node=inmemory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the new InMemory implementation should be the default instead of adding these lines commented out.
As far as I understand, the backward compatibility point here is to support old nifi.properties files where only nifi.components.status.repository.implementation property exists. And it is provided / works even if the default is not the old property.
New installations could (should) go with the new configuration / implementation classes.

@simonbence simonbence force-pushed the NIFI-8113 branch 2 times, most recently from 0af5bd7 to a31493b Compare February 17, 2021 09:50
Copy link
Contributor

@markap14 markap14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @simonbence! This is a VERY powerful addition! Code looks good to me. Did some minimal testing, but I will want to do some more testing before fully giving it a +1. The only real concern that I had with the PR is the update to the nifi.properties. There's a LOT going on there. I think there are 2 reasons for that. Firstly, there are some properties that I think really can be removed - they add flexibility but IMO aren't really necessary and add to the complexity of configuring. Secondly, the properties tend to have a lot of comments associated with them. We need to instead keep comments fairly minimal and update the Admin Guide to fully document each of these properties. What effect will they have, etc.

Otherwise, all looks good, but I'll continue doing some testing!

) {
while (cursor.hasNext()) {
final Record record = cursor.getRecord();
result.add(new StringBuilder(record.getStr(0)).toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we creating a StringBuilder here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Record#getStr returns with a CharSequence of some kind. I found StringBuilder the most elaborate way to ensure the correct content will be extracted as with most other ways only a #toString would be called.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to just use record.getStr(0).toString() - to call the toString method of CharSequence directly? With that, if the object that is returned happens to be a String object (which implements CharSequence) then the toString() method simply returns this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The record return with an undetermined implementation of CharSequence (actually it's CharSequenceView, which is an internal implementation of the QuestDB) where it is not guaranteed that the toString will be implemented, or implemented properly. I was striving to keep in the safe side

@@ -113,16 +113,16 @@ public String getField() {


private static long calculateTaskMillis(final ProcessGroupStatus status) {
long nanos = 0L;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these nanos being changed to millis? This leads to a lot of rounding errors, resulting in the data being both less precise and less accurate. By holding onto nanos and converting once at the end, it's also more efficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I mentioned this at the PR description, but the point would be to avoid loss of information: with the original code, at every level of the recursion we did a nano > millis conversion, but the caller (one level up in the recursion) would consider the result as nano. Thus, the deeper we are in the group structure, the more times we make a conversion, which looks to be incorrect.

If you still think that this comes with rounding errors, what I would suggest is to introduce a calculateTaskNanos, which would handle the recursion and work without converting, ant the calculateTaskMillis would call this and converting the end result only once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this would solve both issues:

private static long calculateTaskMillis(final ProcessGroupStatus status) {
    return TimeUnit.MILLISECONDS.convert(calculateTaskNanos(status), TimeUnit.NANOSECONDS);
}

private static long calculateTaskNanos(final ProcessGroupStatus status) {
   long nanos = 0L;

   for (final ProcessorStatus procStatus : status.getProcessorStatus()) {
       nanos += procStatus.getProcessingNanos();
   }

   for (final ProcessGroupStatus childStatus : status.getProcessGroupStatus()) {
       nanos += calculateTaskNanos(childStatus);
   }

   return nanos;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Yes, I think this is a good approach, to calculate recursively using nanos and then converting to millis only after the recursive call.

nifi.components.status.repository.implementation=${nifi.components.status.repository.implementation}

# Builder based specification. Gives the possibility to store Node and Component Status History information in different storage solutions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All properties that get added to this file need to be fully documented in the administration-guide.adoc in nifi-docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guide update is underway, I will add it soonish

# nifi.status.repository.questdb.component.id.distinctvalues=${nifi.status.repository.questdb.component.id.distinctvalues}
# If true, it turns on Java heap based caching for quicker lookup. This increases selection speed but consumes heap memory.
# nifi.status.repository.questdb.component.id.cached=${nifi.status.repository.questdb.component.id.cached}
# Turns on indexing of the component id field. For further details please see https://questdb.io/docs/concept/indexes/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While referencing the QuestDB docs may provide some additional insights, we should not expect users to understand how QuestDB works. That is simply an implementation detail. We need to ensure that we fully document exactly how this property will affect the user, given the context of NiFi. We should do this in the administration guide, though, rather than add too much to the nifi.properties.

Comment on lines 131 to 134
# nifi.status.repository.builder.inmemory=org.apache.nifi.controller.status.history.InMemoryStatusRepositoryBuilder
# nifi.status.repository.builder.persistent=org.apache.nifi.controller.status.history.EmbeddedQuestDbStatusRepositoryBuilder
# nifi.status.repository.roles.component=persistent
# nifi.status.repository.roles.node=inmemory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that I see the benefit to adding these properties at all. If the user wants to persist the data, it should be persisted. If they want to keep it in memory, it should be kept in memory. These properties become confusing and add dubious value. We should lean more toward simple configuration vs. more raw power when we're able to.

Recommend removing all 4 of these properties. Instead, just allow the QuestDB Status Repository to be configured via the nifi.components.status.repository.implementation property, in which case all stats are persistent. If Volatile repo is used, store everything in memory.

Comment on lines 141 to 143
# nifi.status.repository.questdb.persist.frequency=${nifi.status.repository.questdb.persist.frequency}
# nifi.status.repository.questdb.persist.roll.frequency=${nifi.status.repository.questdb.persist.roll.frequency}
# nifi.status.repository.questdb.persist.batch.size=${nifi.status.repository.questdb.persist.batch.size}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These properties also seem too complex to me. Admins shouldn't need to guess what an appropriate "batch size" is for storing metrics. We should try to keep this as simple as possible and just configure how frequently we capture a snapshot. Can always add in additional properties later, if necessary, for tuning. Just don't want to overwhelm users with 15 additional properties when all the user really cares about is "I want this persisted for longer and across restarts."

// Creating status repository based on implementation class takes precedence over creation based on builder
final String implementationClassName = nifiProperties.getProperty(NiFiProperties.COMPONENT_STATUS_REPOSITORY_IMPLEMENTATION);

if (implementationClassName != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any time that we fetch a property value from nifi properties, we need to treat null the same as empty strings or strings with only white space. If the property name exists but with no value, you'll get back an empty string here instead of null.

@simonbence
Copy link
Contributor Author

I do abandon this review. Based on @markap14 's great comments, I simplified the configuration which resulted a more clean codebase as well. This comes with some changes I reverted and in order to keep the review readable I decided to create a new. All comments in this review are answered or aimed here.

@simonbence
Copy link
Contributor Author

Please find the predecessor PR here: PR 4839

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants