Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #12719: Some reports are duplicated between agent and postgres leading to "unexpected" compliance #1969

Conversation

fanf
Copy link
Member

@fanf fanf commented Jun 14, 2018

https://www.rudder-project.org/redmine/issues/12719

Not yet finished (remains the we selection part in Setting). But it's a fairly big change.
I'm quite confident on it, because we have a very good test coverage on that part.

So, basically, I completly changed the way we pair report <-> expected component value. This was needed to allows to be more lenient in case of ducplicate keys and unbounded variable size (from iterators).

The main modification are:

  • we have a parameter set to keep what is the unexpected interpretation: UnexpectedReportInterpretation
  • we don't sort anymore the kind of component values in 3 cases (None, simple value, var pattern). In place of that, we create for each component value a Value utility class with the following info:
    • value / unexpanded value: the two component value
    • cardinality: number of expected reports, most of the time '1' but can be increased for out of bound vars (with iterators)
    • numberDuplicates: the number of dropped duplicated message for that pairing, so that we can know how bad syslog is. Should be 0. This is used to warn looder and looder the user when we have more duplicates
    • isVar: a boolean telling us if the expected reports contains a variable pattern.
    • pattern: the value as a pattern. This is defined in all cases, even for None and simple value - they just have a very specific pattern that only match the corresponding exact value. This is the main difference compared to before, and it allows to process ALL value with the same algo.
    • specificity: how specific the pattern is. That allows to start with more specific patterns and avoid bug https://www.rudder-project.org/redmine/issues/7758. For ex, .* is 0 (not specific at all), foobarbaz is 9.
    • matchingReports : the list of reports whose keyValue match pattern. Most of the time, we only have one. More lead to unexpected (modulo unexpected special interpretations).

With that, and as introduced in "pattern" description, we are able to process all reports of a component in only one pass with the recPairReports algorithm.
The mains ideas of the algo are:

  • we sort expected values and reports by specificy to avoid bug 7758,
  • we have two stack of values: the one not yet paired (all values at begining), and the one already paired (empty at begining)
  • then, for each report we look for the first not-yet-paired value whose pattern accept report keyValue.
  • if non not-yet-paired value matches, we try the same thing with already paired (but that will lead to an unexpected modulo interpretation settings)
  • if nothing matches, it's a real unexpected reports.

Once all reports are processed, we merge the two stack of values, and we transform them in components value status (the still free value lead to missing, the one with one report get the report status, etc). And THAT'S ALL.

There's a subpart of the algo which is in charge of finding the matching value for the current report: recPairReports. It's almost just a loop on all values to consider, with some special cases about what to do in case of a duplicate key or an out of bound var and Rudder settings.

@fanf
Copy link
Member Author

fanf commented Jun 16, 2018

PR rebased

@fanf fanf force-pushed the bug_12719/some_reports_are_duplicated_between_agent_and_postgres_leading_to_unexpected_compliance branch from 1804e5f to e746f20 Compare June 16, 2018 21:29
@fanf
Copy link
Member Author

fanf commented Jun 16, 2018

PR rebased

@fanf fanf force-pushed the bug_12719/some_reports_are_duplicated_between_agent_and_postgres_leading_to_unexpected_compliance branch from e746f20 to f5590ba Compare June 16, 2018 21:55
Copy link
Member

@ncharles ncharles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great change, thank you.
I have some minor remarks:

  • can you correct the typos ?
  • the log message are not that meaning ful - especially for the unexpected report, we used to have super clear unexpected logger, and it seems it"s not the case
  • there is a full list traversal when there's a not match, and the both option on unexpected are disabled - we should skip that.

Thank you !


// Find what reports matche what cfengine variables
val (newPairedValues, pairedAgain) = findMachingValue(report, pairedValues, duplicate, unboundedVar)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if both duplicate & unboundedVar are false, shouldn't we skip this step? we know it won't match again

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we need that to know what is the duplicated value and set the "unexpected" accordingly.
(it may also happen that none match again, but it is an other case).

// design). The log level should be "info" and not more because it was chosen by configuration to ignore them.
// - in some case, we want to accept more reports than originally expected. Then, we must update cardinality to
// trace that decision. It's typically what we want to do for
def findMachingValue(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: findMatchingValue

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

S.appendJs(JsRaw(s"""$$("input, select").prop("disabled",${disable})"""))
// else nothing is done because it enables what should not be.
if(!CurrentUser.checkRights(Edit("administration"))) {
S.appendJs(JsRaw(s"""$$("input, select").prop("disabled","disabled")"""))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are changing the semantic there, from prop("disabled", true) to prop("disabled", "disabled")
I'm not sure this will work on all platform, and you should probably switch back to prop("disabled", true)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, "disabled" is the normalized term to use. JQuery does whatever is needed internally to make it consistant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, it's not: https://learn.jquery.com/using-jquery-core/faq/how-do-i-disable-enable-a-form-element/
so maybe internally jquery does what is expected, but it's not the canonical form

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -199,6 +199,7 @@ <h3>Security</h3>
<div class="deca">
<h3>Protocol</h3>
<div class="lift:administration.PropertiesManagement.networkProtocolSection" id="networkProtocolForm">
<div class="deca">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the change here ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no iea :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@@ -234,6 +236,91 @@ <h3>Protocol</h3>
<div id="agentPolicyMode" class="lift:administration.PropertiesManagement.agentPolicyMode"></div>
<div id="complianceMode" class="lift:administration.PropertiesManagement.complianceMode"></div>

<div class="inner-portlet">
<div class="page-title">Behaviour regarding Unexpected reports sent by node</div>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of "Unexpected reports interpretation" or "Unexpected reports semantic" ?
ping @peckpeck

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use "interpretation" internaly, it's good.

<div class="marker">
<span class="glyphicon glyphicon-info-sign"></span>
</div>
The two following settings affect the interpretation given to some kind of unexpected reports when calculating compliance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

he two following settings affect the interpretation given to some type of unexpected reports when computing compliance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

<span class="glyphicon glyphicon-info-sign"></span>
</div>
The two following settings affect the interpretation given to some kind of unexpected reports when calculating compliance.
These option will take effects when the next reports are received on a node or if you "clear caches" below.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e received from a node or if you "clear caches" below.

Due to the underlying protocol used to send compliance reports back to the policy server (syslog),
it may happen that some reports, for an unitary control point, are duplicated. In that case, compliance for
the corresponding element will be "unexpected": Rudder was awaiting one report, but it got two.
The chance to have a real error reported as a ducplicated message are very low because you should
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: duplicated

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

packages and you use the resulting variable in "Package present" generic method for the "name" parameter,
it is normal and expected to get several compliance reports, one for each configuration value.
<br>
That option allows to rise the number of expected reports to the number of configuration values and so
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allows to increase

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

That option allows to rise the number of expected reports to the number of configuration values and so
it avoids to get and "unexpected" compliance in that case.
<br>Unless it is more important for you to get "unexpected" compliance than the actual compliance of each
configuration value, you should check that option.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure I understand this sentence

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to find a good reason to not use that option, and beside the love of unexpected, I don't see any.

@fanf fanf force-pushed the bug_12719/some_reports_are_duplicated_between_agent_and_postgres_leading_to_unexpected_compliance branch from f5590ba to c65b855 Compare June 19, 2018 07:15
@fanf
Copy link
Member Author

fanf commented Jun 19, 2018

PR rebased

@@ -329,6 +339,8 @@ class LDAPBasedConfigService(configFile: Config, repos: ConfigRepository, workfl
rudder.policy.mode.name=${Enforce.name}
rudder.policy.mode.overridable=true
rudder.featureSwitch.directiveScriptEngine=enabled
rudder.compliance.unexpectedReportAllowsDuplicate=true
rudder.compliance.unexpectedReportUnboundedVarValues=true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't that mean that for upgrading user, the behavious will change ? Or will there be a migration script as well ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does mean that. I believe it is the correct behavior as the previous behavior was in case that we are aware of considerated as a bug. I will need a special highligh in the changelog, but I really see no reason to not make that behavior the default one, even for migrating users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Due to the underlying protocol used to send compliance reports back to the policy server (syslog),
it may happen that some reports, for an unitary control point, are duplicated. In that case, compliance for
the corresponding element will be "unexpected": Rudder was awaiting one report, but it got two.
The chance to have a real error reported as a duplicated message are very low because you should
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The risk to have a real error reported as a duplicated messages is very low, as messages need to have the same timestamp and information message to be considered duplicated on a given node

The chance to have a real error reported as a duplicated message are very low because you should
always have at least the report timestamp or its information message unique for a given directive in a given rule
on a given node.
<br>That option will ignore the duplicated message in compliance calcul and log an information level
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in comlpiance computation

<br>That option will ignore the duplicated message in compliance calcul and log an information level
message about that duplication. A double duplication will still be ignored but with a warning log, and more
duplicates will lead to an error log and an unexpected compliance.
<br/>It is safe to ignore duplicated message and you should check that option.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicated reports

// design). The log level should be "info" and not more because it was chosen by configuration to ignore them.
// - in some case, we want to accept more reports than originally expected. Then, we must update cardinality to
// trace that decision. It's typically what we want to do for
def findMatchinValue(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: findMatchingValue

//
val values = expectedComponent.groupedComponentValues.toList.map { case(v, u) =>
val isVar = matchCFEngineVars.pattern.matcher(v).matches()
val pattern = replaceCFEngineVars(v)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we evaluate the pattern only if "isVar" is true ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, we need a pattern for all component values, to be able to only works on pattern matching in findMatchingValue()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it super expensive ? Plus from the doc of replaceCFEngineVars, it should be called only for string containing a cfengine var

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern is compiled. Maching a \Qstring\E versus string == is almost the same (there is pattern init / compilation overhead. But we were already doing that pattern test before, and in fact many more times:

  • one time for each component values to check if they were a cfengine var,
  • then in getUnexpectedReports we were doing for each reports, potentially for each value (if the report was not expected or matching the last component value) at least one regex eval to know if the component val is a cfengine var, and then a pattern matches.
    And then, we were doing (number of cfe var)^(number of reports) pattern matching.

Now, we are doing exactly 2 regex compile for each component value, plus O(nb cpt value x reports) pattern matches (inexpensive in the case of values). Plus we sort pattern by specificity, which help. I don't think we can do better if we want to keep our hypothesis. In the happy path (each report is expected), reports are pattern matched one time only.

And for the "should be a cfengine var", it only means that you will get a static pattern if it's not the case. Which is what we want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok -thank you for the explanation.

found match {
case Some(x) => (value :: stack, Some(x))
case None =>
if(value.pattern.matcher(report.keyValue).matches()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm probably missing something something there - we are only only evaluating value if they match a pattern, but most values don't have a pattern - reevaluating twice a pattern (once when we detect if there is a variable, then if it match) may be pretty costly - even more as we have a quadratic complextity

Copy link
Member Author

@fanf fanf Jun 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are static patterns, like \Qfoo\E. The matches of such a pattern is in the same order of cost as string equality (there is overhead with the matcher class creation, but the actual matching is the same while loop on chars).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Member

@ncharles ncharles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This a great change - can you correct some typos/wording ?

@fanf
Copy link
Member Author

fanf commented Jun 21, 2018

PR rebased

@fanf fanf force-pushed the bug_12719/some_reports_are_duplicated_between_agent_and_postgres_leading_to_unexpected_compliance branch from c65b855 to 953b8fc Compare June 21, 2018 15:39
@Normation-Quality-Assistant
Copy link
Contributor

This PR is not mergeable to upper versions.
Since it is "Ready for merge" you must merge it by yourself using the following command:
rudder-dev merge https://github.com/Normation/rudder/pull/1969
-- Your faithful QA

@fanf
Copy link
Member Author

fanf commented Jun 24, 2018

OK, merging this PR

@fanf fanf merged commit 953b8fc into Normation:branches/rudder/4.1 Jun 24, 2018
@fanf fanf deleted the bug_12719/some_reports_are_duplicated_between_agent_and_postgres_leading_to_unexpected_compliance branch March 15, 2024 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants