New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #12719: Some reports are duplicated between agent and postgres leading to "unexpected" compliance #1969
Conversation
PR rebased |
1804e5f
to
e746f20
Compare
PR rebased |
e746f20
to
f5590ba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great change, thank you.
I have some minor remarks:
- can you correct the typos ?
- the log message are not that meaning ful - especially for the unexpected report, we used to have super clear unexpected logger, and it seems it"s not the case
- there is a full list traversal when there's a not match, and the both option on unexpected are disabled - we should skip that.
Thank you !
|
||
// Find what reports matche what cfengine variables | ||
val (newPairedValues, pairedAgain) = findMachingValue(report, pairedValues, duplicate, unboundedVar) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if both duplicate & unboundedVar are false, shouldn't we skip this step? we know it won't match again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we need that to know what is the duplicated value and set the "unexpected" accordingly.
(it may also happen that none match again, but it is an other case).
// design). The log level should be "info" and not more because it was chosen by configuration to ignore them. | ||
// - in some case, we want to accept more reports than originally expected. Then, we must update cardinality to | ||
// trace that decision. It's typically what we want to do for | ||
def findMachingValue( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: findMatchingValue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
S.appendJs(JsRaw(s"""$$("input, select").prop("disabled",${disable})""")) | ||
// else nothing is done because it enables what should not be. | ||
if(!CurrentUser.checkRights(Edit("administration"))) { | ||
S.appendJs(JsRaw(s"""$$("input, select").prop("disabled","disabled")""")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are changing the semantic there, from prop("disabled", true) to prop("disabled", "disabled")
I'm not sure this will work on all platform, and you should probably switch back to prop("disabled", true)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, "disabled" is the normalized term to use. JQuery does whatever is needed internally to make it consistant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently, it's not: https://learn.jquery.com/using-jquery-core/faq/how-do-i-disable-enable-a-form-element/
so maybe internally jquery does what is expected, but it's not the canonical form
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -199,6 +199,7 @@ <h3>Security</h3> | |||
<div class="deca"> | |||
<h3>Protocol</h3> | |||
<div class="lift:administration.PropertiesManagement.networkProtocolSection" id="networkProtocolForm"> | |||
<div class="deca"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the change here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no iea :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
@@ -234,6 +236,91 @@ <h3>Protocol</h3> | |||
<div id="agentPolicyMode" class="lift:administration.PropertiesManagement.agentPolicyMode"></div> | |||
<div id="complianceMode" class="lift:administration.PropertiesManagement.complianceMode"></div> | |||
|
|||
<div class="inner-portlet"> | |||
<div class="page-title">Behaviour regarding Unexpected reports sent by node</div> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of "Unexpected reports interpretation" or "Unexpected reports semantic" ?
ping @peckpeck
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use "interpretation" internaly, it's good.
<div class="marker"> | ||
<span class="glyphicon glyphicon-info-sign"></span> | ||
</div> | ||
The two following settings affect the interpretation given to some kind of unexpected reports when calculating compliance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
he two following settings affect the interpretation given to some type of unexpected reports when computing compliance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
<span class="glyphicon glyphicon-info-sign"></span> | ||
</div> | ||
The two following settings affect the interpretation given to some kind of unexpected reports when calculating compliance. | ||
These option will take effects when the next reports are received on a node or if you "clear caches" below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e received from a node or if you "clear caches" below.
Due to the underlying protocol used to send compliance reports back to the policy server (syslog), | ||
it may happen that some reports, for an unitary control point, are duplicated. In that case, compliance for | ||
the corresponding element will be "unexpected": Rudder was awaiting one report, but it got two. | ||
The chance to have a real error reported as a ducplicated message are very low because you should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: duplicated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
packages and you use the resulting variable in "Package present" generic method for the "name" parameter, | ||
it is normal and expected to get several compliance reports, one for each configuration value. | ||
<br> | ||
That option allows to rise the number of expected reports to the number of configuration values and so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allows to increase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
That option allows to rise the number of expected reports to the number of configuration values and so | ||
it avoids to get and "unexpected" compliance in that case. | ||
<br>Unless it is more important for you to get "unexpected" compliance than the actual compliance of each | ||
configuration value, you should check that option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure I understand this sentence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to find a good reason to not use that option, and beside the love of unexpected, I don't see any.
f5590ba
to
c65b855
Compare
PR rebased |
@@ -329,6 +339,8 @@ class LDAPBasedConfigService(configFile: Config, repos: ConfigRepository, workfl | |||
rudder.policy.mode.name=${Enforce.name} | |||
rudder.policy.mode.overridable=true | |||
rudder.featureSwitch.directiveScriptEngine=enabled | |||
rudder.compliance.unexpectedReportAllowsDuplicate=true | |||
rudder.compliance.unexpectedReportUnboundedVarValues=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't that mean that for upgrading user, the behavious will change ? Or will there be a migration script as well ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does mean that. I believe it is the correct behavior as the previous behavior was in case that we are aware of considerated as a bug. I will need a special highligh in the changelog, but I really see no reason to not make that behavior the default one, even for migrating users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
Due to the underlying protocol used to send compliance reports back to the policy server (syslog), | ||
it may happen that some reports, for an unitary control point, are duplicated. In that case, compliance for | ||
the corresponding element will be "unexpected": Rudder was awaiting one report, but it got two. | ||
The chance to have a real error reported as a duplicated message are very low because you should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The risk to have a real error reported as a duplicated messages is very low, as messages need to have the same timestamp and information message to be considered duplicated on a given node
The chance to have a real error reported as a duplicated message are very low because you should | ||
always have at least the report timestamp or its information message unique for a given directive in a given rule | ||
on a given node. | ||
<br>That option will ignore the duplicated message in compliance calcul and log an information level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in comlpiance computation
<br>That option will ignore the duplicated message in compliance calcul and log an information level | ||
message about that duplication. A double duplication will still be ignored but with a warning log, and more | ||
duplicates will lead to an error log and an unexpected compliance. | ||
<br/>It is safe to ignore duplicated message and you should check that option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicated reports
// design). The log level should be "info" and not more because it was chosen by configuration to ignore them. | ||
// - in some case, we want to accept more reports than originally expected. Then, we must update cardinality to | ||
// trace that decision. It's typically what we want to do for | ||
def findMatchinValue( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: findMatchingValue
// | ||
val values = expectedComponent.groupedComponentValues.toList.map { case(v, u) => | ||
val isVar = matchCFEngineVars.pattern.matcher(v).matches() | ||
val pattern = replaceCFEngineVars(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we evaluate the pattern only if "isVar" is true ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, we need a pattern for all component values, to be able to only works on pattern matching in findMatchingValue()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't it super expensive ? Plus from the doc of replaceCFEngineVars, it should be called only for string containing a cfengine var
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pattern is compiled. Maching a \Qstring\E
versus string ==
is almost the same (there is pattern init / compilation overhead. But we were already doing that pattern test before, and in fact many more times:
- one time for each component values to check if they were a cfengine var,
- then in getUnexpectedReports we were doing for each reports, potentially for each value (if the report was not expected or matching the last component value) at least one regex eval to know if the component val is a cfengine var, and then a pattern matches.
And then, we were doing (number of cfe var)^(number of reports) pattern matching.
Now, we are doing exactly 2 regex compile for each component value, plus O(nb cpt value x reports) pattern matches (inexpensive in the case of values). Plus we sort pattern by specificity, which help. I don't think we can do better if we want to keep our hypothesis. In the happy path (each report is expected), reports are pattern matched one time only.
And for the "should be a cfengine var", it only means that you will get a static pattern if it's not the case. Which is what we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok -thank you for the explanation.
found match { | ||
case Some(x) => (value :: stack, Some(x)) | ||
case None => | ||
if(value.pattern.matcher(report.keyValue).matches()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm probably missing something something there - we are only only evaluating value if they match a pattern, but most values don't have a pattern - reevaluating twice a pattern (once when we detect if there is a variable, then if it match) may be pretty costly - even more as we have a quadratic complextity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are static patterns, like \Qfoo\E
. The matches of such a pattern is in the same order of cost as string equality (there is overhead with the matcher class creation, but the actual matching is the same while loop on chars).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This a great change - can you correct some typos/wording ?
…leading to \"unexpected\" compliance
PR rebased |
c65b855
to
953b8fc
Compare
This PR is not mergeable to upper versions. |
OK, merging this PR |
https://www.rudder-project.org/redmine/issues/12719
Not yet finished (remains the we selection part in Setting). But it's a fairly big change.
I'm quite confident on it, because we have a very good test coverage on that part.
So, basically, I completly changed the way we pair report <-> expected component value. This was needed to allows to be more lenient in case of ducplicate keys and unbounded variable size (from iterators).
The main modification are:
.*
is 0 (not specific at all),foobarbaz
is 9.With that, and as introduced in "pattern" description, we are able to process all reports of a component in only one pass with the
recPairReports
algorithm.The mains ideas of the algo are:
Once all reports are processed, we merge the two stack of values, and we transform them in components value status (the still free value lead to missing, the one with one report get the report status, etc). And THAT'S ALL.
There's a subpart of the algo which is in charge of finding the matching value for the current report:
recPairReports
. It's almost just a loop on all values to consider, with some special cases about what to do in case of a duplicate key or an out of bound var and Rudder settings.