Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APPSEC-8287] Add AppSec::Processor::RuleMerger #2686

Merged
merged 3 commits into from
Mar 21, 2023
Merged

Conversation

GustavoCaso
Copy link
Member

@GustavoCaso GustavoCaso commented Mar 13, 2023

In Appsec, there has to be some transformation when the remote configuration client is going to merge existing information with the provided by the agent.

The agent can provide multiple pieces of information regarding new rules data, rules override, or entirely new rules set.

Before being able to instantiate a new AppSec::Processor we need to make sure the rules payload has the correct format.

The correct format is:

{
    version: ...,
    metadata: ...,
    rules: [...],
    exclusions: [...],
    rules_override: [...],
    rules_data: [...]
}

There is also some merging logic we need to account for when merging rules_data

When combining rules data with the same id, type and value, we need to keep the one with the highest expiration or if no expiration is provided.

@github-actions github-actions bot added the appsec Application Security monitoring product label Mar 13, 2023
@codecov-commenter
Copy link

Codecov Report

Merging #2686 (f1d5a97) into master (22541af) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff            @@
##           master    #2686    +/-   ##
========================================
  Coverage   98.09%   98.09%            
========================================
  Files        1168     1170     +2     
  Lines       64221    64332   +111     
  Branches     2857     2875    +18     
========================================
+ Hits        62998    63108   +110     
- Misses       1223     1224     +1     
Impacted Files Coverage Δ
lib/datadog/appsec/processor/rule_merger.rb 100.00% <100.00%> (ø)
spec/datadog/appsec/processor/rule_merger_spec.rb 100.00% <100.00%> (ø)

... and 4 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@GustavoCaso GustavoCaso changed the title Add AppSec::Processor::RuleMerger [APPSEC-8287] Add AppSec::Processor::RuleMerger Mar 13, 2023
@GustavoCaso GustavoCaso marked this pull request as ready for review March 13, 2023 09:23
@GustavoCaso GustavoCaso requested review from a team and lloeki March 13, 2023 09:23
Copy link
Contributor

@lloeki lloeki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I think it needs a few adjustments.

Also, I feel like it would make sense to rebase this PR onto the chain of PRs, the latest add-remote-config-component one would be a good candidate as a rebase target + base branch for the PR. This would make integration of the merger usable immediately.

Comment on lines 85 to 93
if override['rules_override']
override['rules_override'].each do |rule_override|
rules_override << rule_override
end
elsif override['exclusions']
override['exclusions'].each do |exclusion|
exclusions << exclusion
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seem to recall you mentioned it's probably best if rules_override and exclusions processing is split between combine_overrides and combine_exclusions.

Do we expect these to be ever realistically mixed?

rules_data = combine_data(data) if data
rules_overrides_and_exclusions = combine_overrides(overrides) if overrides

rules_dup.merge!(rules_data) if rules_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken combine_data only ever returns a hash with the 'rules_data' key?

Maybe if combine_data returned result, then result could be assigned directly to rules_dup['rules_data'].

This would also save us a hash allocation.

rules_overrides_and_exclusions = combine_overrides(overrides) if overrides

rules_dup.merge!(rules_data) if rules_data
rules_dup.merge!(rules_overrides_and_exclusions) if rules_overrides_and_exclusions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken combine_overrides only ever returns a hash with the 'rules_override' and 'exclusions' key?

Maybe if combine_overrides was split and returned result, then result could be assigned directly to rules_dup['rules_override'] and rules_dup['exclusions']

This would also save us a hash allocation.

Comment on lines 28 to 33
data_exists = result.select { |x| x['id'] == value['id'] }

if data_exists.any?
existing_data = data_exists.first

if existing_data['type'] == value['type']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be equivalent to:

if (existing_data = result.find { |x| x['id'] == value['id'] }) && 
  if existing_data['type'] == value['type']
    # ...

So:

if (existing_data = result.find { |x| x['id'] == value['id'] }) && existing_data['type'] == value['type']
  # ...

And a single else clause

Comment on lines 73 to 74
# There could be cases that there is no experitaion value.
# To singal that there is no expiration we use the default value 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it was omitted at the source, should we omit it instead of using a default?

Comment on lines 55 to 66
result = data1.each_with_object({}) do |value, acc|
acc[value['value']] = value['expiration']
end

data2.each do |data|
if result.key?(data['value'])
# The value is duplicated so we need to keep
# the one with the highest expiration value
# We replace it if the expiration is higher than the current one
# or if no experiration
expiration = result[data['value']]
result[data['value']] = data['expiration'] if data['expiration'].nil? || data['expiration'] > expiration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could that last expiration be nil?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point 😄

module RuleMerger
class << self
def merge(rules:, data: nil, overrides: nil)
rules_dup = rules.dup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the logic below it's a bit hard to track whether no nested mutation happens.

WDYT of adding some deep freeze to the lets in specs so that we ensure we never mutate the originals?

end
end

return unless result.any?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is because empty rulesets are disallowed by libddwaf

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not be any empty ruleset but an empty rules_data key. We will not add it to the final rules hash.

class Processor
# RuleMerger merge different sources of information
# into the rules payload
module RuleMerger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not merging rules but collections of rules, expirations, and overrides. I'm not happy with the name but I can't figure another one right now so let's go with that.

@ivoanjo
Copy link
Member

ivoanjo commented Mar 14, 2023

Also, I feel like it would make sense to rebase this PR onto the chain of PRs, the latest add-remote-config-component one would be a good candidate as a rebase target + base branch for the PR. This would make integration of the merger usable immediately.

Consider maybe doing it the other way -- merging this to master (as long as it has no impact on existing users) and merging master into the working branch. This way allows incremental changes to land in master, rather than a big "land-everything" PR that is very hard to review.

@lloeki
Copy link
Contributor

lloeki commented Mar 15, 2023

Consider maybe doing it the other way -- merging this to master (as long as it has no impact on existing users) and merging master into the working branch. This way allows incremental changes to land in master, rather than a big "land-everything" PR that is very hard to review.

I'm confused, there is no "big PR", there is a chain of (currently) seven PRs, each with a specific and progressive subset of change that should be easier to review in order and in isolation than a big PR, and each allowing incremental changes to land. Whether this PR here is incrementally merged now or downstream of the sequence of PRs is immaterial.

Each PR having their own branch based on the previous one, merging this to master and then merging master to the (currently) last branch would effectively freeze all the branches since they would not be rebase-able anymore†. So any change on a specific "top" branch would mean merging that branch down, once in each downstream branch. This would make history a terrible mess. The alternative would be to merge only on the last one, effectively turning the chained but independent branches into one big PR behind the scenes.

† I can see some ways to work around that, like merge-preserving rebasing only the last one, then branch -f on each intermediate branches to the newly rebased commits.

@lloeki lloeki merged commit 7289395 into master Mar 21, 2023
@lloeki lloeki deleted the appsec-rule-merging branch March 21, 2023 09:50
@github-actions github-actions bot added this to the 1.11.0 milestone Mar 21, 2023
@lloeki lloeki modified the milestones: 1.11.0, 1.11.0.beta1 Apr 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
appsec Application Security monitoring product
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants