Skip to content

Conversation

ttstarck
Copy link
Contributor

@ttstarck ttstarck commented Mar 4, 2022

Add Regexp matching to Targeting

The goal of these changes is to allow being able to dynamically target similar services that are only unique by a single context value. For example, if I had the following frontend services with names:

frontend-primary-1
frontend-primary-2
frontend-secondary-1
frontend-secondary-2

If I wanted to only target services with names that only match frontend-primary, that would not be possible with current targeting features. Of course, one might argue there should be some other context like region or deployGroup to designate these but there are cases where this might not be possible, extremely difficult to fully setup, or not worth the time for a setting that may only be temporary.

Summary of Changes

  • Target.target_key_matches? will now check if the target_value is a Regexp object and run .matches on the context hash.

Example targeting:

target:
  service_name: !ruby/regexp /frontend-primary/

This target should only match frontend-primary-1 and frontend-primary-2 and not match frontend-secondary-1 and frontend-secondary-2 from the above example.

@ttstarck ttstarck force-pushed the PASSION_add_regex_matching_for_targeting branch from b607ce0 to 402ccfc Compare March 4, 2022 19:56
Copy link
Collaborator

@ColinDKelley ColinDKelley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great passion project!

Check out the power of let in my test suggestions. It's the most amazing part of rspec. It's so frustrating to switch over to the web repo and not have access to let in minitest!

README.md Outdated
This will be applied in any process that has (`service_name == "frontend"` OR `service_name == "auth"`) AND `datacenter == "AWS-US-EAST-1"`.

### Matching Values
By adding a backslash `/` at the front and end of a value, context values will be applied if the string between the backslashes is a substring of the value.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ That's actually a "slash". \ is a backslash.

Copy link
Collaborator

@ColinDKelley ColinDKelley Mar 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I was glad to see in the tests and implementation that slashes are required at both the front and back of the string, so a path prefix like /tmp won't trigger it. At first I missed it in your documentation here, because it says "a backslash" (singular). How about:

By adding `/` at the front and end of a value, the pattern given between the slashes will be treated as a regular expression to match with the context.

^ Note it's more than just a substring--it's a regexp! Can you change the example to use some regexp special values, like maybe /frontend-\d{3}/ for 3 digits after the hyphen?

README.md Outdated
target:
service_name: /frontend/
```
This will be applied in any process that has `service_name =~ "frontend"`. As an example this will match `"frontend-1"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use slashes here in the example since that's how Ruby would do it after =~:

that has `service_name =~ /frontend/

when true, false
target_value
when String
if target_value =~ /\/.+\// # Any string that starts and ends with backslashes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you need to anchor that regexp! As you have it written, a string like "network/advertiser/affiliate" would match! Can you add a test that fails on that?

Also, sadly, =~ is slightly non-preferred because it's slow. .match? is preferred when you don't need the MatchData back.

if target_value.match?(/\A\/.+\/\z/)

An alternative would be:

if target_value.start_with?('/') && target_value.end_with?('/') && target_value.size > 2

But...there's also this cool Regexp extraction notation (parens go around the group(s) you want to extract and then , 1 means "get me the first match group"). Note that I used /x in the Regexp so that whitespace can be added for readability.

if (pattern = target_value[/\A \/ (.+) \/ \z/x, 1])
  context_hash.match?(pattern)

process_target = described_class.new(target_hash)
expect(process_target.target_key_matches?(context_hash)).to be_falsey
expect(process_target.target_key_matches?({})).to be_falsey
expect(process_target.target_key_matches?({ 'service' => '/telecom' })).to be_truthy
Copy link
Collaborator

@ColinDKelley ColinDKelley Mar 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subject and is_expected.to can make this sort of test read great. It's a bit more verbose, so feel free to skip it. But I really like how it reads. (I learned this from @jebentier BTW. It's now pretty common to see in Hydra tests.)

context 'when target value has a leading slash only' do
  let(:target_hash) { { 'service' => '/telecom' } }
  let(:process_target) { described_class.new(target_hash) }
  subject { process_target.target_key_matches?(context_hash) }

  describe 'when context_hash has the string but not the slash' do
    let(:context_hash) {
      {
          'service' => 'telecom',
          'region'  => 'east',
          'cdr'     => { 'caller' => '+18056807000' }
      }
    }
    it { is_expected.to be_falsey }
  end

  describe 'when context_hash is empty' do
    let(:context_hash) { {} }
    it { is_expected.to be_falsey }
  end

  describe 'when context_hash has an exact string match' do
    let(:context_hash) { { 'service' => '/telecom' } }
    it { is_expected.to be_truthy }
  end
end

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the failing test I mentioned above:

context 'when target value has embedded slashes (not at the front and back)' do
  let(:service) { 'network/advertiser|affiliate/path' } # | won't match if this is treated as a Regexp
  let(:target_hash) { { 'service' => service } 
  it { is_expected.to be_truthy }
end

end

context "for substring matching" do
it "should match on values when using backslash delimiters in target hash" do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can DRY up the tests below by putting some common lets up here:

let(:context_hash) do
  {
    'service' => service,
    'region' => 'east',
    'cdr' => { 'caller' => '+18056807000' }
  }
end

The individual contexts just need to do let(:service) { ... }.

README.md Outdated

### Matching Values
By adding a backslash `/` at the front and end of a value, context values will be applied if the string between the backslashes is a substring of the value.
By adding a slash `/` at the front and end of a value, context values will be applied if the string between the slashes is a substring of the value.
Copy link
Collaborator

@ColinDKelley ColinDKelley Mar 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The singular is still a bit confusing. And the substring part isn't right--it'll be a regexp match!
How about this wording:

To provide a regular expression for matching, use a leading and trailing slash `/` around your expression. For example:

And then I'd make sure that the example is more than just a substring. That's why I suggested

  service_name: /frontend-\d/

and then maybe you can give an example that matches because of a digit and also one that fails because it's a non-digit there.

@ttstarck ttstarck requested a review from ColinDKelley March 7, 2022 19:21
@ColinDKelley ColinDKelley changed the title Add Substring Targeting Add Regexp Targeting Mar 7, 2022
CHANGELOG.md Outdated

## [0.20.0] - 2022-03-04
### Added
- Added substring matching for targeting. See [README.md](README.md#Matching-Values) for usage examples.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the PR title to "Add Regexp Targeting". Can you change it here in the CHANGELOG and anywhere else in the PR?

Copy link
Contributor Author

@ttstarck ttstarck Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed c2e5de2

Copy link
Contributor

@jebentier jebentier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a general question about the implementation. I found out today through a little research that Ruby's YAML parser has built in support for declaring a Regexp. How would we feel about allowing the use of this built in so that we appropraitely load a Regex into the target_value and don't need to do these string comparisons and extractions? The YAML syntax isn't the prettiest, but it's declarative

key: !ruby/regexp /frontend-\d/

This would reduce the necessary changes in the gem as well as speed up the comparison by not having to check every string value of the target every time.

CHANGELOG.md Outdated

Note: this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.20.0] - 2022-03-04
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put Unreleased for the release date until it's released?

Copy link
Collaborator

@ColinDKelley ColinDKelley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice twist there to use actual regexps.

```
target:
service_name: /frontend-\d/
service_name: !ruby/regexp /frontend-\d/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa! This is really cool to not preclude slashes around strings. In the past we've been nervous about using this YAML ruby class support because it has a much wider surface area for security vulnerabilities (since this Ruby class will be invoked). That doesn't seem like a big deal here, but we might want to get an opinion from someone with a security background.

Copy link
Contributor Author

@ttstarck ttstarck Mar 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the source for how Pysch parses it this value:
https://github.com/ruby/psych/blob/master/lib/psych/visitors/to_ruby.rb#L96-L110

Copy link
Contributor Author

@ttstarck ttstarck Mar 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ColinDKelley @jebentier It looks like we technically are already open to this potential vulnerability! I can switch how we are parsing yaml to use Pysch.safe_load_file and specify allowing only Regexp as a permitted class besides the default permitted classes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See

json_doc = Psych.load_file(file_path)

Copy link
Collaborator

@ColinDKelley ColinDKelley Mar 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point--we are already vulnerable! And because the process_settings are set by developers, I think we'll be fine? Worth mentioning that to in the request for a security sign-off. (It's the cases where raw YAML is parsed from the outside world that have caused serious security problems. And that's why safe_load_file was invented.)

And if it doesn't add much scope, I really like your suggestion of switching over to Pysch.safe_load_file here with only Regexp permitted. But if that takes more than a couple minutes, feel free to create a ticket for Octothorpe instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added: c6d10fe

Copy link
Contributor

@jebentier jebentier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes in general look good to me, but Colin brings up a good point about pulling in someone from Security to make sure we're not exposing ourselves to a regex vulnerability.

@ttstarck ttstarck force-pushed the PASSION_add_regex_matching_for_targeting branch from 4e56522 to c6d10fe Compare March 25, 2022 20:44
Copy link
Collaborator

@ColinDKelley ColinDKelley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work!

@ColinDKelley
Copy link
Collaborator

@ttstarck @jbuehring was hoping to use this feature. Anything blocking it from getting merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants