Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ [SDS-194] Add path_regex to ProximityKeywordsRegex #52

Merged
merged 13 commits into from
May 28, 2024

Conversation

artslidd
Copy link
Contributor

@artslidd artslidd commented May 23, 2024

Description

First PR of three that aim to match included keywords against the path of the value that matched.

This PR adds a field called path_regex to the ProximityKeywordsRegex and a function called calculate_keyword_path_pattern that computes the path_pattern using pattern matching.

Some tests verify that the regex generated is correct

@artslidd artslidd force-pushed the artslidd/sds-194-generate-path-regex branch 2 times, most recently from 01b42f1 to f8d5036 Compare May 23, 2024 09:39
@artslidd artslidd changed the title Artslidd/sds 194 generate path regex ✨ [SDS-194] Add path_regex to ProximityKeywordsRegex May 23, 2024
@artslidd artslidd force-pushed the artslidd/sds-194-generate-path-regex branch from f8d5036 to 29ac559 Compare May 23, 2024 09:42
@artslidd artslidd marked this pull request as ready for review May 23, 2024 13:57
@@ -226,11 +241,33 @@ impl Default for Metrics {
}
}

fn compile_keywords_pattern(keyword_patterns: Vec<(Ast, Ast)>) -> (String, String) {
let content_pattern = Ast::Alternation(Alternation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect the Ast::Alternation to be produced by the compile_keywords_to_ast function, since that's part of the Ast, and it doesn't look like that function is used without it anywhere else?

This function would just be calling .to_string(), and could probably just be inlined where it's used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@artslidd artslidd force-pushed the artslidd/sds-194-generate-path-regex branch from de96e32 to 94bdea5 Compare May 24, 2024 08:08
@artslidd artslidd requested a review from fuchsnj May 24, 2024 11:16
Copy link
Contributor

@vinckama vinckama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few general suggestions in the PR. I think it can be simplified a bit

Comment on lines 271 to 287
let content_pattern = Ast::Alternation(Alternation {
span: span(),
asts: keyword_patterns
.iter()
.map(|(content_ast, _)| content_ast.clone())
.collect(),
});

fn compile_keywords_pattern(keyword_patterns: Vec<Ast>) -> String {
Ast::Alternation(Alternation {
let path_pattern = Ast::Alternation(Alternation {
span: span(),
asts: keyword_patterns,
})
.to_string()
asts: keyword_patterns
.iter()
.map(|(_, path_ast)| path_ast.clone())
.collect(),
});

Ok(Some((content_pattern, path_pattern)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be simplified with unzip:

let (content_patterns, path_patterns) = keywords...

[...]

...
        .collect::<Result<Vec<_>, _>>()?
        .iter()
        .cloned()
        .unzip();

    Ok(Some((
        compile_keywords_pattern(content_patterns),
        compile_keywords_pattern(path_patterns),
    )))

and where compile_keywords_pattern is (this method can still be inlined if you prefer ):

fn compile_keywords_pattern(keyword_patterns: Vec<Ast>) -> Ast {
    Ast::Alternation(Alternation {
        span: span(),
        asts: keyword_patterns,
    })
}

Here there is still a cloned, I'm not sure if it's possible to remove it but I think there is less chance to do a mistake by inverting content_patterns and path_pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting a compilation error saying: "the trait Default is not implemented for regex_syntax::ast::Ast"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that I will come back to this later if that does not bother you (I think it can be left as a todo)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I checked and it works :), thanks for that 👍

Add unzip to simplify the code

sds/src/proximity_keywords.rs Outdated Show resolved Hide resolved
sds/src/proximity_keywords.rs Outdated Show resolved Hide resolved
for chars in chars.windows(3) {
let current = &chars[1];
let (prev_char, current_char, next_char) = get_chars_type(&chars[0], &chars[1], &chars[2]);
match (prev_char, current_char, next_char) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that next_char is not used, it could probably be dropped.

I did not try myself, but maybe you can use a simple for loop on keyword.chars() and keep track on the previous character type (if it exist). This way, the char_list won't need to be created in calculate_keyword_path_pattern, and possibly kw_length >= 2 at the end can be dropped

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DataDog DataDog deleted a comment from vinckama May 27, 2024
@artslidd
Copy link
Contributor Author

@artslidd artslidd deleted a comment from vinckama 17 minutes ago

Sorry for this, I did something wrong in Intellij

Your comment was related to this one, and I fixed them two in this commit: Improve algorithm and reduce window to 2

@artslidd artslidd requested a review from vinckama May 28, 2024 06:27
Copy link
Contributor

@vinckama vinckama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few more comments


push_character(&chars[0]);

for (i, chars) in chars.windows(2).enumerate() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you can skip the last one and add it ad the end
for chars in chars[..chars.len() - 1].windows(2) {

and then add it later at the end..

The code look like

if chars.is_empty() {
return;
}
push_character(&chars[0]);
if  chars.len() == 1 {
return;
}
for chars in chars[..chars.len() - 1].windows(2) {
   .... some logic
}
push_character(chars.last().unwrap());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it really simplifies the current code, I'm quite happy with the pattern matching that is currently here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was mostly a suggestion of style to avoid having 3 variables in the match. Both works, I don't have a strong opinion here 👍

Comment on lines 271 to 287
let content_pattern = Ast::Alternation(Alternation {
span: span(),
asts: keyword_patterns
.iter()
.map(|(content_ast, _)| content_ast.clone())
.collect(),
});

fn compile_keywords_pattern(keyword_patterns: Vec<Ast>) -> String {
Ast::Alternation(Alternation {
let path_pattern = Ast::Alternation(Alternation {
span: span(),
asts: keyword_patterns,
})
.to_string()
asts: keyword_patterns
.iter()
.map(|(_, path_ast)| path_ast.clone())
.collect(),
});

Ok(Some((content_pattern, path_pattern)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vinckama

This comment was marked as duplicate.

Some(regex_automata::Match::must(0, 14..19))
);

assert_eq!(path_regex.is_match("awsAccess"), false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be a match as aws access in pascalCase is awsAccess?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually no because I'm expecting to have the path being transformed in aws.access so it should not match the camel case: awsAccess

@artslidd artslidd merged commit c425920 into main May 28, 2024
4 checks passed
@artslidd artslidd deleted the artslidd/sds-194-generate-path-regex branch May 28, 2024 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants