-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ [SDS-194] Add path_regex to ProximityKeywordsRegex #52
Conversation
01b42f1
to
f8d5036
Compare
f8d5036
to
29ac559
Compare
sds/src/proximity_keywords.rs
Outdated
@@ -226,11 +241,33 @@ impl Default for Metrics { | |||
} | |||
} | |||
|
|||
fn compile_keywords_pattern(keyword_patterns: Vec<(Ast, Ast)>) -> (String, String) { | |||
let content_pattern = Ast::Alternation(Alternation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect the Ast::Alternation
to be produced by the compile_keywords_to_ast
function, since that's part of the Ast, and it doesn't look like that function is used without it anywhere else?
This function would just be calling .to_string()
, and could probably just be inlined where it's used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
de96e32
to
94bdea5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few general suggestions in the PR. I think it can be simplified a bit
sds/src/proximity_keywords.rs
Outdated
let content_pattern = Ast::Alternation(Alternation { | ||
span: span(), | ||
asts: keyword_patterns | ||
.iter() | ||
.map(|(content_ast, _)| content_ast.clone()) | ||
.collect(), | ||
}); | ||
|
||
fn compile_keywords_pattern(keyword_patterns: Vec<Ast>) -> String { | ||
Ast::Alternation(Alternation { | ||
let path_pattern = Ast::Alternation(Alternation { | ||
span: span(), | ||
asts: keyword_patterns, | ||
}) | ||
.to_string() | ||
asts: keyword_patterns | ||
.iter() | ||
.map(|(_, path_ast)| path_ast.clone()) | ||
.collect(), | ||
}); | ||
|
||
Ok(Some((content_pattern, path_pattern))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be simplified with unzip
:
let (content_patterns, path_patterns) = keywords...
[...]
...
.collect::<Result<Vec<_>, _>>()?
.iter()
.cloned()
.unzip();
Ok(Some((
compile_keywords_pattern(content_patterns),
compile_keywords_pattern(path_patterns),
)))
and where compile_keywords_pattern
is (this method can still be inlined if you prefer ):
fn compile_keywords_pattern(keyword_patterns: Vec<Ast>) -> Ast {
Ast::Alternation(Alternation {
span: span(),
asts: keyword_patterns,
})
}
Here there is still a cloned, I'm not sure if it's possible to remove it but I think there is less chance to do a mistake by inverting content_patterns and path_pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting a compilation error saying: "the trait Default
is not implemented for regex_syntax::ast::Ast
"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that I will come back to this later if that does not bother you (I think it can be left as a todo)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an example in this branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I checked and it works :), thanks for that 👍
sds/src/proximity_keywords.rs
Outdated
for chars in chars.windows(3) { | ||
let current = &chars[1]; | ||
let (prev_char, current_char, next_char) = get_chars_type(&chars[0], &chars[1], &chars[2]); | ||
match (prev_char, current_char, next_char) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that next_char
is not used, it could probably be dropped.
I did not try myself, but maybe you can use a simple for loop on keyword.chars()
and keep track on the previous character type (if it exist). This way, the char_list
won't need to be created in calculate_keyword_path_pattern, and possibly kw_length >= 2
at the end can be dropped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@artslidd artslidd deleted a comment from vinckama 17 minutes ago Sorry for this, I did something wrong in Intellij Your comment was related to this one, and I fixed them two in this commit: Improve algorithm and reduce window to 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few more comments
|
||
push_character(&chars[0]); | ||
|
||
for (i, chars) in chars.windows(2).enumerate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you can skip the last one and add it ad the end
for chars in chars[..chars.len() - 1].windows(2) {
and then add it later at the end..
The code look like
if chars.is_empty() {
return;
}
push_character(&chars[0]);
if chars.len() == 1 {
return;
}
for chars in chars[..chars.len() - 1].windows(2) {
.... some logic
}
push_character(chars.last().unwrap());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it really simplifies the current code, I'm quite happy with the pattern matching that is currently here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was mostly a suggestion of style to avoid having 3 variables in the match. Both works, I don't have a strong opinion here 👍
sds/src/proximity_keywords.rs
Outdated
let content_pattern = Ast::Alternation(Alternation { | ||
span: span(), | ||
asts: keyword_patterns | ||
.iter() | ||
.map(|(content_ast, _)| content_ast.clone()) | ||
.collect(), | ||
}); | ||
|
||
fn compile_keywords_pattern(keyword_patterns: Vec<Ast>) -> String { | ||
Ast::Alternation(Alternation { | ||
let path_pattern = Ast::Alternation(Alternation { | ||
span: span(), | ||
asts: keyword_patterns, | ||
}) | ||
.to_string() | ||
asts: keyword_patterns | ||
.iter() | ||
.map(|(_, path_ast)| path_ast.clone()) | ||
.collect(), | ||
}); | ||
|
||
Ok(Some((content_pattern, path_pattern))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an example in this branch
Some(regex_automata::Match::must(0, 14..19)) | ||
); | ||
|
||
assert_eq!(path_regex.is_match("awsAccess"), false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be a match as aws access in pascalCase is awsAccess
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually no because I'm expecting to have the path being transformed in aws.access
so it should not match the camel case: awsAccess
Description
First PR of three that aim to match included keywords against the path of the value that matched.
This PR adds a field called
path_regex
to theProximityKeywordsRegex
and a function calledcalculate_keyword_path_pattern
that computes the path_pattern using pattern matching.Some tests verify that the regex generated is correct