Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow optional rewrite rules before computing similarity scores during replay #499

Merged
merged 9 commits into from
Sep 26, 2023

Conversation

timdawborn
Copy link
Contributor

@timdawborn timdawborn commented Sep 25, 2023

In order to be able to replay the correct tape more accurately, we want to be able to apply some rewrite rules before the similarity score is computed between a request and a recorded tape. The intent of these rewrite rules is to remove random values in both the request and the recorded tapes, so that the tapes match exactly (once the random data has been removed).

This PR adds a new varadic CLI argument to allow users to provide zero to many regex-based rewrite rules using sed-style replacement syntax. These regex replacements are performed over all strings that go through similarity scoring.

@timdawborn timdawborn force-pushed the no-ticket-rewrite-rules-before-similarity branch from 2a9b45a to 4926e3b Compare September 25, 2023 11:08
Comment on lines +348 to +380
// The following payloads are identical after rewrite rules have been applied.
expect(
computeSimilarity(
"POST",
"/test",
{},
compactJsonBuffer({
name: "Jane Doe",
email:
"jane.doe-some-test-6f82fbbe-d36a-4c5c-b47b-84100122fbbc@example.com",
age: 42,
}),
{
request: {
method: "POST",
path: "/test",
headers: {},
body: wellFormattedJsonBuffer({
name: "Jane Doe",
email: "jane.doe-some-test@example.com",
age: 42,
}),
},
response: DUMMY_RESPONSE,
},
new RewriteRules().appendRule(
new RewriteRule(
/-[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}(@example.com)$/gi,
"$1"
)
)
)
).toBe(0);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new test that tests similarity with rewrite rules. The rest of the changes in this file are just formatting changes applied by the linter.

@timdawborn timdawborn marked this pull request as ready for review September 26, 2023 05:21
@timdawborn timdawborn changed the title Allow optional rewrite rules before computing similarity diff during replay Allow optional rewrite rules before computing similarity scores during replay Sep 26, 2023
Copy link

@njday njday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good 👍 few nits/suggestions primarily around documentation

src/cli.ts Outdated
import { RecordReplayServer } from "./server";

function commaSeparatedList(value: string) {
return value ? value.split(",") : [];
}

const RE_REWRITE_RULE = /s\/(.+(?<!\\))\/(.+(?<!\\))\/([gims]*)/;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We could use some more descriptive names here e.g. RE_SED_INPUT_VALIDATION & RE_REPLACE_SED_CAPTURE_GROUPS

describe("RewriteRule", () => {
describe("without capture groups", () => {
it("applies the expected changes", () => {
const rule = new RewriteRule(/-[a-z0-9]+-?/gi, "");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could we add a comment explaining the regex to the expressions in these tests? Will help identify the intention and to verify the output is expected

src/rewrite.spec.ts Show resolved Hide resolved
src/rewrite.ts Show resolved Hide resolved
src/rewrite.ts Show resolved Hide resolved
src/cli.ts Outdated
@@ -40,6 +78,12 @@ async function main(argv: string[]) {
"--https-ca <filename.pem>",
"Enable HTTPS server where the certificate was generated by this CA. Useful if you are using a self-signed certificate. Also requires --https-key and --https-cert."
)
.option<RewriteRules>(
"--rewrite-before-diff [s/find/replace/g...]",
"Provide regex-based rewrite rules over strings before passing them to the diffing algorithm. The regex rules use sed-style syntax. s/find/replace/ with an optional regex modifier suffixes. Capture groups can be used using sed-style \\N syntax. This only is only used during replaying existing tapes.",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Provide regex-based rewrite rules over strings before passing them to the diffing algorithm. The regex rules use sed-style syntax. s/find/replace/ with an optional regex modifier suffixes. Capture groups can be used using sed-style \\N syntax. This only is only used during replaying existing tapes.",
"Provide regex-based rewrite rules over strings before passing them to the diffing algorithm. The regex rules use sed-style syntax. s/find/replace/ with an optional regex modifier suffixes. Capture groups can be used using sed-style \\N syntax. This is only used during replaying existing tapes.",

src/cli.ts Outdated
import { RecordReplayServer } from "./server";

function commaSeparatedList(value: string) {
return value ? value.split(",") : [];
}

const RE_REWRITE_RULE = /s\/(.+(?<!\\))\/(.+(?<!\\))\/([gims]*)/;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question Should we have a comment here describing that this is sed-style regex? I know it's mentioned on line 17, but feels more relevant to have with the definition of the rule.

Do we want to link to a documentation of the style somewhere?

src/cli.ts Show resolved Hide resolved
src/cli.ts Outdated
}
const [rawFind, rawReplace, rawFlags] = match.slice(1, 4);

// Parse the find regex with the given regex flags.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Parse the find regex with the given regex flags.
// Parse the found regex with the given regex flags.

src/rewrite.spec.ts Show resolved Hide resolved
catgie: "wuff",
xyz: "I hate cats",
});
expect(o1).toEqual({ dog: "woof", doggie: "wuff", xyz: "I hate dogs" });

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"I hate dogs", blasphemy! 😂

for (const rule of this.rules) {
s = rule.apply(value);
}
return (s as any) as T;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question Why is this first converted to any to then immediately be casted to T?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type of s was resolved as a string with our manual check typeof value === "string" and subsequent cast let s = value as string. However the top level recursed function has a generic type T which it needs to return. string and T don't overlap and the cast to any is required to allow us to cast it to T afterwards

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a limitation of TypeScript. Line 54 casts the generic type T to string. We need to cast string back to T in order to return it. The compiler does not allow casting string directly to T as there's no overlap of the types. Going via any is the way you do a "cast to anything" cast in TypeScript.

Copy link

@HGyllensvard HGyllensvard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My Typescript is nothing to brag about, but it looks okay.

Copy link

@njday njday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining comments are non-blocking. LGTM 👍

@timdawborn
Copy link
Contributor Author

timdawborn commented Sep 26, 2023

Ah crap. I can't merge this due to what looks to be a broken license/cli branch protection rule. Time to go down that rabbit hole.

@timdawborn
Copy link
Contributor Author

Ah crap. I can't merge this due to what looks to be a broken license/cli branch protection rule. Time to go down that rabbit hole.

Right. I needed to sign the CLA. Makes sense, I guess.

@timdawborn timdawborn merged commit f273960 into master Sep 26, 2023
5 checks passed
@timdawborn timdawborn deleted the no-ticket-rewrite-rules-before-similarity branch September 26, 2023 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants