Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regex pattern that parses extractedValues #7

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sseifert-akamai
Copy link
Contributor

This change fixes an issue with extractedValues when the value contains semicolons.

Copy link
Collaborator

@ynohat ynohat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sseifert-akamai, thanks a lot again for this contribution. While reviewing the change I just realized it is also implementing some auto-formatting from your IDE probably. This is noisy and prevents accurate identification of the fix, could you please reduce the change to just what is relevant?

let pat = /^name=([^;]*); value=([^;]*).*$/;
return getResponseHeaderValues(response, "x-akamai-session-info")
.reduce(function (vars, value) {
let pat = /^name=(.*); value=([^\s]*)(;.*)?$/;
Copy link
Contributor

@bacalec bacalec Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regex in this line is assuming that there is at most 1 ";" followed by something that needs to be ignored after the value but

  • there can be 2 when variable value is extracted from qsp or from cookie. In this case variable value is not only followed by full_location_id but also by separator. In that case, the part that includes full_location_id is included in the value against which assert function is matching
  • there can be 0 when value is not extracted. In this case, if variable value includes some ";", the part of this value after the last ";" matches unepectedly with the last group.

Looking closer at groups in this regex:

  1. The 1st one (.*) is capturing but it may not need to. More importantly, it can match with any caracter including those that don't belong to a variable name. It could be more specific.
  2. The 2nd one ([^\s]*) is capturing and this is probably needed. More importantly, it excludes spaces but it shouldn't.
  3. The 3rd one (;.*)? is capturing but it may not need to. More importantly, it could be more specific.

With this regex proposed instead ^name=([^\s]*); value=(.*)(; full_location_id=[^;]*(; separator=[^;]*)?)?$:

  • it is leaving asside capturing or not
  • name and value are captured in same group 1 and 2 as before
  • group 3 that captured one of full_location_id or separator before is now capturing both if present
  • capture group 4 is added and it captures separator only

However, it requires the match to be ungreedy.

Attached screenshots show test results with U flag with PCRE2 for ungridiness, with sample values below:
proposedRegex1_ungreedy_NotExtracted

  1. name=NOT_EXTRACTED; value= key1=value1; key2=value2
  2. name=NOT_EXTRACTED; value= key3=value3

proposedRegex1_ungreedy_ExtractedFromCookieOrQsp
3. name=EXTRACTED_FROM_COOKIE_OR_QSP; value=key1=value1; key2=value2; full_location_id=cookieName3; separator=%3d
4. name=EXTRACTED_FROM_COOKIE_OR_QSP; value=key3=value3; full_location_id=cookieName3; separator=%3d

proposedRegex1_ungreedy_ExtractedFromHeader
5. name=EXTRACTED_FROM_HEADER; value=key1=value1; key2=value2; full_location_id=cookieName3
6. name=EXTRACTED_FROM_HEADER; value=key3=value3; full_location_id=cookieName3

These sample values are meant to cover:

  • variable value contains ";" or not
  • variable is not extracted
  • variable is extracted from cookie (or qsp) or is extracted from header

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a greedy match, the 1st new proposed regex capture full_location_id and separator in the group that captures value, which is not wanted.
proposedRegex1_greedy_ExtractedFromCookieOrQsp

Copy link
Contributor

@bacalec bacalec Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If greediness can't be disabled, another regex similar to the one below may be used.

It it's built with an alternative where 1st option captures full_location (and eventually separator) when it's present. 2nd option should match only when full_location and separator are absent.
^name=([^\s]*); value=(.*)(; full_location_id=.*)$|^name=([^\s]*); value=(.*)$

A problem with this approach is that name and value are captured in group 1 and 2 when full_location_id is present
and in groups 4 and 5 otherwise. So this needs to be dealt with if captured groups are actually used.

proposedRegex2_greedy_NotExtracted

proposedRegex2_greedy_ExtractedFromHeader

proposedRegex2_greedy_ExtractedFromCookieOrQsp

@bacalec
Copy link
Contributor

bacalec commented Jun 10, 2024

It could be nice to add also tests that cover the different kind of values that may be found in the header:

  • variable value contains ";" or not
  • variable is not extracted
  • variable is extracted from cookie (or qsp) or is extracted from header

for example

name=NOT_EXTRACTED; value= key1=value1; key2=value2
name=NOT_EXTRACTED; value= key3=value3
name=EXTRACTED_FROM_COOKIE_OR_QSP; value=key1=value1; key2=value2; full_location_id=cookieName3; separator=%3d
name=EXTRACTED_FROM_COOKIE_OR_QSP; value=key3=value3; full_location_id=cookieName3; separator=%3d
name=EXTRACTED_FROM_HEADER; value=key1=value1; key2=value2; full_location_id=cookieName3
name=EXTRACTED_FROM_HEADER; value=key3=value3; full_location_id=cookieName3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants