Skip to content

Commit

Permalink
chg: Attempt to improve regex more
Browse files Browse the repository at this point in the history
  • Loading branch information
Rafiot committed May 16, 2024
1 parent cc9527c commit 5decaf3
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion har2tree/helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -280,7 +280,7 @@ def find_identifiers(html_doc: bytes) -> dict[str, list[str]] | None:
# This is beta and kinda fragile, but it's going to find (most) of the google tag IDs
# https://support.google.com/google-ads/answer/12326985?hl=en_us_us
# NOTE: the doc says 9 X, but all the examples I found have 10 X so we cannot trust it
if google_tag_ids := set(re.findall(rb"(?:G-|AW-|GA-|UA-)\w{9}+", html_doc)):
if google_tag_ids := set(re.findall(rb"(?:G-|AW-|GA-|UA-)\w{9,13}", html_doc)):
blocklist = {b'UA-Compatible'}
google_tag_ids -= blocklist
to_return['google_tag_ids'] = list(google_tag_ids)
Expand Down

0 comments on commit 5decaf3

Please sign in to comment.