Skip to content

feat: add bot traffic filtering and engaged sessions metric to static analytics #4837

@MillenniumFalconMechanic

Description

Summary

Add two improvements to the static analytics site generator to reduce bot noise in analytics reports:

  1. Suspicious page path filter — regex-based filter in fetch.py that removes malformed/bot page paths from the pageviews detail table (broken markdown links, CMS probes, asset requests, etc.)

  2. Engaged sessions metric — queries GA4's engagedSessions metric alongside sessions and displays it in the stats card as "Engaged Sessions" instead of "User Sessions"

Details

Suspicious page path filter

Removes paths like:

  • /](https://...) — broken markdown links
  • //checkout/ — e-commerce probes
  • /help@lists... — email-as-path
  • /robots.txt, /favicon-32x32.png — asset requests
  • /docs/, /docs-EN/ — CMS probes

Engaged sessions

GA4's engagedSessions counts only sessions where the user stayed 10+ seconds, viewed 2+ pages, or triggered a conversion. This gives a more honest session count by excluding bot drive-bys.

Files changed

  • analytics/static_site/fetch.py — add SUSPICIOUS_PAGE_PATH_RE, METRIC_ENGAGED_SESSIONS, filter logic
  • analytics/static_site/export.py — export engaged_sessions in meta.json
  • analytics/static_site/template/index.html — display engaged sessions in stats card

Note

These changes affect all sites using the shared analytics package (AnVIL Portal, LungMAP, HCA Explorer, etc.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions