Skip to content

Correct mis-assigned Segment userIds #10

@morrismatts

Description

@morrismatts

Problem

Starting in January 2024, the www site has been incorrectly setting the Segment userId to the Segment anonymousId by calling analytics.identify(analytics.user.anonymousId()). Per the Segment documentation, the identify method is not meant to be used in this way, and the result is that the our analytics engineers can't properly track the acquisition funnel for these users.

Background

We started working to fix this issue in February 2025 by removing the problematic code from the www site. There were a few straggler code bits of in-page JS that aren't tracked in this repo which were cleaned up in early March 2025.

Solution

Now that we believe we've removed all of the spots where we were creating bad data, we need to write a script to remove the bad userIds for returning visitors who have already had their userId mis-assigned. Unfortunately, Segment doesn't officially support removing the userId from a user so we'll have to take a somewhat hacky approach.

Prerequisite: The main site-wide header and footer scripts for the www site are not currently tracked in this repo, so we should first correct that problem. This is important because while the script outlined below is not large, it could have a substantial negative impact (i.e., erasing correct userIds) if not written properly and therefore I feel it's particularly important that it get a proper code review.

Write a script that:

  1. Checks if analytics.user.userId() is non-null and doesn’t start with user_. Bail out if the userId is blank or looks okay.
  2. For the remaining "bad" userIds, capture analytics.user.anonymousId().
  3. Call analytics.reset() to reset both the userId and the anonymousId.
  4. Call analytics.setAnonymouseId(oldAnonymousId) to set their anonymous ID back to what it was.

This should result in a Segment user with no userId and the correct anonymousId.

We should confirm that Segment doesn't cache the userId server side and restore it when it sees the anonymousId again. If they do, we could instead just reset both IDs by calling analytics.reset(), if the userID is non-null but doesn’t start with user_, and not attempting to restore the anonymousId. Unfortunately this would mean we lose those users previous analytics data, so hopefully it's not necessary.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions