Skip to content
This repository has been archived by the owner on May 16, 2024. It is now read-only.

Sync: Fix page by page query in sync, leading to duplicated entries #667

Open
wants to merge 11 commits into
base: release/v0.15
Choose a base branch
from

Conversation

keymon
Copy link
Contributor

@keymon keymon commented Jan 31, 2024

Comes from #665

In this PR we:

  • Modify the DynamodDB queries in the sync lambda to iterate
    the query page by page.

  • Provide commands in dev-cli tool to allow get users and groups

  • Provide a command line tool to delete duplicated users in dynamodb

Why?

Currently sync process does not iterate the paginated responses from
dynamodb, reading around 2700 users maximum, Active and Archived.

This can lead to a bug in the sync proces, where it will create
duplicated entries in DB in each sync, with new IDs, and updating
the groups to point to these users. In the meantime, the users
might still have access to Glide, but not be members of the groups.

In detail:

  1. Sync reads the first 2700 user entries in dynamodb, missing any other
  2. Sync reads all IDP users
  3. Sync tries to find if the user is already in DB. As the query
    was partial, it will be considered a new user.
  4. A new user ID is created, with a new PK/SK
  5. Groups are updated to point to the new ID

Observed behaviour:

  • A random set of users would not have access to any rule based
    on groups. But can be added individually.
  • DynamoDB table grows infinitely in each sync

How to trigger the issue:

  • Try to get more than >2700 users in the dynamodb, for instance
    by syncing >2700 users. Users can be later deactivated.

Workaround and remediation

In our case our DB grew to >650k users. We were forced to create
a cli tool to cleanup dynamodb that is included.

This tool can run workers in parallel, but does not handle well failure/retries. It is safe to rerun and might require multiple reruns.

How did you test it?

Tested manually. Pending implement some unit testing mocking
the ddb library.

Cli tool tested manually.

Potential risks

If somebody was experiencing this bug, they might have 100s of 1000s of users. that means the page by page query will take long to run and it might cause sync to excess the time to run.

One workaround is just run the tool to delete duplicates, but you might want to extend the lambda timeout.

Is patch release candidate?

A simple helper to automatically run paginated queries with ddb
This is useful to troubleshoot the internal DB state.

Uses paginated queries
Due a bug in the sync process, that does not consume paginated
results from dynamodb, we are getting duplicated users
in dynamodb.

This command searches all duplicated users based on email, keeping
only the first one (most recent CreatedAt). It reports
back the duplicated users and allow to delete them in dynamodb.

Supports options to limit the users to list, dryrun mode,
maximum duplicates to delete, etc.
We must only pass the unique keys to the deletion task, or
dynamodb complains by duplicates (even on a delete!)
To speed up the deletion of >600k entries
We might have 1000s of users, and the ddb.Query() will return
a paginated response.

We must read all of the pages in order to work. Otherwise, the
sync process will consider that the users retuned by the IDP
do not exist, and create new entries in each iteration.
detect and report duplicated users when syncing, returning
the oldest.

This is important if there are duplicates in DB to be sure
we use the oldest and provide a consistent state.

Logging is Good to find out about the existence of the bug
Also do not log all duplicated emails, can be many
@keymon
Copy link
Contributor Author

keymon commented Feb 1, 2024

We had to delete 100s of 1000s using the cli. But this in one example with one:

image

Fixed with:

./bin/commonfate ddb delete-duplicated-users --workers 20  --name Granted    | tee ~/tmp/delete-duplicated-users.report.$(date +%Y%m%d-%H%M%S).json > /dev/null 
Got 3041 users
Calculating duplicates...
Found 1 duplicated users
- user@company.com: 1
WARNING: Dry-run mode, skipping deletion...
./bin/commonfate ddb delete-duplicated-users --workers 20  --name Granted  --dry-run=false  | tee ~/tmp/delete-duplicated-users.report.$(date +%Y%m%d-%H%M%S).json > /dev/null 
Got 3041 users
Calculating duplicates...
Found 1 duplicated users
 - user@company.com: 1
Deleting duplicates at 2024-02-01T10:50:34Z using 20 workers...
Deleting 1 dups for email=shkumar@confluent.io at 2024-02-01T10:50:34Z...
Done deleting dups for email=shkumar@confluent.io Duration=173.408125ms...
Deletion complete at 2024-02-01T10:50:35Z. Duration 173.922ms...

@keymon keymon changed the title Sync: Fix page by page query in sync, leading to duplicated entries ### What changed? Sync: Fix page by page query in sync, leading to duplicated entries Feb 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant