-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[PM-28555] Add DefaultCollectionSemaphore table #6791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ed by bulk insert
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6791 +/- ##
==========================================
+ Coverage 54.93% 58.87% +3.93%
==========================================
Files 1927 1928 +1
Lines 85457 85454 -3
Branches 7648 7649 +1
==========================================
+ Hits 46949 50312 +3363
+ Misses 36723 33274 -3449
- Partials 1785 1868 +83 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
New Issues (3)Checkmarx found the following issues in this Pull Request
|
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
r-tome
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
|
I reached out to @bitwarden/dept-dbops this morning and haven't heard back yet but I am not initially understanding this pattern and usage of the database. Logistically-speaking I think it's better to name such a thing regarding its association and to drop the "default" language (we always lead with domain too) but fundamentally why is the database used for this vs. a persistent cache? I'd like to hear about rationale and what was assessed to address this problem; we don't do this anywhere else that I know of and personally I have never used a relational database table for this kind of lock. We should strive to keep logic out of the data tier. |
|
@withinfocus The naming can be adjusted no problem. In terms of the fundamental approach, the thinking was that a database can enforce uniqueness constraints as a matter of data integrity. Same as enforcing a uniqueness constraint on the Let me know how you go with dbops - if you want to revisit this approach we can, but probably best that we have a meeting about it to get alignment before spending more time on it. |
| public async Task CreateDefaultCollectionsAsync(Guid organizationId, IEnumerable<Guid> organizationUserIds, string defaultCollectionName) | ||
| { | ||
| organizationUserIds = organizationUserIds.ToList(); | ||
| if (!organizationUserIds.Any()) | ||
| { | ||
| return; | ||
| } | ||
|
|
||
| using var scope = ServiceScopeFactory.CreateScope(); | ||
| var dbContext = GetDatabaseContext(scope); | ||
|
|
||
| var orgUserIdWithDefaultCollection = await GetOrgUserIdsWithDefaultCollectionAsync(dbContext, organizationId); | ||
| var missingDefaultCollectionUserIds = organizationUserIds.Except(orgUserIdWithDefaultCollection); | ||
|
|
||
| var (collectionUsers, collections) = BuildDefaultCollectionForUsers(organizationId, missingDefaultCollectionUserIds, defaultCollectionName); | ||
| var (collections, collectionUsers) = CollectionUtils.BuildDefaultUserCollections(organizationId, missingDefaultCollectionUserIds, defaultCollectionName); | ||
|
|
||
| if (!collectionUsers.Any() || !collections.Any()) | ||
| if (!collections.Any() || !collectionUsers.Any()) | ||
| { | ||
| return; | ||
| } | ||
|
|
||
| await dbContext.BulkCopyAsync(collections); | ||
| await dbContext.BulkCopyAsync(collectionUsers); | ||
| await dbContext.BulkCopyAsync(Mapper.Map<IEnumerable<Collection>>(collections)); | ||
| await dbContext.BulkCopyAsync(Mapper.Map<IEnumerable<CollectionUser>>(collectionUsers)); | ||
|
|
||
| await dbContext.SaveChangesAsync(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ CRITICAL: Missing transaction wrapper for atomic operations
This method performs multiple database operations (BulkCopyAsync for Collections, BulkCopyAsync for CollectionUsers) without wrapping them in an explicit transaction. This violates the atomicity requirement stated in the stored procedure comment: "this MUST be executed in a single transaction to ensure consistency."
Risk: Partial failures could leave the database in an inconsistent state where collections are created without their corresponding collection users, or vice versa.
Evidence: The Dapper implementation correctly wraps these operations in an explicit transaction (lines 410-436 in Dapper CollectionRepository.cs):
await using var transaction = connection.BeginTransaction();
try {
// operations
await transaction.CommitAsync();
} catch {
await transaction.RollbackAsync();
throw;
}Fix: Wrap the entire method body in a transaction similar to the Dapper implementation, or document why EF's implicit transaction handling is sufficient here (though BulkCopyAsync typically bypasses EF's change tracker).
| public async Task CreateDefaultCollectionsBulkAsync(Guid organizationId, IEnumerable<Guid> organizationUserIds, string defaultCollectionName) | ||
| { | ||
| organizationUserIds = organizationUserIds.ToList(); | ||
| if (!organizationUserIds.Any()) | ||
| { | ||
| return; | ||
| } | ||
|
|
||
| await using var connection = new SqlConnection(ConnectionString); | ||
| connection.Open(); | ||
| await using var transaction = connection.BeginTransaction(); | ||
| try | ||
| { | ||
| var orgUserIdWithDefaultCollection = await GetOrgUserIdsWithDefaultCollectionAsync(connection, transaction, organizationId); | ||
|
|
||
| var missingDefaultCollectionUserIds = organizationUserIds.Except(orgUserIdWithDefaultCollection); | ||
|
|
||
| var (collectionUsers, collections) = BuildDefaultCollectionForUsers(organizationId, missingDefaultCollectionUserIds, defaultCollectionName); | ||
| var (collections, collectionUsers) = | ||
| CollectionUtils.BuildDefaultUserCollections(organizationId, missingDefaultCollectionUserIds, defaultCollectionName); | ||
|
|
||
| if (!collectionUsers.Any() || !collections.Any()) | ||
| { | ||
| return; | ||
| } | ||
|
|
||
| await BulkResourceCreationService.CreateCollectionsAsync(connection, transaction, collections); | ||
| await BulkResourceCreationService.CreateCollectionsUsersAsync(connection, transaction, collectionUsers); | ||
|
|
||
| transaction.Commit(); | ||
| await transaction.CommitAsync(); | ||
| } | ||
| catch | ||
| { | ||
| transaction.Rollback(); | ||
| await transaction.RollbackAsync(); | ||
| throw; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method still has a TOCTOU race condition:
- Line 415: Read existing collections via
GetOrgUserIdsWithDefaultCollectionAsync(inside transaction) - Line 417: Filter in C# to find missing collections
- Lines 427-428: Insert collections and collection users (same transaction)
Race condition scenario:
If two concurrent calls both execute step 1 at approximately the same time for overlapping user sets, both could determine the same user is "missing" a collection, then both would attempt to insert, causing duplicate key violations or duplicate collections.
Why the transaction doesn't prevent this: While the transaction ensures atomicity of the inserts within a single call, it doesn't provide isolation between concurrent calls at the SERIALIZABLE level needed to prevent this race.
Comparison with non-bulk method: The CreateDefaultCollectionsAsync method (lines 364-400) delegates filtering to the stored procedure, which performs the NOT EXISTS check and INSERT in one operation. While this is better, it still lacks a unique constraint to fully prevent the race at the database level.
Recommendation:
- Add a unique constraint on
CollectionUser(OrganizationUserId)whereCollection.Type = DefaultUserCollection - Or use SERIALIZABLE isolation level
- Or accept potential duplicate key errors and handle them gracefully
| -- Filter to only users who don't have default collections | ||
| SELECT ids.Id1, ids.Id2 | ||
| INTO #FilteredIds | ||
| FROM @OrganizationUserCollectionIds ids | ||
| WHERE NOT EXISTS ( | ||
| SELECT 1 | ||
| FROM [dbo].[CollectionUser] cu | ||
| INNER JOIN [dbo].[Collection] c ON c.Id = cu.CollectionId | ||
| WHERE c.OrganizationId = @OrganizationId | ||
| AND c.[Type] = 1 -- CollectionType.DefaultUserCollection | ||
| AND cu.OrganizationUserId = ids.Id1 | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NOT EXISTS check followed by INSERT operations creates a potential race condition that transactions alone cannot prevent under default READ COMMITTED isolation level.
Race scenario:
- Call A executes NOT EXISTS for User1 → returns true (no collection exists)
- Call B executes NOT EXISTS for User1 → returns true (no collection exists)
- Call A inserts Collection and CollectionUser for User1
- Call B inserts Collection and CollectionUser for User1 → Duplicate or error
Why transaction wrapper isn't sufficient: The caller wraps this in a transaction (line 378 in Dapper CollectionRepository.cs), but SQL Server's default READ COMMITTED isolation doesn't prevent phantom reads between the NOT EXISTS check and the INSERT.
Current safeguards:
- Collection table has PK on Id (prevents duplicate collections since IDs are pre-generated)
- But CollectionUser table may allow duplicates if no unique constraint exists on (CollectionId, OrganizationUserId)
Recommendations (choose one):
- Preferred: Add unique constraint
(OrganizationUserId)filtered whereCollection.Type = 1- this makes the database enforce the business rule - Use SERIALIZABLE isolation level in the transaction wrapper
- Handle unique constraint violations gracefully and return success (idempotent behavior)
Question for maintainers: Does the CollectionUser table have a unique constraint on (CollectionId, OrganizationUserId)? If yes, this would cause violations under concurrent load rather than silent duplicates.
|
After some discussions, we are going to simplify this approach and not seek a database-level guarantee to avoid duplicates. We are going to rely on querying for existing default collections first. If we experience TOCTOU issues in practice then we can look at increasing the isolation level of the transaction, but in any case, we are not going with the semaphore table. I will open a new PR so there is a tidier commit history and better context for everyone. |


🎟️ Tracking
https://bitwarden.atlassian.net/browse/PM-28555
📔 Objective
Users are only permitted 1 default user collection ("My Items") for an organization. Today, this is enforced by querying the database and skipping users who already have such a collection, however this is susceptible to time-of-check time-of-use bugs. There is no atomic check or constraint that prevents duplicate collections from being created. This PR adds this guarantee.
A significant challenge is that the unique columns are spread across 2 tables:
Collection.Type(to identify default collections only) andCollectionUser.OrganizationUserId. Additionally, we have many complex sprocs for theCollectiontable, which makes it hard to modify without making a large impact throughout the codebase.I went with a solution suggested by @mkincaid-bw, which was a semaphore table. Advantages:
Changes in this PR:
ICollectionRepository.CreateAsyncare migrated to thisSqlBulkCopymethod to use semaphoresDesign decisions worth pointing out:
OrganizationUserId, and a creation date for debugging purposes. We could have a FK to the Collection as well, but it's not strictly required. I haven't included it for now.📸 Screenshots
⏰ Reminders before review
🦮 Reviewer guidelines
:+1:) or similar for great changes:memo:) or ℹ️ (:information_source:) for notes or general info:question:) for questions:thinking:) or 💭 (:thought_balloon:) for more open inquiry that's not quite a confirmed issue and could potentially benefit from discussion:art:) for suggestions / improvements:x:) or:warning:) for more significant problems or concerns needing attention:seedling:) or ♻️ (:recycle:) for future improvements or indications of technical debt:pick:) for minor or nitpick changes