New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel PATCH calls result in optimistic locking exception #107
Comments
the reason that the result can still be found in the database is a built in retry-strategy. Does this answer the problem? |
Thanks for the answer but how is entra handling this with a 500 response? And how does this effect the entra state (watermark)? |
Hmm... I am not able to reproduce this behaviour with a 500 response. Could you give me some more details or maybe provide your mock-implementation? This might help to pinpoint your problem. |
Sure! I have also tested it with KC18 and received a 500, but as you predicted, the user is not added to the group in KC18. Code:
|
I checked the first comment again and I am a little bit confused. How are you testing with kc-18? The plugin from https://scim-for-keycloak.de does not have support for Keycloak version before 20. The kc-18 plugin is still based on the OpenSource version from github that you can download here and this version does not have handling for such errors. But how does the kc-24.2.0.0 version into this? Do you have the exactly same problem with this version as with the OpenSource version? I cannot reproduce this with version kc-24-2.0.0 |
Sorry for the confusion. I have two separate KC instances running for testing, KC18 with the open source plugin and KC24 with the kc-24-2.0.0-enterprise plugin. I get the same 500 responses from both KC instances when testing the code above, but in KC18, the user is not added to the group. In KC24, on the other hand, the user is added to the group, but it still results in a 500 for me which might confuse leading IDPs.
I totally agree, but unfortunately, we might not be able to pressurize Microsoft into working with the SCIM standard. |
I see. |
I extended my integration-test but I am not able to reproduce it. This is my code: (This test has 1000 users created in the database that are all getting added to a single group in parallel patch-requests) version kc-23-2.0.0 dynamicNodes.add(DynamicTest.dynamicTest("Provoke OptimisticLockException on Groups-endpoint", () -> {
try
{
new ForkJoinPool(20).submit(() -> {
IntStream.range(0, users.size()).parallel().mapToObj(index -> {
final int userBulkId = 2 + index;
final User randomUser = users.get(String.valueOf(userBulkId));
Assertions.assertNotNull(randomUser, String.valueOf(userBulkId));
final String patchRequest;
patchRequest = """
{
"schemas": [
"urn:ietf:params:scim:api:messages:2.0:PatchOp"
],
"Operations": [
{
"op": "add",
"path": "members",
"value": {
"value": "%s",
"type": "%s"
}
}
]
}
""".formatted(randomUser.getId().get(), ResourceTypeNames.USER);
groupMembers.add(Member.builder()
.value(randomUser.getId().get())
.display(randomUser.getUserName().get())
.type(ResourceTypeNames.USER)
.build());
return Triple.of(index, patchRequest, randomUser.getId().get());
}).forEach(triple -> {
final String patchRequest = triple.getMiddle();
final String groupId = groups.get(String.valueOf(groupBulkId)).getId().get();
ServerResponse<Group> response = scimRequestBuilder.patch(Group.class, EndpointPaths.GROUPS, groupId)
.setPatchResource(patchRequest)
.sendRequest();
if (HttpStatus.NO_CONTENT == response.getHttpStatus() || HttpStatus.OK == response.getHttpStatus())
{
indexes.add(triple.getLeft());
}
else
{
ServerResponse<Group> groupResponse = scimRequestBuilder.get(Group.class, EndpointPaths.GROUPS, groupId)
.sendRequest();
Assertions.assertEquals(HttpStatus.OK, groupResponse.getHttpStatus());
Group group = groupResponse.getResource();
final String userMemberId = triple.getRight();
boolean wasNotAdded = group.getMembers()
.stream()
.noneMatch(member -> member.getValue().get().equals(userMemberId));
Assertions.assertTrue(wasNotAdded);
membersNotAddedDueToError.add(userMemberId);
}
});
}).get();
}
catch (Exception ex)
{
throw ex;
}
}));
/*
* this test will verify that the members that were added to the group in the previous were only added if no error was returned in the previous test
*/
dynamicNodes.add(DynamicTest.dynamicTest("Verify members successfully added to group", () -> {
{
ServerResponse<ListResponse<User>> response = scimRequestBuilder.list(User.class, EndpointPaths.USERS)
.filter("not(groups pr)")
.get()
.sendRequest();
Assertions.assertEquals(HttpStatus.OK, response.getHttpStatus());
log.warn("\n{}",
response.getResource()
.getListedResources()
.stream()
.map(user -> user.getUserName().get())
.collect(Collectors.joining("\n")));
}
{
Collections.sort(indexes);
ServerResponse<Group> response = scimRequestBuilder.get(Group.class,
EndpointPaths.GROUPS,
groups.get(String.valueOf(groupBulkId)).getId().get())
.sendRequest();
Assertions.assertEquals(HttpStatus.OK, response.getHttpStatus());
Group retrievedGroup = response.getResource();
Assertions.assertEquals(users.size() - membersNotAddedDueToError.size(), retrievedGroup.getMembers().size());
}
})); I think this code is a pretty good example and it provokes a lot of OptimisticLockExceptions in the process. Approximately 1% of the requests end up in status code 500 and the test does then verify that the member was not added to the group which is what we expect. |
Hey, I think I have found out the cause. I created 1000 users parallel in KC24 and added them to the same group. But I also checked the occurrences:
I received 131–500 responses on the PATCH call without any of the users being added to the group.
I received 177 - 500 responses to the PATCH call, with 169 users still added and 8 not added to the group. |
My tests actually never had ETags activated to begin with but this would make the whole thing even more weird. ETags are calculated on the JSON response-body just after the response-body was generated. The ETag is a SHA-1 digest on the minified JSON-string. I have no idea how this should affect the whole thing. But just to make sure, I tried both variations, with ETag and without and I get the same result in both cases... I still cannot reproduce this... Which database are you using? Maybe we can find the problem there. I am testing with postgres |
I totally get your concerns, and I am confused too. |
Hey, I have some interesting news about the behavior of Entra for you. We started two initial and one incremental cycle due to a misconfiguration of a custom attribute mapping. In the first initial cycle, we tracked a few 500 responses when PATCHING group memberships. In the second initial cycle, each PATCH call received a 200-word response. This is because no user or group has been deleted in Keycloak, but the state is still reset. Here you can find our access_log: censored_access.log |
Okay, I have a little bit of trouble following here, because I am not sure anymore if the error still seems to be on my side or if the error was found on Entra side:
|
Hey, as I have encountered the same problem as mentioned in #90, my company has acquired the SCIM for Keycloak enterprise edition to see if we could fix this issue with the use of KC24.
Therefore, I created a test setup with the kc-24-2.0.0-enterprise plugin installed on KC24 and a small Java application that mocks Entra.
The Java application tries to parallel create multiple users and add them to the same group.
The sequence looks like this:
The result is the following:
All threads die due to an optimistic locking exception in Keycloak except a single one.
The weird thing is that the thread which gets the 500 internal server error still patched the user into the group in the same request.
Internal server error:
2024-04-04 10:25:27,775 WARN [io.agroal.pool] (executor-thread-22) Datasource '': JDBC resources leaked: 2 ResultSet(s) and 2 Statement(s)
2024-04-04 10:25:27,776 ERROR [de.captaingoldfish.scim.sdk.keycloak.scim.AbstractEndpoint] (executor-thread-22) Failed to commit or rollback the transaction: org.hibernate.StaleObjectStateException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect) : [org.keycloak.models.jpa.entities.GroupAttributeEntity#1a3a3e46-69a2-4b3a-b01a-a906484c8f0a]
2024-04-04 10:25:27,776 ERROR [de.captaingoldfish.scim.sdk.keycloak.scim.AbstractEndpoint] (executor-thread-22) org.hibernate.StaleObjectStateException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect) : [org.keycloak.models.jpa.entities.GroupAttributeEntity#1a3a3e46-69a2-4b3a-b01a-a906484c8f0a]: org.keycloak.models.ModelException: org.hibernate.StaleObjectStateException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect) : [org.keycloak.models.jpa.entities.GroupAttributeEntity#1a3a3e46-69a2-4b3a-b01a-a906484c8f0a]
at org.keycloak.connections.jpa.PersistenceExceptionConverter.convert(PersistenceExceptionConverter.java:99)
at org.keycloak.connections.jpa.JpaExceptionConverter.convert(JpaExceptionConverter.java:31)
at org.keycloak.transaction.JtaTransactionWrapper.lambda$handleException$0(JtaTransactionWrapper.java:65)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1779)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at org.keycloak.transaction.JtaTransactionWrapper.handleException(JtaTransactionWrapper.java:67)
at org.keycloak.transaction.JtaTransactionWrapper.commit(JtaTransactionWrapper.java:92)
at org.keycloak.services.DefaultKeycloakTransactionManager.commit(DefaultKeycloakTransactionManager.java:136)
at de.captaingoldfish.scim.sdk.keycloak.scim.AbstractEndpoint.commitOrRollbackTransaction(AbstractEndpoint.java:159)
at de.captaingoldfish.scim.sdk.keycloak.scim.AbstractEndpoint.lambda$doAfterExecution$6(AbstractEndpoint.java:132)
at de.captaingoldfish.scim.sdk.server.endpoints.ResourceEndpoint.handleRequest(ResourceEndpoint.java:201)
at de.captaingoldfish.scim.sdk.keycloak.scim.AbstractEndpoint.lambda$handleScimRequest$2(AbstractEndpoint.java:75)
at de.captaingoldfish.scim.sdk.keycloak.scim.RetryStrategy.doWithRetries(RetryStrategy.java:71)
at de.captaingoldfish.scim.sdk.keycloak.scim.AbstractEndpoint.handleScimRequest(AbstractEndpoint.java:72)
at de.captaingoldfish.scim.sdk.keycloak.scim.ScimResourceServerEndpoint.handleScimRequest(ScimResourceServerEndpoint.java:124)
at de.captaingoldfish.scim.sdk.keycloak.scim.ScimResourceServerEndpoint.handleScimPatchRequest(ScimResourceServerEndpoint.java:89)
at de.captaingoldfish.scim.sdk.keycloak.scim.ScimResourceServerEndpoint$quarkusrestinvoker$handleScimPatchRequest_e58d4f8a5c56e81acada2ec0ef3ac16143e93153.invoke(Unknown Source)
at org.jboss.resteasy.reactive.server.handlers.InvocationHandler.handle(InvocationHandler.java:29)
at io.quarkus.resteasy.reactive.server.runtime.QuarkusResteasyReactiveRequestContext.invokeHandler(QuarkusResteasyReactiveRequestContext.java:141)
at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:147)
at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:582)
at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)
at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: org.hibernate.StaleObjectStateException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect) : [org.keycloak.models.jpa.entities.GroupAttributeEntity#1a3a3e46-69a2-4b3a-b01a-a906484c8f0a]
at org.hibernate.engine.jdbc.mutation.internal.ModelMutationHelper.identifiedResultsCheck(ModelMutationHelper.java:75)
at org.hibernate.persister.entity.mutation.AbstractDeleteCoordinator.lambda$doStaticDelete$2(AbstractDeleteCoordinator.java:283)
at org.hibernate.engine.jdbc.mutation.internal.ModelMutationHelper.checkResults(ModelMutationHelper.java:50)
at org.hibernate.engine.jdbc.mutation.internal.AbstractMutationExecutor.performNonBatchedMutation(AbstractMutationExecutor.java:115)
at org.hibernate.engine.jdbc.mutation.internal.MutationExecutorSingleNonBatched.performNonBatchedOperations(MutationExecutorSingleNonBatched.java:40)
at org.hibernate.engine.jdbc.mutation.internal.AbstractMutationExecutor.execute(AbstractMutationExecutor.java:52)
at org.hibernate.persister.entity.mutation.AbstractDeleteCoordinator.doStaticDelete(AbstractDeleteCoordinator.java:279)
at org.hibernate.persister.entity.mutation.AbstractDeleteCoordinator.coordinateDelete(AbstractDeleteCoordinator.java:87)
at org.hibernate.persister.entity.AbstractEntityPersister.delete(AbstractEntityPersister.java:2979)
at org.hibernate.action.internal.EntityDeleteAction.execute(EntityDeleteAction.java:131)
at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:632)
at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:499)
at org.hibernate.event.internal.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:363)
at org.hibernate.event.internal.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:41)
at org.hibernate.event.service.internal.EventListenerGroupImpl.fireEventOnEachListener(EventListenerGroupImpl.java:127)
at org.hibernate.internal.SessionImpl.doFlush(SessionImpl.java:1403)
at org.hibernate.internal.SessionImpl.managedFlush(SessionImpl.java:484)
at org.hibernate.internal.SessionImpl.flushBeforeTransactionCompletion(SessionImpl.java:2319)
at org.hibernate.internal.SessionImpl.beforeTransactionCompletion(SessionImpl.java:1976)
at org.hibernate.engine.jdbc.internal.JdbcCoordinatorImpl.beforeTransactionCompletion(JdbcCoordinatorImpl.java:439)
at org.hibernate.resource.transaction.backend.jta.internal.JtaTransactionCoordinatorImpl.beforeCompletion(JtaTransactionCoordinatorImpl.java:336)
at org.hibernate.resource.transaction.backend.jta.internal.synchronization.SynchronizationCallbackCoordinatorNonTrackingImpl.beforeCompletion(SynchronizationCallbackCoordinatorNonTrackingImpl.java:47)
at org.hibernate.resource.transaction.backend.jta.internal.synchronization.RegisteredSynchronization.beforeCompletion(RegisteredSynchronization.java:37)
at com.arjuna.ats.internal.jta.resources.arjunacore.SynchronizationImple.beforeCompletion(SynchronizationImple.java:52)
at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.beforeCompletion(TwoPhaseCoordinator.java:351)
at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.end(TwoPhaseCoordinator.java:69)
at com.arjuna.ats.arjuna.AtomicAction.commit(AtomicAction.java:138)
at com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.commitAndDisassociate(TransactionImple.java:1271)
at com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.commit(BaseTransaction.java:104)
at io.quarkus.narayana.jta.runtime.NotifyingTransactionManager.commit(NotifyingTransactionManager.java:70)
at org.keycloak.transaction.JtaTransactionWrapper.commit(JtaTransactionWrapper.java:90)
... 20 more
2024-04-04 10:25:27,880 ERROR [de.captaingoldfish.scim.sdk.common.response.ErrorResponse] (executor-thread-22) Cannot access delegate without a transaction: de.captaingoldfish.scim.sdk.common.exceptions.InternalServerException: Cannot access delegate without a transaction
at de.captaingoldfish.scim.sdk.server.endpoints.ResourceEndpointHandler.patchResource(ResourceEndpointHandler.java:1070)
at de.captaingoldfish.scim.sdk.server.endpoints.ResourceEndpoint.resolveRequest(ResourceEndpoint.java:290)
at de.captaingoldfish.scim.sdk.server.endpoints.ResourceEndpoint.handleRequest(ResourceEndpoint.java:189)
at de.captaingoldfish.scim.sdk.keycloak.scim.AbstractEndpoint.lambda$handleScimRequest$1(AbstractEndpoint.java:95)
at org.keycloak.models.utils.KeycloakModelUtils.lambda$runJobInTransaction$1(KeycloakModelUtils.java:257)
at org.keycloak.models.utils.KeycloakModelUtils.runJobInTransactionWithResult(KeycloakModelUtils.java:379)
at org.keycloak.models.utils.KeycloakModelUtils.runJobInTransaction(KeycloakModelUtils.java:256)
at de.captaingoldfish.scim.sdk.keycloak.scim.AbstractEndpoint.lambda$handleScimRequest$2(AbstractEndpoint.java:86)
at de.captaingoldfish.scim.sdk.keycloak.scim.RetryStrategy.doWithRetries(RetryStrategy.java:71)
at de.captaingoldfish.scim.sdk.keycloak.scim.AbstractEndpoint.handleScimRequest(AbstractEndpoint.java:72)
at de.captaingoldfish.scim.sdk.keycloak.scim.ScimResourceServerEndpoint.handleScimRequest(ScimResourceServerEndpoint.java:124)
at de.captaingoldfish.scim.sdk.keycloak.scim.ScimResourceServerEndpoint.handleScimPatchRequest(ScimResourceServerEndpoint.java:89)
at de.captaingoldfish.scim.sdk.keycloak.scim.ScimResourceServerEndpoint$quarkusrestinvoker$handleScimPatchRequest_e58d4f8a5c56e81acada2ec0ef3ac16143e93153.invoke(Unknown Source)
at org.jboss.resteasy.reactive.server.handlers.InvocationHandler.handle(InvocationHandler.java:29)
at io.quarkus.resteasy.reactive.server.runtime.QuarkusResteasyReactiveRequestContext.invokeHandler(QuarkusResteasyReactiveRequestContext.java:141)
at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:147)
at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:582)
at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)
at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.IllegalStateException: Cannot access delegate without a transaction
at org.keycloak.models.cache.infinispan.RealmCacheSession.getGroupDelegate(RealmCacheSession.java:176)
at org.keycloak.models.cache.infinispan.RealmCacheSession.getGroupById(RealmCacheSession.java:907)
at org.keycloak.models.cache.infinispan.RealmAdapter.getGroupById(RealmAdapter.java:1450)
at de.captaingoldfish.scim.sdk.keycloak.scim.groups.GroupHandler.getResource(GroupHandler.java:216)
at de.captaingoldfish.scim.sdk.keycloak.scim.groups.GroupHandler.getResource(GroupHandler.java:61)
at de.captaingoldfish.scim.sdk.keycloak.scim.patch.AbstractPatchOperationHandler.getUpdatedResource(AbstractPatchOperationHandler.java:414)
at de.captaingoldfish.scim.sdk.server.patch.PatchRequestHandler.getUpdatedResource(PatchRequestHandler.java:197)
at de.captaingoldfish.scim.sdk.server.endpoints.ResourceEndpointHandler.lambda$patchResource$24(ResourceEndpointHandler.java:1032)
at de.captaingoldfish.scim.sdk.server.interceptor.NoopInterceptor.doAround(NoopInterceptor.java:21)
at de.captaingoldfish.scim.sdk.server.endpoints.ResourceEndpointHandler.patchResource(ResourceEndpointHandler.java:1011)
... 22 more
I have also testet this with KC18 and got the exact same result.
Should the fix already be in the kc-24-2.0.0-enterprise plugin?
Thanks in advance!
The text was updated successfully, but these errors were encountered: