Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not acquire a lock from redis error - on creating observe request. #832

Closed
madhushreegc opened this issue Apr 17, 2020 · 10 comments
Closed
Labels
question Any question about leshan

Comments

@madhushreegc
Copy link

madhushreegc commented Apr 17, 2020

Hi ,

Leshan version M10.

I am triggering more than 20 observe request towards the same device from leshan server . Some time I see below error. If I change the lock key used now for add, addObservation and removeObservation method in RedisStoreRegistraion like LOCK:EP:deviceName:lwm2mpath (example : LOCK:EP:urn:imei:devideName:3:0:3) . Will I face any issues any where for any operations?

04:36:39,985 ERROR [StripedExchangeJob] Exception in striped thread: Could not acquire a lock from redis
java.lang.IllegalStateException: Could not acquire a lock from redis
        at org.eclipse.leshan.server.cluster.RedisLock.acquire(RedisLock.java:52) ~[leshan-server-cluster-1.0.0-M10.jar:?]
        at org.eclipse.leshan.server.cluster.RedisRegistrationStore.add(RedisRegistrationStore.java:551) ~[leshan-server-cluster-1.0.0-M10.jar:?]
        at org.eclipse.leshan.server.cluster.RedisRegistrationStore.putIfAbsent(RedisRegistrationStore.java:533) ~[leshan-server-cluster-1.0.0-M10.jar:?]
        at org.eclipse.californium.core.network.BaseMatcher.registerObserve(BaseMatcher.java:187) ~[californium-core-2.0.0-M12.jar:?]
        at org.eclipse.californium.core.network.UdpMatcher.sendRequest(UdpMatcher.java:115) ~[californium-core-2.0.0-M12.jar:?]
        at org.eclipse.californium.core.network.CoapEndpoint$OutboxImpl.sendRequest(CoapEndpoint.java:713) ~[californium-core-2.0.0-M12.jar:?]
        at org.eclipse.californium.core.network.stack.BaseCoapStack$StackBottomAdapter.sendRequest(BaseCoapStack.java:187) ~[californium-core-2.0.0-M12.jar:?]
        at org.eclipse.californium.core.network.stack.ReliabilityLayer.sendRequest(ReliabilityLayer.java:107) ~[californium-core-2.0.0-M12.jar:?]
        at org.eclipse.californium.core.network.stack.BlockwiseLayer.sendRequest(BlockwiseLayer.java:296) ~[californium-core-2.0.0-M12.jar:?]
        at org.eclipse.californium.core.network.stack.AbstractLayer.sendRequest(AbstractLayer.java:66) ~[californium-core-2.0.0-M12.jar:?]
        at org.eclipse.californium.core.network.stack.AbstractLayer.sendRequest(AbstractLayer.java:66) ~[californium-core-2.0.0-M12.jar:?]
        at org.eclipse.californium.core.network.stack.ExchangeCleanupLayer.sendRequest(ExchangeCleanupLayer.java:45) ~[californium-core-2.0.0-M12.jar:?]
        at org.eclipse.californium.core.network.stack.BaseCoapStack$StackTopAdapter.sendRequest(BaseCoapStack.java:142) ~[californium-core-2.0.0-M12.jar:?]
        at org.eclipse.californium.core.network.stack.BaseCoapStack.sendRequest(BaseCoapStack.java:80) ~[californium-core-2.0.0-M12.jar:?]
        at org.eclipse.californium.core.network.CoapEndpoint$4.runStriped(CoapEndpoint.java:600) ~[californium-core-2.0.0-M12.jar:?]
        at org.eclipse.californium.core.network.StripedExchangeJob.run(StripedExchangeJob.java:65) [californium-core-2.0.0-M12.jar:?]
        at eu.javaspecialists.tjsn.concurrency.stripedexecutor.StripedExecutorService$SerialJob.run(StripedExecutorService.java:548) [element-connector-2.0.0-M12.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_131]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_131]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_131]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
@sbernard31
Copy link
Contributor

Will I face any issues any where for any operations?

Yes. We lock on Endpoint Name because we want atomic operation for a given endpoint name.

The question is why you get the could not be acquire ... We try to acquire the lock during 5s and redis access should be really fast...

@sbernard31 sbernard31 added the question Any question about leshan label Apr 17, 2020
@madhushreegc
Copy link
Author

I am facing this issue very often when there are so many concurrent observe request towards device . Redis has huge number of keys .

Your advice is not to change the key for locking mechanism for observe request and to increase the duration to acquire the redis key lock correct?

@yemkay
Copy link

yemkay commented Apr 19, 2020

On a race condition, RedisLock could block a Jedis resource for a longer duration while it waits to acquire lock. Just wondering, if it makes sense to release the Jedis resource back to pool, when RedisLock.acquire goes to sleep for 10ms.

@sbernard31
Copy link
Contributor

Your advice is not to change the key for locking mechanism for observe request and to increase the duration to acquire the redis key lock correct?

Not really.
You face this after only 20 observe request on the same endpoint ?
I'm still surprising that the lock could not be acquired after 5s ... My guess would be that this should not happen and you should rather try to understand the issue more deeply maybe by monitoring your redis server or the Leshan code to find the time consuming part of code.

If I found time, I will try to reproduce this on my side.

On a race condition, RedisLock could block a Jedis resource for a longer duration while it waits to acquire lock. Just wondering, if it makes sense to release the Jedis resource back to pool, when RedisLock.acquire goes to sleep for 10ms.

Maybe it could 🤔
This would be useful if you face some new JedisExhaustedPoolException( "Could not get a resource since the pool is exhausted", ....) because of RedisLock.
But this means we need to rewrite the code.

Not directly linked but I think we should :

  • have WARN log if the lock expire before we try to unlock.
  • Make an interface about RedisLock to allow several implementation.
  • Timing of default implementation could be configurable.

I also ask myself if log is enough when we failed to acquired. 🤔

@yemkay
Copy link

yemkay commented Apr 22, 2020

Thanks Simon. We narrowed down the issue to another application which was running a "keys *" command on Redis, which eventually slowed down other Redis clients. We dont see a problem in Leshan per se, but this extension would help someone facing similar issue.

Make an interface about RedisLock to allow several implementation.

@sbernard31
Copy link
Contributor

Glad to see you finally find out the issue.
Using Keys * command in production is rarely a good idea 🙂

I will create an issue to remember to improve the RedisLock stuff.

Could we close this issue ?

@sbernard31
Copy link
Contributor

By the way, as you are using Leshan, could you eventually take time to give us some information about that in : #830 🙏

@sbernard31
Copy link
Contributor

sbernard31 commented Apr 22, 2020

I create an issue about improving RedisLock flexibility : #836
And fix a possible race condition : #837

@sbernard31
Copy link
Contributor

I hope I didn't afraid you with my "leshan user research" stuff 😅

I close the issue, feel free to reopen if needed.

@yemkay
Copy link

yemkay commented Apr 30, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Any question about leshan
Projects
None yet
Development

No branches or pull requests

3 participants