Restcomm1147 rms resources leak on conf#2564
Conversation
This refer to RESTCOMM-1147
This refer to RESTCOMM-1155
…resources_leak_on_conf
This refer to RESTCOMM-1155
This refer to RESTCOMM-1155
This close RESTCOMM-1155
This refer to RESTCOMM-1158
…restcomm1147_rms_resources_leak_on_conf
This close RESTCOMM-1158 This refer to RESTCOMM-1155
…test This refer to RESTCOMM-1155
This close RESTCOMM-1163
…rt proper MGCP resources at the end of each test This refer to RESTCOMM-1155 This refer to RESTCOMM-1147
| @@ -0,0 +1,26 @@ | |||
| <?xml version="1.0" encoding="UTF-8" ?> | |||
There was a problem hiding this comment.
i guess we need to find a way to configure how this is used in JenkinsCI env...
There was a problem hiding this comment.
Agree @jaimecasero . I did this to quickly enable logs in the testsuite since they are essential when working locally with the testsuite
| public static Map<MediaSession, ActorRef> getLinks() { | ||
| return links; | ||
| connectionToEndpointMap = new ConcurrentHashMap<ActorRef, ActorRef>(); | ||
| connEndpointMap = new ConcurrentHashMap<String, String>(); |
There was a problem hiding this comment.
considering this is Actor state, and concurrency should be already solved by Actor model, is there any particular intention on using ConcurrentHashMap??
There was a problem hiding this comment.
@jaimecasero no particular reason, just a habit
|
|
||
| public static Map<MediaSession, ActorRef> getLinks() { | ||
| return links; | ||
| connectionToEndpointMap = new ConcurrentHashMap<ActorRef, ActorRef>(); |
There was a problem hiding this comment.
i dont seem to find where connectionToEndpointMap is used? could it be this slid into final code after different design/refactorings?
There was a problem hiding this comment.
@jaimecasero I forgot to clean up this one. I will clean this now
| final ActorRef self = self(); | ||
| final DeleteConnection dlcx = (DeleteConnection) message; | ||
| if (dlcx.getConnectionIdentifier() == null) { | ||
| connEndpointMap.values().removeAll(Collections.singleton(dlcx.getEndpointIdentifier().getLocalEndpointName())); |
There was a problem hiding this comment.
can we log the result of this operation, i think it could be worth to print it, whether we deleted something or not...
There was a problem hiding this comment.
@jaimecasero you mean the operation to remove from the Map? Yes I can add this at the log statement later
| logger.info(msg); | ||
| } | ||
| } else { | ||
| connEndpointMap.remove(dlcx.getConnectionIdentifier().toString()); |
There was a problem hiding this comment.
same here, can we save the result of this operation for logging later?
| } else { | ||
| connEndpointMap.remove(dlcx.getConnectionIdentifier().toString()); | ||
| monitoringService.tell(new MgcpConnectionDeleted(dlcx.getConnectionIdentifier().toString(), null), self()); | ||
| if (!connEndpointMap.values().contains(dlcx.getEndpointIdentifier().getLocalEndpointName())) { |
There was a problem hiding this comment.
could you add some comment to this logic? im not sure im following the intention. It seems we already removed the map entry using the connId, which is the key to the map. Then we check if the endpointId is still in the map as value, if not we tell the service that is was removed. Is there an scenario where connId was removed, but endpointId is still present, what would be the use case there? Are the endpointId shared among different connIds?
There was a problem hiding this comment.
@jaimecasero RC only deletes connections, on the RMS side when there is no connection for a given endpoint, the endpoint is released.
This is what the method does here.
First remove endpoint-connection, then check if there are more connections for the give endpoint, IF NOT, consider the endpoint released and tell the MonitoringService.
I added comments for that.
| mgcpEndpointsConference = new AtomicInteger(0); | ||
| mgcpEndpointsPacketRelay = new AtomicInteger(0); | ||
|
|
||
| mgcpEndpointMap = new ConcurrentHashMap<String, String>(); |
There was a problem hiding this comment.
again, do we need concurrent collections as Actor state?
| mgcpEndpointsConference = new AtomicInteger(0); | ||
| mgcpEndpointsPacketRelay = new AtomicInteger(0); | ||
|
|
||
| mgcpEndpointMap = new ConcurrentHashMap<String, String>(); |
There was a problem hiding this comment.
seems we only need the size of these maps to return as part of onStatistics logic... Do we really need to save all this info into maps? it seems a potential place for unnecesary leaks, when the size of these maps can be already calculated by using other counters already present... I would try to reduce state to simple counters, that maybe be wrong but do not risk to introduce more leaks...
There was a problem hiding this comment.
@jaimecasero I am thinking that in the future we can provide MGCP resource details by call, similar to the live call details, so in that case the Map is needed.
Agree with your point but I don't see a risk of leaks here.
| fsm.transition(message, completed); | ||
| } | ||
| //Don't wait for ever for the CallController response | ||
| context().setReceiveTimeout(Duration.create(2000, TimeUnit.SECONDS)); |
There was a problem hiding this comment.
this kind of timeouts should come from configuration, so we can tweak then (or customers) depending on envirnoment... where is 2 secs coming from (for me is regular call timeout, just wondering if this is iline with some other system in the cloud or something)=
There was a problem hiding this comment.
@jaimecasero this is a safety net for Conference in a cluster configuration where notification from media server might never arrive. It doesn't need to be on configuration since this is an internal, very specific timer to be able to cleanup call no matter what error condition might occur. No particular reason for for the 2 sec duration, I can reduce it if you think its too much
| assertTrue(maxConcurrentIncomingCalls==0); | ||
| assertTrue(maxConcurrentOutgoingCalls==0); | ||
|
|
||
| Map<String, Integer> mgcpResources = MonitoringServiceTool.getInstance().getMgcpResources(metrics); |
There was a problem hiding this comment.
this will be a fairly common construct, could we isolate this in a Assert util class, where access to monitoring services, and literal to specific counters are protected
There was a problem hiding this comment.
@jaimecasero I thought about that but let this task for later as it will take significant time. I did half of the work (MonitoringServiceTool.getInstance().getMgcpResources(metrics)) but the rest of the enhancement can be done in another sprint with proper estimation
This refer to RESTCOMM-1147
PR also includes MGCP resources metrics and updated tests to assert MGCP resources are properly cleaned up