New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed race condition in recording, Recording file is now moved, inste… #200

Merged
merged 8 commits into from Jul 12, 2016

Conversation

Projects
None yet
3 participants
@pchlupacek
Contributor

pchlupacek commented Jun 25, 2016

as discussed in #128

@hrosa hrosa self-assigned this Jun 27, 2016

@hrosa hrosa added this to the 4.2.0 milestone Jun 27, 2016

@hrosa

This comment has been minimized.

Collaborator

hrosa commented Jun 27, 2016

During my tests I got the following exception:

15:13:01,132 ERROR [AudioRecorderImpl] Error writing to file
java.nio.file.NoSuchFileException: /Users/hrosa/Telestax/mobicents/restcomm/binaries/Restcomm-JBoss-AS7-7.6.0.873/standalone/deployments/restcomm.war/recordings/RE8abfb69ded5a4000b4550ae557a784c6.wav~ -> /Users/hrosa/Telestax/mobicents/restcomm/binaries/Restcomm-JBoss-AS7-7.6.0.873/standalone/deployments/restcomm.war/recordings/RE8abfb69ded5a4000b4550ae557a784c6.wav
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:451)
at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
at java.nio.file.Files.move(Files.java:1347)
at org.mobicents.media.server.impl.resource.audio.AudioRecorderImpl.commitRecording(AudioRecorderImpl.java:413)
at org.mobicents.media.server.impl.resource.audio.AudioRecorderImpl.deactivate(AudioRecorderImpl.java:181)
at org.mobicents.media.server.mgcp.pkg.au.PlayRecord.terminateRecordPhase(PlayRecord.java:308)
at org.mobicents.media.server.mgcp.pkg.au.PlayRecord.terminate(PlayRecord.java:380)
at org.mobicents.media.server.mgcp.pkg.au.PlayRecord.cancel(PlayRecord.java:217)
at org.mobicents.media.server.mgcp.controller.Request.cancel(Request.java:132)
at org.mobicents.media.server.mgcp.tx.cmd.NotificationRequestCmd$Request.perform(NotificationRequestCmd.java:121)
at org.mobicents.media.server.scheduler.Task.run(Task.java:122)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

@hrosa hrosa modified the milestones: 4.3.0, 4.2.0 Jun 27, 2016

@hrosa

This comment has been minimized.

Collaborator

hrosa commented Jun 30, 2016

Another issue appeared during load testing:

19:39:54,280 ERROR [AudioRecorderImpl] Error writing to file
java.nio.channels.ClosedChannelException
at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:99)
at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:273)
at org.mobicents.media.server.impl.resource.audio.AudioRecorderImpl.commitRecording(AudioRecorderImpl.java:405)
at org.mobicents.media.server.impl.resource.audio.AudioRecorderImpl.deactivate(AudioRecorderImpl.java:181)
at org.mobicents.media.server.mgcp.pkg.au.PlayRecord.terminateRecordPhase(PlayRecord.java:308)
at org.mobicents.media.server.mgcp.pkg.au.PlayRecord.terminate(PlayRecord.java:380)
at org.mobicents.media.server.mgcp.pkg.au.PlayRecord.cancel(PlayRecord.java:217)
at org.mobicents.media.server.mgcp.controller.Request.cancel(Request.java:132)
at org.mobicents.media.server.mgcp.tx.cmd.NotificationRequestCmd$Request.perform(NotificationRequestCmd.java:121)
at org.mobicents.media.server.scheduler.Task.run(Task.java:122)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Both issues seem to be extremely rare, still they point to concurrency problems in the recorder.

@pchlupacek

This comment has been minimized.

Contributor

pchlupacek commented Jun 30, 2016

@hrosa right. do you have any ideas when the different methods on recorder are called? I mean would be really good to know if the fix shall be proper.

@hrosa

This comment has been minimized.

Collaborator

hrosa commented Jun 30, 2016

I think this is because you still have cleanup logic in the checkIn() method.
I will advise again to try commenting that method until we get the lifecycle of these components done right.

without much analysis, I can tell you

  • recorder is taken from pool (checkout())
  • recorder is activated (activate())
  • then multiple invocations of onMediaTransfer() during recording operation
  • recording terminates and deactivate() is invoked
  • recorder is put back into the pool (checkIn)

I was naive enough to believe it was safe to implement checkIn method and do cleanup there, believing that since object is being put back into the pool then it was not being used anymore.
It seems that is not the case: maybe the last few bytes are still being written in a different thread or some other concurrent operation is still ongoing.

Another problem i noticed during my load tests (wrt recording use case) is that the resulting files tend to decrease in size sometimes. Recording quality is good though.
Maybe its a symptom of the problem I described: a new call picks up a pooled recorder that was still finishing writing operation and stops.

screen shot 2016-06-30 at 11 23 53

@hrosa

This comment has been minimized.

Collaborator

hrosa commented Jun 30, 2016

Couple of notes:

  • we must somehow guarantee that recorder is not placed back into the pool until the recording operation is absolutely completed.
  • Notice that a recording operation may terminate prematurely because of many different factors (DTMF key pressed, no voice detected in x seconds, user hangs up, etc).

We need to solidify concurrent operations in recorder taking into account these factors.

@hrosa

This comment has been minimized.

Collaborator

hrosa commented Jun 30, 2016

@pchlupacek I think we need to consider this behaviour:

Heartbeat task, which runs in different thread, is responsible for the timeouts I mentioned before (no voice, max recording duration, etc) and submits a KillRecording task to the scheduler, which will be executed in yet another thread:

https://github.com/RestComm/mediaserver/blob/master/resources/recorder/src/main/java/org/mobicents/media/server/impl/resource/audio/AudioRecorderImpl.java#L567
https://github.com/RestComm/mediaserver/blob/master/resources/recorder/src/main/java/org/mobicents/media/server/impl/resource/audio/AudioRecorderImpl.java#L573
https://github.com/RestComm/mediaserver/blob/master/resources/recorder/src/main/java/org/mobicents/media/server/impl/resource/audio/AudioRecorderImpl.java#L581

Then the kill recording executes deactivate() and completes the File writing operation:
https://github.com/RestComm/mediaserver/blob/master/resources/recorder/src/main/java/org/mobicents/media/server/impl/resource/audio/AudioRecorderImpl.java#L521

Keep also in mind that the Media Group may also deactivate() the recorder upon release:
https://github.com/RestComm/mediaserver/blob/master/controls/mgcp/src/main/java/org/mobicents/media/server/mgcp/endpoint/MediaGroup.java#L210
The media group will then release it back to the pool, where checkIn() is invoked.

I think this is where concurrency issue between deactivate() and checkIn() happens.

Any thoughts on how to improve this?

@pchlupacek

This comment has been minimized.

Contributor

pchlupacek commented Jun 30, 2016

@hrosa will review today, and see how this can be done.

@pchlupacek

This comment has been minimized.

Contributor

pchlupacek commented Jul 1, 2016

@hrosa My first thought would be to just toss away any Recorder once used. That way we can prevent anything to happen after Recorder is deactivated. If that isn't the solution, then I think only issue we have there is that we have to guarantee to null any references to file before the file move operation actually starts.

As you suggest, we will have the cleanup code only on one place (I remove that checkin check). Furthermore I am thinking to introduce an AtomicRef to guard about perhaps multiple concurrent invocations of the deactivate code.

Any thoughts? Suggestions?

@pchlupacek

This comment has been minimized.

Contributor

pchlupacek commented Jul 6, 2016

@hrosa just pushed the update. This is sort of re-work, and I believe shall be correct and safe for now. (At least cannot think of scenarios where concurrent invocations of deactivate or checkin will mess.

I have introduced the Sink, that is tossed away with each deactivate call and created when the file is set. Also this guarantees, that there cannot be write operations while the sink is making its commit, in case there were some leftover references writing to the recorder.

I run 10k call through it and no issue.

lmk what do you think

pchlupacek added some commits Jul 6, 2016

Merge branch 'master' of github.com:RestComm/mediaserver into recordi…
…ng-fix-2

# Conflicts:
#	resources/recorder/src/main/java/org/mobicents/media/server/impl/resource/audio/AudioRecorderImpl.java

@hrosa hrosa merged commit 6cb0a49 into RestComm:master Jul 12, 2016

@hrosa hrosa removed the Peer Review label Jul 12, 2016

@hrosa

This comment has been minimized.

Collaborator

hrosa commented Jul 12, 2016

nicely done @pchlupacek thank you!

@hrosa hrosa modified the milestones: 5.1.0, 6.0.0 Feb 25, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment