New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race between closing the file and copying it #128

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
3 participants
@pchlupacek
Contributor

pchlupacek commented Apr 21, 2016

Output stream closes on checkIn(). So the file copy operation may be interrupted in middle.

@deruelle deruelle added this to the 4.2.0 milestone Apr 21, 2016

@hrosa

This comment has been minimized.

Collaborator

hrosa commented Apr 28, 2016

Thank you for reporting this issue @pchlupacek

Unfortunately the proper fix for this problem won't be so easy. Can you please revert the changes done to deactivate() and empty the checkIn method instead and add a TODO?

Then open an issue to review the lifecycle of this component. This will have to be carefully analysed.

@pchlupacek

This comment has been minimized.

Contributor

pchlupacek commented Apr 28, 2016

@hrosa, sure. One idea I had in mind to proper fix it is following:

  • instead for copying the file, pre-allocate header before writing to it.
  • Then once we finish the recording just populate the header in file
  • then rename the file

This can be also forked to another thread (or task with scheduler), as the file names anyhow are // shall be unique.
Any ideas on this?

@hrosa

This comment has been minimized.

Collaborator

hrosa commented Apr 28, 2016

@pchlupacek that sounds good, feel free to open issue and take ownership if you're up to it :)

In fact, we have some issues on RestComm with long recorded calls because MS always takes quite some time to make the file available. So any strategy that helps speed up this process is very much welcome.

Overall, I'm currently working on improving and simplifying the class model of the Media Server.
As you might have noticed, the Media Server pools many different resource types (recorder, player, etc) but the Pool implementation was poor... basically the client took complete ownership of the resource so the pool could not guarantee the integrity of pooled resources. We ended up with cesspools instead.

I recently added a new ResourcePool implementation that calls out checkIn/checkout methods to reset and initialise the state of these resources. But as we just witnessed (thanks to you) we cannot implement this method carefree because of these race conditions. That's why we need to review the lifecycle of each pooled resources. This will probably be taken care of in milestones 4.3.0 or 4.4.0.

@pchlupacek

This comment has been minimized.

Contributor

pchlupacek commented Apr 29, 2016

@hrosa we found quite a lot of oddities with MS concurrency model. I am not sure what are your plans / restrictions here are, but perhaps just a quick conversation may help us to understand it better

I am coming from scala (functional) so perhaps better understand what your plans are, could make some development more effective.

What do you think ?

@hrosa

This comment has been minimized.

Collaborator

hrosa commented Apr 29, 2016

@pchlupacek Yes that is true, there are some oddities at the moment.

So originally the Media Server used a Scheduler to execute all tasks in intervals of 20ms. This interval was imposed to ensure media synchronization. As you can imagine, this is pretty bad because any non-media tasks are delayed 20ms without real necessity and this slows things down in MS core.

When I implemented support for WebRTC this was specially bad because in a single channel we had several protocols multiplexed (RTP, RTCP, STUN, DTLS). Although the 20ms applies finely for RTP, the same is not true for other protocols. What happened in the end was that we experienced increasing delays in calls because the MS could not read from sockets in a timely fashion to support all the multiplexed protocols.
The solution was to stop using the scheduler for IO operations and use a JDK Executor instead.

Now, the problem is that we have two different schedulers co-existing in the Media Server and this may cause some oddities. And this is one issue that we will try to address in the 4.x milestones, together with improving the class model of MS.

Regarding class model, I already changed the way the Media Server bootstraps and introduced Guice to provide Dependency Injection and AOP capabilities. DI will help simplify the complicated class model and AOP will hide access to aspects like resource pooling.
Another major important task that I'm currently working is the MGCP stack refactoring.

We are also going to integrate Netty in the project to replace that dreadful UDPManager class, which will also help improve the thread model as far as IO is concerned.

When MGCP refactoring and Netty tasks are completed, we need to start looking at the Media Scheduler and all media components (player, recorder, etc). I think by the end of this task we can get the threading model right in MS.

This is a lot of work and may take some time to achieve, so I will ask some patience from users.
If you wish to join this effort you're more than welcome :)

If you're experiencing issues please open tickets here on github. If you want to debate the MS design flaws please open a thread in our public forum https://groups.google.com/forum/#!forum/restcomm

Let me know if you have any questions.

@hrosa

This comment has been minimized.

Collaborator

hrosa commented Apr 29, 2016

Forgot to mention, feel free to join the community here

https://gitter.im/RestComm/Restcomm-discuss

The team and community are always present here and we held weekly meetings every Wednesday at 4pm (GMT)

@pchlupacek

This comment has been minimized.

Contributor

pchlupacek commented Apr 29, 2016

@hrosa thanks will follow up on gitter

@deruelle

This comment has been minimized.

Member

deruelle commented Jun 27, 2016

Thanks @pchlupacek ! Acknowledged contribution at https://telestax.com/acknowledgements/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment