Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quicken FastMovieWriter? #1

Open
adehad opened this issue Jun 1, 2019 · 19 comments
Open

Quicken FastMovieWriter? #1

adehad opened this issue Jun 1, 2019 · 19 comments
Labels
Performance Performance-related Workarounds Workarounds for bugs or issues

Comments

@adehad
Copy link

adehad commented Jun 1, 2019

Just inquiring if there are any plans to improve the speed of FastMovieWriter?
At the moment we are able to add a frame every 5 seconds, which results in tediously long .fmv creation time for our high frame rate system.

I had a look at some of the source code and it seems that speed ups could be done in:

  1. Having a constructor that takes the (maximum) number of frames as an argument, to then pre-allocate space for all the frames.
  2. Utilising asynchronous running of the file writing commands compared to the MATLAB script. As demonstrated here for example: https://uk.mathworks.com/help/matlab/matlab_external/call-matlab-from-user-threads-in-mex-function.html

I would also be excited to try these out myself, but cannot seem to find current build instructions?

Cheers

@cculianu
Copy link
Owner

cculianu commented Jun 1, 2019

Hi there!

Hmm.. I'll take a look at that stuff. Yes, I probably can be improved dramatically, for sure. I think when I developed this there was no threading in matlab...

5 seconds per frame is abysmal. Wow. Is it spending most of its time in FastMovieWriter ??

@cculianu
Copy link
Owner

cculianu commented Jun 1, 2019

Wow that link you pasted is very sexy C++ code. I hadn't realized matlab updated their api to be so C++-friendly.

As for build instructions -- I forgot how to build this thing! I seem to remember mostly the matlab bits required just "mex file.c" or whatever -- is that not working?

@adehad
Copy link
Author

adehad commented Jun 1, 2019

Indeed it appears to spend a majority of its time in AddFrame:
image

However, I will be doing more Profiling in the coming days to see if there is any 'badness' on our side.

Ahaha, yes MATLAB does have quite a few useful features that are terribly documented. If you are interested, there is an article by the godly Yair Altman, which you might find interesting:
https://undocumentedmatlab.com/blog/multi-threaded-mex

I did try the mex file.cpp command, but it seems that with the MS Build tools I was using, it didn't play nice.

@adehad
Copy link
Author

adehad commented Jun 2, 2019

I believe I found the cause, the uint8 conversion done inside the MEX file!

if (clsid == mxUINT8_CLASS) {
// optimized processing of pixels if the format is correct -- just do a memcpy!
memcpy(imgbuf, m, sx*sy);
} else {
for (int x = 0; x < sx; ++x) {
for (int y = 0; y < sy; ++y) {
int color = 0;
switch(clsid) {
case mxCHAR_CLASS:
case mxINT8_CLASS: color = static_cast<int>(((signed char *)m)[y*sx + x]) + 128; break;
case mxUINT8_CLASS: color = ((unsigned char *)m)[y*sx + x]; break;
case mxINT16_CLASS: color = ((short *)m)[y*sx + x]; break;
case mxUINT16_CLASS: color = ((unsigned short *)m)[y*sx + x]; break;
case mxUINT32_CLASS: color = ((unsigned int *)m)[y*sx + x]; break;
case mxINT32_CLASS: color = ((int *)m)[y*sx + x]; break;
case mxDOUBLE_CLASS: color = ((double *)m)[y*sx + x] * 255.; break;
case mxSINGLE_CLASS: color = ((float *)m)[y*sx + x] * 255.; break;
default:
delete [] imgbuf;
mexErrMsgTxt("Argument 2 must be a matrix of numeric type.");
}
if (color < 0) color = 0;
if (color > 255) color = 255;
(imgbuf + (y*sx))[x] = (uint8_t)color;
}
}
}
if (!FM_AddFrame(c->ctx, imgbuf, sx, sy, clevel)) {
delete [] imgbuf;
mexErrMsgTxt("Failed in call to FM_AddFrame().");
}
delete [] imgbuf;
++c->frameCt;
RETURN(1);
}

Might be better to do it in the MATLAB call itself.

FastMovieWriterMex('addFrame', g.handle, frame, clevel);

e.g.
FastMovieWriterMex('addFrame', g.handle, uint8(frame), clevel);

Or throw a warning of sorts.

The Profiler now says AddFrame takes 0.18s for our entire stimulus (!)

I found some other inefficiencies in our main script, so I'll look into those to further speed up our rendering.

@cculianu
Copy link
Owner

cculianu commented Jun 2, 2019

Nice job!! Oh wow. That should be way better documented. You are right -- it should throw an exception. It's useless to accept those values if the slowdown is so large.

Thanks for catching this. I honestly forgot how this code works -- I'm super glad you managed to get it working!!

@cculianu cculianu pinned this issue Jun 2, 2019
@cculianu cculianu added Performance Performance-related Workarounds Workarounds for bugs or issues labels Jun 2, 2019
@adehad
Copy link
Author

adehad commented Jun 2, 2019

So I just found out that AddFrame does not take 0.18s for the same stimulus used in the screenshot earlier, my bad.
The profile does show we saved 300s or so of run time though (using MATLAB to typecast to uint8):
image

So it still might be worth looking into further optimisations.

@cculianu
Copy link
Owner

cculianu commented Jun 2, 2019

Hmm. It might be spending most of its time compressing. Try passing a third argument to AddFrame, 0. This is the compression level. Try turning off compression altogether. See this comment:

% frame to the movie file. The optional third argument

I think if compression is set to 0, it should be very fast.

@adehad
Copy link
Author

adehad commented Jun 2, 2019

It is indeed faster (around 400s). However the filesize went from ~30MB to ~30GB (!!!).
image

@cculianu
Copy link
Owner

cculianu commented Jun 3, 2019

So try compression 1, 2, 3, etc. 9 is the maximum (and the default!). I bet even with 1 or 2 it will be much better.

Sadly I should have used a real movie format to do this -- a lossless one such as FFv1. But at the time integrating a real movie encoder like ffmpeg was beyond the scope of this project.

I suspect hope with compression 1 or 2 or 3 you may get a happy balance between speed and size? :/

@adehad
Copy link
Author

adehad commented Jun 3, 2019

So try compression 1, 2, 3, etc. 9 is the maximum (and the default!). I bet even with 1 or 2 it will be much better.
I suspect hope with compression 1 or 2 or 3 you may get a happy balance between speed and size? :/

Compression of 3 seems to be okay for this stimulus, it takes around 50s longer, and the output filesize is now 130MB. Hopefully it will translate well for our longer stimuli as well.
image

Sadly I should have used a real movie format to do this -- a lossless one such as FFv1. But at the time integrating a real movie encoder like ffmpeg was beyond the scope of this project.

I understand! We appreciate your work nonetheless, and I would be happy to look into further development/optimisations with you.

@cculianu
Copy link
Owner

cculianu commented Jun 3, 2019

Ok, cool. Yeah if you come up with changes, do let me know. For your information: you can probably compile the mex file if you specify all the .cpp files it depends-on on the command-line .. I know it pulls in dependencies from ../../FastMovieFormat.cpp for the actual fast movie writing... it should work if you pay attention to the error messages (usually it will complain about missing symbols), and add 1 by 1 the .cpp files from the parent dir that are missing. It also depends on zlib and I think you need to specify those libs too on the command-line...

@adehad
Copy link
Author

adehad commented Jun 3, 2019

I actually tried running the .fmv on StimGL now, using the MATLAB uint8( ) type casting, the stimulus does not display properly. The entire screen flashes between black and white instead. Allowing the MEX file to handle the conversion works as expected, but at the cost of speed.

@cculianu
Copy link
Owner

cculianu commented Jun 3, 2019

Gaah! Really. That.. shouldn't be the case. Data is data. Very strange that it would do that. Are you sure uint8 casting does what it claims to? Hmm... I honestly forgot my maltlab details..

I'm sorry I don't really have time to work on this now .. I would perhaps fix up the code. It's strange to me that the data would produce bad movie files in that case.. it makes little sense...

@cculianu
Copy link
Owner

cculianu commented Jun 3, 2019

Just double check that the dimensions of the casted uint8 matrix weren't changed after the cast?

@adehad
Copy link
Author

adehad commented Jun 4, 2019

It is very strange:
image

Here we can see our black and white image matrix bw. After typecasting to uint8 (aaa) from double (bbb) we confirm that the two arrays' (i.e. frames) data are equal isequal(aaa,bbb) .
The whos command confirms the dimensions are equal.
Using the imagesc() command (e.g. imagesc(aaa)) we were able to confirm that in MATLAB the stimulus is visible in the black and white image, on both the original and typecasted frames.

I'm sorry I don't really have time to work on this now .. I would perhaps fix up the code. It's strange to me that the data would produce bad movie files in that case.. it makes little sense...

No worries, I am just posting here in case anyone else finds it useful too. If we find a solution I will post that on here too.

@adehad
Copy link
Author

adehad commented Jun 4, 2019

Just found out it was a big yikes on our side.
The frame we passed was 1 for white and 0 for black, a 'binary image'. But AddFrame expected a 255 to 0 range instead for the uint8 type. For the double and single type it actually multiplies by 255:

case mxDOUBLE_CLASS: color = ((double *)m)[y*sx + x] * 255.; break;
case mxSINGLE_CLASS: color = ((float *)m)[y*sx + x] * 255.; break;

Which is in line with the documentation for the AddFrame function, where it expects intensity values:
% Takes 2 arguments, the fasmoviewriter class and a frame, an M x
% N matrix of (0,1) intensity (color) values. Adds the

So if you are going to make a backwards compatible fix, it might be something like:

FastMovieWriterMex('addFrame', g.handle, uint8(255*frame), clevel);

instead of:

FastMovieWriterMex('addFrame', g.handle, frame, clevel);

@cculianu
Copy link
Owner

cculianu commented Jun 4, 2019

Oh wow. That should have at least been documented. GRRR. Yeah to my developer mind "of course it would be scaled to 255" -- but that fact is not obvious -- esp since the documentation is WAYY misleading. It also doesn't talk about the performance impact and how uint8 is the fastest.

I agree this needs to be updated. I'm leaving this issue open and may get to it soon.

So .. I take it now it works, right?

@adehad
Copy link
Author

adehad commented Jun 4, 2019

So .. I take it now it works, right?

Yep, it now correctly displays the stimulus after typecasting with uint8, sorry I didn't make that clear !

@cculianu
Copy link
Owner

cculianu commented Jun 4, 2019

Ok, cool. I'm leaving this issue open. I may find time soon to go in and better document the behavior and also give this code a little love.. Thanks for posting and do use this issue tracker. It's probably the best way to get a hold of me and to document things for others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Performance-related Workarounds Workarounds for bugs or issues
Projects
None yet
Development

No branches or pull requests

2 participants