Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progress regarding refactor #57

Closed
Hellowlol opened this issue Jun 29, 2018 · 19 comments
Closed

Progress regarding refactor #57

Hellowlol opened this issue Jun 29, 2018 · 19 comments

Comments

@Hellowlol
Copy link

Hellowlol commented Jun 29, 2018

I couldnt notice there was done anything refactoring lately, it that still underway? Or has it stopped? I have some ideas on how we can make this more modular. Atm it seems like the library is written to support the cli.

@Hellowlol Hellowlol changed the title Progress to refactor Progress regarding refactor Jun 29, 2018
@Breakthrough
Copy link
Owner

Breakthrough commented Jun 29, 2018

Hi @Hellowlol;

Thanks for the comment. Indeed I realized the direction I was taking the project in v0.4 was primarily focused on the CLI as I was working on more high-level features at the time. Indeed I am planning on reworking the structure of the entire project to make it more appropriate for extension.

My plan is to release v0.5 with an updated Python API by the end of July (I have not had time to work on the project all year until this week, coincidentally). That being said, now would be a great time for any suggestions, and I am most open to any ideas you might have that would improve the modularity of the project.

My first goal is to separate all CLI logic outside of the SceneManager, create a separate StatisticsManager, and generate thumbnails in a separate pass (greatly simplifying the required logic for certain scene detection algorithms). Another class I realized is missing for the Python API is some kind of video I/O wrapper (e.g. VideoAnalyzer class) that applies detection algorithms on the frames of a video, storing the results in either one or both of a SceneManager or StatisticsManager.

Thank you.

@Hellowlol
Copy link
Author

Hellowlol commented Jun 29, 2018

Im on holly day without a computer so this might be a bit brife, I’ll fix up it later.

For the cli I would add subcommands with options

Fx: scenedetect -d (the rest of manager params and output options) detector1 -t 10 (rest of params for that detector) detector2 -options

Most should be moved inside manager. Cap should be a be a kw so can use our own (reimplemented using threads) or just default to cap. See https://www.pyimagesearch.com/2017/02/06/faster-video-file-fps-with-cv2-videocapture-and-opencv/ for a example.

Add a method that adds a a detector to the detector list

Move all args parsing outside the classes. This should be converted to keyword arguments

I’ll add some code examples when I get home

@Breakthrough
Copy link
Owner

Breakthrough commented Jul 2, 2018

Made some major progress to implementing a multiprocessing-based video decoder. One thing I'm having trouble with here is using the multiprocessing module to pass images via a queue to the main process. I have successfully put several images into the queue, but on the decoding process side, it raises a queue.Empty error infrequently, even if there are other objects in the queue. Other times, it blocks indefinitely, not sure why this occurring. I thought it was due to the decoder process quitting early, but that doesn't seem to have any effect. Any thoughts?

It seems that the frames get put into the queue properly, just the main process can't seem to get them out of the queue... It only seems like this issue happens when the frame queue is a certain size. Under that amount, it works fine (i.e. only queuing 1 frame at a time works fine, any more than that and I start getting queue.Empty errors when calling get_nowait(), queue.empty() returns true, but queue.qsize() is always > 0). Copying each frame explicitly also has no effect, so not sure why this issue is happening.

For now I'll just stick with a max queue/buffer of 1 frame, but would still like to understand why this is happening. Definitely running into some throughput issues as well, this method seems much slower for some reason (unit testing all methods from scratch).

Edit: Seem to have solved the queuing issue by using an end-of-stream sentinel and blocking waits for get. Still noticeably slower than the one-process approach for some reason though, maybe will be quicker for larger numbers of frames...

@Breakthrough
Copy link
Owner

Breakthrough commented Jul 3, 2018

One other strange issue is with process termination. Everything works fine if I terminate the main loop while it's working, but if I put in a short sleep statement, then the join() method never works (even if I verify that the return statement is executed with a print statement). Will try to upload it later, there's a test case that demonstrates the issue. May also be due to me using a super old version of Python, will try updating that as well to see if that might be causing the issue.

TL,DR: It seems like join() breaks for a Process object if the process function target returns a few seconds before calling join(), but not immediately after it exits?

Edit: Alright, have everything working as expected, just not getting the speed gains as expected. Single-threaded version always beats the multiprocessing version by about 1 second or so (so not really slower, but not faster either...).

@Hellowlol
Copy link
Author

There was some comments on the blog about py3 being slow and the need for a lock. Have you posted your code anywhere?

@Breakthrough
Copy link
Owner

Breakthrough commented Jul 3, 2018

Hey @Hellowlol;

I've attached to this comment a copy of the development version I'm working on (sorry I still need to push it to Git in a separate branch, as it breaks everything). Started from scratch on a lot of stuff. As per your suggestion, that I've made it now so in the scene manager, you can pass an OpenCV VideoCapture object, a PySceneDetect VideoManager, or VideoManagerAsync object.

In the test_scene_manger.py file is where the issues lie, the VideoManager is fine, however VideoManagerAsync seems to have some issues with throughput (I thought there might have been some bottlenecking, but decreasing the queue size increases runtime... With a queue size of 512 frames / max 4GB of RAM, it runs as fast as the synchronous version, maybe 1 second slower due to the decoder process startup time). I added in ContentDetectorNew temporarily just to experiment with increasing the workload to see if that helped, but that doesn't seem to affect anything. Interestingly enough, I tried switching back to the threading module to see if that helped, but if anything, that version was slower than the multiprocessing version when I added in a workload.

There's only unit tests, run with python -m unittest discover -v. Specifically, check out the test_content_detect_asynchronous test case in test_scene_manager.py. All the async stuff is in video_manager.py. For some reason, whenever I add a delay long enough for the video manager process to queue all frames in the video (2 seconds is usually long enough since I'm only queuing the first 10 seconds for the test case), the video manager process becomes unjoinable from the main process (see the VideoManagerAsync's stop() method), and I'm unable to obtain the return code (even though I know the process thread_func has returned a value).

Interestingly enough, without the delay, this issue doesn't occur, and the process is joinable, and I see the return code in the main process. I've gone through the code many times and can't seem to find any errors in logic, even forcing the process to loop on the event flag before returning doesn't seem to help solve the terminate() issue, so not sure what is causing this. Perhaps a fresh set of eyes will help.

Didn't know about the py3 thing being slow, maybe I should test this on Python 2, but not sure if it's worth supporting v2.x moving forwards (the v0.5-rewrite attached to this post has no real provisions for Python 2 yet outside of the print statement, need to add something for handling the Queue->queue rename at runtime in the scenedetect.platform module).


Attachments:
PySceneDetect-v0.5-dev.zip

@Hellowlol
Copy link
Author

Hellowlol commented Jul 3, 2018

I added a lock and tested test_content_detect_asynchronous and it worked. But the memory usage was 15 times higher. I didnt notice any speed increase but the laptop im using now so dog slow so i didnt except anything to be faster.

@Breakthrough
Copy link
Owner

Breakthrough commented Jul 3, 2018

Wow that is a significant increase indeed, thank you for looking into the matter. Where did you add the lock, and around what object, the queue? If you have a link to the comments you mentioned before, wouldn't mind having a look (or feel free to upload the modified code).

I'm starting to wonder if there's a logic error with the way I'm handling this, since this should theoretically be faster using multiprocessing, although perhaps the overhead of doing I/O in a separate process outweighs the performance gains?

Regardless, I'm going to just go forwards in the meantime with cleaning up the VideoManager class, maybe moving the Async version into a separate module, and continue going forwards with reworking the SceneManager (taking into account the changes you mentioned), the SceneDetectors, and add a StatsticsManager key/value store for frame metrics rather than relying on a passed frame_metrics dict.

Other than that, any comments in regards to how the v0.5 API is shaping up? I'm hoping focusing on test-driven development for this release will help create an environment where the API is much more stable and focused on Python-first rather than CLI-first, and the unit tests will also provide some implicit examples on how to use the API and handle errors gracefully.

@Hellowlol
Copy link
Author

Hellowlol commented Jul 3, 2018

I created a new file called lock.py with a single variable called LOCK = RLock()

i imported that in video_manager and added that to line 523
and added the lock to scene_manager to line 92.

Ill do some more testing later. The comments are in the blog post i linked to you. Just scoll down ;)

Consider using pytest as a testing framework. Its 10 times easier to work/reason with.

@Hellowlol
Copy link
Author

Also the results from the scene_manager should be a class or a duct so it’s possible to know what detector found what

@Breakthrough
Copy link
Owner

Breakthrough commented Jul 3, 2018

@Hellowlol After calling detect_scenes(...) my plan was for it to return the # of scenes detected, and, have you obtain the results by calling get_scene_list() - are you thinking that the actual detect_scenes(...) function should just return the scene list instead?

Since the scene numbers will always be contiguous, was thinking it should just return a list of either FrameTimecodes of the scene start frames, or tuples of (start_frame, end_frame). Can have it return a dict as well though, where the keys are the integer scene numbers (might work better since then can call the first scene Scene 1?).

I'll also look into the idea of returning a dict of SceneResult objects as you suggested, which the more I think about it, the more it makes sense, since then the SceneResult object can contain the scene's start timecode, end timecode, duration, and optionally, the actual first/last frames of the scene for thumbnail generation purposes.

Also after going through this page on pytest, will definitely be looking into that, looks super useful for testing the CLI and what-not.

@Breakthrough
Copy link
Owner

Also, what is your opinion on supporting Python 2 versus 3? What do you use, and what do you think the project should support officially?

One thing I'd like to use is type hints, but that's thrown a wrench into my testing of the code using Py2.7 (was going to see if it helped the performance). One way around it is to use comment-like type hints, but was wondering if you had dealt with this issue before.

It's basically between just using commented type hints, which is extra maintenance cost IMHO, versus official PEP 484 type hints, and losing compatibility with Python 2.x...

@Hellowlol
Copy link
Author

I don’t think the type hits make the code more readable so I don’t use it. I support py2 and py3 for all my project but I gonna rip it out when eol for py 2 is reached. So that’s up to you.

I think it should return the class and use get_scene_list() but return dict or a scene result or some kind. The user or the cli tool can concat all the scenes.

@Breakthrough
Copy link
Owner

Breakthrough commented Jul 4, 2018

Gotcha, thanks for your input. Decided to go with just using comments for type hints to get it working with Python 2. Did get it up and running again on Py2.7, but no change in runtime behaviour - synchronous version still seems to be faster on both Python 2 and 3.

Results for 20 second run of 720p video using multiprocessing:

  • Python 2.7: Synchronous = 17.4 sec, Asynchronous = 18.1 sec
  • Python 3.6: Synchronous = 14.6 sec, Asynchronous = 15.5 sec

And using threading and Thread instead of multiprocessing and Process:

  • Python 2.7: Synchronous = 17.4 sec, Asynchronous = 16.5 sec
  • Python 3.6: Synchronous = 14.6 sec, Asynchronous = 17.2 sec

Very interesting that only using threading and Python 2 is the threaded version faster, and not by very much either... Guess I'll stick to multiprocessing, interesting how much faster Python 3 runs than Python 2 for a lot of stuff though.

To be honest this almost makes me want to rewrite the VideoManager part as a C++11/14 lib that exports some C functions to let it spawn a thread that does the decoding, and lets you pass a numpy array to be filled with a frame in a C function called with ctypes... That way there's no way the GIL would get in the way as the video decoding stuff would all be C++ and in a separate thread. Might just be idle speculation though, if I get some time I'll write up a prototype to benchmark and compare (will probably use std::thread to launch the threads and OpenCV to do the Video I/O, and pass the frame size to Python so you can alloc a Numpy array buffer of the frame size to pass to a ctypes function that just needs to memcpy the frame into it).

@Breakthrough
Copy link
Owner

Breakthrough commented Jul 5, 2018

Also, preview of the direction I'm going with the new sub-commands:

scenedetect input -i video.mp4 -s video.stats.csv time --duration 10s detect-content -t 32 detect-threshold -t 20 split --precise output --directory output_dir/

So there's four basic subcommands input, time, split, and output for setting options in regards to those things. Then there will be a detect-* for all the detectors and their options, so for now, detect-threshold and detect-content.

A list of all the subcommands is given by scenedetect -h, and the args for each subcommand can be viewed by doing scenedetect [subcommand] -h.

@Breakthrough
Copy link
Owner

Breakthrough commented Jul 7, 2018

Progress on the v0.5 release has now been pushed to a separate branch now, v0.5-dev ( https://github.com/Breakthrough/PySceneDetect/tree/v0.5-dev ) for those that wish to monitor progress and provide any feedback on the direction things are going. I will keep this issue open until v0.5 is complete, at which time it will be closed to signal the release.

Also I realized the issue with the multiprocessing thing after doing some benchmarking - it seems like the decoding time is just so negligible compared to the actual processing of the frame that it doesn't make sense to do that in parallel. I think instead I should focus on parallelization of the detection algorithms themselves, or perhaps allow a way for just the detectors to be written in C++, with Python doing all the video I/O and passing pointers to frames to the C++ detectors via ctypes.

@Breakthrough
Copy link
Owner

Breakthrough commented Jul 7, 2018

Closing issue as duplicate of #27 - Python interface broken in v0.4. Discussion may be continued there.

@Breakthrough
Copy link
Owner

Breakthrough commented Jul 22, 2018

Hey @Hellowlol;

Just an update for you, the v0.5 branch should now be stable enough for people to start testing with. The only features remaining to be implemented is the re-addition of video splitting support, and generation of start/end frame images for each detected scene. Also added a progress bar if you have the tqdm library installed.

Other than that, all other features (stats file, scene list, content/threshold algos) are all up and running with the newer and much cleaner CLI interface and, finally, a proper and much-improved API. Run scenedetect.py help to get a sample of how the new CLI functions, and check out api_test.py for a sample of the new Python API.

Thanks again for your feedback and input, am looking forwards to any other comments you might have before preparing the v0.5 release.

@Hellowlol
Copy link
Author

I havnt had time to play with it, but i think it would be nice to have some examples on how to use the api on the readme. Currently any users would need to check the source code to figure out how to use it.

Ill do some testing and see if can see any other thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants