-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Describe the solution you'd like
I was running a bulk cleanup operation that mostly consists of checking if MKVs are uniformly formatted (1 vid track, at most 2 audio track, all subtitles are english, etc.), so the workflow was quite straightforward: just iterate through everything, call MKVFile(path)
, check the tracks. But I noticed my HDD usage consistently stayed at a low level, indicating the process was spending time somewhere else when it should be IO bound instead.
So a bit of profiling I did, and I found out a lot of redundant shell requests with the same paths were sent to mkvmerge
. For example, each track would call pymkv.MKVTrack.MKVTrack.track_id
once, which in turn would make a call to mkvmerge
then parse some JSON (I'm not sure if parsing JSON might be another point that could be improved since AFAIK, JSON has pretty crappy performance, especially when you don't need the entire object). All of these mkvmerge
calls would run on the same mkvmerge_path
and file_path
when only the first one should have been enough and should be cached.
Here's a very crude comparison using my bandage patch:
Before
loaded 10 files in 9.9s
After
loaded 10 files in 1.4s