Skip to content

Low performance when loading existing MKVs with MKVFile #16

@kokonguyen191

Description

@kokonguyen191

Describe the solution you'd like

I was running a bulk cleanup operation that mostly consists of checking if MKVs are uniformly formatted (1 vid track, at most 2 audio track, all subtitles are english, etc.), so the workflow was quite straightforward: just iterate through everything, call MKVFile(path), check the tracks. But I noticed my HDD usage consistently stayed at a low level, indicating the process was spending time somewhere else when it should be IO bound instead.

So a bit of profiling I did, and I found out a lot of redundant shell requests with the same paths were sent to mkvmerge. For example, each track would call pymkv.MKVTrack.MKVTrack.track_id once, which in turn would make a call to mkvmerge then parse some JSON (I'm not sure if parsing JSON might be another point that could be improved since AFAIK, JSON has pretty crappy performance, especially when you don't need the entire object). All of these mkvmerge calls would run on the same mkvmerge_path and file_path when only the first one should have been enough and should be cached.

Here's a very crude comparison using my bandage patch:

Before
loaded 10 files in 9.9s
After
loaded 10 files in 1.4s

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions