Skip to content

Automated normalization and curating of media collections

License

Notifications You must be signed in to change notification settings

AlexAltea/curator

Repository files navigation

Curator

ci-badge

Automated normalization and curating of media collections. Written in Python 3.x.

Curator is a collection of stateless CLI tools, following the Unix philosophy, to organize large collections of heterogeneous media. Each tool creates a plan made of tasks with clearly defined input and output files, which the user can optionally review before applying.

Install the package via:

pip install git+https://github.com/AlexAltea/curator.git

Credits

Acknowledgements to people who contributed code/ideas to the project:

Features

Curator can automatically rename and link media files, edit container metadata, remux and merge streams. Reducing manual labor and achieve reliable results across different media from potentially different sources, some tools rely on signal processing and machine learning (e.g. Whisper, LangID).

Highlighted use cases (current and planned):

Below you can find a description and examples of all tools provided by Curator:

Auto

flowchart LR
    Convert --> Merge --> Sync --> Tag --> Rename
Loading

Merge

Merges all streams with identical names into a single container, except for:

  • Video streams, if one already exists.
  • Audio streams, if one with the same language tag already exists.

Requires all video containers to be MKV.

example-curator-merge

Rename

Update filenames according to a pattern made of the following variables:

Key Description
@ext File extension of the input media.
@dbid When using a database, the ID of the match, e.g. imdbid-tt12345678.
@name Localized name of the media.
@oname Original name of the media (needs database).
@tags Tags present in the input media filename enclosed by square brackets, if any.
@year Year the media was released.

example-curator-rename

Sync

Synchronize streams via data cross-correlation.

Every synchronization task involves (A) a reference stream, and (B) the stream we want to synchronize. We name this relationship as A ← B. Curator can only handle the following types of synchronization tasks:

  • Video ← Audio:
    Comparing lip movement timestamps with ASR timestamps.
  • Audio ← Audio:
    Comparing sound data.
  • Audio ← Subtitle:
    Comparing ASR timestamps with uniquely matching text timestamps.
  • Subtitle ← Subtitle:
    Comparing text timestamps.

The synchronization plan (SyncPlan) will create a tree of synchronization tasks (SyncTask) for every media file it processes. For example, with an input Media("movie.mkv") with streams: #0 (video), #1 (audio:eng), #2 (audio:spa), #3 (subtitle:eng), #4 (subtitle:spa), it will genarate the following sync proposals:

  1. #0#1
  2. #1#2
  3. #1#3
  4. #3#4

Tag

example-curator-tag

About

Automated normalization and curating of media collections

Resources

License

Stars

Watchers

Forks

Packages

No packages published