Skip to content

Script to crawl a musical library folder and spot the audio files that do not match the naming convention, according to their metadata among other tests.

License

Notifications You must be signed in to change notification settings

ArthurBeaulieu/OstrichRemover

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OstrichRemover

Like your audio files to be correctly tagged ? OstrichRemover might help you !

This script is specially made to test a musical folder against the ManaZeak naming convention. It is highly recommended to read the naming convention before going any further. Based on this convention, OstrichRemover was designed to work in four fields: verifying, pre-tagging, analyzing and generating stats on your musical library.

For any concerns, you can check the wiki, however, if you encounter any problem that is undocumented, please contact support@manazeak.org

Get started

This script uses Python3, and requires mutagen, Pillow (that replaced Python Image Library) and PyICU. Please ensure these are installed on your system, otherwise, with pip installed, run the command:

# pip install -r requierements.txt

When all requierements are installed, you can launch OstrichRemover in four main modes, and one additional command :

Scan mode (-s or --scan)

Available options :

  • -d or --dump to dump a JSON report in the ./dump folder ;
  • -p or --path to specify the output path to dump the JSON report in ;
  • -m or --minify to minify the JSON output ;
  • -v or --verbose for a verbose output.

The script will crawl the folder you gave as an argument and will report you any error it found in your file naming / tagging. If specified with a -d of --dump flag, errors can be outputed in a JSON file, to be further reviewed in the web-report/index.html file (just drag and drop the json file in the input area). OstrichRemover can detect 42 errors per file (so far). Those errors are grouped in five categories that are detailed in the wiki, respectively:

  • Category 1 – File system naming inconsistencies ;
  • Category 2 – File system naming against ID3 tags ;
  • Category 3 – ID3 tags inconsistencies ;
  • Category 4 – Tags coherence with against album analysis ;
  • Category 5 – Miscelaneous errors.

Before running the script in scan mode, you must ensure that the folder you are about to test matches the ManaZeak tree structure and naming convention, since other folder structure may results in a biased result. Then in your cloned repository, run:

$ python ./OstrichRemover.py -s ./path/to/library/folder/

The script computes a purity grade, that takes into account the total number of possible errors per track and the actual number of errors.

Fill mode (-f or --fill)

Available options :

  • -v or --verbose for a verbose output ;
  • -e or --errors to only display errors that occurred.

The script will also crawl the folder you gave as an argument, but this time it will fill the file tags, using the filename. This script usage assumes that you have already properly named the file in the tested folder. According to the ManaZeak naming convention, it will automatically fill the following tags:

  • title ;
  • artist ;
  • album title ;
  • album artist ;
  • year ;
  • performer ;
  • track number ;
  • total track ;
  • disc number ;
  • total disc ;
  • cover.

OstrichRemover will may be able to fill the other following tags if a given condition is met :

  • Label if the publisher tags was previously set.

In any case, if the filled track name doesn't fit the convention, it will be not tested. To perform a full scan over a given folder, run:

$ python ./OstrichRemover.py -f ./path/to/library/folder/

Analyze mode (-a or --analyze)

Available options :

  • -d or --dump to dump a JSON report in the ./dump folder ;
  • -p or --path to specify the output path to dump the JSON report in ;
  • -m or --minify to minify the JSON output.

The analysis mode is made to use any JSON dumps generated with the -sd or --scan --dump command. Just provide the path where all your dumps resides and let the script generates you a meta analysis of them. The main goal of this command is to prepare data to be displayed in a graph (hello d3js). To do so, run (add -d or --dump to generate the JSON report) :

$ python ./OstrichRemover.py -a ./path/to/json/dumps/

Stat scan mode (-t or --stat)

Available options :

  • -d or --dump to dump a JSON report in the ./dump folder ;
  • -p or --path to specify the output path to dump the JSON report in ;
  • -m or --minify to minify the JSON output.

This mode will crawl the audio library, and collect stats about the number of unique artists (in all artist, performer, composer and producer fields), as well as for genres and labels. This way it offers a sorted list them so you can identify typos and such. To do so, run (add -d or --dump to generate the JSON report) :

$ python ./OstrichRemover.py -t ./path/to/library/folder/

JSON generator mode (-g or --gen)

Available options :

  • -d or --dump to dump a JSON report in the ./dump folder ;
  • -p or --path to specify the output path to dump the JSON report in.

This mode will crawl the audio library, and generate a JSON file for artists, genres and labels it came accross (in all artist, performer, composer and producer fields). The generated JSON matches the advanced naming convention of ManaZeak and can therefor be provided as-is to ManaZeak web application. To do so, run (add -d or --dump to generate the JSON report) :

$ python ./OstrichRemover.py -gd ./path/to/library/folder/ -p ./path/to/output/

Additional commands

Clean mode (-c or --clean)

The script will crawl the folder you gave as an argument, to clean all existing track metadata. It is mainly crafted to prepare tracks to be filled later on, to avoid ambiguous tags to remain (for example TOTALTRACK, TOTALTRACKS, TRACKTOTAL...). Warning, this command will remove all tags in files with no remorses at all, use with caution and self-awarness. To do so, run:

$ python ./OstrichRemover.py -c ./path/to/library/folder/


Features

v1.0
  • Complete scanning process and error check
  • Verbose output (display track errors as a tree after scan)
  • JSON dump (as a -d option)
  • Basic web view for JSON dumps
v2.0
  • Fill tag from filename mode
  • Clean tags of given folder
  • Specific errors (cover size, album artist field, genre, producer, bpm)
  • Meta analyzer for scan dumps
  • Update web report to support meta analysis dumps
  • Web improvement (local storage, error handling, full responsive)
  • Library statistic mode
  • JSON generator for extended naming convention
  • Quality assurance

v3.0

  • Service usage to make ostrich call an output point when done
  • Packaging in pip (PyPi)
  • ManaZeak integration as a plugin