Skip to content

Commit

Permalink
Merge pull request #11 from adamjakab/devel
Browse files Browse the repository at this point in the history
Merging low and high level extraction
  • Loading branch information
adamjakab committed Apr 4, 2021
2 parents 2f67dda + 0cc493d commit 06bc019
Show file tree
Hide file tree
Showing 32 changed files with 121 additions and 125 deletions.
Empty file modified .github/ISSUE_TEMPLATE/bug-report.md
100644 → 100755
Empty file.
Empty file modified .github/ISSUE_TEMPLATE/feature-request.md
100644 → 100755
Empty file.
Empty file modified .github/workflows/pythonpublish.yml
100644 → 100755
Empty file.
1 change: 1 addition & 0 deletions .gitignore
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
tmp/
coverage/
BEETSDIR/*
Data/*


### Python template
Expand Down
Empty file modified .idea/.gitignore
100644 → 100755
Empty file.
4 changes: 3 additions & 1 deletion .idea/BeetsPluginXtractor.iml
100644 → 100755

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Empty file modified .idea/codeStyles/codeStyleConfig.xml
100644 → 100755
Empty file.
Empty file modified .idea/copyright/JACK.xml
100644 → 100755
Empty file.
Empty file modified .idea/copyright/profiles_settings.xml
100644 → 100755
Empty file.
Empty file modified .idea/inspectionProfiles/profiles_settings.xml
100644 → 100755
Empty file.
Empty file modified .idea/misc.xml
100644 → 100755
Empty file.
Empty file modified .idea/modules.xml
100644 → 100755
Empty file.
Empty file modified .idea/other.xml
100644 → 100755
Empty file.
Empty file modified .idea/rSettings.xml
100644 → 100755
Empty file.
Empty file modified .idea/vcs.xml
100644 → 100755
Empty file.
Empty file modified .travis.yml
100644 → 100755
Empty file.
Empty file modified LICENSE.txt
100644 → 100755
Empty file.
Empty file modified MANIFEST.in
100644 → 100755
Empty file.
52 changes: 34 additions & 18 deletions README.md
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,14 @@

# Xtractor (Beets Plugin)

The *beets-xtractor* plugin lets you, through the use of the [Essentia](https://essentia.upf.edu/index.html) extractors, to obtain low and high level musical information about your songs.

Currently, the following attributes are extracted for each library item: `average_loudness`, `bpm`, `danceable`, `gender`, `genre_rosamerica`, `voice_instrumental`, `mood_acoustic`, `mood_aggressive`, `mood_electronic`, `mood_happy`, `mood_party`, `mood_relaxed`, `mood_sad` (some more to come soon)
The *beets-xtractor* plugin lets you, through the use of the [Essentia](https://essentia.upf.edu/index.html) extractors,
to obtain low and high level musical information from your songs.

Currently, the following attributes are extracted for each library item:
`bpm`, `danceability`, `beats_count`, `average_loudness`, `danceable`, `gender`, `is_male`, `is_female`,
`genre_rosamerica`, `voice_instrumental`, `is_voice`, `is_instrumental`, `mood_acoustic`,
`mood_aggressive`, `mood_electronic`, `mood_happy`, `mood_sad`, `mood_party`, `mood_relaxed`, `mood_mirex`,
`mood_mirex_cluster_1`, `mood_mirex_cluster_2`, `mood_mirex_cluster_3`, `mood_mirex_cluster_4`, `mood_mirex_cluster_5`

## Installation
The plugin can be installed via:
Expand All @@ -25,15 +29,21 @@ plugins:
```

### Install the Essentia extractors
You will also need the two binary extractors from the [Essentia project](#credits). They are called:

- streaming_extractor_music
- streaming_extractor_music_svm
You will also need the `streaming_extractor_music` binary extractor from the [Essentia project](#credits). You will need
to compile this extractor yourself.
The [official installation documentation](https://essentia.upf.edu/installing.html#compiling-essentia-from-source)
is somewhat complex but with some cross searching on the internet you will make it. If you are stuck you can use
the [Issue tracker](https://github.com/adamjakab/BeetsPluginXtractor/issues). Make sure you compile it with Gaia
support (`--with-gaia`) otherwise will not be able to use the high level models.

Unfortunately, only the first extractor is readily available for download whilst to have the second one you will need to compile it yourself. The [official installation documentation](https://essentia.upf.edu/installing.html) is somewhat complex but with some cross searching on internet you will make it. If you are stuck you can use the [Issue tracker](https://github.com/adamjakab/BeetsPluginXtractor/issues). Make sure you compile with Gaia support (`--with-gaia`) otherwise your second `streaming_extractor_music_svm` will not be built.

### Download the SVM models
The second extractor uses prebuilt trained models for prediction. You need to download these from here: [SVM Models](https://essentia.upf.edu/svm_models/) I suggest that you download the more recent beta5 version. This means that your binaries must match this version. Put the downloaded models in any folder from which they can be accessed.

The second extractor uses prebuilt trained models for prediction. You need to download these from
here: [SVM Models](https://essentia.upf.edu/svm_models/). I suggest that you download the more recent beta5 version.
This means that your binaries must match this version. Put the downloaded models in any folder from which they can be
accessed.


## Configuration
Expand All @@ -47,13 +57,11 @@ xtractor:
threads: 1
force: no
quiet: no
items_per_run: 0
keep_output: yes
keep_profiles: no
output_path: /mnt/data/xtraction_data
low_level_extractor: /mnt/data/extractors/beta5/streaming_extractor_music
high_level_extractor: /mnt/data/extractors/beta5/streaming_extractor_music_svm
high_level_profile:
essentia_extractor: /mnt/data/extractors/beta5/streaming_extractor_music
extractor_profile:
highlevel:
svm_models:
- /mnt/data/extractors/beta5/svm_models/danceability.history
Expand All @@ -63,17 +71,25 @@ xtractor:
- /mnt/data/extractors/beta5/svm_models/mood_aggressive.history
- /mnt/data/extractors/beta5/svm_models/mood_electronic.history
- /mnt/data/extractors/beta5/svm_models/mood_happy.history
- /mnt/data/extractors/beta5/svm_models/mood_sad.history
- /mnt/data/extractors/beta5/svm_models/mood_party.history
- /mnt/data/extractors/beta5/svm_models/mood_relaxed.history
- /mnt/data/extractors/beta5/svm_models/mood_sad.history
- /mnt/data/extractors/beta5/svm_models/voice_instrumental.history
- /mnt/data/extractors/beta5/svm_models/moods_mirex.history
```

First of all, you will need adjust all paths. Put the paths of the extractor binaries in `low_level_extractor`and `high_level_extractor`, substitute the location of the SVM models with your local path under the `svm_models` desction. And finally, set the `output_path` to indicate where the extracted data files will be stored. I you do not set this, a temporary path will be used.

By default both `keep_output` and `keep_profile` options are set to `no`. This means that after extraction (and the storage of the important information) the profile files used to pass to the extractors and the json files created by the extractors will be deleted. There are various reasons you might want to keep these files. One is for debugging purposes. Another is to see what else is in these files (there is a lot) and maybe to use them with some other projects of yours. Lastly, you might want to keep these because the plugin only extracts data if these files are not present. If you store them, on a successive extraction, the plugin will skip the extraction and use these files (they are named by `mb_trackid`) - speeding up the process a lot.

The `items_per_run` set to 0 will execute on all items. If you want to limit the number of items per execution (maybe because you want to run a nightly cron job in a limited timeframe) you can use this.
First of all, you will need adjust all paths. Put the path of the extractor binary in `essentia_extractor` and
substitute the location of the SVM models with your local path under the `svm_models` section. Finally, set
the `output_path` to indicate where the extracted data files will be stored. If you do not set this, a temporary path
will be used.

By default both `keep_output` and `keep_profile` options are set to `no`. This means that after extraction (and the
storage of the important information) the profile files used to pass to the extractors, and the json files created by
the extractors will be deleted. There are various reasons you might want to keep these files. One is for debugging
purposes. Another is to see what else is in these files (there is a lot) and maybe to use them with some other projects
of yours. Lastly, you might want to keep these because the plugin only extracts data if these files are not present. If
you store them, on a successive extraction, the plugin will skip the extraction and use these files (they are named
by `mb_trackid`) - speeding up the process a lot.

The `force` option instructs the plugin to execute on items which already have the required properties.

Expand Down
Empty file modified beetsplug/__init__.py
100644 → 100755
Empty file.
12 changes: 11 additions & 1 deletion beetsplug/xtractor/__init__.py
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@

from beets.plugins import BeetsPlugin
from beets.util.confit import ConfigSource, load_yaml

from beetsplug.xtractor.command import XtractorCommand


Expand All @@ -21,5 +20,16 @@ def __init__(self):
source = ConfigSource(load_yaml(config_file_path) or {}, config_file_path)
self.config.add(source)

# @todo: activate this to store the attributes in media files
# field = mediafile.MediaField(
# mediafile.MP3DescStorageStyle(u'danceability'), mediafile.StorageStyle(u'danceability')
# )
# self.add_media_field('danceability', field)
#
# field = mediafile.MediaField(
# mediafile.MP3DescStorageStyle(u'beats_count'), mediafile.StorageStyle(u'beats_count')
# )
# self.add_media_field('beats_count', field)

def commands(self):
return [XtractorCommand(self.config)]
4 changes: 2 additions & 2 deletions beetsplug/xtractor/about.py
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@
__email__ = u'adam@jakab.pro'
__copyright__ = u'Copyright (c) 2020, {} <{}>'.format(__author__, __email__)
__license__ = u'License :: OSI Approved :: MIT License'
__version__ = u'0.2.3'
__status__ = u'Kickstarted'
__version__ = u'0.3.0'
__status__ = u'Building'

__PACKAGE_TITLE__ = u'Xtractor'
__PACKAGE_NAME__ = u'beets-xtractor'
Expand Down
105 changes: 32 additions & 73 deletions beetsplug/xtractor/command.py
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,6 @@ class XtractorCommand(Subcommand):
cfg_threads = 1
cfg_force = False
cfg_quiet = False
cfg_items_per_run = 0

def __init__(self, config):
self.config = config
Expand All @@ -47,7 +46,6 @@ def __init__(self, config):
self.cfg_version = False
self.cfg_count_only = False
self.cfg_quiet = cfg.get("quiet")
self.cfg_items_per_run = cfg.get("items_per_run")

self.parser = OptionParser(
usage='beet {plg} [options] [QUERY...]'.format(
Expand Down Expand Up @@ -131,24 +129,18 @@ def func(self, lib: Library, options, arguments):

def xtract(self):
self.find_items_to_analyse()
self._say("Number of items to be processed: {}".format(len(self.items_to_analyse)))
self._say("Number of items to be processed: {}".format(len(self.items_to_analyse)), False)

# Count only and exit
if self.cfg_count_only:
return

# Limit the number of items per run (0 means no limit)
if self.cfg_items_per_run != 0:
self.items_to_analyse = self.items_to_analyse[:self.cfg_items_per_run]
self._say("Number of items selected: {}".format(len(self.items_to_analyse)))

# Run tasks on selected items
self._execute_on_each_items(self.items_to_analyse, self.run_full_analysis)

# Delete profiles (if config wants)
if self.config["keep_profiles"].exists() and not self.config["keep_profiles"].get():
os.unlink(self._get_extractor_profile_path("low"))
os.unlink(self._get_extractor_profile_path("high"))
os.unlink(self._get_extractor_profile_path())

def find_items_to_analyse(self):
# Parse the incoming query
Expand Down Expand Up @@ -180,16 +172,12 @@ def find_items_to_analyse(self):
return

def run_full_analysis(self, item):
self._run_analysis_low_level(item)
self._run_analysis_high_level(item)
self._run_write_to_item(item)
self._run_analysis(item)
# self._run_write_to_item(item)

# Delete output files (if config wants)
if self.config["keep_output"].exists() and not self.config["keep_output"].get():
output_path = self._get_output_path_for_item(item, suffix="low")
if os.path.isfile(output_path):
os.unlink(output_path)
output_path = self._get_output_path_for_item(item, suffix="high")
output_path = self._get_output_path_for_item(item)
if os.path.isfile(output_path):
os.unlink(output_path)

Expand All @@ -198,12 +186,12 @@ def _run_write_to_item(self, item):
if self.cfg_write:
item.try_write()

def _run_analysis_high_level(self, item):
def _run_analysis(self, item):
try:
extractor_path = self._get_extractor_path(level="high")
input_path = self._get_output_path_for_item(item, suffix="low")
output_path = self._get_output_path_for_item(item, suffix="high")
profile_path = self._get_extractor_profile_path(level="high")
extractor_path = self._get_extractor_path()
input_path = self._get_input_path_for_item(item)
output_path = self._get_output_path_for_item(item)
profile_path = self._get_extractor_profile_path()
except ValueError as e:
self._say("Value error: {0}".format(e))
return
Expand All @@ -214,50 +202,28 @@ def _run_analysis_high_level(self, item):
self._say("File not found error: {0}".format(e))
return

self._say("Running high-level analysis: {0}".format(input_path))
self._say("Running analysis for: {0}".format(input_path))
self._run_essentia_extractor(extractor_path, input_path, output_path, profile_path)

# Extract low level targets
try:
target_map = self.config["high_level_targets"]
audiodata = helper.extract_from_output(output_path, target_map)
self._say("Audiodata(High): {}".format(audiodata))
audiodata_low = helper.extract_from_output(output_path, self.config["low_level_targets"])
except FileNotFoundError as e:
self._say("File not found: {0}".format(e))
return

if not self.cfg_dry_run:
for attr in audiodata.keys():
if audiodata.get(attr):
setattr(item, attr, audiodata.get(attr))
item.store()

def _run_analysis_low_level(self, item):
# Extract high level targets
try:
extractor_path = self._get_extractor_path(level="low")
input_path = self._get_input_path_for_item(item)
output_path = self._get_output_path_for_item(item, suffix="low")
profile_path = self._get_extractor_profile_path(level="low")
except ValueError as e:
self._say("Value error: {0}".format(e))
return
except KeyError as e:
self._say("Configuration error: {0}".format(e))
return
except FileNotFoundError as e:
self._say("File not found error: {0}".format(e))
return

self._say("Running low-level analysis: {0}".format(input_path))
self._run_essentia_extractor(extractor_path, input_path, output_path, profile_path)

try:
target_map = self.config["low_level_targets"]
audiodata = helper.extract_from_output(output_path, target_map)
self._say("Audiodata(Low): {}".format(audiodata))
audiodata_high = helper.extract_from_output(output_path, self.config["high_level_targets"])
except FileNotFoundError as e:
self._say("File not found: {0}".format(e))
return

# Merge audio data
audiodata = {**audiodata_low, **audiodata_high}
self._say("Audiodata: {}".format(audiodata))

# Update and Store Item
if not self.cfg_dry_run:
for attr in audiodata.keys():
if audiodata.get(attr):
Expand All @@ -280,10 +246,10 @@ def _run_essentia_extractor(self, extractor_path, input_path, output_path, profi

self._say("The process exited with code: {0}".format(proc.returncode))
self._say("Process stdout: {0}".format(stdout.decode()))
self._say("Process stderr: {0}".format(stderr.decode()))
self._say("Process stderr: {0}\n".format(stderr.decode()))

# Make sure file is encoded correctly (sometimes media files have
# funky tags)
# Make sure file is encoded correctly
# Sometimes media files have funky tags
helper.asciify_file_content(output_path)

def _execute_on_each_items(self, items, func):
Expand All @@ -294,16 +260,15 @@ def _execute_on_each_items(self, items, func):
finished += 1
# todo: show a progress bar (--progress-only option)

def _get_output_path_for_item(self, item: Item, suffix=""):
def _get_output_path_for_item(self, item: Item):
identifier = item.get("mb_trackid")
if not identifier:
input_path = self._get_input_path_for_item(item)
identifier = hashlib.md5(input_path.encode('utf-8')).hexdigest()

output_file = "{id}{sfx}{ext}".format(
output_file = "{id}.{ext}".format(
id=identifier,
sfx=".{}".format(suffix) if suffix else "",
ext=".json"
ext="json"
)

return os.path.join(self._get_extraction_output_path(), output_file)
Expand Down Expand Up @@ -331,12 +296,9 @@ def _get_extraction_output_path(self):

return output_path

def _get_extractor_profile_path(self, level):
if level not in ("low", "high"):
raise ValueError("Profile level must be either 'low' or 'high'. Given: {}".format(level))

profile_key = "{}_level_profile".format(level)
profile_filename = "{}.yml".format(profile_key)
def _get_extractor_profile_path(self):
profile_key = "extractor_profile"
profile_filename = "profile.yml"
profile_path = os.path.join(self._get_extraction_output_path(), profile_filename)

if not os.path.isfile(profile_path):
Expand All @@ -346,7 +308,7 @@ def _get_extractor_profile_path(self, level):

profile_content = self.config[profile_key].flatten()
profile_content = json.loads(json.dumps(profile_content))
# Override outputFormat (we only hande json for now)
# Override outputFormat (we only handle json for now)
profile_content["outputFormat"] = "json"

f = open(profile_path, 'w+')
Expand All @@ -357,11 +319,8 @@ def _get_extractor_profile_path(self, level):

return profile_path

def _get_extractor_path(self, level):
if level not in ("low", "high"):
raise ValueError("Extractor level must be either 'low' or 'high'. Given: {}".format(level))

extractor_key = "{}_level_extractor".format(level)
def _get_extractor_path(self):
extractor_key = "essentia_extractor"
if not self.config[extractor_key].exists():
raise KeyError("Key '{}' is not defined".format(extractor_key))

Expand Down

0 comments on commit 06bc019

Please sign in to comment.