Merge pull request #11 from adamjakab/devel

Merging low and high level extraction
adamjakab · Apr 4, 2021 · 06bc019 · 06bc019
2 parents 2f67dda + 0cc493d
commit 06bc019
Show file tree

Hide file tree

Showing 32 changed files with 121 additions and 125 deletions.
diff --git a/.github/ISSUE_TEMPLATE/bug-report.md b/.github/ISSUE_TEMPLATE/bug-report.md
diff --git a/.github/ISSUE_TEMPLATE/feature-request.md b/.github/ISSUE_TEMPLATE/feature-request.md
diff --git a/.github/workflows/pythonpublish.yml b/.github/workflows/pythonpublish.yml
diff --git a/.gitignore b/.gitignore
@@ -2,6 +2,7 @@
 tmp/
 coverage/
 BEETSDIR/*
+Data/*
 
 
 ### Python template

diff --git a/.idea/.gitignore b/.idea/.gitignore
diff --git a/.idea/BeetsPluginXtractor.iml b/.idea/BeetsPluginXtractor.iml
diff --git a/.idea/codeStyles/codeStyleConfig.xml b/.idea/codeStyles/codeStyleConfig.xml
diff --git a/.idea/copyright/JACK.xml b/.idea/copyright/JACK.xml
diff --git a/.idea/copyright/profiles_settings.xml b/.idea/copyright/profiles_settings.xml
diff --git a/.idea/inspectionProfiles/profiles_settings.xml b/.idea/inspectionProfiles/profiles_settings.xml
diff --git a/.idea/misc.xml b/.idea/misc.xml
diff --git a/.idea/modules.xml b/.idea/modules.xml
diff --git a/.idea/other.xml b/.idea/other.xml
diff --git a/.idea/rSettings.xml b/.idea/rSettings.xml
diff --git a/.idea/vcs.xml b/.idea/vcs.xml
diff --git a/.travis.yml b/.travis.yml
diff --git a/LICENSE.txt b/LICENSE.txt
diff --git a/MANIFEST.in b/MANIFEST.in
diff --git a/README.md b/README.md
@@ -6,10 +6,14 @@
 
 # Xtractor (Beets Plugin)
 
-The *beets-xtractor* plugin lets you, through the use of the [Essentia](https://essentia.upf.edu/index.html) extractors, to obtain low and high level musical information about your songs.
-
-Currently, the following attributes are extracted for each library item: `average_loudness`, `bpm`, `danceable`, `gender`, `genre_rosamerica`, `voice_instrumental`, `mood_acoustic`, `mood_aggressive`, `mood_electronic`, `mood_happy`, `mood_party`, `mood_relaxed`, `mood_sad` (some more to come soon)
+The *beets-xtractor* plugin lets you, through the use of the [Essentia](https://essentia.upf.edu/index.html) extractors,
+to obtain low and high level musical information from your songs.
 
+Currently, the following attributes are extracted for each library item:
+`bpm`, `danceability`, `beats_count`, `average_loudness`,  `danceable`, `gender`, `is_male`, `is_female`,
+`genre_rosamerica`, `voice_instrumental`, `is_voice`, `is_instrumental`, `mood_acoustic`,
+`mood_aggressive`, `mood_electronic`, `mood_happy`, `mood_sad`, `mood_party`, `mood_relaxed`, `mood_mirex`,
+`mood_mirex_cluster_1`, `mood_mirex_cluster_2`, `mood_mirex_cluster_3`, `mood_mirex_cluster_4`, `mood_mirex_cluster_5`
 
 ## Installation
 The plugin can be installed via:
@@ -25,15 +29,21 @@ plugins:
 ```
 
 ### Install the Essentia extractors
-You will also need the two binary extractors from the [Essentia project](#credits). They are called:
 
-- streaming_extractor_music
-- streaming_extractor_music_svm
+You will also need the `streaming_extractor_music` binary extractor from the [Essentia project](#credits). You will need
+to compile this extractor yourself.
+The [official installation documentation](https://essentia.upf.edu/installing.html#compiling-essentia-from-source)
+is somewhat complex but with some cross searching on the internet you will make it. If you are stuck you can use
+the [Issue tracker](https://github.com/adamjakab/BeetsPluginXtractor/issues). Make sure you compile it with Gaia
+support (`--with-gaia`) otherwise will not be able to use the high level models.
 
-Unfortunately, only the first extractor is readily available for download whilst to have the second one you will need to compile it yourself. The [official installation documentation](https://essentia.upf.edu/installing.html) is somewhat complex but with some cross searching on internet you will make it. If you are stuck you can use the [Issue tracker](https://github.com/adamjakab/BeetsPluginXtractor/issues). Make sure you compile with Gaia support (`--with-gaia`) otherwise your second `streaming_extractor_music_svm` will not be built.
 
 ### Download the SVM models
-The second extractor uses prebuilt trained models for prediction. You need to download these from here: [SVM Models](https://essentia.upf.edu/svm_models/) I suggest that you download the more recent beta5 version. This means that your binaries must match this version. Put the downloaded models in any folder from which they can be accessed.
+
+The second extractor uses prebuilt trained models for prediction. You need to download these from
+here: [SVM Models](https://essentia.upf.edu/svm_models/). I suggest that you download the more recent beta5 version.
+This means that your binaries must match this version. Put the downloaded models in any folder from which they can be
+accessed.
 
 
 ## Configuration
@@ -47,13 +57,11 @@ xtractor:
     threads: 1
     force: no
     quiet: no
-    items_per_run: 0
     keep_output: yes
     keep_profiles: no
     output_path: /mnt/data/xtraction_data
-    low_level_extractor: /mnt/data/extractors/beta5/streaming_extractor_music
-    high_level_extractor: /mnt/data/extractors/beta5/streaming_extractor_music_svm
-    high_level_profile:
+    essentia_extractor: /mnt/data/extractors/beta5/streaming_extractor_music
+    extractor_profile:
         highlevel:
             svm_models:
                 - /mnt/data/extractors/beta5/svm_models/danceability.history
@@ -63,17 +71,25 @@ xtractor:
                 - /mnt/data/extractors/beta5/svm_models/mood_aggressive.history
                 - /mnt/data/extractors/beta5/svm_models/mood_electronic.history
                 - /mnt/data/extractors/beta5/svm_models/mood_happy.history
+                - /mnt/data/extractors/beta5/svm_models/mood_sad.history
                 - /mnt/data/extractors/beta5/svm_models/mood_party.history
                 - /mnt/data/extractors/beta5/svm_models/mood_relaxed.history
-                - /mnt/data/extractors/beta5/svm_models/mood_sad.history
                 - /mnt/data/extractors/beta5/svm_models/voice_instrumental.history
+                - /mnt/data/extractors/beta5/svm_models/moods_mirex.history
 ```
 
-First of all, you will need adjust all paths. Put the paths of the extractor binaries in `low_level_extractor`and `high_level_extractor`, substitute the location of the SVM models with your local path under the `svm_models` desction. And finally, set the `output_path` to indicate where the extracted data files will be stored. I you do not set this, a temporary path will be used.
-
-By default both `keep_output` and `keep_profile` options are set to `no`. This means that after extraction (and the storage of the important information) the profile files used to pass to the extractors and the json files created by the extractors will be deleted. There are various reasons you might want to keep these files. One is for debugging purposes.  Another is to see what else is in these files (there is a lot) and maybe to use them with some other projects of yours. Lastly, you might want to keep these because the plugin only extracts data if these files are not present. If you store them, on a successive extraction, the plugin will skip the extraction and use these files (they are named by `mb_trackid`) - speeding up the process a lot.
-
-The `items_per_run` set to 0 will execute on all items. If you want to limit the number of items per execution (maybe because you want to run a nightly cron job in a limited timeframe) you can use this.
+First of all, you will need adjust all paths. Put the path of the extractor binary in `essentia_extractor` and
+substitute the location of the SVM models with your local path under the `svm_models` section. Finally, set
+the `output_path` to indicate where the extracted data files will be stored. If you do not set this, a temporary path
+will be used.
+
+By default both `keep_output` and `keep_profile` options are set to `no`. This means that after extraction (and the
+storage of the important information) the profile files used to pass to the extractors, and the json files created by
+the extractors will be deleted. There are various reasons you might want to keep these files. One is for debugging
+purposes. Another is to see what else is in these files (there is a lot) and maybe to use them with some other projects
+of yours. Lastly, you might want to keep these because the plugin only extracts data if these files are not present. If
+you store them, on a successive extraction, the plugin will skip the extraction and use these files (they are named
+by `mb_trackid`) - speeding up the process a lot.
 
 The `force` option instructs the plugin to execute on items which already have the required properties.
 

diff --git a/beetsplug/__init__.py b/beetsplug/__init__.py
diff --git a/beetsplug/xtractor/__init__.py b/beetsplug/xtractor/__init__.py
@@ -8,7 +8,6 @@
 
 from beets.plugins import BeetsPlugin
 from beets.util.confit import ConfigSource, load_yaml
-
 from beetsplug.xtractor.command import XtractorCommand
 
 
@@ -21,5 +20,16 @@ def __init__(self):
         source = ConfigSource(load_yaml(config_file_path) or {}, config_file_path)
         self.config.add(source)
 
+        # @todo: activate this to store the attributes in media files
+        # field = mediafile.MediaField(
+        #     mediafile.MP3DescStorageStyle(u'danceability'), mediafile.StorageStyle(u'danceability')
+        # )
+        # self.add_media_field('danceability', field)
+        #
+        # field = mediafile.MediaField(
+        #     mediafile.MP3DescStorageStyle(u'beats_count'), mediafile.StorageStyle(u'beats_count')
+        # )
+        # self.add_media_field('beats_count', field)
+
     def commands(self):
         return [XtractorCommand(self.config)]
diff --git a/beetsplug/xtractor/about.py b/beetsplug/xtractor/about.py
@@ -6,8 +6,8 @@
 __email__ = u'adam@jakab.pro'
 __copyright__ = u'Copyright (c) 2020, {} <{}>'.format(__author__, __email__)
 __license__ = u'License :: OSI Approved :: MIT License'
-__version__ = u'0.2.3'
-__status__ = u'Kickstarted'
+__version__ = u'0.3.0'
+__status__ = u'Building'
 
 __PACKAGE_TITLE__ = u'Xtractor'
 __PACKAGE_NAME__ = u'beets-xtractor'

diff --git a/beetsplug/xtractor/command.py b/beetsplug/xtractor/command.py
@@ -33,7 +33,6 @@ class XtractorCommand(Subcommand):
     cfg_threads = 1
     cfg_force = False
     cfg_quiet = False
-    cfg_items_per_run = 0
 
     def __init__(self, config):
         self.config = config
@@ -47,7 +46,6 @@ def __init__(self, config):
         self.cfg_version = False
         self.cfg_count_only = False
         self.cfg_quiet = cfg.get("quiet")
-        self.cfg_items_per_run = cfg.get("items_per_run")
 
         self.parser = OptionParser(
             usage='beet {plg} [options] [QUERY...]'.format(
@@ -131,24 +129,18 @@ def func(self, lib: Library, options, arguments):
 
     def xtract(self):
         self.find_items_to_analyse()
-        self._say("Number of items to be processed: {}".format(len(self.items_to_analyse)))
+        self._say("Number of items to be processed: {}".format(len(self.items_to_analyse)), False)
 
         # Count only and exit
         if self.cfg_count_only:
             return
 
-        # Limit the number of items per run (0 means no limit)
-        if self.cfg_items_per_run != 0:
-            self.items_to_analyse = self.items_to_analyse[:self.cfg_items_per_run]
-        self._say("Number of items selected: {}".format(len(self.items_to_analyse)))
-
         # Run tasks on selected items
         self._execute_on_each_items(self.items_to_analyse, self.run_full_analysis)
 
         # Delete profiles (if config wants)
         if self.config["keep_profiles"].exists() and not self.config["keep_profiles"].get():
-            os.unlink(self._get_extractor_profile_path("low"))
-            os.unlink(self._get_extractor_profile_path("high"))
+            os.unlink(self._get_extractor_profile_path())
 
     def find_items_to_analyse(self):
         # Parse the incoming query
@@ -180,16 +172,12 @@ def find_items_to_analyse(self):
             return
 
     def run_full_analysis(self, item):
-        self._run_analysis_low_level(item)
-        self._run_analysis_high_level(item)
-        self._run_write_to_item(item)
+        self._run_analysis(item)
+        # self._run_write_to_item(item)
 
         # Delete output files (if config wants)
         if self.config["keep_output"].exists() and not self.config["keep_output"].get():
-            output_path = self._get_output_path_for_item(item, suffix="low")
-            if os.path.isfile(output_path):
-                os.unlink(output_path)
-            output_path = self._get_output_path_for_item(item, suffix="high")
+            output_path = self._get_output_path_for_item(item)
             if os.path.isfile(output_path):
                 os.unlink(output_path)
 
@@ -198,12 +186,12 @@ def _run_write_to_item(self, item):
             if self.cfg_write:
                 item.try_write()
 
-    def _run_analysis_high_level(self, item):
+    def _run_analysis(self, item):
         try:
-            extractor_path = self._get_extractor_path(level="high")
-            input_path = self._get_output_path_for_item(item, suffix="low")
-            output_path = self._get_output_path_for_item(item, suffix="high")
-            profile_path = self._get_extractor_profile_path(level="high")
+            extractor_path = self._get_extractor_path()
+            input_path = self._get_input_path_for_item(item)
+            output_path = self._get_output_path_for_item(item)
+            profile_path = self._get_extractor_profile_path()
         except ValueError as e:
             self._say("Value error: {0}".format(e))
             return
@@ -214,50 +202,28 @@ def _run_analysis_high_level(self, item):
             self._say("File not found error: {0}".format(e))
             return
 
-        self._say("Running high-level analysis: {0}".format(input_path))
+        self._say("Running analysis for: {0}".format(input_path))
         self._run_essentia_extractor(extractor_path, input_path, output_path, profile_path)
 
+        # Extract low level targets
         try:
-            target_map = self.config["high_level_targets"]
-            audiodata = helper.extract_from_output(output_path, target_map)
-            self._say("Audiodata(High): {}".format(audiodata))
+            audiodata_low = helper.extract_from_output(output_path, self.config["low_level_targets"])
         except FileNotFoundError as e:
             self._say("File not found: {0}".format(e))
             return
 
-        if not self.cfg_dry_run:
-            for attr in audiodata.keys():
-                if audiodata.get(attr):
-                    setattr(item, attr, audiodata.get(attr))
-            item.store()
-
-    def _run_analysis_low_level(self, item):
+        # Extract high level targets
         try:
-            extractor_path = self._get_extractor_path(level="low")
-            input_path = self._get_input_path_for_item(item)
-            output_path = self._get_output_path_for_item(item, suffix="low")
-            profile_path = self._get_extractor_profile_path(level="low")
-        except ValueError as e:
-            self._say("Value error: {0}".format(e))
-            return
-        except KeyError as e:
-            self._say("Configuration error: {0}".format(e))
-            return
-        except FileNotFoundError as e:
-            self._say("File not found error: {0}".format(e))
-            return
-
-        self._say("Running low-level analysis: {0}".format(input_path))
-        self._run_essentia_extractor(extractor_path, input_path, output_path, profile_path)
-
-        try:
-            target_map = self.config["low_level_targets"]
-            audiodata = helper.extract_from_output(output_path, target_map)
-            self._say("Audiodata(Low): {}".format(audiodata))
+            audiodata_high = helper.extract_from_output(output_path, self.config["high_level_targets"])
         except FileNotFoundError as e:
             self._say("File not found: {0}".format(e))
             return
 
+        # Merge audio data
+        audiodata = {**audiodata_low, **audiodata_high}
+        self._say("Audiodata: {}".format(audiodata))
+
+        # Update and Store Item
         if not self.cfg_dry_run:
             for attr in audiodata.keys():
                 if audiodata.get(attr):
@@ -280,10 +246,10 @@ def _run_essentia_extractor(self, extractor_path, input_path, output_path, profi
 
         self._say("The process exited with code: {0}".format(proc.returncode))
         self._say("Process stdout: {0}".format(stdout.decode()))
-        self._say("Process stderr: {0}".format(stderr.decode()))
+        self._say("Process stderr: {0}\n".format(stderr.decode()))
 
-        # Make sure file is encoded correctly (sometimes media files have
-        # funky tags)
+        # Make sure file is encoded correctly
+        # Sometimes media files have funky tags
         helper.asciify_file_content(output_path)
 
     def _execute_on_each_items(self, items, func):
@@ -294,16 +260,15 @@ def _execute_on_each_items(self, items, func):
                 finished += 1
                 # todo: show a progress bar (--progress-only option)
 
-    def _get_output_path_for_item(self, item: Item, suffix=""):
+    def _get_output_path_for_item(self, item: Item):
         identifier = item.get("mb_trackid")
         if not identifier:
             input_path = self._get_input_path_for_item(item)
             identifier = hashlib.md5(input_path.encode('utf-8')).hexdigest()
 
-        output_file = "{id}{sfx}{ext}".format(
+        output_file = "{id}.{ext}".format(
             id=identifier,
-            sfx=".{}".format(suffix) if suffix else "",
-            ext=".json"
+            ext="json"
         )
 
         return os.path.join(self._get_extraction_output_path(), output_file)
@@ -331,12 +296,9 @@ def _get_extraction_output_path(self):
 
         return output_path
 
-    def _get_extractor_profile_path(self, level):
-        if level not in ("low", "high"):
-            raise ValueError("Profile level must be either 'low' or 'high'. Given: {}".format(level))
-
-        profile_key = "{}_level_profile".format(level)
-        profile_filename = "{}.yml".format(profile_key)
+    def _get_extractor_profile_path(self):
+        profile_key = "extractor_profile"
+        profile_filename = "profile.yml"
         profile_path = os.path.join(self._get_extraction_output_path(), profile_filename)
 
         if not os.path.isfile(profile_path):
@@ -346,7 +308,7 @@ def _get_extractor_profile_path(self, level):
 
             profile_content = self.config[profile_key].flatten()
             profile_content = json.loads(json.dumps(profile_content))
-            # Override outputFormat (we only hande json for now)
+            # Override outputFormat (we only handle json for now)
             profile_content["outputFormat"] = "json"
 
             f = open(profile_path, 'w+')
@@ -357,11 +319,8 @@ def _get_extractor_profile_path(self, level):
 
         return profile_path
 
-    def _get_extractor_path(self, level):
-        if level not in ("low", "high"):
-            raise ValueError("Extractor level must be either 'low' or 'high'. Given: {}".format(level))
-
-        extractor_key = "{}_level_extractor".format(level)
+    def _get_extractor_path(self):
+        extractor_key = "essentia_extractor"
         if not self.config[extractor_key].exists():
             raise KeyError("Key '{}' is not defined".format(extractor_key))