Classification models update #86

palonso · 2019-02-27T15:49:04Z

This PR includes some improvements related to the classification models creation:

Fixes the classification_project_template.yaml so it doesn't break with the features generated by the updated Essentia's music_extractor.
1. Removes the descriptors melbands128 and bpm_histogram.
2. Adds new key and scale profiles.
Adds a ranking of the 10 best configuration parameters for each model.
Improves train_model_from_sigs.py behavior:
1. Makes it able to create a project from .sig files (not only from .json).
2. Makes it able to store generated feature files in a separated folder.
3. Checks that every .json file has been converted into .sig (yaml) as before it was only checking the number of files of each type.

- Be able to create a project from sig files - Be able to store generated files in a different folder - Check that every json file had been converted (not only amount of files)

src/bindings/pygaia/scripts/classification/train_model_from_sigs.py

alastair · 2019-03-12T08:57:41Z

We discussed that we'd make a release before merging this so that we have a tag to check out in acousticbrainz - however I see that the most recent commit in master is release 2.4.5 (95f4851), so it looks like we're ok!

dbogdanov · 2019-03-18T16:47:54Z

Other TODOs we discussed:

report standard deviation for cross-fold validation results
add a MFCC baseline (using only MFCC features)

This change requires to store an additional dict inside ConfusionMatrix mapping each track id to the fold in which it was computed. This option was preferred as storing sub-confusion matrices would suppose an unnecessary weight increase. This commit also contains some methods to perform fold-wise operations.

src/bindings/pygaia/classification/confusionmatrix.py

The first point merged to a dataset defines the layout, and points with a different one are discarded. As tags may be inconsistent among tracks and they are not used for classification we can safely discard them before merging the points.

src/bindings/pygaia/classification/classificationtaskmanager.py

src/bindings/pygaia/scripts/classification/classification_project_template.yaml

When the maximum number of iterations is reached the stop tolerance is augmented by a factor of 10. The maximum number of allowed augmentions is also controlled by a parameter. This guarantees that the program does not get stuck on the optimization phase.

dbogdanov · 2019-06-06T13:30:28Z

Not related to this repository, but we should add the information to Essentia's FAQ.md about how to retrain a model with the new script and where to see the csv output with results for all parameter combinations.

src/bindings/pygaia/scripts/classification/generate_classification_project.py

src/bindings/pygaia/fusion.py

Allow files to fail when retrieving the version. Also fixed some typos.

alastair · 2020-03-11T10:23:44Z

test/unittest/test_generateclasificationproject.py

+
+class TestGenerateClassificationProject(unittest.TestCase):
+    def check_project(self, groundtruth_file, filelist_file, expected):
+        tmp_dir = 'tmp/'


you can use with tempfile.TemporaryDirectory, and it'll clean up once it exits the with block

As this class is being used by many scripts, move it out of the get_classification_results file. Update some scripts to use argparse Update select_best_model to support more return values from ClassificationResults.best

alastair

I'm happy with this now! We can go ahead and merge it if @dbogdanov and @pabloEntropia give the OK

palonso · 2020-03-23T14:11:13Z

This last group of commits addresses the creation of a flag --force-consistency than can be enabled to check that all the descriptor files were computed from the same Essentia version. This feature is disabled by default and should be the last change in this PR.

alastair · 2020-03-27T12:31:05Z

great, let's merge it!

palonso added 3 commits February 19, 2019 13:43

Addapt classification_project_template for the new music_extractor

50284d6

Added best models ranking

2ba7bd9

Improved train_model_from_sigs.py behavior

98333d6

- Be able to create a project from sig files - Be able to store generated files in a different folder - Check that every json file had been converted (not only amount of files)

dbogdanov requested changes Mar 4, 2019

View reviewed changes

src/bindings/pygaia/scripts/classification/train_model_from_sigs.py Outdated Show resolved Hide resolved

palonso added 4 commits March 6, 2019 12:02

Fix typo

a0c4552

Update prints to Python 3 format

83b17a8

Improve code style

454a58c

Make project name parameter mandatory

d89df4e

dbogdanov requested changes Mar 6, 2019

View reviewed changes

src/bindings/pygaia/scripts/classification/train_model_from_sigs.py Outdated Show resolved Hide resolved

src/bindings/pygaia/scripts/classification/train_model_from_sigs.py Outdated Show resolved Hide resolved

Improve doc and implement force option

84dc3c1

palonso added 4 commits March 27, 2019 13:50

Add MFCC only model to the project template

284e846

Include standard deviation info in results

49517aa

Add seed parameter to be able to replicate folds

03db2d8

dbogdanov requested changes Mar 29, 2019

View reviewed changes

src/bindings/pygaia/classification/confusionmatrix.py Show resolved Hide resolved

Add comment to stdNfold

3357b58

dbogdanov mentioned this pull request Apr 1, 2019

Report mean and stdev for accuracies of trained classifiers #34

Closed

palonso added 2 commits April 12, 2019 17:01

Remove incorrect output from '.results.ranking'

84a9270

Estimate class weights to compensate unbalanced datasets

3c2ca55

palonso mentioned this pull request Apr 15, 2019

Add weight parameter for training C-SVC SVMs on unbalanced data #18

Open

palonso added 2 commits April 16, 2019 17:27

Add normalized accuracies and std as requested in MTG#21

6881c88

Exclude metadata tags for classification tasks

6d6b913

The first point merged to a dataset defines the layout, and points with a different one are discarded. As tags may be inconsistent among tracks and they are not used for classification we can safely discard them before merging the points.

dbogdanov requested changes Apr 18, 2019

View reviewed changes

src/bindings/pygaia/classification/classificationtaskmanager.py Outdated Show resolved Hide resolved

src/bindings/pygaia/scripts/classification/classification_project_template.yaml Outdated Show resolved Hide resolved

palonso added 4 commits June 5, 2019 16:20

Generate csv with statistics about the tried parameters

62e3ab5

Remove unnecessary results.ranking file

8bcecd4

Add script to retrain a model for a given param set

483c334

alastair reviewed Mar 10, 2020

View reviewed changes

src/bindings/pygaia/scripts/classification/generate_classification_project.py Outdated Show resolved Hide resolved

alastair reviewed Mar 10, 2020

View reviewed changes

src/bindings/pygaia/scripts/classification/generate_classification_project.py Outdated Show resolved Hide resolved

alastair reviewed Mar 10, 2020

View reviewed changes

src/bindings/pygaia/scripts/classification/generate_classification_project.py Outdated Show resolved Hide resolved

alastair reviewed Mar 10, 2020

View reviewed changes

src/bindings/pygaia/fusion.py Outdated Show resolved Hide resolved

alastair and others added 9 commits March 10, 2020 12:25

Update classification README

e9c59c8

Update help text for train model scripts

b920685

Improve generate_classification_project.py doc

53ea3bf

generateProject -> generate_project

d2a8227

Rename template default -> 2.1-beta2

6481efc

Improve essentia version detection loop

1db80e7

Allow files to fail when retrieving the version. Also fixed some typos.

Add templateVersion field

e632a25

Add unit tests

479a19b

Rename passUnmatched -> dontFailOnUnmatched

693005c

alastair reviewed Mar 11, 2020

View reviewed changes

alastair added 3 commits March 12, 2020 16:31

Clean up generate_params_report to make it easier to follow

d1460c8

Move ClassificationResults class into separate file

e0d8274

As this class is being used by many scripts, move it out of the get_classification_results file. Update some scripts to use argparse Update select_best_model to support more return values from ClassificationResults.best

Add details about our custom stop tolerance patch to libsvm

a5b6212

alastair approved these changes Mar 13, 2020

View reviewed changes

use mkdtemp to make a temporary directory

a39b500

alastair linked an issue Mar 19, 2020 that may be closed by this pull request

train_model.py is hanging somewhere and wait infinitely #46

Closed

This was referenced Mar 19, 2020

Report standard deviation for raw and normalized accuracies when training classifier models #21

Open

Gaia Python 2/3 issues. #72

Closed

json_to_sig NOT eliminating metadata attributes #79

Closed

palonso added 4 commits March 23, 2020 14:19

Add --force-consistency flag

6ccb519

Add --force-consistency flag and reorder arguments

d9ad308

Fix test filename and add ForceConsistency test

d371421

Fix --dontFailOnUnmatched flag name

6546f88

dbogdanov merged commit 875a8c1 into MTG:master Mar 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classification models update #86

Classification models update #86

palonso commented Feb 27, 2019

alastair commented Mar 12, 2019

dbogdanov commented Mar 18, 2019

dbogdanov commented Jun 6, 2019

alastair Mar 11, 2020

alastair left a comment

palonso commented Mar 23, 2020

alastair commented Mar 27, 2020

Classification models update #86

Classification models update #86

Conversation

palonso commented Feb 27, 2019

alastair commented Mar 12, 2019

dbogdanov commented Mar 18, 2019

dbogdanov commented Jun 6, 2019

alastair Mar 11, 2020

Choose a reason for hiding this comment

alastair left a comment

Choose a reason for hiding this comment

palonso commented Mar 23, 2020

alastair commented Mar 27, 2020