Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classification models update #86

Merged
merged 63 commits into from
Mar 27, 2020

Conversation

palonso
Copy link
Contributor

@palonso palonso commented Feb 27, 2019

This PR includes some improvements related to the classification models creation:

  1. Fixes the classification_project_template.yaml so it doesn't break with the features generated by the updated Essentia's music_extractor.
    1. Removes the descriptors melbands128 and bpm_histogram.
    2. Adds new key and scale profiles.
  2. Adds a ranking of the 10 best configuration parameters for each model.
  3. Improves train_model_from_sigs.py behavior:
    1. Makes it able to create a project from .sig files (not only from .json).
    2. Makes it able to store generated feature files in a separated folder.
    3. Checks that every .json file has been converted into .sig (yaml) as before it was only checking the number of files of each type.

- Be able to create a project from sig files
- Be able to store generated files in a different folder
- Check that every json file had been converted (not only amount of files)
@alastair
Copy link
Member

We discussed that we'd make a release before merging this so that we have a tag to check out in acousticbrainz - however I see that the most recent commit in master is release 2.4.5 (95f4851), so it looks like we're ok!

@dbogdanov
Copy link
Member

Other TODOs we discussed:

  • report standard deviation for cross-fold validation results
  • add a MFCC baseline (using only MFCC features)

This change requires to store an additional dict inside ConfusionMatrix mapping each track id to the fold in which it was computed.
This option was preferred as storing sub-confusion matrices would suppose an unnecessary weight increase.
This commit also contains some methods to perform fold-wise operations.
The first point merged to a dataset defines the layout,
and points with a different one are discarded.
As tags may be inconsistent among tracks and they are
not used for classification we can safely discard
them before merging the points.
When the maximum number of iterations is reached the stop tolerance is augmented by
a factor of 10. The maximum number of allowed augmentions is also controlled by a parameter.
This guarantees that the program does not get stuck on the optimization phase.
@dbogdanov
Copy link
Member

Not related to this repository, but we should add the information to Essentia's FAQ.md about how to retrain a model with the new script and where to see the csv output with results for all parameter combinations.


class TestGenerateClassificationProject(unittest.TestCase):
def check_project(self, groundtruth_file, filelist_file, expected):
tmp_dir = 'tmp/'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use with tempfile.TemporaryDirectory, and it'll clean up once it exits the with block

As this class is being used by many scripts, move it out of the
get_classification_results file.
Update some scripts to use argparse
Update select_best_model to support more return values from
ClassificationResults.best
Copy link
Member

@alastair alastair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this now! We can go ahead and merge it if @dbogdanov and @pabloEntropia give the OK

@palonso
Copy link
Contributor Author

palonso commented Mar 23, 2020

This last group of commits addresses the creation of a flag --force-consistency than can be enabled to check that all the descriptor files were computed from the same Essentia version. This feature is disabled by default and should be the last change in this PR.

@alastair
Copy link
Member

great, let's merge it!

@dbogdanov dbogdanov merged commit 875a8c1 into MTG:master Mar 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

train_model.py is hanging somewhere and wait infinitely
3 participants