-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Classification models update #86
Conversation
- Be able to create a project from sig files - Be able to store generated files in a different folder - Check that every json file had been converted (not only amount of files)
src/bindings/pygaia/scripts/classification/train_model_from_sigs.py
Outdated
Show resolved
Hide resolved
src/bindings/pygaia/scripts/classification/train_model_from_sigs.py
Outdated
Show resolved
Hide resolved
src/bindings/pygaia/scripts/classification/train_model_from_sigs.py
Outdated
Show resolved
Hide resolved
We discussed that we'd make a release before merging this so that we have a tag to check out in acousticbrainz - however I see that the most recent commit in master is release 2.4.5 (95f4851), so it looks like we're ok! |
Other TODOs we discussed:
|
This change requires to store an additional dict inside ConfusionMatrix mapping each track id to the fold in which it was computed. This option was preferred as storing sub-confusion matrices would suppose an unnecessary weight increase. This commit also contains some methods to perform fold-wise operations.
The first point merged to a dataset defines the layout, and points with a different one are discarded. As tags may be inconsistent among tracks and they are not used for classification we can safely discard them before merging the points.
src/bindings/pygaia/classification/classificationtaskmanager.py
Outdated
Show resolved
Hide resolved
src/bindings/pygaia/scripts/classification/classification_project_template.yaml
Outdated
Show resolved
Hide resolved
When the maximum number of iterations is reached the stop tolerance is augmented by a factor of 10. The maximum number of allowed augmentions is also controlled by a parameter. This guarantees that the program does not get stuck on the optimization phase.
Not related to this repository, but we should add the information to Essentia's |
src/bindings/pygaia/scripts/classification/generate_classification_project.py
Outdated
Show resolved
Hide resolved
src/bindings/pygaia/scripts/classification/generate_classification_project.py
Outdated
Show resolved
Hide resolved
src/bindings/pygaia/scripts/classification/generate_classification_project.py
Outdated
Show resolved
Hide resolved
Allow files to fail when retrieving the version. Also fixed some typos.
|
||
class TestGenerateClassificationProject(unittest.TestCase): | ||
def check_project(self, groundtruth_file, filelist_file, expected): | ||
tmp_dir = 'tmp/' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use with tempfile.TemporaryDirectory
, and it'll clean up once it exits the with block
As this class is being used by many scripts, move it out of the get_classification_results file. Update some scripts to use argparse Update select_best_model to support more return values from ClassificationResults.best
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this now! We can go ahead and merge it if @dbogdanov and @pabloEntropia give the OK
This last group of commits addresses the creation of a flag |
great, let's merge it! |
This PR includes some improvements related to the classification models creation:
classification_project_template.yaml
so it doesn't break with the features generated by the updated Essentia'smusic_extractor
.melbands128
andbpm_histogram
.key
andscale
profiles.train_model_from_sigs.py
behavior: