-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transcription [wip] #170
Transcription [wip] #170
Conversation
------- | ||
intervals : np.ndarray, shape=(n_events, 2) | ||
array of event start and end time | ||
values : list of float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should return values
as a np.ndarray
(just involves values = np.array(values)
somewhere below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was returning a list
because that's what load_labeled_intervals
does, but I'm fine with changing it to an np.ndarray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that's because it's a list of strings, not a list of numeric data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed in 7802919
Is this ready for re-review? |
note: i re-ran the pr test build to trigger coverage reporting post merging #179. |
It's missing the regression tests still, but otherwise I think I've addressed every comment from the previous CR, so yes I think it's ready for re-review (or you could wait till I have the regression tests in too, as you prefer). |
|
||
# pycharm | ||
.idea/* | ||
docs/_build/doctrees/environment.pickle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just ignore docs/_build/*
, if you want to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, made a couple more minor comments for you to chew on until we wait for real world data. As a side note, once we have said data, eventually we will need to add the regression-score-generating step to |
Roger that (I've addressed all the comments except for one I wasn't sure about). Looks like adding the regression-score-generating step should be fairly straight forward. I've now asked Chris Cannam and Emmanouil Benetos for their algorithms' output too, so whoever gets back to me first "wins" :) |
b084e51
to
a738ae1
Compare
prf() implements the computation of precision, recall, and f1. match_notes() finds the optimal matchinf between the ref and est notes.
Implement the evaluate function. Rename prf() to precision_recall_f1. Flip the bi-partite graph construction such that the first index of evert returned tuple is the ref index, and the second is the est index. Compure the evaluation metrics with and without offsets. Change load_valued_intervals to return the pitch sequence as an nd.array.
Add transcription to sphinx doc generation. Modify tests to raise errors instead of warning were relevant, and remove redundant tests. Use offset_min_tolerance, and make it an optional parameter.
Move match notes into transcription.py. Remove with_offset and make the use of offsets implicit in the offset_ratio argument (if None then offsets are ignored). Also, change behavior of precision_recall_f1 so that by default it DOES consider offsets. Also, the output metrics have been renamed.
In addition to moving test_match_notes it has also been updated.
Add strict option for using <. The default is strict=False, which means using <=. Use a more efficient metho to compute the offset_hit_matric. Rename cmp to cmp_func.
Generate est, ref, and output files for regression testing. Fix the unit test data (file-evel constants) to match the expected results: est note 1 should not match ref note 1 when offsets are considered. Also update the expected unit test results, since est note 3 can match ref note 3 when strict=False.
Test Duan's matching code (which uses greedy note matching) to compare it to the graph based note matching. This also requires rounding down note onset and offset times to 2 decimal places.
Major update of the top-level docstring to explain all the differences compared to the MIREX 2015 matlab evaluation code. Remove the multiplication by the .5 factor when computing the offset_tolerance, since it turns out this not done in the MIREX evaluation. Also remove the rounding down of onset/offset times.
Make N_DECIMALS a file-level constant. Improve the numerical stability of the pitch comparison, and use a simpler and optimized offset tolerance computation.
Add the following tests: test_match_notes_strict, test_invalid_pitch, test_inconsistent_int_pitch (for when intervals and pitches are not the same length), test_empty_ref, test_empty_est, test_precision_recall_f1_empty (for when either the ref or est is empty).
Replace "\" with "( .. )" for all line continuations. Fix top-level docstring to describe relative difference to MIREX metrics in percetages rather than absolute values.
a738ae1
to
1077366
Compare
Alright @craffel, I think we're where we want to be. If you think the commit messages need further editing feel free to edit them, though I've followed the recommended guidelines to the dot. |
Looks good to me, feel free to merge! |
Huzzah! |
You can close #167 too :) |
This will (eventually) contain all the code for transcription evaluation (and tests)