Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examples for training phase of validation task #30

Closed
edufonseca opened this issue May 18, 2017 · 13 comments
Closed

Examples for training phase of validation task #30

edufonseca opened this issue May 18, 2017 · 13 comments

Comments

@edufonseca
Copy link
Contributor

As mentioned in #27 , ideally, the annotation protocol will consist of a training phase followed by a validation phase. In the former, some representative audio examples should be presented to the rater. How to choose these examples for every category? Several options:

  1. Clips whose validations were rated as PP && with highest Freesound ranking
  2. Randomly chosen clips between those validated as PP. These will vary from rater to rater, thus mitigating bias…. However, in this way, some examples could be not very representative.
@xavierfav
Copy link
Contributor

In case 1. no implementation needed. Just need to add the examples in the ontology.json file
In case 2. very few implementation needed.

I think the case 1. makes more sense imo, since we want to control the training phase as much as possible. Providing good examples might be essential.

@ffont
Copy link
Member

ffont commented May 18, 2017

We can also default to case 2 when there are no examples chosen for case 1.

@jordipons
Copy link
Contributor

jordipons commented May 22, 2017

I like @ffont idea, as well! It is nice for the "cold start" problem we face.

For case 1 I propose computing a ranking of sounds given a punctuation considering PP and Freesound ratings.

@xavierfav
Copy link
Contributor

I guess you mean Freesound ratings?

@jordipons
Copy link
Contributor

Yes, thanks. I amended my comment! :)

@edufonseca
Copy link
Contributor Author

For the crowdsource launch (ie, annotations have been validated by a single rater) it has been suggested:

  • We could define a number of good sound examples for every category, eg, from 5 to 10
  • Ideally all the categories should have examples

The examples could be used for (see #27 to locate this steps in the annotation protocol):

  1. Representative examples to be shown in the explicit part of the training phase
  2. Good examples for the hidden part of the training phase
  3. Good examples to be used for Quality Control in the Validation Phase

How to choose these examples for every category? Currently, it is proposed to:

  1. Consider clips whose annotations are currently rated as PP
  2. Rank them with Freesound ranking
  3. Since we can't be sure that the Freesound ranking always yield the most representative audio clips for every category, a manual inspection of the resulting list should be done.

@xavierfav
Copy link
Contributor

Here is a dictionary {<aso_id> : [<fs_id>, <fs_id>, ...] , ...}

which provides PP examples for each AudioSet Ontology category ids.

I have combined Freesound ratings and downloads to sort them.
I manually checked a few categories, it seems to provide good examples.

As you said, now we need to manually validate some of this examples, and put them in the ontology_preCrowd.json file in the field "positive_examples" for each categories.

@xavierfav
Copy link
Contributor

Definitely the best way would be to implement this functionality on the web platform.
As we are quite in a rush, I prefer putting the priority on other stuff (annotation protocol).

So I will leave the option of filling directly the json file with the freesound examples.

Please start adding and checking examples!

@xavierfav
Copy link
Contributor

Tool is ready for adding the examples:

  • clone the repo
  • open the google sheet
  • use the script_examples_for_fsd.py file and fill the google sheet

@edufonseca
Copy link
Contributor Author

After inspecting the tool, it seems good for the task. I have few suggestions:

  1. The outcome will be a set of (ideally 10) examples for every sound category. They will be used to show something representative to the rater and for verification clips. For both cases, the best thing to have is short examples, for clarity and simplicity. However, in the proposed list of candidate examples, some of them are even longer than 1min. Options:
  • Maybe we could add duration in the criteria for creating this list.
  • Or, we could tell the subject to focus first on those clips shorter than 10s, for instance.
    But IMO we should not have examples longer than 10s (ideally).
  1. What is our intial target?
  • get examples for FSD current 398 categories?
  • get examples for AudioSet 632 categories? This would be great, but it may take a significant amount of additional effort: there may be not enough candidates provided (due to scarce validation), hence the user would have to go to Freesound and find them...
    Maybe we can decide this depending on the manpower we have....

@xavierfav
Copy link
Contributor

  1. Candidate examples updated:
    The same order is kept (using Freesound downloads and ratings), but the result are then organized by duration. First will be presented the sound which have a duration < 10 sec, then the one that have a duration btw 10 and 20 sec, and finally the one longer than 20 sec.
    This way it will be easier to find short relevant examples.

  2. Spreadshit updated:

  • it presents only the 398 categories that we considered during TTs
  • added MULTIPLE PARENTS after the path of the ambiguous case

@xavierfav
Copy link
Contributor

Providing examples for all the 398 categories seems to be complicated.
Some categories are hard to distinguish, or they are just are to recognize.

I think we don't have time to gathered enough examples for allowing to have this ready for the platform launching.

@xavierfav
Copy link
Contributor

  • A lot of examples have been provided (thanks @jordipons and all the contributors).

  • However it seems that they not always correspond to Present and Predominant source in clips.

  • Moreover, it would be needed to selected the examples that are shown to users, and the one used Quality Control. In some cases (mutliple parent), it would be nice to select examples for all possible parents.

  • A few Freesound IDs did not correspond to any sound in our platform (detailed at the end of this file)

  • From the Admin page it is now possible to edit these examples Add access to admin page for editing TaxonomyNode fields #79.

@xavierfav xavierfav reopened this Oct 26, 2017
@xavierfav xavierfav added this to the Platform Launch milestone Nov 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants