Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add strategy for prioritizing annotations to be voted #23

Closed
xavierfav opened this issue May 16, 2017 · 5 comments
Closed

Add strategy for prioritizing annotations to be voted #23

xavierfav opened this issue May 16, 2017 · 5 comments
Assignees

Comments

@xavierfav
Copy link
Contributor

In order to have "the best dataset we can at a time t", we have chosen some constrains [TO BE DISCUSSED]:

  • vote all annotation candidate for a sound (in order to get closer to "complete" annotation for a sound)
  • annotation candidates need 2 identical votes to be considered valid
  • prioritize sound that have a length < 30 sec
  • prioritize sounds with "good quality" (use Freesound downloads and rates? a descriptor for quality?)

We need to implement a "manager" that selects the annotations and the sounds to be voted.
Ideally a rank of priority should be derived from the constrains, and the annotation should be proposed to crowd-workers following this rank.

@ffont
Copy link
Member

ffont commented May 18, 2017

I agree with what you propose.
You probably want to have these scores precomputed in a property of Annotation model because otherwise its probably complicated to compute all the scores in real time (specially if score is complex to compute).

I suggest you to start implementing a function which given an annotation returns a "priority score". This could be a method of Annotation class.

@xavierfav
Copy link
Contributor Author

For now, the prioritization is based on votes:
Annotations that have at least one vote are prioritized.

In order to include the other constrains listed in this post, we need some Freesound metadata that we don't have in the current platform (ratings, nb of donwloads).

@ffont
Should we use the API to get this data. Or should I load it into our model so we have it in FSD platform?

Moreover, about the first point: vote all annotation candidate for a sound (in order to get closer to "complete" annotation for a sound)
I would say that as it is now, it is not worth to do it: because we did not work on population and prioritizing leaf nodes, we would prioritize annotation that are not worth voting (eg. voting "dog bark", "dog" and "animal"). We should first work on how to populate whenever an annotation is considered as ground truth.
We have been inspecting ambiguous cases with edufonseca (categories with more than one parent) to see whether or not it make sense to distinguish two categories and if it make sense to populate to the different parents or not.

@ffont
Copy link
Member

ffont commented Jul 12, 2017

@xavierfav We should use the API to load the data in the FSD platform ;)
There has always been the idea to write this management command that iterates over all sounds and gets data from freesound to store in the JSON field of each sound. I'm not sure if something similar was ever implemented (I guess not). I think this is the way to go, have this command that you can run from time to time to re-sync with Freesound.

When implementing the command, I'd iterate over all sounds in groups of N, and then use the API to make a search restricting the results to the IDs of these sounds (you can "OR" sound IDs in the search filter). Then using the fields param you decide which information you want to get returned and store in the FSD platform. In this way, the number of requests needed is n_sounds/N instead of n_sounds. N could be theoretically set to 150 (max number of search results per page), but the limitation here is the length of the URL (as all filter sound IDs will be in the URL). I think with N=50 should be fine. Otherwise try lower or higher values.

@edufonseca
Copy link
Contributor

In the constraints listed in the first comment, it was suggested to prioritize sounds with length < 30 sec. I think we should specify further in this direction. How about prioritizing (apart from the other aforementioned constraints):

  1. sounds with length < 10s (just as in AudioSet). This will presumably imply having more PP and also shorter sounds that, at this point, may be more useful.
  2. when the above are over, sounds with length < 20s
  3. when the above are over, sounds with length < 30s

@xavierfav
Copy link
Contributor Author

Sound with length < 10 sec are prioritized #70

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants