Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mechanical Turk Labeling pipeline? #9

Open
TylerBalsam opened this issue Apr 12, 2017 · 1 comment
Open

Mechanical Turk Labeling pipeline? #9

TylerBalsam opened this issue Apr 12, 2017 · 1 comment

Comments

@TylerBalsam
Copy link

Hey there!

Been quietly interested in this project for a while. While I can't give time, I'd happily sponsor Mechanical Turk labeling of movie content, and I'm sure others interested would too.

What are your thoughts on a pipeline /process for this? I think the main one would be actually renting the movie to the labellers, which I'm sure could be done with some sort of micro gift card code for the cost of the movie on Amazon, etc.

It may cost a bit but would really broaden the sponsor base and content base and may be the catalyst to get this project going.

@ocram
Copy link
Contributor

ocram commented Apr 12, 2017

Thank you so much, Tyler!

This is a really interesting idea which we should definitely discuss in detail.

Unfortunately, I have never used Mechanical Turk (or similar services) myself. So I don't know much about whether this is actually feasible or about the quality that we could expect. Neither can I comment on the practicability of that renting process for the workers and the idea of using gift cards. But that all seems worth exploring.

It's important to note that the labeling process that is required here is not a simple binary classification. It requires accurate work and a little bit of judgement. If all annotations are off by one or two seconds, that won't be of much help to us. But it could be done, I guess. Most importantly, you could make tagging violence, for example, a priority and ignore all the other categories in a first step. It's much easier, obviously, watching out for occurrences of a single topic than watching out for dozens at the same time.

Again, this really seems worth exploring, and trying this on a super small scale doesn't seem like a risk. I would ensure that the required technology is in place, whatever we need here, if we need anything at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants