Add 'Both Bad' button to judging #13

obo · 2014-09-02T13:48:17Z

Next to the Skip, we should also have 'Both Bad' button. This absolute scoring is very important (and different from plain pick better) because MT will be almost always bad but humans can make killer errors. If the judge understands he has to pick a better one, he would pick the better (but bad) human translation. And we would not know anything about the fact that it is suspicious.

cifkao · 2014-09-02T13:55:09Z

The question is how we should alter the score when someone clicks the
button.

On Tue, Sep 2, 2014 at 3:48 PM, Ondrej Bojar notifications@github.com
wrote:

Next to the Skip, we should also have 'Both Bad' button. This absolute
scoring is very important (and different from plain pick better) because MT
will be almost always bad but humans can make killer errors. If the judge
understands he has to pick a better one, he would pick the better (but bad)
human translation. And we would not know anything about the fact that it is
suspicious.

—
Reply to this email directly or view it on GitHub
#13.

obo · 2014-09-02T14:00:09Z

We should reduce the score for both and log separately, e.g. in one more column in the database, that each of the candidates got one more 'bad mark'. In the end, we should know how many such disqualifying marks did each sentence get.

----- Original Message -----

From: "Ondřej Cífka" notifications@github.com
To: "cifkao/tct" tct@noreply.github.com
Cc: "Ondrej Bojar" bojar@ufal.mff.cuni.cz
Sent: Tuesday, 2 September, 2014 3:55:08 PM
Subject: Re: [tct] Add 'Both Bad' button to judging (#13)

The question is how we should alter the score when someone clicks the
button.

On Tue, Sep 2, 2014 at 3:48 PM, Ondrej Bojar notifications@github.com
wrote:

Next to the Skip, we should also have 'Both Bad' button. This absolute
scoring is very important (and different from plain pick better) because MT
will be almost always bad but humans can make killer errors. If the judge
understands he has to pick a better one, he would pick the better (but bad)
human translation. And we would not know anything about the fact that it is
suspicious.

—
Reply to this email directly or view it on GitHub
#13.

Reply to this email directly or view it on GitHub:
#13 (comment)

Ondrej Bojar (mailto:obo@cuni.cz / bojar@ufal.mff.cuni.cz)
http://www.cuni.cz/~obo

cifkao · 2014-09-04T07:58:05Z

We have a table for individual scorings with a result column that is either 'a' or 'b'. We could just make it 'x' when both translations are unacceptable.

Or do you mean to create a bad mark counter in the translations table? Do we also want to increment it when the other translation gets selected?

edasubert · 2014-09-04T08:11:39Z

This is a bit unfortunate since as already mentioned the Elo scoring is
strictly one win one lose
a simple solution would be to (if 'both bad' selected) we let both
translations lose to virtual one with base 1400 score
that way we can keep current tables and everything just add a 'special'
entry for this virtual translation and make sure it's score stays the same

On Thu, Sep 4, 2014 at 9:58 AM, Ondřej Cífka notifications@github.com
wrote:

We have a table for individual scorings with a result column that is
either 'a' or 'b'. We could just make it 'x' when both translations are
unacceptable.

Or do you mean to create a bad mark counter in the translations table? Do
we also want to increment it when the other translation gets selected?

—
Reply to this email directly or view it on GitHub
#13 (comment).

obo · 2014-09-04T08:23:27Z

Yes, adding 'x' for 'Both Bad' would be great. (And actually, I would consider one more button, 'Both Acceptable', e.g. 'o' for OK in the database.)

As for the counters, I'd actually want to know exactly: how many wins (a), losses (b), bads (x) and possibly OKs (o) did each candidate get.

If we don't add Both OK, I hope the scorers will tell us both relative knowledge (this one is better) as well as absolute knowledge (this candidate is bad, full stop).

----- Original Message -----

From: "Ondřej Cífka" notifications@github.com
To: "cifkao/tct" tct@noreply.github.com
Cc: "Ondrej Bojar" bojar@ufal.mff.cuni.cz
Sent: Thursday, 4 September, 2014 9:58:05 AM
Subject: Re: [tct] Add 'Both Bad' button to judging (#13)

We have a table for individual scorings with a result column that is either
'a' or 'b'. We could just make it 'x' when both translations are
unacceptable.

Or do you mean to create a bad mark counter in the translations table? Do we
also want to increment it when the other translation gets selected?

Reply to this email directly or view it on GitHub:
#13 (comment)

Ondrej Bojar (mailto:obo@cuni.cz / bojar@ufal.mff.cuni.cz)
http://www.cuni.cz/~obo

obo · 2014-09-04T08:26:43Z

I dont know anything about Elo. The main reason for adding Both Bad is that when scoring, I really did not want to pick any of two bad outputs.

The second reason is that (as written in another comment here) I believe this would give us also some absolute information.

I suggest adding Both Bad and -- if there is no better way -- completely ignore it in Elo scoring and just use it elsewhere. E.g. I'd make use of it in the manual shutter page.

----- Original Message -----

From: "edasubert" notifications@github.com
To: "cifkao/tct" tct@noreply.github.com
Cc: "Ondrej Bojar" bojar@ufal.mff.cuni.cz
Sent: Thursday, 4 September, 2014 10:11:40 AM
Subject: Re: [tct] Add 'Both Bad' button to judging (#13)

This is a bit unfortunate since as already mentioned the Elo scoring is
strictly one win one lose
a simple solution would be to (if 'both bad' selected) we let both
translations lose to virtual one with base 1400 score
that way we can keep current tables and everything just add a 'special'
entry for this virtual translation and make sure it's score stays the same

On Thu, Sep 4, 2014 at 9:58 AM, Ondřej Cífka notifications@github.com
wrote:

We have a table for individual scorings with a result column that is
either 'a' or 'b'. We could just make it 'x' when both translations are
unacceptable.

Or do you mean to create a bad mark counter in the translations table? Do
we also want to increment it when the other translation gets selected?

—
Reply to this email directly or view it on GitHub
#13 (comment).

Reply to this email directly or view it on GitHub:
#13 (comment)

Ondrej Bojar (mailto:obo@cuni.cz / bojar@ufal.mff.cuni.cz)
http://www.cuni.cz/~obo

edasubert · 2014-09-04T08:41:48Z

I am not arguing against both bad button. The question is how to solve it
internally. We can not abandon the Elo since the entire scoring system is
wrapped around it. We need a way to adjust a score of both bad translations
so it does not render then not future Elo-score-able.
One possible way is to score them against the virtual translation. Then
once you want to find all both bad translations you search for records of
comparison to this specific one.

I will hoverer argue against both OK button. We are not looking for the
most OK translation. We are looking for the best translation. If you do not
feel like you can judge which one is better use skip button.
If we would raise the score for OK translation it could harm translations
that are better but were not selected for judging at the moment.
On Sep 4, 2014 10:26 AM, "Ondrej Bojar" notifications@github.com wrote:

I dont know anything about Elo. The main reason for adding Both Bad is
that when scoring, I really did not want to pick any of two bad outputs.

The second reason is that (as written in another comment here) I believe
this would give us also some absolute information.

I suggest adding Both Bad and -- if there is no better way -- completely
ignore it in Elo scoring and just use it elsewhere. E.g. I'd make use of it
in the manual shutter page.

----- Original Message -----

From: "edasubert" notifications@github.com
To: "cifkao/tct" tct@noreply.github.com
Cc: "Ondrej Bojar" bojar@ufal.mff.cuni.cz
Sent: Thursday, 4 September, 2014 10:11:40 AM
Subject: Re: [tct] Add 'Both Bad' button to judging (#13)

This is a bit unfortunate since as already mentioned the Elo scoring is
strictly one win one lose
a simple solution would be to (if 'both bad' selected) we let both
translations lose to virtual one with base 1400 score
that way we can keep current tables and everything just add a 'special'
entry for this virtual translation and make sure it's score stays the
same

On Thu, Sep 4, 2014 at 9:58 AM, Ondřej Cífka notifications@github.com
wrote:

We have a table for individual scorings with a result column that is
either 'a' or 'b'. We could just make it 'x' when both translations
are
unacceptable.

Or do you mean to create a bad mark counter in the translations table?
Do
we also want to increment it when the other translation gets selected?

—
Reply to this email directly or view it on GitHub
#13 (comment).

Reply to this email directly or view it on GitHub:
#13 (comment)

Ondrej Bojar (mailto:obo@cuni.cz / bojar@ufal.mff.cuni.cz)
http://www.cuni.cz/~obo

—
Reply to this email directly or view it on GitHub
#13 (comment).

obo · 2014-09-04T10:04:01Z

Agreed.
Let's add just the Both Bad.
At last I understand what you meant by the virtual translation, and your handling seems reasonable: for Both Bad, each of the segments independently should get a 'worse than' mark in a virtual comparison against something terribly bad.

On the other hand, I find it essential that we still somewhere record the actual comparisons and answers we got, in the most detailed and original form, so that we can reinterpret them if we find something better.

On September 4, 2014 10:41:49 AM CEST, edasubert notifications@github.com wrote:

I am not arguing against both bad button. The question is how to solve
it
internally. We can not abandon the Elo since the entire scoring system
is
wrapped around it. We need a way to adjust a score of both bad
translations
so it does not render then not future Elo-score-able.
One possible way is to score them against the virtual translation. Then
once you want to find all both bad translations you search for records
of
comparison to this specific one.

I will hoverer argue against both OK button. We are not looking for the
most OK translation. We are looking for the best translation. If you do
not
feel like you can judge which one is better use skip button.
If we would raise the score for OK translation it could harm
translations
that are better but were not selected for judging at the moment.
On Sep 4, 2014 10:26 AM, "Ondrej Bojar" notifications@github.com
wrote:

I dont know anything about Elo. The main reason for adding Both Bad
is
that when scoring, I really did not want to pick any of two bad
outputs.

The second reason is that (as written in another comment here) I
believe
this would give us also some absolute information.

I suggest adding Both Bad and -- if there is no better way --
completely
ignore it in Elo scoring and just use it elsewhere. E.g. I'd make use
of it
in the manual shutter page.

----- Original Message -----

From: "edasubert" notifications@github.com
To: "cifkao/tct" tct@noreply.github.com
Cc: "Ondrej Bojar" bojar@ufal.mff.cuni.cz
Sent: Thursday, 4 September, 2014 10:11:40 AM
Subject: Re: [tct] Add 'Both Bad' button to judging (#13)

This is a bit unfortunate since as already mentioned the Elo
scoring is
strictly one win one lose
a simple solution would be to (if 'both bad' selected) we let both
translations lose to virtual one with base 1400 score
that way we can keep current tables and everything just add a
'special'
entry for this virtual translation and make sure it's score stays
the
same

On Thu, Sep 4, 2014 at 9:58 AM, Ondřej Cífka
notifications@github.com
wrote:

We have a table for individual scorings with a result column
that is
either 'a' or 'b'. We could just make it 'x' when both
translations
are
unacceptable.

Or do you mean to create a bad mark counter in the translations
table?
Do
we also want to increment it when the other translation gets
selected?

—
Reply to this email directly or view it on GitHub
#13 (comment).

Reply to this email directly or view it on GitHub:
#13 (comment)

Ondrej Bojar (mailto:obo@cuni.cz / bojar@ufal.mff.cuni.cz)
http://www.cuni.cz/~obo

—
Reply to this email directly or view it on GitHub
#13 (comment).

Reply to this email directly or view it on GitHub:
#13 (comment)

Ondrej Bojar (mailto:obo@cuni.cz / bojar@ufal.mff.cuni.cz)
http://www.cuni.cz/~obo

cifkao · 2014-09-04T14:23:35Z

I'm not very much in favor of having a 'virtual translation' entry in the database because I would have to change the way scoring works. :-) An entry in the scorings table, identified by a hash, is created every time a user accesses the Scorings page, and updated when the user makes a choice. This is to prevent users from resubmitting their choice. If we used the virtual translation, we would actually have to remove the entry from the database and create two new entries (with different hashes etc.)... Not a very clean solution.

So I think the best solution is not to actually make the translations lose against a virtual translation, just change their score as if they did.

We could also make the two translations lose against each other. That is, just calculate the 'losing' part of the Elo rating and subtract that from either translation's score. (I'm not sure how this would behave.)

edasubert · 2014-09-04T14:30:39Z

OK let's do it your way but I am in favor of pretending they lost to
default value
On Sep 4, 2014 4:23 PM, "Ondřej Cífka" notifications@github.com wrote:

I'm not very much in favor of having a 'virtual translation' entry in the
database because I would have to change the way scoring works. :-) An entry
in the scorings table, identified by a hash, is created every time a user
accesses the Scorings page, and updated when the user makes a choice. This
is to prevent users from resubmitting their choice. If we used the virtual
translation, we would actually have to remove the entry from the database
and create two new entries (with different hashes etc.)... Not a very
clean solution.

So I think the best solution is not to actually make the translations lose
against a virtual translation, just change their score as if they did.

We could also make the two translations lose against each other. That is,
just calculate the 'losing' part of the Elo rating and subtract that from
either translation's score. (I'm not sure how this would behave.)

—
Reply to this email directly or view it on GitHub
#13 (comment).

cifkao · 2014-09-05T08:41:23Z

Added a 'Both wrong' button (sounds better I think). When clicked, we make both candidates lose to a notional translation with score 1400 (this is stored in the settings as Scoring.both_bad_winner_score).

edasubert · 2014-09-05T08:51:13Z

I do not think 'both wrong' is better I would suggest something along
'neither acceptable' since I think this button be used as less as possible
It is just for that (hopefully) rare occasion when neither translation has
meaning
I should pick slightly better one out of two bad ones
at least that is how I see it
On Sep 5, 2014 10:41 AM, "Ondřej Cífka" notifications@github.com wrote:

Added a 'Both wrong' button (sounds better I think). When clicked, we make
both candidates lose to a notional translation with score 1400 (this is
stored in the settings as Scoring.both_bad_winner_score).

—
Reply to this email directly or view it on GitHub
#13 (comment).

cifkao · 2014-09-05T08:52:01Z

How about 'both junk'?

edasubert · 2014-09-05T08:54:17Z

It does have the proper meaning, but I would wish for something a bit more
classy
On Sep 5, 2014 10:52 AM, "Ondřej Cífka" notifications@github.com wrote:

How about 'both junk'?

—
Reply to this email directly or view it on GitHub
#13 (comment).

obo · 2014-09-05T08:57:59Z

Both Junk sounds best.

On September 5, 2014 10:52:01 AM CEST, "Ondřej Cífka" notifications@github.com wrote:

How about 'both junk'?

Reply to this email directly or view it on GitHub:
#13 (comment)

Ondrej Bojar (mailto:obo@cuni.cz / bojar@ufal.mff.cuni.cz)
http://www.cuni.cz/~obo

obo · 2014-09-05T09:14:06Z

Both Inacceptable
(or Unacceptable??)
...but this is twitter users, they surely understant Junk better than Inacceptable.

On September 5, 2014 10:54:18 AM CEST, edasubert notifications@github.com wrote:

It does have the proper meaning, but I would wish for something a bit
more
classy
On Sep 5, 2014 10:52 AM, "Ondřej Cífka" notifications@github.com
wrote:

How about 'both junk'?

—
Reply to this email directly or view it on GitHub
#13 (comment).

Reply to this email directly or view it on GitHub:
#13 (comment)

Ondrej Bojar (mailto:obo@cuni.cz / bojar@ufal.mff.cuni.cz)
http://www.cuni.cz/~obo

edasubert · 2014-09-05T09:27:04Z

I would rather use ridiculous or something
Junk has some other meanings...
On Sep 5, 2014 11:14 AM, "Ondrej Bojar" notifications@github.com wrote:

Both Inacceptable
(or Unacceptable??)
...but this is twitter users, they surely understant Junk better than
Inacceptable.

On September 5, 2014 10:54:18 AM CEST, edasubert notifications@github.com
wrote:

It does have the proper meaning, but I would wish for something a bit
more
classy
On Sep 5, 2014 10:52 AM, "Ondřej Cífka" notifications@github.com
wrote:

How about 'both junk'?

—
Reply to this email directly or view it on GitHub
#13 (comment).

Reply to this email directly or view it on GitHub:
#13 (comment)

Ondrej Bojar (mailto:obo@cuni.cz / bojar@ufal.mff.cuni.cz)
http://www.cuni.cz/~obo

—
Reply to this email directly or view it on GitHub
#13 (comment).

obo added the enhancement label Sep 2, 2014

obo mentioned this issue Sep 2, 2014

Explain the blue-green-red bars under tweets #15

Open

cifkao closed this as completed Sep 5, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 'Both Bad' button to judging #13

Add 'Both Bad' button to judging #13

obo commented Sep 2, 2014

cifkao commented Sep 2, 2014

obo commented Sep 2, 2014

cifkao commented Sep 4, 2014

edasubert commented Sep 4, 2014

obo commented Sep 4, 2014

obo commented Sep 4, 2014

edasubert commented Sep 4, 2014

obo commented Sep 4, 2014

cifkao commented Sep 4, 2014

edasubert commented Sep 4, 2014

cifkao commented Sep 5, 2014

edasubert commented Sep 5, 2014

cifkao commented Sep 5, 2014

edasubert commented Sep 5, 2014

obo commented Sep 5, 2014

obo commented Sep 5, 2014

edasubert commented Sep 5, 2014

Add 'Both Bad' button to judging #13

Add 'Both Bad' button to judging #13

Comments

obo commented Sep 2, 2014

cifkao commented Sep 2, 2014

obo commented Sep 2, 2014

cifkao commented Sep 4, 2014

edasubert commented Sep 4, 2014

obo commented Sep 4, 2014

obo commented Sep 4, 2014

edasubert commented Sep 4, 2014

obo commented Sep 4, 2014

cifkao commented Sep 4, 2014

edasubert commented Sep 4, 2014

cifkao commented Sep 5, 2014

edasubert commented Sep 5, 2014

cifkao commented Sep 5, 2014

edasubert commented Sep 5, 2014

obo commented Sep 5, 2014

obo commented Sep 5, 2014

edasubert commented Sep 5, 2014