user scoring and question selection #57

MattAlexMiracle · 2022-12-25T13:06:53Z

This implements an early version of #37 using the following systems:

We use "information gain" to model how much a prompt's rating could benefit from new votes
We score votes based on how close they were to the optimal vote, i.e. if the consensus produces [100,300,200] and you vote for index 2 (counting from 0), you would get 1 point, since the ranking is [0,2,1] as 300 is the biggest, 200 the second, and 100 the smallest value.
We score prompts using "positive edge": i.e. if your votes are [200,300,100,500], you would multiply it with the (signed) distance to the middle, i.e. [-1,0,1,2] and get $-1\cdot 200+0\cdot 300+1\cdot 100+2\cdot 500$ points
We then score rankings using the kendall-tau correlation (or "normalized bubble sort distance"): this compares the combined ranking from all rankers to the ranking an individual user produced, and gives points based on how similar they are.

All three point scoring methods (voting, ranking, and prompting) have their own point counts, since the scales are very different. How we weight them between each other is going to have to be experimental, as it also depends on many external factors, such as "how many votes per question do we target".

scripts/postprocessing/infogain_selector.py

yk

LGTM, thank you

if I understand correctly, a "vote" is a best-one-of-N judgement. Is there also a variant if each user provides a complete ranking of all N choices? I'm thinking if we already make the user read all of the options, they might as well rank all of them.

Alexander Mattick added 7 commits December 25, 2022 11:45

infogain analytic solution

be424c9

Merge remote-tracking branch 'refs/remotes/origin/main'

81666fc

simple scoring system for scoreboard

76f7ed8

simple scoring system for prompts and ranks

58885f1

added utility functions to dataclass

a1c2580

added fixed 'good prompt' definition for ranking

d198eaf

added fixed point definition for prompt

a62db13

MattAlexMiracle requested review from yk and andreaskoepf as code owners December 25, 2022 13:06

andreaskoepf reviewed Dec 25, 2022

View reviewed changes

scripts/postprocessing/infogain_selector.py Show resolved Hide resolved

yk approved these changes Dec 25, 2022

View reviewed changes

andreaskoepf merged commit f8c3008 into LAION-AI:main Dec 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

user scoring and question selection #57

user scoring and question selection #57

MattAlexMiracle commented Dec 25, 2022

yk left a comment

user scoring and question selection #57

user scoring and question selection #57

Conversation

MattAlexMiracle commented Dec 25, 2022

yk left a comment

Choose a reason for hiding this comment