Grand Challenge #38

k0105 opened this Issue Mar 2, 2016 · 11 comments


None yet

3 participants

k0105 commented Mar 2, 2016


just wanted to let you know that I plan on having a competition between human contestants and "my" Yoda version in mid to late April as a grande finale of my contributions so far. Due to some tricks I currently have about twice as many correct (top 1) answers in one third of the time as the default configuration and thus I'm cautiously optimistic the system could win this challenge. Hence, unless some higher power prevents it, this will take place.

Might not be Jeopardy grand champions or covered on live TV, but if Yoda with some additions should be able to win against people running around in a university that would be a great milestone imho. I'm currently in between fear and excitement and will keep you posted about the results.

Best wishes,

pasky commented Mar 3, 2016

Hi! That sounds really awesome! Please do keep us posted about this. :-) I'm also looking forward to learn more about your tricks as well as the details of your configuration. How many human contestants will you have, and I assume they will be amateurs in the domain?

k0105 commented Mar 10, 2016

Btw.: Are any new releases planned for the near future (~6 weeks)? I currently use 1.4 and ask myself whether I should upgrade to 1.5 or wait for the next version.

pasky commented Mar 10, 2016
k0105 commented Mar 10, 2016

Neural models sound great, looking forward to that.

k0105 commented Mar 14, 2016

Grand Challenge is done. I asked questions which a colleague of mine (who is not involved in my team or project) wrote for me so I couldn't influence them in any way.

Most normal people could only answer around 15 questions correctly, but one particularly strong candidate managed to get 24 right. Afterwards, I ran my system against it. Yoda itself answered 13 correctly, the complete system got 24 just like the best human contender.

So the result of my Grand Challenge of man vs. machine is: We are currently in a draw. We can win against average people, but we are "only" on par with the best.

@k0105 k0105 closed this Mar 14, 2016

@jbauer180266 - I'm wondering if you have results published somewhere? I'd love to take a look!

k0105 commented Mar 14, 2016

Not yet, soon. I'll let you know.

@k0105 k0105 reopened this Mar 18, 2016
k0105 commented Mar 18, 2016

We have superhuman performance after all: I didn't activate the Bing backend when I did my tests, but with it exactly one additional question can be answered correctly, which is one more than the best human could. 25 out of 30. Very nice.

@k0105 k0105 closed this Mar 18, 2016
k0105 commented Mar 21, 2016

Until now 16 people have taken the test and the result is the same: Best human 24, system 25. Bad news: This time I don't have any kids around for the evaluation, so I can't say anything about humans under 18. Good news: All other age groups covered, fairly many women around (slightly under 50%), almost all educational statuses, all English proficiency levels except for "none" covered, so external validity is decent. Under the assumption that older people know more and that PhD-level subjects are more "dangerous" to the system, this works in our favor - you could say internal validity is increased, I guess.

pasky commented Mar 22, 2016

Congratulations! To test pure stock YodaQA on the challenge, I have created a small JSON dataset and dusted off data/eval/

{"qId": "gch000000", "qText": "What is the capital of Zimbabwe?", "answers": ["Harare"]},
{"qId": "gch000001", "qText": "Who invented the Otto engine?", "answers": ["Nikolaus Otto"]},
{"qId": "gch000002", "qText": "When was Pablo Picasso born?", "answers": ["1881"]},
{"qId": "gch000003", "qText": "What is 7*158 + 72 - 72 + 9?", "answers": ["1115"]},
{"qId": "gch000004", "qText": "Who wrote the novel The Light Fantastic?", "answers": ["Terry Pratchett"]},
{"qId": "gch000005", "qText": "In which city was Woody Allen born?", "answers": ["New York"]},
{"qId": "gch000006", "qText": "Who is the current prime minister of Italy?", "answers": ["Matteo Renzi"]},
{"qId": "gch000007", "qText": "What is the equatorial radius of Earth's moon?", "answers": ["1738"]},
{"qId": "gch000008", "qText": "When did the Soviet Union dissolve?", "answers": ["1991"]},
{"qId": "gch000009", "qText": "What is the core body temperature of a human?", "answers": ["37", "98.6"]},
{"qId": "gch000010", "qText": "Who is the current Dalai Lama?", "answers": ["Tenzin Gyatso"]},
{"qId": "gch000011", "qText": "What is 2^23?", "answers": ["8388608"]},
{"qId": "gch000012", "qText": "Who is the creator of Star Trek?", "answers": ["Gene Roddenberry"]},
{"qId": "gch000013", "qText": "In which city is the Eiffel Tower?", "answers": ["Paris"]},
{"qId": "gch000014", "qText": "12 metric tonnes in kilograms?", "answers": ["12 *000"]},
{"qId": "gch000015", "qText": "Where is the mouth of the river Rhine?", "answers": ["the Netherlands"]},
{"qId": "gch000016", "qText": "Where is Buckingham Palace located?", "answers": ["London"]},
{"qId": "gch000017", "qText": "Who directed the movie The Green Mile?", "answers": ["Frank Darabont"]},
{"qId": "gch000018", "qText": "When did Franklin D. Roosevelt die?", "answers": ["1945"]},
{"qId": "gch000019", "qText": "Who was the first man in space?", "answers": ["Yuri Gagarin"]},
{"qId": "gch000020", "qText": "Where was the Peace of Westphalia signed?", "answers": ["Osnabrück", "Münster", "Westphalia"]},
{"qId": "gch000021", "qText": "Who was the first woman to be awarded a Nobel Prize?", "answers": ["Marie Curie"]},
{"qId": "gch000022", "qText": "12.1147 inches to yards?", "answers": ["0.3365194444"]},
{"qId": "gch000023", "qText": "What is the atomic number of potassium?", "answers": ["19"]},
{"qId": "gch000024", "qText": "Where is the Tiananmen Square?", "answers": ["China"]},
{"qId": "gch000025", "qText": "What is the binomial name of horseradish?", "answers": ["Armoracia Rusticana"]},
{"qId": "gch000026", "qText": "How long did Albert Einstein live?", "answers": ["76"]},
{"qId": "gch000027", "qText": "Who earned the most Academy Awards?", "answers": ["Walt Disney", "Katharine Hepburn"]},
{"qId": "gch000028", "qText": "How many lines does the London Underground have?", "answers": ["11"]},
{"qId": "gch000029", "qText": "When is the next planned German Federal Convention?", "answers": []}
$ data/eval/ data/eval/gch.json
ID              Question Text                                           indicator       correct answer  found           URL
gch000000       What is the capital of Zimbabwe?                        correct         Harare          Harare
gch000001       Who invented the Otto engine?                           correct         Nikolaus Otto   Nikolaus Otto
gch000002       When was Pablo Picasso born?                            correct         1881            1881  
gch000003       What is 7*158 + 72 - 72 + 9?                            incorrect       1115   78
gch000004       Who wrote the novel The Light Fantastic?                correct         Terry Pratchett Terry Pratchett
gch000005       In which city was Woody Allen born?                     correct         New York        New York
gch000006       Who is the current prime minister of Italy?             correct         Matteo Renzi    Matteo Renzi
gch000007       What is the equatorial radius of Earth's moon?          incorrect       1738            the Moon and Su
gch000008       When did the Soviet Union dissolve?                     correct         1991            1991  
gch000009       What is the core body temperature of a human?           incorrect       37              Bio 42 and cour
gch000010       Who is the current Dalai Lama?                          correct         Tenzin Gyatso   Tenzin Gyatso
gch000011       What is 2^23?                                           incorrect       8388608         the Gregorian c
gch000012       Who is the creator of Star Trek?                        correct         Gene Roddenberr Gene Roddenberr
gch000013       In which city is the Eiffel Tower?                      correct         Paris           Paris 
gch000014       12 metric tonnes in kilograms?                          incorrect       12 *000         SI    
gch000015       Where is the mouth of the river Rhine?                  correct         the Netherlands the Netherlands
gch000016       Where is Buckingham Palace located?                     correct         London          London
gch000017       Who directed the movie The Green Mile?                  correct         Frank Darabont  Frank Darabont
gch000018       When did Franklin D. Roosevelt die?                     correct         1945            1945  
gch000019       Who was the first man in space?                         correct         Yuri Gagarin    Yuri Gagarin
gch000020       Where was the Peace of Westphalia signed?               incorrect       Osnabrück       France
gch000021       Who was the first woman to be awarded a Nobel Priz      incorrect       Marie Curie     Elinor Ostrom
gch000022       12.1147 inches to yards?                                incorrect       0.3365194444    CUX 570 17 577
gch000023       What is the atomic number of potassium?                 correct         19              19    
gch000024       Where is the Tiananmen Square?                          correct         China           China 
gch000025       What is the binomial name of horseradish?               correct         Armoracia Rusti Armoracia Rusti
gch000026       How long did Albert Einstein live?                      incorrect       76              Germany
gch000027       Who earned the most Academy Awards?                     recall          Walt Disney     Jimmy Stewart
gch000028       How many lines does the London Underground have?        incorrect       11              Soho Revue Bar
gch000029       When is the next planned German Federal Convention      incorrect                       1850  
correctly answered: 18
recall: 1
incorrect: 11

Seems like combined with Wolfram Alpha, the system could answer another at least 4 non-factoid questions, and probably help with at least 2 factoids - which would bring it to 24, but it's pretty likely Wolfram Alpha knows some of the other incorrect factoids too.

For your thesis, I'd also recommend comparing this to plain Wolfram Alpha and Google QA. For the latter, we have a script you can easily use in (though it may not correctly extract answer from all non-movie-related results, there is some variability in the HTML code).

Great work!

k0105 commented Mar 22, 2016

Thank you very much for the hints - already done from the start, though. I compared my results to both and also confirmed the great synergies between Yoda and Wolfram that I've already reported to you after the pilot study (as you probably remember). The questions are a bit treacherous, btw. - for this particular set Yoda and Wolfram almost achieve answering all questions of my full system correctly, but for larger test sets it became apparent that there are still some holes when only relying on a combination of Wolfram and Yoda which I've been able to plug at least to some degree.

Btw: The evaluation is finished after 26 (13 male, 13 female) subjects. The system still takes the top spot.

[Just to document a minor discrepancy: I had 27 subjects, but couldn't find a 14th female subject, so I burnt a male candidate to get equal numbers by gender, which should slightly increase external validity. The one I burnt was neither the best nor the worst and randomly chosen from all males. Btw: The two best human contenders are male, the third best is female.]

@k0105 k0105 referenced this issue in claritylab/lucida Mar 23, 2016

Status of openephyra #89

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment