# Winning Jeopardy

Jeopardy is a popular TV show in the US where participants answer questions to win money. Let's say we want to compete on Jeopardy, and we're looking for any edge we can get to win. In this project we will figure out some patterns in the questions that could help us win.

At first we import dataset which contains around 200000 rows of Jeopardy questions from it's beginning from CSV file.

In [2]:
import pandas as pd

# import dataset from file
jeopardy = pd.read_csv('JEOPARDY_CSV.csv')

# show first 5 rows of dataset
jeopardy.head()

Unnamed: 0,Show Number,Air Date,Round,Category,Value,Question,Answer
0,4680,2004-12-31,Jeopardy!,HISTORY,$200,"For the last 8 years of his life, Galileo was ...",Copernicus
1,4680,2004-12-31,Jeopardy!,ESPN's TOP 10 ALL-TIME ATHLETES,$200,No. 2: 1912 Olympian; football star at Carlisl...,Jim Thorpe
2,4680,2004-12-31,Jeopardy!,EVERYBODY TALKS ABOUT IT...,$200,The city of Yuma in this state has a record av...,Arizona
3,4680,2004-12-31,Jeopardy!,THE COMPANY LINE,$200,"In 1963, live on ""The Art Linkletter Show"", th...",McDonald's
4,4680,2004-12-31,Jeopardy!,EPITAPHS & TRIBUTES,$200,"Signer of the Dec. of Indep., framer of the Co...",John Adams


Some of dataset column names have spaces in front of their column names. Let's delete them.

In [3]:
# assign column names with deleted spaces to variable
jeopardy_columns = jeopardy.columns.str.replace(' ', '')
# assign variable's values to column names back
jeopardy.columns = jeopardy_columns
jeopardy.columns

Index(['ShowNumber', 'AirDate', 'Round', 'Category', 'Value', 'Question',
       'Answer'],
      dtype='object')

We also convert *Value* column to integer value.

In [4]:
# show all unique values for Value column
print(jeopardy['Value'].unique())

# delete '$' and ',' symbol and None value from Value column
jeopardy['Value'] = jeopardy['Value'].str.replace('$', '')
jeopardy['Value'] = jeopardy['Value'].str.replace(',', '')
jeopardy['Value'] = jeopardy['Value'].str.replace('None', '0')

# convert Value column type to int
jeopardy['Value'] = jeopardy['Value'].astype(int)

jeopardy.rename({'Value': 'Value, $'}, axis=1, inplace=True)

['$200' '$400' '$600' '$800' '$2,000' '$1000' '$1200' '$1600' '$2000'
 '$3,200' 'None' '$5,000' '$100' '$300' '$500' '$1,000' '$1,500' '$1,200'
 '$4,800' '$1,800' '$1,100' '$2,200' '$3,400' '$3,000' '$4,000' '$1,600'
 '$6,800' '$1,900' '$3,100' '$700' '$1,400' '$2,800' '$8,000' '$6,000'
 '$2,400' '$12,000' '$3,800' '$2,500' '$6,200' '$10,000' '$7,000' '$1,492'
 '$7,400' '$1,300' '$7,200' '$2,600' '$3,300' '$5,400' '$4,500' '$2,100'
 '$900' '$3,600' '$2,127' '$367' '$4,400' '$3,500' '$2,900' '$3,900'
 '$4,100' '$4,600' '$10,800' '$2,300' '$5,600' '$1,111' '$8,200' '$5,800'
 '$750' '$7,500' '$1,700' '$9,000' '$6,100' '$1,020' '$4,700' '$2,021'
 '$5,200' '$3,389' '$4,200' '$5' '$2,001' '$1,263' '$4,637' '$3,201'
 '$6,600' '$3,700' '$2,990' '$5,500' '$14,000' '$2,700' '$6,400' '$350'
 '$8,600' '$6,300' '$250' '$3,989' '$8,917' '$9,500' '$1,246' '$6,435'
 '$8,800' '$2,222' '$2,746' '$10,400' '$7,600' '$6,700' '$5,100' '$13,200'
 '$4,300' '$1,407' '$12,400' '$5,401' '$7,800' '$1,183' '$1,203

We also convert *AirDate* column values from str to date type.

In [5]:
# convert to datetime
jeopardy['AirDate'] = pd.to_datetime(jeopardy['AirDate'])

# show information about Jeopardy dataframe columns and their types 
jeopardy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 216930 entries, 0 to 216929
Data columns (total 7 columns):
ShowNumber    216930 non-null int64
AirDate       216930 non-null datetime64[ns]
Round         216930 non-null object
Category      216930 non-null object
Value, $      216930 non-null int64
Question      216930 non-null object
Answer        216928 non-null object
dtypes: datetime64[ns](1), int64(2), object(4)
memory usage: 11.6+ MB


Now we normalize all questions it *Question* and *Answer* columns. To do that we write a function which will convert all word in string to lowercase and delete all punctuation symbols. Then we apply that function to *Question* and *Answer* columns.

In [6]:
import re

# define function which brings letter case of string to lower and delete all punctuation symbols 
def normalize(value):
    value = value.lower()
    return re.sub('[^a-z0-9\s\-]', '', value)
    
# apply function to Question column, assing result to new column Clean Question
jeopardy['Clean Question'] = jeopardy['Question'].apply(normalize)

# apply function to Answer column, assing result to new column Clean Answer
jeopardy['Clean Answer'] = jeopardy['Answer'].astype(str).apply(normalize)

# show first five rows of dataset
jeopardy.head()

Unnamed: 0,ShowNumber,AirDate,Round,Category,"Value, $",Question,Answer,Clean Question,Clean Answer
0,4680,2004-12-31,Jeopardy!,HISTORY,200,"For the last 8 years of his life, Galileo was ...",Copernicus,for the last 8 years of his life galileo was u...,copernicus
1,4680,2004-12-31,Jeopardy!,ESPN's TOP 10 ALL-TIME ATHLETES,200,No. 2: 1912 Olympian; football star at Carlisl...,Jim Thorpe,no 2 1912 olympian football star at carlisle i...,jim thorpe
2,4680,2004-12-31,Jeopardy!,EVERYBODY TALKS ABOUT IT...,200,The city of Yuma in this state has a record av...,Arizona,the city of yuma in this state has a record av...,arizona
3,4680,2004-12-31,Jeopardy!,THE COMPANY LINE,200,"In 1963, live on ""The Art Linkletter Show"", th...",McDonald's,in 1963 live on the art linkletter show this c...,mcdonalds
4,4680,2004-12-31,Jeopardy!,EPITAPHS & TRIBUTES,200,"Signer of the Dec. of Indep., framer of the Co...",John Adams,signer of the dec of indep framer of the const...,john adams



Now we write a function which helps to figure out:

* How often the answer is deducible from the question.
* How often new questions are repeats of older questions.

# Answers in questions

In [7]:
def count_matches(row):
    # variable which counts word matches in answer and question
    match_count = 0
    # split answer and question in a word lists for more convinient count
    split_question = row['Clean Question'].split(' ')
    split_answer = row['Clean Answer'].split(' ')
    # delete from answer word 'the' as the most common
    if 'the' in split_answer:
        split_answer.remove('the')
    # if answer length is 0 - return 0 to prevent zero division
    if len(split_answer) == 0:
        return 0
    # else count word matches
    for word in split_answer:
        if word in split_question:
            match_count += 1
    # divide number of matches by len of answer, return the result
    return match_count/len(split_answer)

# apply the function to new column Answer in Question
jeopardy['Answer in Question'] = jeopardy.apply(count_matches, axis=1)

# find a mean of Answer in Question column
jeopardy['Answer in Question'].mean()

0.05931971014221584

Mean value for *Answer in Question* column is around 0.06. That means that vast majority of questions doesn't contain an answer in themselfs. That also means that we can't win the Jeopardy by looking for answer in question. We have to study.

Now we want to answer how often new answers repeat the old ones. To do that we sort the dataset according to *Air Date* column dates from older to newer than write function which counts how often new questions repeat the old ones and apply that function to the entire dataset.

# Questions overlap

In [8]:
# sort dataset by date from earliest to latest
jeopardy = jeopardy.sort_values('AirDate')
# set for storing words from answers
terms_used = set()

def count_repeats(row):
    # variable which counts word repeats in answer
    repeat_count = 0
    # split answer in a word lists for more convinient count
    split_question = row['Clean Question'].split(' ')
    # delete from list any word which length less than 6 
    split_question = [word for word in split_question if len(word) > 5]
    # count question word repeats
    for word in split_question:
        if word in terms_used:
            repeat_count += 1
        terms_used.add(word)
    # divide number of repeats by length of question and add it to list
    if len(split_question) > 0:
        repeat_count /= len(split_question)
    return repeat_count

jeopardy['Questions Overlap'] = jeopardy.apply(count_repeats, axis=1)
jeopardy['Questions Overlap'].mean()

0.8713278428096588

As we can see, of 87% words in average question was used before. This makes it relatively insignificant, but it does mean that it's worth looking more into the recycling of questions.

We also find that *terms_used* set of words in questions has a strange words like *hrefhttpwwwj-archivecommedia*, etc. Let's delete these words from set.

In [9]:
for term in list(terms_used):
    if 'hrefhttpwwwj-archivecommedia' in term:
        terms_used.remove(term)

Now we want to know if there are words which can be met in high value questions more often, than words with low value. Let's figure this out. At first we write a function that will determine which question has high value ($800 and more).

In [10]:
def high_value(value):
    if value > 800:
        return 1
    else:
        return 0
    
jeopardy['High Value'] = jeopardy['Value, $'].apply(high_value)
jeopardy.head()

Unnamed: 0,ShowNumber,AirDate,Round,Category,"Value, $",Question,Answer,Clean Question,Clean Answer,Answer in Question,Questions Overlap,High Value
84523,1,1984-09-10,Jeopardy!,LAKES & RIVERS,100,River mentioned most often in the Bible,the Jordan,river mentioned most often in the bible,the jordan,0.0,0.0,0
84565,1,1984-09-10,Double Jeopardy!,THE BIBLE,1000,"According to 1st Timothy, it is the ""root of a...",the love of money,according to 1st timothy it is the root of all...,the love of money,0.333333,0.0,1
84566,1,1984-09-10,Double Jeopardy!,'50'S TV,1000,Name under which experimenter Don Herbert taug...,Mr. Wizard,name under which experimenter don herbert taug...,mr wizard,0.0,0.0,1
84567,1,1984-09-10,Double Jeopardy!,NATIONAL LANDMARKS,1000,D.C. building shaken by November '83 bomb blast,the Capitol,dc building shaken by november 83 bomb blast,the capitol,0.0,0.0,1
84568,1,1984-09-10,Double Jeopardy!,NOTORIOUS,1000,"After the deed, he leaped to the stage shoutin...",John Wilkes Booth,after the deed he leaped to the stage shouting...,john wilkes booth,0.0,0.0,1


Then we write a function which counts number of high and low question in which one or another word could be met. Counting frequencies for each word is an extremely huge task, so at first we count for first 10 of them.

In [17]:
comparsion_dict = {k:[0, 0] for k in list(terms_used)[:20]}

def count_high_low_value(row):
    # split answer in a word lists for more convinient count
    split_question = row['Clean Question'].split(' ')
    # count question word repeats
    for word in comparsion_dict.keys():
        if word in split_question:
            if row['High Value'] == 1:
                comparsion_dict[word][0] += 1
            else:
                comparsion_dict[word][1] += 1

jeopardy.apply(count_high_low_value, axis=1)
comparsion_dict

{'rabelais': [3, 8],
 'nabopolassar': [1, 0],
 'quarries': [1, 4],
 'grumble': [2, 4],
 'rasingingin': [0, 1],
 'stretcht': [0, 1],
 'secretaries-general': [2, 0],
 'mazzello': [0, 1],
 'targetblankmassacrea': [1, 0],
 'roosts': [0, 1],
 'apapane': [0, 1],
 'targetblank100': [0, 1],
 'elment': [0, 1],
 'crystal-induced': [0, 1],
 'keratin': [1, 6],
 'openings': [4, 6],
 'evinced': [0, 1],
 'name--yep': [0, 1],
 'panning': [0, 1],
 'bonsal': [1, 0]}

Now we count the number of high and low value words in dataset and use these values for chisquared test.

In [19]:
high_value_count = jeopardy.loc[jeopardy['High Value'] == 1].shape[0]
low_value_count = jeopardy[jeopardy['High Value'] == 0].shape[0]
print(high_value_count)
print(low_value_count)

61422
155508


It's time for chi-square testing. We will use values from our high/low value frequency dictionary and number of high and low words in dataset to build sets of observed and expected frequencies for each word in our dictionary and then we count chi-square values for each of them.

In [21]:
from scipy.stats import chisquare
import numpy as np

chi_squared = list()

# for every key in dict
for key in comparsion_dict.keys():
    # count sum of each dict element
    total = sum(comparsion_dict[key])
    total_prop = total / jeopardy.shape[0]
    # count expected high and low values for each word
    high_value_exp = total_prop * high_value_count
    low_value_exp = total_prop * low_value_count
    # count observed high and low values for each word
    observed = np.array([comparsion_dict[key][0], comparsion_dict[key][1]])
    expected = np.array([high_value_exp, low_value_exp])
    chi_squared.append(chisquare(observed, expected))

chi_squared    

[Power_divergenceResult(statistic=0.005878321230796754, pvalue=0.9388859030670194),
 Power_divergenceResult(statistic=2.5317964247338085, pvalue=0.11157312838169751),
 Power_divergenceResult(statistic=0.1702839704934861, pvalue=0.6798595573662745),
 Power_divergenceResult(statistic=0.07446818777814278, pvalue=0.7849388502668134),
 Power_divergenceResult(statistic=0.3949764642333513, pvalue=0.5296950912486695),
 Power_divergenceResult(statistic=0.3949764642333513, pvalue=0.5296950912486695),
 Power_divergenceResult(statistic=5.063592849467617, pvalue=0.02443353405878706),
 Power_divergenceResult(statistic=0.3949764642333513, pvalue=0.5296950912486695),
 Power_divergenceResult(statistic=2.5317964247338085, pvalue=0.11157312838169751),
 Power_divergenceResult(statistic=0.3949764642333513, pvalue=0.5296950912486695),
 Power_divergenceResult(statistic=0.3949764642333513, pvalue=0.5296950912486695),
 Power_divergenceResult(statistic=0.3949764642333513, pvalue=0.5296950912486695),
 Power_dive

# Chi-squared results

Some of the terms had a significant difference in usage between high value and low value rows, but only some. Additionally, the frequencies were all lower than 5, so the chi-squared test isn't as valid. It would be better to run this test with only terms that have higher frequencies.

Then we make a dictionary of popular words with frequencies more than 1000, where key is word, first value is total frequeuncy of that word in dataset, second value is frequency of that word in high value questions, third value is a frequency of that word in high value questions.

In [110]:
freq_dict = dict()

def count_terms_freq(row):
    # split answer in a word lists for more convinient count
    split_question = row['Clean Question'].split(' ')
    # delete from list any word which length less than 6 
    split_question = [word for word in split_question if len(word) > 5]
    # count question word repeats
    for word in split_question:
        # for each word in row count it's frequency and frequency of it's high and low values
        if word in freq_dict:
            freq_dict[word][0] += 1
        else:
            freq_dict[word] = [1, 0, 0]
        if row['High Value'] == 1:
            freq_dict[word][1] += 1
        else:
            freq_dict[word][2] += 1

jeopardy.apply(count_terms_freq, axis=1)

popular_terms = {k: v for k, v in sorted(freq_dict.items(), key=lambda item: item[1].copy(), reverse=True) if v[0] > 1000}.copy()
popular_terms

{'called': [5461, 1712, 3749],
 'country': [4868, 1391, 3477],
 'became': [3162, 915, 2247],
 'played': [3011, 773, 2238],
 'president': [3010, 845, 2165],
 'before': [2909, 787, 2122],
 'american': [2837, 925, 1912],
 'capital': [2772, 803, 1969],
 'famous': [2497, 721, 1776],
 'french': [2488, 943, 1545],
 'targetblankherea': [2469, 1065, 1404],
 'island': [2445, 771, 1674],
 'people': [2224, 603, 1621],
 'during': [2000, 563, 1437],
 'national': [1976, 571, 1405],
 'british': [1947, 657, 1290],
 'largest': [1943, 481, 1462],
 'century': [1820, 642, 1178],
 'little': [1812, 480, 1332],
 'company': [1798, 498, 1300],
 'around': [1797, 530, 1267],
 'character': [1755, 510, 1245],
 'author': [1692, 644, 1048],
 'between': [1668, 540, 1128],
 'targetblankthisa': [1623, 790, 833],
 'series': [1619, 399, 1220],
 'meaning': [1570, 580, 990],
 'family': [1537, 482, 1055],
 'founded': [1486, 482, 1004],
 'include': [1409, 360, 1049],
 'states': [1375, 366, 1009],
 'number': [1316, 330, 986],


It's time to repeat our sci-square calculations for list of popular words.

In [112]:
popular_terms_chisq = popular_terms.copy()

# for every key in dict
for key in popular_terms_chisq.keys():
    # count sum of each dict element
    total = sum(popular_terms_chisq[key][1:])
    total_prop = total / jeopardy.shape[0]
    # count expected high and low values for each word
    high_value_exp = total_prop * high_value_count
    low_value_exp = total_prop * low_value_count
    # count observed high and low values for each word
    observed = np.array([popular_terms_chisq[key][1], popular_terms_chisq[key][2]])
    expected = np.array([high_value_exp, low_value_exp])
    popular_terms_chisq[key].append(round(chisquare(observed, expected)[0], 3))
    popular_terms_chisq[key].append(chisquare(observed, expected)[1])

popular_terms

{'called': [5461, 1712, 3749, 24.789, 6.396502041100799e-07],
 'country': [4868, 1391, 3477, 0.162, 0.6870214064786846],
 'became': [3162, 915, 2247, 0.605, 0.4366797022481044],
 'played': [3011, 773, 2238, 10.352, 0.0012932490489544497],
 'president': [3010, 845, 2165, 0.086, 0.7690485090028965],
 'before': [2909, 787, 2122, 2.276, 0.13137471169049217],
 'american': [2837, 925, 1912, 25.732, 3.9230643888072246e-07],
 'capital': [2772, 803, 1969, 0.584, 0.44466144233485827],
 'famous': [2497, 721, 1776, 0.386, 0.5341917767705773],
 'french': [2488, 943, 1545, 112.679, 2.536550638160169e-26],
 'targetblankherea': [2469, 1065, 1404, 267.189, 4.650186459218781e-60],
 'island': [2445, 771, 1674, 12.486, 0.00040997773350746024],
 'people': [2224, 603, 1621, 1.58, 0.2087349403944424],
 'during': [2000, 563, 1437, 0.027, 0.8705216675424382],
 'national': [1976, 571, 1405, 0.33, 0.5654288517992552],
 'british': [1947, 657, 1290, 28.283, 1.047921947671791e-07],
 'largest': [1943, 481, 1462, 12.

Let's sort values in dictionary according to their chisquare value.

In [117]:
# sort word according to their chi-square value
popular_terms_chisq = {k: v for k, v in sorted(popular_terms_chisq.items(), key=lambda item: item[1][3], reverse=True)}
# delete wrong words
del popular_terms_chisq['targetblankthisa']
del popular_terms_chisq['targetblankherea']
popular_terms_chisq = p

{'reports': [1235, 539, 696, 142.984, 5.9262631355130054e-33],
 'french': [2488, 943, 1545, 112.679, 2.536550638160169e-26],
 'italian': [1004, 423, 581, 94.437, 2.530559476941658e-22],
 'author': [1692, 644, 1048, 79.201, 5.611408626881465e-19],
 'german': [1072, 427, 645, 70.065, 5.737360709788552e-17],
 'meaning': [1570, 580, 990, 57.588, 3.2323425443197446e-14],
 'century': [1820, 642, 1178, 43.443, 4.365391335034222e-11],
 'british': [1947, 657, 1290, 28.283, 1.047921947671791e-07],
 'american': [2837, 925, 1912, 25.732, 3.9230643888072246e-07],
 'called': [5461, 1712, 3749, 24.789, 6.396502041100799e-07],
 'countrys': [1135, 259, 876, 16.884, 3.9743321508823615e-05],
 'english': [1279, 424, 855, 14.741, 0.00012332642188547744],
 'between': [1668, 540, 1128, 13.545, 0.000232873067591866],
 'island': [2445, 771, 1674, 12.486, 0.00040997773350746024],
 'founded': [1486, 482, 1004, 12.439, 0.00042056476395803166],
 'largest': [1943, 481, 1462, 12.123, 0.0004980320506931184],
 'series

Let's convert this dictionary to dataframe

In [136]:
pd.options.display.max_columns=1000

# make a dataframe from dict and transpose in
popular_terms_df = pd.DataFrame(popular_terms_chisq, index=['Total Frequency', 'High Value Freq', 'Low Value Freq', 'Chi-Square', 'P-value'])
popular_terms_df = popular_terms_df.transpose()

# function for rounding values in dataframe
def round_value(value):
    return round(value, 5)

popular_terms_df['P-value'] = popular_terms_df['P-value'].apply(round_value)
popular_terms_df

Unnamed: 0,Total Frequency,High Value Freq,Low Value Freq,Chi-Square,P-value
reports,1235.0,539.0,696.0,142.984,0.0
french,2488.0,943.0,1545.0,112.679,0.0
italian,1004.0,423.0,581.0,94.437,0.0
author,1692.0,644.0,1048.0,79.201,0.0
german,1072.0,427.0,645.0,70.065,0.0
meaning,1570.0,580.0,990.0,57.588,0.0
century,1820.0,642.0,1178.0,43.443,0.0
british,1947.0,657.0,1290.0,28.283,0.0
american,2837.0,925.0,1912.0,25.732,0.0
called,5461.0,1712.0,3749.0,24.789,0.0


As we can see, some words like *reports*, *french*, *italian*, *author*, etc can be met in high value questions more often than it's expected. 
Let's filter this words thus only words which chi-square value significally higher than p-value left.

In [138]:
popular_terms_df.loc[popular_terms_df['Chi-Square'] > 2*popular_terms_df['P-value']]

Unnamed: 0,Total Frequency,High Value Freq,Low Value Freq,Chi-Square,P-value
reports,1235.0,539.0,696.0,142.984,0.0
french,2488.0,943.0,1545.0,112.679,0.0
italian,1004.0,423.0,581.0,94.437,0.0
author,1692.0,644.0,1048.0,79.201,0.0
german,1072.0,427.0,645.0,70.065,0.0
meaning,1570.0,580.0,990.0,57.588,0.0
century,1820.0,642.0,1178.0,43.443,0.0
british,1947.0,657.0,1290.0,28.283,0.0
american,2837.0,925.0,1912.0,25.732,0.0
called,5461.0,1712.0,3749.0,24.789,0.0


As we can see, a lot of popular words with high frequency and high value are related to countries like America, England, German, France, geography, history, education and arts. Let's also look more close on *Category* column of dataset.

In [145]:
jeopardy['Category'].value_counts()[:30]

BEFORE & AFTER             547
SCIENCE                    519
LITERATURE                 496
AMERICAN HISTORY           418
POTPOURRI                  401
WORLD HISTORY              377
WORD ORIGINS               371
COLLEGES & UNIVERSITIES    351
HISTORY                    349
SPORTS                     342
U.S. CITIES                339
WORLD GEOGRAPHY            338
BODIES OF WATER            327
ANIMALS                    324
STATE CAPITALS             314
BUSINESS & INDUSTRY        311
ISLANDS                    301
WORLD CAPITALS             300
U.S. GEOGRAPHY             299
RELIGION                   297
SHAKESPEARE                294
OPERA                      294
LANGUAGES                  284
BALLET                     282
TELEVISION                 281
FICTIONAL CHARACTERS       280
TRANSPORTATION             279
PEOPLE                     279
RHYME TIME                 279
STUPID ANSWERS             270
Name: Category, dtype: int64

Our word analysis has confirmed. Geography, history, education and arts are quite popular themes in Jeopardy's questions. Espesially popular America related questions. Well, that's not surprise.

In [149]:
jeopardy[jeopardy['Questions Overlap'] == 1]

Unnamed: 0,ShowNumber,AirDate,Round,Category,"Value, $",Question,Answer,Clean Question,Clean Answer,Answer in Question,Questions Overlap,High Value
137409,2,1984-09-11,Jeopardy!,STATE CAPITALS,400,"It actually <u>is</u> 5,280 feet above sea level",Denver,it actually uisu 5280 feet above sea level,denver,0.000000,1.0,0
164716,3,1984-09-12,Double Jeopardy!,WORLD OF FOOD,800,"French for ""sour wine"", one variety is literal...",vinegar,french for sour wine one variety is literally ...,vinegar,0.000000,1.0,0
70463,4,1984-09-13,Double Jeopardy!,STATE CAPITALS,600,"1 of 4 state capitals that end in word ""City""","(1 of) Salt Lake City, Oklahoma City, Jefferso...",1 of 4 state capitals that end in word city,1 of salt lake city oklahoma city jefferson ci...,0.545455,1.0,0
127871,5,1984-09-14,Jeopardy!,AMERICAN LITERATURE,200,"Lincoln called it ""the book that caused the bi...",Uncle Tom's Cabin,lincoln called it the book that caused the big...,uncle toms cabin,0.000000,1.0,0
127905,5,1984-09-14,Double Jeopardy!,SHAKESPEARE,600,"Battle of the sexes on which musical ""Kiss Me ...",The Taming of the Shrew,battle of the sexes on which musical kiss me k...,the taming of the shrew,0.500000,1.0,0
...,...,...,...,...,...,...,...,...,...,...,...,...
105934,6300,2012-01-27,Jeopardy!,THE TRUTH LIES THEREIN,200,Symbol for the second-lightest element,He,symbol for the second-lightest element,he,0.000000,1.0,0
105944,6300,2012-01-27,Jeopardy!,STUPID ANSWERS,600,This was first imprinted in black on individua...,M,this was first imprinted in black on individua...,m,0.000000,1.0,0
105946,6300,2012-01-27,Jeopardy!,THE TRUTH LIES THEREIN,600,"Old school CBS history show: ""You Are"" this",There,old school cbs history show you are this,there,0.000000,1.0,0
105947,6300,2012-01-27,Jeopardy!,VISITING THE CITY,800,There's a great opera house on Bennelong Point...,Sydney,theres a great opera house on bennelong point ...,sydney,0.000000,1.0,0
