# This is Jeopardy!

In this project I write several functions that investigate a dataset of _Jeopardy!_ questions and answers. Filter the dataset for topics that you're interested in, compute the average difficulty of those questions, and train to become the next Jeopardy champion!

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None)
jeopardy_data = pd.read_csv('jeopardy.csv')
jeopardy_data.rename(columns = {'Show Number':'show_number', ' Air Date':'air_date', ' Round':'round', ' Category':'category', ' Value':'value', ' Question':'question', ' Answer':'answer'}, inplace = True)
jeopardy_data

Unnamed: 0,show_number,air_date,round,category,value,question,answer
0,4680,2004-12-31,Jeopardy!,HISTORY,$200,"For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory",Copernicus
1,4680,2004-12-31,Jeopardy!,ESPN's TOP 10 ALL-TIME ATHLETES,$200,"No. 2: 1912 Olympian; football star at Carlisle Indian School; 6 MLB seasons with the Reds, Giants & Braves",Jim Thorpe
2,4680,2004-12-31,Jeopardy!,EVERYBODY TALKS ABOUT IT...,$200,"The city of Yuma in this state has a record average of 4,055 hours of sunshine each year",Arizona
3,4680,2004-12-31,Jeopardy!,THE COMPANY LINE,$200,"In 1963, live on ""The Art Linkletter Show"", this company served its billionth burger",McDonald's
4,4680,2004-12-31,Jeopardy!,EPITAPHS & TRIBUTES,$200,"Signer of the Dec. of Indep., framer of the Constitution of Mass., second President of the United States",John Adams
...,...,...,...,...,...,...,...
216925,4999,2006-05-11,Double Jeopardy!,RIDDLE ME THIS,$2000,This Puccini opera turns on the solution to 3 riddles posed by the heroine,Turandot
216926,4999,2006-05-11,Double Jeopardy!,"""T"" BIRDS",$2000,"In North America this term is properly applied to only 4 species that are crested, including the tufted",a titmouse
216927,4999,2006-05-11,Double Jeopardy!,AUTHORS IN THEIR YOUTH,$2000,"In Penny Lane, where this ""Hellraiser"" grew up, the barber shaves another customer--then flays him alive!",Clive Barker
216928,4999,2006-05-11,Double Jeopardy!,QUOTATIONS,$2000,"From Ft. Sill, Okla. he made the plea, Arizona is my land, my home, my father's land, to which I now ask to... return""",Geronimo


This function filters the dataset for questions that contains all of the words in a list of words. For example, when the list `["King", "England"]` was passed to our function, the function returned a DataFrame of 49 rows. Every row had the strings `"King"` and `"England"` somewhere in its `" Question"`.

In [13]:
def question_words(data, list_of_words):
    filter = lambda x: all(word in x for word in list_of_words)
    return data.loc[data.question.apply(filter)]

filtered = question_words(jeopardy_data, ['For', 'the', 'life', 'arrest'])
filtered

Unnamed: 0,show_number,air_date,round,category,value,question,answer
0,4680,2004-12-31,Jeopardy!,HISTORY,200.0,"For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory",Copernicus


A more robust version of the above function that lowers the strings (essentially normalizing them) to be checked and therefore yields more accurate rersults.

In [15]:
def question_words2(data, list_of_words):
    filter = lambda x: all(word.lower() in x.lower() for word in list_of_words)
    return data.loc[data.question.apply(filter)]

filtered = question_words2(jeopardy_data, ['King'])
filtered

Unnamed: 0,show_number,air_date,round,category,value,question,answer
34,4680,2004-12-31,Double Jeopardy!,"""X""s & ""O""s",400.0,Around 100 A.D. Tacitus wrote a book on how this art of persuasive speaking had declined since Cicero,oratory
40,4680,2004-12-31,Double Jeopardy!,DR. SEUSS AT THE MULTIPLEX,1200.0,"<a href=""http://www.j-archive.com/media/2004-12-31_DJ_26.mp3"">Ripped from today's headlines, he was a turtle king gone mad; Mack was the one good turtle who'd bring him down</a>",Yertle
50,4680,2004-12-31,Double Jeopardy!,DR. SEUSS AT THE MULTIPLEX,2000.0,"<a href=""http://www.j-archive.com/media/2004-12-31_DJ_24.mp3"">""500 Hats""... 500 ways to die. On July 4th, this young boy will defy a king... & become a legend</a>",Bartholomew Cubbins
56,5957,2010-07-06,Jeopardy!,"GEOGRAPHY ""E""",200.0,It's the largest kingdom in the United Kingdom,England
72,5957,2010-07-06,Jeopardy!,LET'S BOUNCE,600.0,"In this kid's game, you bounce a small rubber ball while picking up 6-pronged metal objects",jacks
...,...,...,...,...,...,...,...
216777,5070,2006-09-29,Double Jeopardy!,ANCIENT HISTORY,400.0,The first one of these tombs was built about 2650 B.C. by Imhotep for King Zoser & rose about 200 feet using steps,a pyramid (the pyramids accepted)
216787,5070,2006-09-29,Double Jeopardy!,TALES OF E.T.A. HOFFMANN,2000.0,"A Hoffmann tale title lost the words ""And The Mouse King"" when it became this Tchaikovsky ballet",The Nutcracker
216789,5070,2006-09-29,Double Jeopardy!,ANCIENT HISTORY,1200.0,"This kingdom of England grew from 2 settlements, one founded around 495 by Cerdic & his son Cynric",Wessex
216856,5195,2007-03-23,Double Jeopardy!,HAIL TO THE CHEF,1600.0,"You can cook like <a href=""http://www.j-archive.com/media/2007-03-23_DJ_24.jpg"" target=""_blank"">this</a> man who wrote ""The Joy of Wokking""",(Martin) Yan


Next I cleaned the value column, as its values were strings and I wanted to convert it to numerical values in order to be able to use statsitcs like mean and max/min on this column.

In [4]:
value_to_float_lambda = lambda row: float(row.value.strip('$').replace(',', '')) if row.value != 'None' else 0
jeopardy_data['value'] = jeopardy_data.apply(value_to_float_lambda, axis = 1)

In [16]:
#Now I can use my filter function from earlier in order to see how the mean behaves with different words
print(jeopardy_data.value.mean())
question_w_king = question_words2(jeopardy_data, ['King'])
print(question_w_king.value.mean())

739.9884755451067
771.8833850722094


This next function returns the count of unique answers to all of the questions in a dataset. For example, after filtering the entire dataset to only questions containing the word `"King"`, we could then find all of the unique answers to those questions.

In [20]:
def unique_answers_for_filtered_questions(data, filter_words):
    filtered_data = question_words(data, filter_words)
    all_answers = filtered_data.answer.unique()
    for answer in all_answers:
        temp_fram = filtered_data[filtered_data.answer == answer]
        print(answer, len(temp_fram))
        
        
def short(data):
    return data.answer.value_counts()

#unique_answers_for_filtered_questions(jeopardy_data, ['King'])
short(filtered)

#The answer "Henry VIII" appeared 55 times and was the most common answer.

Henry VIII                   55
Solomon                      35
Richard III                  33
Louis XIV                    31
David                        30
                             ..
cardiac (in card I acted)     1
Henderson                     1
Computer                      1
Indians                       1
work                          1
Name: answer, Length: 5268, dtype: int64

6. Explore from here! This is an incredibly rich dataset, and there are so many interesting things to discover. There are a few columns that we haven't even started looking at yet. Here are some ideas on ways to continue working with this data:

 * Investigate the ways in which questions change over time by filtering by the date. How many questions from the 90s use the word `"Computer"` compared to questions from the 2000s?
 * Is there a connection between the round and the category? Are you more likely to find certain categories, like `"Literature"` in Single Jeopardy or Double Jeopardy?
 * Build a system to quiz yourself. Grab random questions, and use the <a href="https://docs.python.org/3/library/functions.html#input">input</a> function to get a response from the user. Check to see if that response was right or wrong.

In [7]:
jeopardy_data

air_date1 = jeopardy_data.air_date[0][0:4]
print(air_date1)

def words_in_time(data, words, years):
    years_str = [str(year) for year in years]
    print(years_str)
    range_frame_columns = {'show_number': [], 'air_date': [], 'round': [], 'category': [], 'value': [], 'question': [], 'answer': []}
    range_frame = pd.DataFrame(range_frame_columns)
    for year in years_str:
        year_lambda = lambda row: row['air_date'][0:4] == year
        filtered_year = data[data.apply(year_lambda, axis = 1)]
        filtered_both = question_words(filtered_year, words)
        range_frame = pd.concat([range_frame, filtered_both])
    return range_frame

words_in_time(jeopardy_data, [' Computer '], range(1991, 2000))

2004
['1991', '1992', '1993', '1994', '1995', '1996', '1997', '1998', '1999']


Unnamed: 0,show_number,air_date,round,category,value,question,answer
149993,1525.0,1991-03-29,Jeopardy!,BUSINESS & INDUSTRY,100.0,This biggest computer company is biggest also among companies in philanthropic contributions,IBM
143917,2038.0,1993-06-16,Double Jeopardy!,U.S. GOVERNMENT,800.0,"This agency, the GSA, operates the Federal Communications & Computer Systems",General Services Administration
143069,2578.0,1995-11-15,Double Jeopardy!,1995,400.0,"In June, Lotus Development Corporation agreed to be taken over by this computer giant",IBM
186728,2543.0,1995-09-27,Double Jeopardy!,WOMEN,600.0,"Grace Murray Hopper, who helped develop COBOL, also coined this term for a computer glitch",a bug
191881,2840.0,1996-12-27,Jeopardy!,CORPORATE AMERICA,500.0,Hoover's Handbook says this South Dakota mail-order computer firm is the USA's fastest-growing company,Gateway 2000
191929,2815.0,1996-11-22,Jeopardy!,BUSINESS & INDUSTRY,300.0,In 1993 Louis V. Gerstner became the first outsider to head this computer giant,IBM
209897,2676.0,1996-04-01,Jeopardy!,APRIL FOOLS' DAY,200.0,"The ""core"" of this computer company goes back to its founding, April 1, 1976",Apple Computer
22502,2874.0,1997-02-13,Jeopardy!,BUSINESS & INDUSTRY,200.0,This company's new computer products include the Aptiva S & the Thinkpad 560,IBM
51059,2990.0,1997-09-05,Jeopardy!,BORN IN THE '60S,300.0,"In 1996, he beat a chess-playing IBM computer called Deep Blue; in 1997 the tables were turned",Garry Kasparov
52516,2912.0,1997-04-08,Double Jeopardy!,CONTEMPORARIES,600.0,"While Charles Babbage was trying to build a computer in the 1820s, this Scot invented what would be a ""Mac""",Charles McIntosh


This function filters the data based on the years provided and returns a pandas DataFrame containing the filtered data.

In [8]:
def filter_in_time(data, years):
    years_str = [str(year) for year in years]
    range_frame_columns = {'show_number': [], 'air_date': [], 'round': [], 'category': [], 'value': [], 'question': [], 'answer': []}
    range_frame = pd.DataFrame(range_frame_columns)
    for year in years_str:
        year_lambda = lambda row: row['air_date'][0:4] == year
        filtered_year = data[data.apply(year_lambda, axis = 1)]
        range_frame = pd.concat([range_frame, filtered_year])
    return range_frame
        
recent_jp = filter_in_time(jeopardy_data, range(2000, 2014)).reset_index()
recent_jp

Unnamed: 0,index,show_number,air_date,round,category,value,question,answer
0,117,3751.0,2000-12-18,Jeopardy!,ROYAL FEMALE NICKNAMES,100.0,"Prime Minister Tony Blair dubbed her ""The People's Princess""",Princess Diana
1,118,3751.0,2000-12-18,Jeopardy!,TV ACTORS & ROLES,100.0,"Once Tommy Mullaney on ""L.A. Law"", John Spencer now plays White House chief of staff Leo McGarry on this series",The West Wing
2,119,3751.0,2000-12-18,Jeopardy!,TRAVEL & TOURISM,100.0,The Cinderella Castle Mystery Tour is a highlight of this Asian city's Disneyland,Tokyo
3,120,3751.0,2000-12-18,Jeopardy!,"""I"" LADS",100.0,This punk rock hitmaker heard here has had numerous hits on both sides of the Atlantic,Billy Idol
4,121,3751.0,2000-12-18,Jeopardy!,FOREWORDS,100.0,"""Conrad begins (and ends) Marlow's journey... on the Thames, on the yawl, Nellie"", says the foreword to this novel",Heart of Darkness
...,...,...,...,...,...,...,...,...
152072,192273,6292.0,2012-01-17,Double Jeopardy!,I PLAYED A DOCTOR & SOME OTHER GUY ON TV,2000.0,"Marcus Welby & Jim Anderson (the ""Father"" who ""Knows Best"")",Robert Young
152073,192274,6292.0,2012-01-17,Double Jeopardy!,CLASSICAL MUSIC,2000.0,"This composer of Armenian heritage is best known for the ""Sabre Dance"" from his ballet ""Gayane""",(Aram) Khachaturian
152074,192275,6292.0,2012-01-17,Double Jeopardy!,POSSESSIVE BOOK TITLES,2000.0,"Kim Edwards: ""The ____ ____'s Daughter"" (2005)",The Memory Keeper's Daughter
152075,192276,6292.0,2012-01-17,Double Jeopardy!,"""TRI"" TIPS",2000.0,This was an ancient Roman warship with multiple tiers of oars,a trireme


In [9]:
money_counter = 2000 

In [10]:
# This is an attempt to calcualte a ratio of correctness, a metric that could be used to determine if the 
# user meant the right thing, but had a typo or forgot an article or something of the sorts.
# Currently the jeopardy game only works if the entire string is correct. If the answer is "a flush" and the user
# put "flush" the answer will be wrong. The below function was meant to get rid of this issue, however 
# it never really got there :(

def calculate_correctness_ratio(answer, real_answer):
    correct_counter = 0
    total_counter = min(len(answer), len(real_answer))
    
    # Iterate through each character in the shortest answer string
    for i in range(total_counter):
        # Check if the current character is correct and has correct neighbors
        if answer[i] == real_answer[i] and i > 0 and i < total_counter - 1:
            if answer[i - 1] == real_answer[i - 1] and answer[i + 1] == real_answer[i + 1]:
                correct_counter += 1
    
    # Calculate the ratio of correctness
    if total_counter > 0:
        correctness_ratio = correct_counter / len(real_answer)
    else:
        correctness_ratio = 0.0  # To handle the case where both answers are empty
    
    return correctness_ratio

# Example usage
real_answer = "prologue"
user_answer = "prologue"
ratio = calculate_correctness_ratio(user_answer, real_answer)
print(f"Ratio of correctness: {ratio}")

cans = recent_jp[recent_jp.question == "An epilogue is found at the end of a book; this opposite term is a short section at the beginning of a book"]
cans

Ratio of correctness: 0.75


Unnamed: 0,index,show_number,air_date,round,category,value,question,answer
58310,51448,4706.0,2005-02-07,Double Jeopardy!,OPPOSITES ATTRACT,800.0,An epilogue is found at the end of a book; this opposite term is a short section at the beginning of a book,a prologue


The function below is a jeopardy game that makes use of the dataset. Rules are listed below.

In [11]:
import random as rdm



def play_jeopardy(data):
    global money_counter
    rand_jeop = data.iloc[rdm.randint(1, len(data))]
    print(rand_jeop.question + '\n')
    user_answer = input("Enter your answer: ")
    if(user_answer.lower() in 'skip'):
        print('Question skipped')
        print("The correct answer was: " + rand_jeop.answer)
        money_counter -= 20
    elif(user_answer.lower() == "give context"):
        print("category: " + rand_jeop.category)
        print("Air date: " + rand_jeop.air_date)
        print("value: " + str(rand_jeop.value))
        user_answer_after_context = input("Enter your answer using the context: ")
        if(user_answer_after_context.lower() in rand_jeop.answer.lower()):
            print("OOOOOHHH yeah baby that is the correct answer! The correct answer was: " + rand_jeop.answer)
            print("you won: $"+str(rand_jeop.value/2) + "!")
            money_counter += rand_jeop.value/2
        elif(user_answer_after_context.lower() in 'skip'):
            print('Question skipped')
            print("The correct answer was: " + rand_jeop.answer)
            money_counter -= 20
        else:
            print("Tough luck, you lose all your money! The correct answer was: " + rand_jeop.answer)
            money_counter -= 150
            
    elif(user_answer.lower() in rand_jeop.answer.lower()):
        print("OOOOOHHH yeah baby that is the correct answer! The correct answer was: " + rand_jeop.answer)
        print("you won: $"+str(rand_jeop.value) + "!")
        money_counter += rand_jeop.value
    else:
        print("Tough luck, you lose all your money! The correct answer was: " + rand_jeop.answer)
        money_counter -= 100
        

#An epilogue is found at the end of a book; this opposite term is a short section at the beginning of a book

PLAY HERE:

Rules:
This game is about mainting your original money (2000$) and making as much as you can. The game ends when the player decides to stop playing or the money goes to 0. There are some special commands that the player can use in order to win:

- 'skip': skips the question (-20$)

- 'give context': reveals category, air data and value of question (halves the value to be gained if won, -150$ if still wrong answer)


- wrong answer: -100$
- right answer: + the value of the question

In [12]:
play_jeopardy(recent_jp)
print("current money: " + str(money_counter))

In poker, 3 & 5 of hearts, 10 of clubs, jack of hearts, king of hearts is called a busted this

Enter your answer: give context
category: HODGEPODGE
Air date: 2009-02-09
value: 400.0
Enter your answer using the context: skip
Question skipped
The correct answer was: a flush
current money: 2000
