# This Is Jeopardy!

## Project Goals

You will work to write several functions that investigate a dataset of Jeopardy! questions and answers. Filter the dataset for topics that you’re interested in, compute the average difficulty of those questions, and train to become the next Jeopardy champion!

In [2]:
import pandas as pd

## Project Requirements

We’ve provided a csv file containing data about the game show *Jeopardy!* in a file named `jeopardy.csv`. Load the data into a DataFrame and investigate its contents. Try to print out specific columns.

Note that in order to make this project as “real-world” as possible, we haven’t modified the data at all — we’re giving it to you exactly how we found it. As a result, this data isn’t as “clean” as the datasets you normally find on Codecademy. More specifically, there’s something odd about the column names. After you figure out the problem with the column names, you may want to rename them to make your life easier the rest of the project.

In order to disply the full contents of a column, we’ve added this line of code to the top of your file:

In [3]:
pd.set_option('display.max_colwidth', None)

In [4]:
jeopardy_data = pd.read_csv('jeopardy.csv')
jeopardy_data.head()

Unnamed: 0,Show Number,Air Date,Round,Category,Value,Question,Answer
0,4680,2004-12-31,Jeopardy!,HISTORY,$200,"For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory",Copernicus
1,4680,2004-12-31,Jeopardy!,ESPN's TOP 10 ALL-TIME ATHLETES,$200,"No. 2: 1912 Olympian; football star at Carlisle Indian School; 6 MLB seasons with the Reds, Giants & Braves",Jim Thorpe
2,4680,2004-12-31,Jeopardy!,EVERYBODY TALKS ABOUT IT...,$200,"The city of Yuma in this state has a record average of 4,055 hours of sunshine each year",Arizona
3,4680,2004-12-31,Jeopardy!,THE COMPANY LINE,$200,"In 1963, live on ""The Art Linkletter Show"", this company served its billionth burger",McDonald's
4,4680,2004-12-31,Jeopardy!,EPITAPHS & TRIBUTES,$200,"Signer of the Dec. of Indep., framer of the Constitution of Mass., second President of the United States",John Adams


In [5]:
jeopardy_data.columns

Index(['Show Number', ' Air Date', ' Round', ' Category', ' Value',
       ' Question', ' Answer'],
      dtype='object')

In [6]:
# rename columns to remove starting space
jeopardy_data.columns = jeopardy_data.columns.str.lstrip()
jeopardy_data.columns

Index(['Show Number', 'Air Date', 'Round', 'Category', 'Value', 'Question',
       'Answer'],
      dtype='object')

In [7]:
jeopardy_data[['Category', 'Answer']].head(10)

Unnamed: 0,Category,Answer
0,HISTORY,Copernicus
1,ESPN's TOP 10 ALL-TIME ATHLETES,Jim Thorpe
2,EVERYBODY TALKS ABOUT IT...,Arizona
3,THE COMPANY LINE,McDonald's
4,EPITAPHS & TRIBUTES,John Adams
5,3-LETTER WORDS,the ant
6,HISTORY,the Appian Way
7,ESPN's TOP 10 ALL-TIME ATHLETES,Michael Jordan
8,EVERYBODY TALKS ABOUT IT...,Washington
9,THE COMPANY LINE,Crate & Barrel


In [8]:
jeopardy_data[['Category', 'Round']].tail(10)

Unnamed: 0,Category,Round
216920,"""T"" BIRDS",Double Jeopardy!
216921,AUTHORS IN THEIR YOUTH,Double Jeopardy!
216922,QUOTATIONS,Double Jeopardy!
216923,WORLD CAPITALS,Double Jeopardy!
216924,OFF-BROADWAY,Double Jeopardy!
216925,RIDDLE ME THIS,Double Jeopardy!
216926,"""T"" BIRDS",Double Jeopardy!
216927,AUTHORS IN THEIR YOUTH,Double Jeopardy!
216928,QUOTATIONS,Double Jeopardy!
216929,HISTORIC NAMES,Final Jeopardy!


**Variable**: Each as a separate column

**Observation**: Each as a separate row

In [9]:
len(jeopardy_data)

216930

In [10]:
duplicates = jeopardy_data.duplicated()
duplicates.value_counts()

False    216930
dtype: int64

In [11]:
jeopardy_data['Round'].unique()

array(['Jeopardy!', 'Double Jeopardy!', 'Final Jeopardy!', 'Tiebreaker'],
      dtype=object)

In [12]:
# jeopardy_data['Category'].unique()
jeopardy_data['Category'].value_counts().head(20)

BEFORE & AFTER             547
SCIENCE                    519
LITERATURE                 496
AMERICAN HISTORY           418
POTPOURRI                  401
WORLD HISTORY              377
WORD ORIGINS               371
COLLEGES & UNIVERSITIES    351
HISTORY                    349
SPORTS                     342
U.S. CITIES                339
WORLD GEOGRAPHY            338
BODIES OF WATER            327
ANIMALS                    324
STATE CAPITALS             314
BUSINESS & INDUSTRY        311
ISLANDS                    301
WORLD CAPITALS             300
U.S. GEOGRAPHY             299
RELIGION                   297
Name: Category, dtype: int64

Write a function that filters the dataset for questions that contains all of the words in a list of words. For example, when the list `["King", "England"]` was passed to our function, the function returned a DataFrame of 152 rows. Every row had the strings `"King"` and `"England"` somewhere in its `" Question"`.

Note that in this example, we found 152 rows by filtering the *entire* dataset. You can download the entire dataset at the start or end of this project. The dataset used on Codecademy is only a fraction of the dataset so you won’t find as many rows.

Test your function by printing out the column containing the question of each row of the dataset.

In [13]:
test_find = jeopardy_data[jeopardy_data['Question'].str.contains('last')]
# test_find.head()
test_find

Unnamed: 0,Show Number,Air Date,Round,Category,Value,Question,Answer
0,4680,2004-12-31,Jeopardy!,HISTORY,$200,"For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory",Copernicus
280,4931,2006-02-06,Double Jeopardy!,SCIENCE,$1200,"(<a href=""http://www.j-archive.com/media/2006-02-06_DJ_13.jpg"" target=""_blank"">Sarah of the Clue Crew reads from the pole vault at Duke University's track in Durham, NC.</a>) In bending an elastic solid, stress is the force causing deformation & this is the 6-letter term for <a href=""http://www.j-archive.com/media/2006-02-06_DJ_13a.jpg"" target=""_blank"">the deformation</a>",strain
439,6037,2010-12-07,Jeopardy!,LET'S HIT IT,$800,Everlast makes these that come in speed and heavy varieties,punching bags
679,2735,1996-06-21,Jeopardy!,SHAKESPEAREAN LAST SCENES,$400,"Puck's last speech in this play begins, ""If we shadows have offended, think but this--and all is mended""",A Midsummer Night's Dream
685,2735,1996-06-21,Jeopardy!,SHAKESPEAREAN LAST SCENES,$500,"In the last scene of this comedy, Petruchio wins a bet that he has the most obedient wife",The Taming of the Shrew
...,...,...,...,...,...,...,...
216652,3940,2001-10-19,Jeopardy!,LAST BUT NOT LEAST,$400,Vadim Bakatin became the last head of this intelligence service in 1991,KGB
216658,3940,2001-10-19,Double Jeopardy!,AMERICAN HISTORIC EVENTS,$200,"On May 15, 1963 Gordon Cooper became the last American to do this alone",last solo American to go into space
216807,5070,2006-09-29,Final Jeopardy!,NATIONAL CAPITALS,,"This city's website calls it ""the last divided capital in Europe""",Nicosia
216833,5195,2007-03-23,Jeopardy!,THE DIRECTOR'S CHAIR,$1000,"""La Voce della Luna"" was the last movie by this great Italian director",(Federico) Fellini


In [14]:
# cc_test = jeopardy_data[jeopardy_data['Question'].str.contains('King','England')]
cc_test = jeopardy_data[(jeopardy_data['Question'].str.contains('King')) & (jeopardy_data['Question'].str.contains('England'))]
# cc_test['Question']
cc_test.head()

Unnamed: 0,Show Number,Air Date,Round,Category,Value,Question,Answer
4953,3003,1997-09-24,Double Jeopardy!,"""PH""UN WORDS",$200,"Both England's King George V & FDR put their stamp of approval on this ""King of Hobbies""",Philately (stamp collecting)
14912,2832,1996-12-17,Jeopardy!,WORLD HISTORY,$100,"This country's King Louis IV was nicknamed ""Louis From Overseas"" because he was raised in England",France
21511,4650,2004-11-19,Jeopardy!,"THE ""O.C.""",$1000,this man and his son ruled England following the execution of King Charles I,Oliver Cromwell
23810,4862,2005-11-01,Jeopardy!,NAME THE YEAR,$400,William the Conqueror was crowned King of England in Westminster Abbey on Christmas Day in this year,1066
27555,1799,1992-05-28,Double Jeopardy!,HISTORIC IN-LAWS,$600,This member of the Medici family was the mother-in-law of England's King Charles I,Marie de Medici


In [15]:
len(cc_test)

49

In [16]:
cc_test[['Question']].head()

Unnamed: 0,Question
4953,"Both England's King George V & FDR put their stamp of approval on this ""King of Hobbies"""
14912,"This country's King Louis IV was nicknamed ""Louis From Overseas"" because he was raised in England"
21511,this man and his son ruled England following the execution of King Charles I
23810,William the Conqueror was crowned King of England in Westminster Abbey on Christmas Day in this year
27555,This member of the Medici family was the mother-in-law of England's King Charles I


Hint: You could use a lambda function to do this. Our lambda function uses the `all` function (https://docs.python.org/3/library/functions.html#all) to check if all elements of the input list are in the question.

In [17]:
def jeopardy_filter(dataset, words):
#     def all(words):
#         for word in words:
#             if not word:
#                 return False
#         return True
# don't need to define all function, can just use it
    filter = lambda question_str: all(word in question_str for word in words)
    return dataset.loc[dataset['Question'].apply(filter)]

# how do you make an argument for the 'Question' label?

f_test = jeopardy_filter(jeopardy_data, ['King', 'England'])
f_test.head()

Unnamed: 0,Show Number,Air Date,Round,Category,Value,Question,Answer
4953,3003,1997-09-24,Double Jeopardy!,"""PH""UN WORDS",$200,"Both England's King George V & FDR put their stamp of approval on this ""King of Hobbies""",Philately (stamp collecting)
14912,2832,1996-12-17,Jeopardy!,WORLD HISTORY,$100,"This country's King Louis IV was nicknamed ""Louis From Overseas"" because he was raised in England",France
21511,4650,2004-11-19,Jeopardy!,"THE ""O.C.""",$1000,this man and his son ruled England following the execution of King Charles I,Oliver Cromwell
23810,4862,2005-11-01,Jeopardy!,NAME THE YEAR,$400,William the Conqueror was crowned King of England in Westminster Abbey on Christmas Day in this year,1066
27555,1799,1992-05-28,Double Jeopardy!,HISTORIC IN-LAWS,$600,This member of the Medici family was the mother-in-law of England's King Charles I,Marie de Medici


In [18]:
print(len(f_test))

49


Test your original function with a few different sets of words to try to find some ways your function breaks. Edit your function so it is more robust.

For example, think about capitalization. We probably want to find questions that contain the word `"King"` or `"king"`.

You may also want to check to make sure you don’t find rows that contain substrings of your given words. For example, our function found a question that didn’t contain the word `"king"`, however it did contain the word `"viking"` — it found the `"king"` inside `"viking"`. Note that this also comes with some drawbacks — you would no longer find questions that contained words like `"England's"`.


In [19]:
def jeopardy_filter(dataset, words):
    filter = lambda question_str: all(' ' + word.lower() + ' ' in question_str.lower() for word in words)
    return dataset.loc[dataset['Question'].apply(filter)]

test_f2 = jeopardy_filter(jeopardy_data, ['King', 'England'])
test_f2.head()

Unnamed: 0,Show Number,Air Date,Round,Category,Value,Question,Answer
6337,3517,1999-12-14,Double Jeopardy!,Y1K,$800,"In retaliation for Viking raids, this ""Unready"" king of England attacks Norse areas of the Isle of Man",Ethelred
9191,3907,2001-09-04,Double Jeopardy!,WON THE BATTLE,$800,This king of England beat the odds to trounce the French in the 1415 Battle of Agincourt,Henry V
13454,4726,2005-03-07,Jeopardy!,A NUMBER FROM 1 TO 10,$1000,It's the number that followed the last king of England named William,4
18076,3227,1998-09-22,Double Jeopardy!,WORLD HISTORY,$1000,In 1199 this crusader king of England was mortally wounded while besieging the castle of Chalus,Richard the Lionhearted
19168,3109,1998-02-19,Jeopardy!,HISTORIC WORLD LEADERS,$300,"He was the only king of England to have ""The Great"" tacked on to his name",Alfred


We may want to eventually compute aggregate statistics, like `.mean()` on the `" Value"` column. But right now, the values in that column are strings. Convert the `" Value"` column to floats. If you’d like to, you can create a new column with the float values.

Now that you can filter the dataset of question, use your new column that contains the float values of each question to find the “difficulty” of certain topics. For example, what is the average value of questions that contain the word `"King"`?

Make sure to use the dataset that contains the float values as the dataset you use in your filtering function.

In [20]:
jeopardy_data["Float Value"] = jeopardy_data["Value"].apply(lambda x: float(x[1:].replace(',','')) if x != "None" else 0)

f_difficulty = jeopardy_filter(jeopardy_data, ["King"])
f_difficulty["Float Value"].mean()

806.9708846584547

Write a function that returns the count of the unique answers to all of the questions in a dataset. For example, after filtering the entire dataset to only questions containing the word `"King"`, we could then find all of the unique answers to those questions. The answer “Henry VIII” appeared 31 times and was the most common answer.

In [21]:
def unique_answers(dataset, words):
    filtered = jeopardy_filter(dataset, words)
    answer = filtered['Answer'].value_counts()
    return answer
    
unique_answers(jeopardy_data, ['King'])

Henry VIII                           31
Richard III                          20
Sweden                               18
Norway                               18
Solomon                              17
                                     ..
Bad, Bad Leroy Brown                  1
Elephants                             1
Deborah Kerr                          1
Ruritania                             1
a pyramid (the pyramids accepted)     1
Name: Answer, Length: 1040, dtype: int64

In [22]:
unique_answers(jeopardy_data, ['king', 'england'])

William the Conqueror                        3
Richard the Lionhearted                      3
Alfred                                       2
King Edward VIII                             2
Richard the Lionheart                        2
Henry VIII                                   2
William III                                  1
Georgia                                      1
IV                                           1
Richard I                                    1
James II                                     1
Richard Branson                              1
Old King Cole                                1
Ethelred                                     1
James (I)                                    1
Battle of Hastings (which Harold II lost)    1
William                                      1
Westminster Abbey                            1
William II                                   1
George I                                     1
Wales                                        1
William of Or

In [23]:
unique_answers(jeopardy_data, ['last'])

Nicholas II                        6
Hawaii                             6
13                                 5
Portugal                           5
Yorktown                           5
                                  ..
Gale Sayers                        1
Van Gogh                           1
Walter Reed Army Medical Center    1
Vernon & Irene Castle              1
Grigori Alexandrovich Potemkin     1
Name: Answer, Length: 1999, dtype: int64

Explore from here! This is an incredibly rich dataset, and there are so many interesting things to discover. There are a few columns that we haven’t even started looking at yet. Here are some ideas on ways to continue working with this data:

-    Investigate the ways in which questions change over time by filtering by the date. How many questions from the 90s use the word `"Computer"` compared to questions from the 2000s?
-    Is there a connection between the round and the category? Are you more likely to find certain categories, like `"Literature"` in Single Jeopardy or Double Jeopardy?
-    Build a system to quiz yourself. Grab random questions, and use the `input` (https://docs.python.org/3/library/functions.html#input) function to get a response from the user. Check to see if that response was right or wrong. Note that you can’t do this on the Codecademy platform — to do this, download the data, and write and run the code on your own computer!

In [62]:
# what are the typical categories and/or answers for final jeopardy? or double jeopardy?

# only look at the data for jeopardy_data['Final Jeopardy!'] or jeopardy_data['Double Jeopardy!']
# groupby
# then value_counts() the categories? And/or answers?
# Need a loop to check if value in Round is 'Final Jeopardy!', then add that value to a new dataframe?
# Then work with that dataframe?

# def jeopardy_rounds(dataset, rounds):
#     answer = dataset[rounds].value_counts()
#     return answer
    
# jeopardy_rounds(jeopardy_data, 'Final Jeopardy!')


# testing
# jeopardy_data.loc[jeopardy_data['Round'] == 'Final Jeopardy!', 'Category'].unique()
# jeopardy_data.loc[jeopardy_data['Round'] == 'Final Jeopardy!', 'Category'].unique().tolist()
# jeopardy_data.loc[jeopardy_data['Round'] == 'Final Jeopardy!', 'Category'].values[0:50]
# jeopardy_data[jeopardy_data['Round'] == 'Final Jeopardy!']['Category'].item()
# jeopardy_data[['Category', 'Round']]