 # This Is Jeopardy!

We will work to write several functions that investigate a dataset of Jeopardy! questions and answers. Filter the dataset for topics that we’re interested in, compute the average difficulty of those questions, and train to become the next Jeopardy champion!

In [1]:
import pandas as pd


In [2]:
# Loading the data and investigating it
df = pd.read_csv('jeopardy.csv')

In [6]:
df.head()

Unnamed: 0,Show Number,Air Date,Round,Category,Value,Question,Answer
0,4680,2004-12-31,Jeopardy!,HISTORY,$200,"For the last 8 years of his life, Galileo was ...",Copernicus
1,4680,2004-12-31,Jeopardy!,ESPN's TOP 10 ALL-TIME ATHLETES,$200,No. 2: 1912 Olympian; football star at Carlisl...,Jim Thorpe
2,4680,2004-12-31,Jeopardy!,EVERYBODY TALKS ABOUT IT...,$200,The city of Yuma in this state has a record av...,Arizona
3,4680,2004-12-31,Jeopardy!,THE COMPANY LINE,$200,"In 1963, live on ""The Art Linkletter Show"", th...",McDonald's
4,4680,2004-12-31,Jeopardy!,EPITAPHS & TRIBUTES,$200,"Signer of the Dec. of Indep., framer of the Co...",John Adams


In [4]:
print(df.dtypes)

Show Number     int64
 Air Date      object
 Round         object
 Category      object
 Value         object
 Question      object
 Answer        object
dtype: object


In [5]:
# Renaming misformatted columns
df.columns = ['Show Number','Air Date','Round','Category','Value','Question','Answer']
print(df.dtypes)

Show Number     int64
Air Date       object
Round          object
Category       object
Value          object
Question       object
Answer         object
dtype: object


In [8]:
# Filling missing values is Answer column
df = df.fillna(value ={'Answer':"Null"})
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 216930 entries, 0 to 216929
Data columns (total 7 columns):
 #   Column       Non-Null Count   Dtype 
---  ------       --------------   ----- 
 0   Show Number  216930 non-null  int64 
 1   Air Date     216930 non-null  object
 2   Round        216930 non-null  object
 3   Category     216930 non-null  object
 4   Value        216930 non-null  object
 5   Question     216930 non-null  object
 6   Answer       216930 non-null  object
dtypes: int64(1), object(6)
memory usage: 11.6+ MB
None


In [9]:
# Converting Value column from string to int
df.Value = df.Value.replace('None',0)
df.Value = df.Value.replace('[\$,]','',regex = True)
df.Value = pd.to_numeric(df.Value)
print(df.Value.head())


0    200
1    200
2    200
3    200
4    200
Name: Value, dtype: int64


In [7]:
# Converting Date column from string to datetime
df['Air Date'] = pd.to_datetime(df['Air Date'])
df['Air Date'].head


<bound method NDFrame.head of 0        2004-12-31
1        2004-12-31
2        2004-12-31
3        2004-12-31
4        2004-12-31
            ...    
216925   2006-05-11
216926   2006-05-11
216927   2006-05-11
216928   2006-05-11
216929   2006-05-11
Name: Air Date, Length: 216930, dtype: datetime64[ns]>

In [11]:
# replacing Hyperlinks from quetions column
df.Question = df.Question.replace('<a.*</a>','',regex = True)
print(df.Question.head())

0               For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory
1    No. 2: 1912 Olympian; football star at Carlisle Indian School; 6 MLB seasons with the Reds, Giants & Braves
2                       The city of Yuma in this state has a record average of 4,055 hours of sunshine each year
3                           In 1963, live on "The Art Linkletter Show", this company served its billionth burger
4       Signer of the Dec. of Indep., framer of the Constitution of Mass., second President of the United States
Name: Question, dtype: object


In [12]:
print(df.Round.unique())

['Jeopardy!' 'Double Jeopardy!' 'Final Jeopardy!' 'Tiebreaker']


In [9]:
# Filtering a dataset by a list of words
def filter_question(data,words):
    # Lowercases all words in the list of words as well as the questions.
    # Returns true is in the all of the words list appear in the question.
    filter = lambda x : all(word.lower() in x.lower() for word in words)
    # Applies the labmda function to the Question column and returns the rows where the function returned True
    return data.loc[data.Question.apply(filter)]


filtered = filter_question(df,['king',"england's"])
print(filtered.Question.nunique)

<bound method IndexOpsMixin.nunique of 4953      Both England's King George V & FDR put their s...
27555     This member of the Medici family was the mothe...
28570     The IV king of this name (following the Norman...
41357     England's King Henry VIII had 3 wives named Ca...
43122                The father of England's King Edward VI
51565     He wrote several anthems, including "The King ...
52184     (<a href="http://www.j-archive.com/media/2010-...
54982     When England's Queen Anne died, this German be...
56600     This city known for its 24-hour auto race was ...
57516     Famous (& rather insulting) adjective for Engl...
59780     In literature, restoration refers to the perio...
69991     Chronological lists of England's kings are mis...
71808     Number of the William who was England's "Sailo...
74235     Barons & churchmen drew up this 1215 document ...
79269     He called himself "king of Great Britain" afte...
80113     England's King Charles II was known by this "j...
8

In [33]:
new_filtered = filter_question(df.Value ,['king'])
print(new_filtered)
print(new_filtered.Value.mean())
print(new_filtered.Question.value_counts())
print(new_filtered.Answer.unique)

        Show Number   Air Date             Round                  Category  \
34             4680 2004-12-31  Double Jeopardy!               "X"s & "O"s   
56             5957 2010-07-06         Jeopardy!             GEOGRAPHY "E"   
72             5957 2010-07-06         Jeopardy!              LET'S BOUNCE   
113            5957 2010-07-06  Double Jeopardy!                 SEE & SAY   
177            3673 2000-07-19         Jeopardy!        FLAGS OF THE WORLD   
...             ...        ...               ...                       ...   
216777         5070 2006-09-29  Double Jeopardy!           ANCIENT HISTORY   
216787         5070 2006-09-29  Double Jeopardy!  TALES OF E.T.A. HOFFMANN   
216789         5070 2006-09-29  Double Jeopardy!           ANCIENT HISTORY   
216856         5195 2007-03-23  Double Jeopardy!          HAIL TO THE CHEF   
216916         4999 2006-05-11  Double Jeopardy!                QUOTATIONS   

        Value  \
34        400   
56        200   
72        60