# Jeopardy Game Analysis

#### *Kim Kirk* <br> *July 16, 2020*

## Synopsis

A descriptive data analysis was performed on Jeopardy game data. 216,930 rows were imported, cleaned, and analyzed. Analysis of the difficulty level of specific topics in the game Jeopary by determining the average value of questions containing a given keyword, with the idea that identifying these high value questions can be used as a successful strategy to win the game.

### Data Processing

Import required libraries and data set. Explore the data set.

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None)

jeopardy = pd.read_csv('jeopardy.csv', skipinitialspace=True)

print(jeopardy.head(30))
print(jeopardy.info())
print(jeopardy)

print(jeopardy['Question'].head(15))
#renaming columns for ease of use moving forward
jeopardy.rename(columns={'Show Number': 'show_number', 'Air Date': 'air_date', 'Round': 'round', 'Category': 'category', 'Value': 'value','Question': 'question', 'Answer': 'answer'}, inplace = True)

    Show Number    Air Date             Round  \
0          4680  2004-12-31         Jeopardy!   
1          4680  2004-12-31         Jeopardy!   
2          4680  2004-12-31         Jeopardy!   
3          4680  2004-12-31         Jeopardy!   
4          4680  2004-12-31         Jeopardy!   
5          4680  2004-12-31         Jeopardy!   
6          4680  2004-12-31         Jeopardy!   
7          4680  2004-12-31         Jeopardy!   
8          4680  2004-12-31         Jeopardy!   
9          4680  2004-12-31         Jeopardy!   
10         4680  2004-12-31         Jeopardy!   
11         4680  2004-12-31         Jeopardy!   
12         4680  2004-12-31         Jeopardy!   
13         4680  2004-12-31         Jeopardy!   
14         4680  2004-12-31         Jeopardy!   
15         4680  2004-12-31         Jeopardy!   
16         4680  2004-12-31         Jeopardy!   
17         4680  2004-12-31         Jeopardy!   
18         4680  2004-12-31         Jeopardy!   
19         4680  200

Finding specific words in a Question in the data set. A check is performed to ensure the function is working properly. 

In [2]:
   
def find_these_words(a_list):
    words_with_matches = []
    for word in a_list:
        word = word.lower()
        for ind in jeopardy.index:
            if word in jeopardy['question'][ind].lower():
                found_row = jeopardy.iloc[ind]
                words_with_matches.append(found_row)
    matches_dataframe = pd.DataFrame(words_with_matches, columns=jeopardy.columns) 
    return matches_dataframe

#check  
data_frame_results = find_these_words(['Galileo', 'yuma'])
print(data_frame_results.iloc[0:3])


      show_number    air_date             round           category value  \
0            4680  2004-12-31         Jeopardy!            HISTORY  $200   
4419         1276  1990-03-05  Double Jeopardy!        THE PLANETS  $600   
6239         5307  2007-10-09         Jeopardy!  OUT OF THIS WORLD  $400   

                                                                                                   question  \
0          For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory   
4419  The 4 largest moons of this planet are called Galilean satellites after Galileo, who saw them in 1610   
6239                                       Galileo was the first person to see the rings around this planet   

          answer  
0     Copernicus  
4419     Jupiter  
6239      Saturn  


Add a new column calculated from the "value" column in the jeopardy data frame to float so aggregates can be performed on it. A check is performed to ensure the new column is populated correctly.

In [3]:
import re

jeopardy['value_as_float'] = jeopardy.value.apply(lambda x: #strip string of $ character 
                                                 float(0) if x == 'None' else float(re.sub('^\$|\,', '', x)))

#check
print('New column is populated')
jeopardy['value_as_float'][0:5]                                                 
 


New column is populated


0    200.0
1    200.0
2    200.0
3    200.0
4    200.0
Name: value_as_float, dtype: float64

### Exploratory Data Analysis

Find difficulty level of specific topics, determined by average value of questions containing a given keyword. 

In [4]:

rows_with_king = find_these_words(['King'])
print('Average value of questions with keyword "King" in them: ')
print(round(rows_with_king.value_as_float.mean(), 2))

rows_with_yuma = find_these_words(['Yuma'])
print('Average value of questions with keyword "Yuma" in them: ')
print(round(rows_with_yuma.value_as_float.mean(), 2))


Average value of questions with keyword "King" in them: 
771.88
Average value of questions with keyword "Yuma" in them: 
860.0


Clearly, questions with the word "Yuma" in them have, on average, a greater value than questions with the word "King" in them.

Identify the count of unique answers to questions with a specific keyword by finding the number of unique counts for the answer values for the given question. Note that the game Jeopardy requires a correct answer to the given question, which is a full answer; this means that a partial answer to a question is not sufficient to be deemed a correct answer to the question. For example, if the full answer to the question given is 'Olympia, Washington' then 'Olympia' is not considered a correct answer because it is a partial answer. In this way, unique counts of values for the 'answer' column means the value must be an exact match and not a partial match. A check is done to ensure the function is working correctly.

In [5]:
import numpy

def unique_answers(a_dataframe):
    number_of_unique = a_dataframe.answer.nunique()
    return number_of_unique

for_rows_with_kings = unique_answers(rows_with_king)
   
#check
print('Number of rows with keyword in "question" column \
is more than number of unique values found for the "answer" column for those questions')
print("\n")
print(len(rows_with_king) > for_rows_with_kings)


Number of rows with keyword in "question" column is more than number of unique values found for the "answer" column for those questions


True


### Conclusion

Based on the analysis, there is a pattern of questions with a specific keyword that have a greater value on average than others. For example, questions with the word "Yuma" in them have, on average, a greater value than questions with the word "King" in them. Additional analysis should be performed for other keywords to identify the set of questions that have high value; these keywords can then be used a focal point for devising a strategy to win the game. 