# Jeopardy Playground
## By: Kevin

Welcome! This is a fun little project using both Python and Pandas to explore a data set called *'jeopardy.csv'* This data set spans across 3 decades (1984-2012) and contains episode numbers, show dates, categories, questions, answers, and their associated values for the popular quiz show *Jeopardy.* I decided to explore this data set further because of its captivating intrigue. The data set is full of interesting things to discover and play around with!


Let's start by first setting up our environment, importing the *.csv* file, and inspecting it.

In [1]:
import random as rndm # We'll use this later for our little Jeopardy game.
import pandas as pd
# The statement below allows the output to not be cut off which is essentially for seeing the entire question.
pd.set_option("display.max_colwidth", None)

jeopardy = pd.read_csv("jeopardy.csv")

print(jeopardy.columns)
print(jeopardy.head(5))

Index(['Show Number', ' Air Date', ' Round', ' Category', ' Value',
       ' Question', ' Answer'],
      dtype='object')
   Show Number    Air Date      Round                         Category  Value  \
0         4680  2004-12-31  Jeopardy!                          HISTORY   $200   
1         4680  2004-12-31  Jeopardy!  ESPN's TOP 10 ALL-TIME ATHLETES   $200   
2         4680  2004-12-31  Jeopardy!      EVERYBODY TALKS ABOUT IT...   $200   
3         4680  2004-12-31  Jeopardy!                 THE COMPANY LINE   $200   
4         4680  2004-12-31  Jeopardy!              EPITAPHS & TRIBUTES   $200   

                                                                                                      Question  \
0             For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory   
1  No. 2: 1912 Olympian; football star at Carlisle Indian School; 6 MLB seasons with the Reds, Giants & Braves   
2                     The city of Yuma in this st

The column names have some unnecessary whitespace within them, so we'll have to rename them to make it easier on ourselves. We can use the method below to rename the columns without needing to create a new variable so long as we keep the list in the same order as the data set's columns.

In [2]:
jeopardy.columns = ["show_number", "air_date", "round", "category", "value", "question", "answer"]
print(jeopardy.columns)

Index(['show_number', 'air_date', 'round', 'category', 'value', 'question',
       'answer'],
      dtype='object')


Great! Let's take a look at how much data we're working with.

In [3]:
print(len(jeopardy))

216930


This data set contains 216,930 rows of data! It's quite a considerable size. Typically in Jeopardy, questions are organized into categories. Each question does indeed have an associated category that pertains to it. Let's create a variable to store each instance of a category! We'll use this later.

In [4]:
jeopardy_categories = jeopardy.category.unique()

print(len(jeopardy_categories))

27995


There are 27,995 categories with 216,930 questions spread across each of these categories! That's nearly 8 times as many questions as there are categories! With a data set as large as this, it would be useful if you could impose more filters to the questions aside from the category. We could implement basic search functionality to look for questions that match our specified words. We'll create a function, so we can reuse it as much as we'd like.

In [5]:
def search_questions(terms):
    query = lambda x: all(word.lower() in x.lower() for word in terms)
    return jeopardy.loc[jeopardy.question.apply(query)]

query1 = search_questions(["Massachusetts", "founded"])
print(query1)

        show_number    air_date             round                 category  \
8457           6135  2011-04-22  Double Jeopardy!          FATHER'S IN LAW   
22796          4465  2004-01-23         Jeopardy!              U.S. CITIES   
31491          4546  2004-05-17  Double Jeopardy!            EARLY AMERICA   
50558          5089  2006-10-26  Double Jeopardy!          COLLEGE COLLAGE   
84717          5143  2007-01-10         Jeopardy!           MARINE BIOLOGY   
142558         2669  1996-03-21  Double Jeopardy!                 MEDICINE   
152072         1651  1991-11-04  Double Jeopardy!        RELIGIOUS LEADERS   
172053         2826  1996-12-09  Double Jeopardy!  COLLEGES & UNIVERSITIES   
196404         2958  1997-06-11  Double Jeopardy!  COLLEGES & UNIVERSITIES   
214691         5619  2009-01-29         Jeopardy!       AMERICAN EDUCATION   

        value  \
8457     $400   
22796    $400   
31491    $400   
50558   $2000   
84717   $1000   
142558   $800   
152072   $400   
17205

Each question has a different value associated with it. The different values can be associated with how difficult the particular question is with higher values meaning higher difficulty. It would be useful to see the average value of your search query to get a sense of how difficult and rewarding it may be. Since we'll be working with numbers to accomplish this, we should first check if the data is in the format we need.

In [6]:
print(type(jeopardy.value[0]))

<class 'str'>


The code above checked the type of the first value within our DataFrame and it returned as type string. The values are formatted with '$' signs as well as commas. We need to remove these elements before we cast it to another type. There are also some values with an entry of 'None' because these are associated with the 'Final Jeopardy' round, and we'll need to account for this as well.

In [7]:
# The lambda function belows skips the leading '$' sign, retrieves the first digit and then removes
# the comma within and replaces it with nothing leading to a proper format before we convert it into a floating-point number.
# If the value is "None" then it is replaced with 0.
formatter = lambda num: float(num[1:].replace(',', '')) if num != "None" else 0
jeopardy["float_value"] = jeopardy.value.apply(formatter)

print(jeopardy.head(5))

   show_number    air_date      round                         category value  \
0         4680  2004-12-31  Jeopardy!                          HISTORY  $200   
1         4680  2004-12-31  Jeopardy!  ESPN's TOP 10 ALL-TIME ATHLETES  $200   
2         4680  2004-12-31  Jeopardy!      EVERYBODY TALKS ABOUT IT...  $200   
3         4680  2004-12-31  Jeopardy!                 THE COMPANY LINE  $200   
4         4680  2004-12-31  Jeopardy!              EPITAPHS & TRIBUTES  $200   

                                                                                                      question  \
0             For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory   
1  No. 2: 1912 Olympian; football star at Carlisle Indian School; 6 MLB seasons with the Reds, Giants & Braves   
2                     The city of Yuma in this state has a record average of 4,055 hours of sunshine each year   
3                         In 1963, live on "The Art Linkletter 

We created a new column named 'float_value' to insert our converted values into. It's best if we try not to directly modify the data itself, but rather transform it in a way that suits our needs. Let's calculate the average value of our previous query!

In [8]:
query1 = search_questions(["Massachusetts", "founded"])
print(query1.float_value.mean())

770.0


We made some pretty cool things so far. Now, let's create a way for us to play *Jeopardy* ourselves!

In [9]:
class PlayJeopardy:
    """Simulates three rounds of a quiz game in which there are four total answer choices
    and the player must guess the correct answer. The incorrect answers are pulled randomly
    from the data, so difficult can vary due to contextual information or relationships between the question and answers.
    
    TODO: I would like to further expand the class and implement a method where the player can choose a category
    that they would like their question to be based on."""
    
    # Three variables that contain questions, answers, and values based on the Jeopardy round.
    round1 = jeopardy[["question", "answer", "float_value"]] [jeopardy["round"] == "Jeopardy!"]
    round2 = jeopardy[["question", "answer", "float_value"]] [jeopardy["round"] == "Double Jeopardy!"]
    round3 = jeopardy[["question", "answer", "float_value"]] [jeopardy["round"] == "Final Jeopardy!"]
    
    def generate_answers(self, current_round, correct_answer):
        abcd = rndm.randint(0, 3)
        answers = []
            
        for i in range(4):
            # Correct answer is at index 0 or choice A
            if abcd == 0:
                if i == 0:
                    answers.append(correct_answer)
                    continue
                incorrect_answer = current_round['answer'].iloc[rndm.randint(0, len(current_round.answer))]
                # Check for answer duplicates and if any, generate new incorrect answer
                while incorrect_answer == answers[i - 1] or incorrect_answer == correct_answer:
                    incorrect_answer = current_round['answer'].iloc[rndm.randint(0, len(current_round.answer))]
                answers.append(incorrect_answer)
                
            # Correct answer is at index 1 or choice B
            elif abcd == 1:
                if i == 1:
                    answers.append(correct_answer)
                    continue
                incorrect_answer = current_round['answer'].iloc[rndm.randint(0, len(current_round.answer))]
                # Check if incorrect answers are the same as correct answer
                while incorrect_answer == correct_answer:
                    incorrect_answer = current_round['answer'].iloc[rndm.randint(0, len(current_round.answer))]
                if i >= 2:
                    # Check for incorrect answer duplicates
                    while incorrect_answer == answers[i - 1]:
                        incorrect_answer = current_round['answer'].iloc[rndm.randint(0, len(current_round.answer))]
                answers.append(incorrect_answer)
            
            # Correct answer is at index 2 or choice C
            elif abcd == 2:
                if i == 2:
                    answers.append(correct_answer)
                    continue
                incorrect_answer = current_round['answer'].iloc[rndm.randint(0, len(current_round.answer))]
                while incorrect_answer == correct_answer:
                    incorrect_answer = current_round['answer'].iloc[rndm.randint(0, len(current_round.answer))]
                if i >= 1:
                    while incorrect_answer == answers[i - 1]:
                        incorrect_answer = current_round['answer'].iloc[rndm.randint(0, len(current_round.answer))]
                answers.append(incorrect_answer)
            
            # Correct answer is at index 3 or choice D
            else:
                if i == 3:
                    answers.append(correct_answer)
                    continue
                incorrect_answer = current_round['answer'].iloc[rndm.randint(0, len(current_round.answer))]
                while incorrect_answer == correct_answer:
                    incorrect_answer = current_round['answer'].iloc[rndm.randint(0, len(current_round.answer))]
                if i >= 1:
                    while incorrect_answer == answers[i - 1]:
                        incorrect_answer = current_round['answer'].iloc[rndm.randint(0, len(current_round.answer))]
                answers.append(incorrect_answer)
                    
                
        print("A: " + answers[0] + "\nB: " + answers[1] + "\nC: " + answers[2] + "\nD: " + answers[3])
        return str(abcd)
        
    
    def start_game(self):
        current_round = 1
        current_earnings = 0.0
        in_game = True
        
        while in_game:
            if current_round == 1:
                print("Welcome to round 1 of Jeopardy!")
                # Randomly select the first question and its associated data
                round_details = self.round1.iloc[rndm.randint(0, len(self.round1)-1)]
                question = round_details.question
                answer = round_details.answer
                value = round_details.float_value
                print(question + " (${})".format(str(value)))

                # Generate answer choices, output to user, and save correct choice
                correct_choice = self.generate_answers(self.round1, answer)
                
            elif current_round == 2:
                print("Welcome to round 2 of Jeopardy!\nAll values are doubled!")
                round_details = self.round2.iloc[rndm.randint(0, len(self.round2)-1)]
                question = round_details.question
                answer = round_details.answer
                value = round_details.float_value
                print(question + " (${})".format(str(value)))
                
                correct_choice = self.generate_answers(self.round2, answer)
            
            else:
                print("Welcome to round 3 of Jeopardy!")
                print("This will be your final question.")
                print("If answered correctly, its value will be triple your current earnings!")
                round_details = self.round3.iloc[rndm.randint(0, len(self.round3)-1)]
                question = round_details.question
                answer = round_details.answer
                value = current_earnings
                print(question + " (${})".format(str(value * 3)))
                
                correct_choice = self.generate_answers(self.round3, answer)
                
        
            # Get player answer
            player_answer = input("What's the correct answer? Type 'A', 'B', 'C', or 'D': ").upper()
            
            # Check for incorrect input and prompt to try again
            checkInput = "ABCD"
            while not player_answer in checkInput:
                print("Incorrect choice. Please try again.")
                player_answer = input("What's the correct answer? Type 'A', 'B', 'C', or 'D': ").upper()

            # Replaces the player's answer with the corresponding index to determine the correct answer
            # If no instance of a letter is found in the player's answer then nothing happens which allows for this functionality
            player_answer = player_answer.replace("A", "0")
            player_answer = player_answer.replace("B", "1")
            player_answer = player_answer.replace("C", "2")
            player_answer = player_answer.replace("D", "3")

            # Check if player answered correctly
            if player_answer == correct_choice:
                match current_round:
                    case 1:
                        current_earnings += value
                        current_round += 1
                        print("Correct! You earned $" + str(value) + "\n")
                    case 2:
                        current_earnings += value * 2
                        current_round += 1
                        print("Correct! You earned $" + str(value*2) + "\n")
                    case 3:
                        current_earnings += current_earnings * 3
                        print("Correct!\nCongratulations! You've tripled your earnings!")
                        print("You walked away with ${}".format(str(current_earnings)))
                        in_game = False
                
            else:
                print("Incorrect. You walk away with $" + str(current_earnings))
                in_game = False
        
play_jeopardy = PlayJeopardy()
play_jeopardy.start_game()

Welcome to round 1 of Jeopardy!
(<a href="http://www.j-archive.com/media/2007-05-09_J_13.jpg" target="_blank">I'm Pete Carroll, head football coach at USC.</a>) Our 2 national titles at USC are still 4 short of this "animal" of an Alabama head coach, but we're working on that ($600.0)
A: McLean Stevenson
B: Arabian
C: Annette Bening
D: Bear Bryant
What's the correct answer? Type 'A', 'B', 'C', or 'D': D
Correct! You earned $600.0

Welcome to round 2 of Jeopardy!
All values are doubled!
The remains of a 3.2 million-year-old hominid, discovered in 1974, were nicknamed this, after a Beatles song ($1600.0)
A: Excalibur
B: Chris Farley
C: "O Captain!  My Captain! America"
D: Lucy
What's the correct answer? Type 'A', 'B', 'C', or 'D': A
Incorrect. You walk away with $600.0


# Conclusion

This data set is a lot of fun to work with as you can see. It's very rich and the possibilities that you can create with it are seemingly endless. I hope you enjoyed going through this notebook as much as I did. I'll be coming back and updating the notebook as I find the time to do so. Please, if you have any suggestions, criticisms, or feedback of any kind, let me hear it! I'm always looking to improve and I would love to hear what you think! Thanks for viewing my notebook!