# This is Jeopardy!

#### Overview

This project is slightly different than others you have encountered thus far. Instead of a step-by-step tutorial, this project contains a series of open-ended requirements which describe the project you'll be building. There are many possible ways to correctly fulfill all of these requirements, and you should expect to use the internet, Codecademy, and/or other resources when you encounter a problem that you cannot easily solve.

#### Project Goals

You will work to write several functions that investigate a dataset of _Jeopardy!_ questions and answers. Filter the dataset for topics that you're interested in, compute the average difficulty of those questions, and train to become the next Jeopardy champion!

## Prerequisites

In order to complete this project, you should have completed the Pandas lessons in the <a href="https://www.codecademy.com/learn/paths/analyze-data-with-python">Analyze Data with Python Skill Path</a>. You can also find those lessons in the <a href="https://www.codecademy.com/learn/data-processing-pandas">Data Analysis with Pandas course</a> or the <a href="https://www.codecademy.com/learn/paths/data-science/">Data Scientist Career Path</a>.

Finally, the <a href="https://www.codecademy.com/learn/practical-data-cleaning">Practical Data Cleaning</a> course may also be helpful.

## Project Requirements

**1.** We've provided a csv file containing data about the game show _Jeopardy!_ in a file named `jeopardy.csv`. Load the data into a DataFrame and investigate its contents. Try to print out specific columns.

   Note that in order to make this project as "real-world" as possible, we haven't modified the data at all - we're giving it to you exactly how we found it. As a result, this data isn't as "clean" as the datasets you normally find on Codecademy. More specifically, there's something odd about the column names. After you figure out the problem with the column names, you may want to rename them to make your life easier for the rest of the project.
   
   In order to display the full contents of a column, we've added this line of code for you:
   
   ```py
   pd.set_option('display.max_colwidth', None)
   ```

In [1]:
# Setting parameters
import pandas as pd
pd.set_option('display.max_colwidth', None)

# Import the dataset into a DataFrame
jeopardy = pd.read_csv("jeopardy.csv")

# Preview the headers and data
print(jeopardy.head(10))

   Show Number    Air Date      Round                         Category  Value  \
0         4680  2004-12-31  Jeopardy!                          HISTORY   $200   
1         4680  2004-12-31  Jeopardy!  ESPN's TOP 10 ALL-TIME ATHLETES   $200   
2         4680  2004-12-31  Jeopardy!      EVERYBODY TALKS ABOUT IT...   $200   
3         4680  2004-12-31  Jeopardy!                 THE COMPANY LINE   $200   
4         4680  2004-12-31  Jeopardy!              EPITAPHS & TRIBUTES   $200   
5         4680  2004-12-31  Jeopardy!                   3-LETTER WORDS   $200   
6         4680  2004-12-31  Jeopardy!                          HISTORY   $400   
7         4680  2004-12-31  Jeopardy!  ESPN's TOP 10 ALL-TIME ATHLETES   $400   
8         4680  2004-12-31  Jeopardy!      EVERYBODY TALKS ABOUT IT...   $400   
9         4680  2004-12-31  Jeopardy!                 THE COMPANY LINE   $400   

                                                                                                        Ques

In [2]:
# After previewing, I can tell that the column names need to be standardized
jeopardy.columns = ["show_number", "air_date", "round", "category", "value", "question", "answer"]

# Lets' take a look at the DataFrame with its new column headings
print(jeopardy.head(10))

   show_number    air_date      round                         category value  \
0         4680  2004-12-31  Jeopardy!                          HISTORY  $200   
1         4680  2004-12-31  Jeopardy!  ESPN's TOP 10 ALL-TIME ATHLETES  $200   
2         4680  2004-12-31  Jeopardy!      EVERYBODY TALKS ABOUT IT...  $200   
3         4680  2004-12-31  Jeopardy!                 THE COMPANY LINE  $200   
4         4680  2004-12-31  Jeopardy!              EPITAPHS & TRIBUTES  $200   
5         4680  2004-12-31  Jeopardy!                   3-LETTER WORDS  $200   
6         4680  2004-12-31  Jeopardy!                          HISTORY  $400   
7         4680  2004-12-31  Jeopardy!  ESPN's TOP 10 ALL-TIME ATHLETES  $400   
8         4680  2004-12-31  Jeopardy!      EVERYBODY TALKS ABOUT IT...  $400   
9         4680  2004-12-31  Jeopardy!                 THE COMPANY LINE  $400   

                                                                                                        question  \
0  

In [45]:
# Let's look at the length of our columns and some other basic information
row_count = len(jeopardy["show_number"])
print("The DataFrame has {:,} rows.".format(row_count))

show_count = jeopardy.show_number.nunique()
print("The DataFrame comprises questions from {:,} shows.".format(show_count))

oldest_show = jeopardy.air_date.min()
print("The oldest show is from {}".format(oldest_show))

newest_show = jeopardy.air_date.max()
print("The most recent show is from {}.".format(newest_show))

value_type = jeopardy.value.dtype
print(value_type)
values_observed = jeopardy.value.unique()
print(values_observed)

rounds_observed = jeopardy["round"].unique()
print(rounds_observed)

The DataFrame has 216,930 rows.
The DataFrame comprises questions from 3,640 shows.
The oldest show is from 1984-09-10
The most recent show is from 2012-01-27.
object
['$200' '$400' '$600' '$800' '$2,000' '$1000' '$1200' '$1600' '$2000'
 '$3,200' 'no value' '$5,000' '$100' '$300' '$500' '$1,000' '$1,500'
 '$1,200' '$4,800' '$1,800' '$1,100' '$2,200' '$3,400' '$3,000' '$4,000'
 '$1,600' '$6,800' '$1,900' '$3,100' '$700' '$1,400' '$2,800' '$8,000'
 '$6,000' '$2,400' '$12,000' '$3,800' '$2,500' '$6,200' '$10,000' '$7,000'
 '$1,492' '$7,400' '$1,300' '$7,200' '$2,600' '$3,300' '$5,400' '$4,500'
 '$2,100' '$900' '$3,600' '$2,127' '$367' '$4,400' '$3,500' '$2,900'
 '$3,900' '$4,100' '$4,600' '$10,800' '$2,300' '$5,600' '$1,111' '$8,200'
 '$5,800' '$750' '$7,500' '$1,700' '$9,000' '$6,100' '$1,020' '$4,700'
 '$2,021' '$5,200' '$3,389' '$4,200' '$5' '$2,001' '$1,263' '$4,637'
 '$3,201' '$6,600' '$3,700' '$2,990' '$5,500' '$14,000' '$2,700' '$6,400'
 '$350' '$8,600' '$6,300' '$250' '$3,989' '$8

**2.** Write a function that filters the dataset for questions that contains all of the words in a list of words. For example, when the list `["King", "England"]` was passed to our function, the function returned a DataFrame of 49 rows. Every row had the strings `"King"` and `"England"` somewhere in its `" Question"`.

   Test your function by printing out the column containing the question of each row of the dataset.

In [4]:
# Let's start by making a list of possible question topics
question_words = ["guitar", "chess", "Yucatan"]

# Now let's try to build a function that finds questions with any of these terms
# First, we will combine the question_words with "or" to form a regular expression to use in our search
schema = "|".join(question_words)

# Next, let's use the schema to search the questions column
filtered_set = jeopardy[jeopardy.question.str.contains(schema)]

# Let's check and see if we got anything
print(filtered_set.head(10))

      show_number    air_date             round  \
224          3673  2000-07-19  Double Jeopardy!   
1421         3362  1999-03-30  Double Jeopardy!   
2107         5981  2010-09-20         Jeopardy!   
2264         5630  2009-02-13         Jeopardy!   
2410         3214  1998-07-16         Jeopardy!   
2644         6125  2011-04-08         Jeopardy!   
3097         4487  2004-02-24  Double Jeopardy!   
3454         5347  2007-12-04  Double Jeopardy!   
4062         2818  1996-11-27  Double Jeopardy!   
5869         4506  2004-03-22         Jeopardy!   

                                     category  value  \
224                   INTERNATIONAL SPORTSMEN   $800   
1421                           HEADS OF STATE  $1000   
2107                            COUNTRY MUSIC   $800   
2264                           CABLE CHANNELS   $200   
2410                   PRE-COLUMBIAN CULTURES   $500   
2644                      YES, THAT'S "WHITE"   $800   
3097                                   ROCK-Y 

In [5]:
# Let's see how long our set is
filtered_length = len(filtered_set["show_number"])
print("The filtered DataFrame contains {} questions.".format(filtered_length))

The filtered DataFrame contains 453 questions.


In [6]:
# Let's take a look at just 40 of the 453 questions
print(filtered_set["question"].head(40))


224                                                                                                                                                                                                                                                                  The Times of London estimates this chess player is taking home $20 mil. a year; that's some check, mate!
1421                                                                                                                                                                                                                                                                                               In 1964 Luxembourg's grand duchess abdicated in favor of this man, her son
2107                                                                                                                                                                                                                                                                        

3. Test your original function with a few different sets of words to try to find some ways your function breaks. Edit your function so it is more robust.

   For example, think about capitalization. We probably want to find questions that contain the word `"King"` or `"king"`.
   
   You may also want to check to make sure you don't find rows that contain substrings of your given words. For example, our function found a question that didn't contain the word `"king"`, however it did contain the word `"viking"` &mdash; it found the `"king"` inside `"viking"`. Note that this also comes with some drawbacks &mdash; you would no longer find questions that contained words like `"England's"`.

In [7]:
# We can observe that we are getting a lot of questions about royalty due to duchess getting through the filter
# We will need to make sure that our schema sees only "chess" and not "duchess"
# We can use \b on both sides of the format location to specify that the word cannot include beginning or ending characters
# However, if we do this, we will lose guitars and guitarist too.  So, let's just put \b on the front.
schema = "|".join(r'\b{}'.format(word) for word in question_words)

# We can redefine filtered_set to not care about letter case, and to treat the schema as a regular expression
filtered_set = jeopardy[jeopardy['question'].str.contains(schema, case=False, regex=True)]

# Let's test that everything still works
print(filtered_set.head(10))

      show_number    air_date             round  \
224          3673  2000-07-19  Double Jeopardy!   
2107         5981  2010-09-20         Jeopardy!   
2264         5630  2009-02-13         Jeopardy!   
2410         3214  1998-07-16         Jeopardy!   
2644         6125  2011-04-08         Jeopardy!   
3097         4487  2004-02-24  Double Jeopardy!   
5869         4506  2004-03-22         Jeopardy!   
5875         4506  2004-03-22         Jeopardy!   
5881         4506  2004-03-22         Jeopardy!   
5887         4506  2004-03-22         Jeopardy!   

                                     category  value  \
224                   INTERNATIONAL SPORTSMEN   $800   
2107                            COUNTRY MUSIC   $800   
2264                           CABLE CHANNELS   $200   
2410                   PRE-COLUMBIAN CULTURES   $500   
2644                      YES, THAT'S "WHITE"   $800   
3097                                   ROCK-Y  $2000   
5869  ROLLING STONE'S 100 GREATEST GUITARISTS 

In [8]:
# Looks good.  Now, let's take a look at the first 40 questions again and make sure there is no royalty.
print(filtered_set["question"].head(40))

224                                                                                                                                                                                                                                                                  The Times of London estimates this chess player is taking home $20 mil. a year; that's some check, mate!
2107                                                                                                                                                                                                                                                                           This sound, like that made by plucking a guitar, is the title of a 2009 album by George Strait
2264                                                                                                                                                                                                                                                                        

4. We may want to eventually compute aggregate statistics, like `.mean()` on the `" Value"` column. But right now, the values in that column are strings. Convert the`" Value"` column to floats. If you'd like to, you can create a new column with float values.

   While most of the values in the `" Value"` column represent a dollar amount as a string, note that some do not &mdash; these values will need to be handled differently!

   Now that you can filter the dataset of question, use your new column that contains the float values of each question to find the "difficulty" of certain topics. For example, what is the average value of questions that contain the word `"King"`?
   
   Make sure to use the dataset that contains the float values as the dataset you use in your filtering function.

In [9]:
# We need the value column to be integers or floats
# First we need to eliminate the "$" and "," characters, as well as convert "no value" to 0
# The easiest way is to make a dictionary of replacements
replacement_dict = {"$" : "", "," : "", "no value" : 0}

# We will create a new column in jeopardy that applies our replacement dictionary and converts the values to floats
jeopardy["value_float"] = jeopardy["value"].replace(replacement_dict, regex=True).astype(float)

# Verify it worked
print(jeopardy.head(10))


ValueError: could not convert string to float: '$200'

In [10]:
# It didn't work.  Let's try to apply the replacements and then see unique values.  
jeopardy["value_float"] = jeopardy["value"].replace(replacement_dict, regex=True)
print(jeopardy.value_float.unique())

['$200' '$400' '$600' '$800' '$2000' '$1000' '$1200' '$1600' '$3200' 0
 '$5000' '$100' '$300' '$500' '$1500' '$4800' '$1800' '$1100' '$2200'
 '$3400' '$3000' '$4000' '$6800' '$1900' '$3100' '$700' '$1400' '$2800'
 '$8000' '$6000' '$2400' '$12000' '$3800' '$2500' '$6200' '$10000' '$7000'
 '$1492' '$7400' '$1300' '$7200' '$2600' '$3300' '$5400' '$4500' '$2100'
 '$900' '$3600' '$2127' '$367' '$4400' '$3500' '$2900' '$3900' '$4100'
 '$4600' '$10800' '$2300' '$5600' '$1111' '$8200' '$5800' '$750' '$7500'
 '$1700' '$9000' '$6100' '$1020' '$4700' '$2021' '$5200' '$3389' '$4200'
 '$5' '$2001' '$1263' '$4637' '$3201' '$6600' '$3700' '$2990' '$5500'
 '$14000' '$2700' '$6400' '$350' '$8600' '$6300' '$250' '$3989' '$8917'
 '$9500' '$1246' '$6435' '$8800' '$2222' '$2746' '$10400' '$7600' '$6700'
 '$5100' '$13200' '$4300' '$1407' '$12400' '$5401' '$7800' '$1183' '$1203'
 '$13000' '$11600' '$14200' '$1809' '$8400' '$8700' '$11000' '$5201'
 '$1801' '$3499' '$5700' '$601' '$4008' '$50' '$2344' '$2811' 

In [11]:
# Interesting.  The dictionary worked for "," and "no value", but didn't work for "$".
# It seems that "$" is a special character in regular expressions, so I will need a break character in front of it
# Let's try that again
replacement_dict = {"\$" : "", "," : "", "no value" : 0}
jeopardy["value_float"] = jeopardy["value"].replace(replacement_dict, regex=True)
print(jeopardy.value_float.unique())

['200' '400' '600' '800' '2000' '1000' '1200' '1600' '3200' 0 '5000' '100'
 '300' '500' '1500' '4800' '1800' '1100' '2200' '3400' '3000' '4000'
 '6800' '1900' '3100' '700' '1400' '2800' '8000' '6000' '2400' '12000'
 '3800' '2500' '6200' '10000' '7000' '1492' '7400' '1300' '7200' '2600'
 '3300' '5400' '4500' '2100' '900' '3600' '2127' '367' '4400' '3500'
 '2900' '3900' '4100' '4600' '10800' '2300' '5600' '1111' '8200' '5800'
 '750' '7500' '1700' '9000' '6100' '1020' '4700' '2021' '5200' '3389'
 '4200' '5' '2001' '1263' '4637' '3201' '6600' '3700' '2990' '5500'
 '14000' '2700' '6400' '350' '8600' '6300' '250' '3989' '8917' '9500'
 '1246' '6435' '8800' '2222' '2746' '10400' '7600' '6700' '5100' '13200'
 '4300' '1407' '12400' '5401' '7800' '1183' '1203' '13000' '11600' '14200'
 '1809' '8400' '8700' '11000' '5201' '1801' '3499' '5700' '601' '4008'
 '50' '2344' '2811' '18000' '1777' '3599' '9800' '796' '3150' '20' '1810'
 '22' '9200' '1512' '8500' '585' '1534' '13800' '5001' '4238' '16400'
 

In [12]:
# Now that the characters have been removed, we can convert to float
jeopardy["value_float"] = jeopardy["value_float"].astype(float)

# Now make sure the rest of the DataFrame is unharmed by the change
print(jeopardy.head(10))

   show_number    air_date      round                         category value  \
0         4680  2004-12-31  Jeopardy!                          HISTORY  $200   
1         4680  2004-12-31  Jeopardy!  ESPN's TOP 10 ALL-TIME ATHLETES  $200   
2         4680  2004-12-31  Jeopardy!      EVERYBODY TALKS ABOUT IT...  $200   
3         4680  2004-12-31  Jeopardy!                 THE COMPANY LINE  $200   
4         4680  2004-12-31  Jeopardy!              EPITAPHS & TRIBUTES  $200   
5         4680  2004-12-31  Jeopardy!                   3-LETTER WORDS  $200   
6         4680  2004-12-31  Jeopardy!                          HISTORY  $400   
7         4680  2004-12-31  Jeopardy!  ESPN's TOP 10 ALL-TIME ATHLETES  $400   
8         4680  2004-12-31  Jeopardy!      EVERYBODY TALKS ABOUT IT...  $400   
9         4680  2004-12-31  Jeopardy!                 THE COMPANY LINE  $400   

                                                                                                        question  \
0  

In [13]:
# Let's Filter the jeopardy DataFrame again now that we have a new column
schema = "|".join(r'\b{}'.format(word) for word in question_words)

filtered_set = jeopardy[jeopardy['question'].str.contains(schema, case=False, regex=True)]

# Let's test that everything still works
print(filtered_set.head(10))

      show_number    air_date             round  \
224          3673  2000-07-19  Double Jeopardy!   
2107         5981  2010-09-20         Jeopardy!   
2264         5630  2009-02-13         Jeopardy!   
2410         3214  1998-07-16         Jeopardy!   
2644         6125  2011-04-08         Jeopardy!   
3097         4487  2004-02-24  Double Jeopardy!   
5869         4506  2004-03-22         Jeopardy!   
5875         4506  2004-03-22         Jeopardy!   
5881         4506  2004-03-22         Jeopardy!   
5887         4506  2004-03-22         Jeopardy!   

                                     category  value  \
224                   INTERNATIONAL SPORTSMEN   $800   
2107                            COUNTRY MUSIC   $800   
2264                           CABLE CHANNELS   $200   
2410                   PRE-COLUMBIAN CULTURES   $500   
2644                      YES, THAT'S "WHITE"   $800   
3097                                   ROCK-Y  $2000   
5869  ROLLING STONE'S 100 GREATEST GUITARISTS 

In [14]:
# Now let's use our filtered_set with float values to see counts of questions with our schema words
chess_values = filtered_set.loc[filtered_set["question"].str.contains("chess", case=False), "value_float"]
guitar_values = filtered_set.loc[filtered_set["question"].str.contains("guitar", case=False), "value_float"]
yucatan_values = filtered_set.loc[filtered_set["question"].str.contains("Yucatan", case=False), "value_float"]

print(len(chess_values))
print(len(guitar_values))
print(len(yucatan_values))

print(chess_values.mean())
print(guitar_values.mean())
print(yucatan_values.mean())

132
245
30
715.1515151515151
737.1428571428571
810.0


5. Write a function that returns the count of unique answers to all of the questions in a dataset. For example, after filtering the entire dataset to only questions containing the word `"King"`, we could then find all of the unique answers to those questions. The answer "Henry VIII" appeared 55 times and was the most common answer.

In [15]:
# I can modify what I just did to create a series of answers instead of values
chess_answers = filtered_set.loc[filtered_set["question"].str.contains("chess", case=False), "answer"]
guitar_answers = filtered_set.loc[filtered_set["question"].str.contains("guitar", case=False), "answer"]
yucatan_answers = filtered_set.loc[filtered_set["question"].str.contains("Yucatan", case=False), "answer"]

# Let's see the unique answers for chess
print(chess_answers.unique())

# Now let's see the answer frequency
chess_answer_freq = chess_answers.value_counts()
print(chess_answer_freq.head(5))

['Garry Kasparov' 'a white knight' 'Searching for Bobby Fischer'
 'The Seventh Seal' 'Rank and file' '64' 'Bobby Fischer' 'check' 'fork'
 'kingside' 'Ruy Lopez' 'the knight' 'the bishop' 'a stalemate' 'pinned'
 'the pawns' 'the king & queen' 'a gambit' 'Board games' 'Rook' 'pawn'
 'Lewis Carroll' 'checkmate & stalemate' 'an opening' 'castling'
 'a pawn shop' 'a computer (a supercomputer accepted)' 'Iran'
 'Phantom Of The Opera' 'Harry Potter' 'M*A*S*H' 'Pies' 'N' 'Check'
 'checkmate' 'pawns' 'grandmaster' 'white' 'Deep Blue' 'Castling'
 'Queen side rook' 'Boris Spassky' 'James Kraft' 'Grandmaster' 'a knight'
 'B (for Bishop)' 'Marcus Aurelius' '(Bobby) Fischer' 'a queen'
 'rooks, bishops & knights' 'a bishop' 'Ivan The Terrible' 'rook' 'black'
 'knight' 'diagonal' '16' 'Q' 'Humphrey Bogart' 'Camel' 'Computer'
 'Anatoly Karpov' 'Lauren Bacall' 'the Exchequer' 'Kasparov'
 'Fischer & Spassky' 'just plain B' 'Court stenographer/reporter'
 'King & pawn' 'a rook' 'Ivan the Terrible' 'things 

In [16]:
# It treats Grandmaster and grandmaster as different answers.
# Let's make the answer lists all lowercase and then try again.
chess_answers_lower = chess_answers.str.lower()
guitar_answers_lower = guitar_answers.str.lower()
yucatan_answers_lower = yucatan_answers.str.lower()

# Now let's see top five answer frequency for each list
chess_answer_freq = chess_answers_lower.value_counts()
guitar_answer_freq = guitar_answers_lower.value_counts()
yucatan_answer_freq = yucatan_answers_lower.value_counts()
print("The most common chess answers are:\n", chess_answer_freq.head(5))
print("The most common guitar answers are:\n", guitar_answer_freq.head(5))
print("The most common Yucatan answers are:\n", yucatan_answer_freq.head(5))

The most common chess answers are:
 answer
bobby fischer     11
garry kasparov     8
rook               6
grandmaster        6
check              5
Name: count, dtype: int64
The most common guitar answers are:
 answer
jimi hendrix          12
segovia                6
les paul               6
eric clapton           6
stevie ray vaughan     5
Name: count, dtype: int64
The most common Yucatan answers are:
 answer
mayans           3
caribbean sea    3
chichen itza     3
peninsulas       2
the mayans       2
Name: count, dtype: int64


6. Explore from here! This is an incredibly rich dataset, and there are so many interesting things to discover. There are a few columns that we haven't even started looking at yet. Here are some ideas on ways to continue working with this data:

 * Investigate the ways in which questions change over time by filtering by the date. How many questions from the 90s use the word `"Computer"` compared to questions from the 2000s?
 * Is there a connection between the round and the category? Are you more likely to find certain categories, like `"Literature"` in Single Jeopardy or Double Jeopardy?
 * Build a system to quiz yourself. Grab random questions, and use the <a href="https://docs.python.org/3/library/functions.html#input">input</a> function to get a response from the user. Check to see if that response was right or wrong.

In [17]:
# Let's see how the round column is divided, and how many questions are from each round
rounds = jeopardy["round"].unique()
print(rounds)
rounds_freq = jeopardy["round"].value_counts()
print(rounds_freq)

# Let's see what the 10 most frequently occuring question categories are
top_categories = jeopardy["category"].value_counts().head(10)
print(top_categories)

# We can get some answers by round more easily if we break the DataFrame up into a separate dfs for each round
jeopardy_round_df = jeopardy[jeopardy["round"] == "Jeopardy!"]
double_jeopardy_df = jeopardy[jeopardy["round"] == "Double Jeopardy!"]
final_jeopardy_df = jeopardy[jeopardy["round"] == "Final Jeopardy!"]
tiebreak_round_df = jeopardy[jeopardy["round"] == "Tiebreaker"]

# Let's see what the 10 most frequent question categories are by round (excluding tiebreak)
jeopardy_top_categories = jeopardy_round_df["category"].value_counts().head(10)
dbl_jeopardy_top_categories = double_jeopardy_df["category"].value_counts().head(10)
fin_jeopardy_top_categories = final_jeopardy_df["category"].value_counts().head(10)
print("The Top 10 most frequent question categories in the Jeopardy! round are:\n", jeopardy_top_categories)
print("The Top 10 most frequent question categories in the Double Jeopardy! round are:\n", dbl_jeopardy_top_categories)
print("The Top 10 most frequent question categories in the Final Jeopardy! round are:\n", fin_jeopardy_top_categories)

# Let's take a look at those 3 instances of tiebreak
tiebreak_details = tiebreak_round_df[["air_date", "category", "question", "answer", "value_float"]]
print(tiebreak_details)

# What is the most common Final Jeopardy! answer?
fin_jeopardy_top_answers = final_jeopardy_df["answer"].value_counts().head(10)
print("The top 10 most frequent Final Jeopardy! answers are:\n", fin_jeopardy_top_answers)

# What are the highest and lowest Final Jeopardy! question values?
fin_jeopardy_max = final_jeopardy_df["value_float"].max()
fin_jeopardy_min = final_jeopardy_df["value_float"].min()
print("The highest Final Jeopardy! question value was ${:.0f}, and the lowest was ${:.0f}".format(fin_jeopardy_max, fin_jeopardy_min))

['Jeopardy!' 'Double Jeopardy!' 'Final Jeopardy!' 'Tiebreaker']
round
Jeopardy!           107384
Double Jeopardy!    105912
Final Jeopardy!       3631
Tiebreaker               3
Name: count, dtype: int64
category
BEFORE & AFTER             547
SCIENCE                    519
LITERATURE                 496
AMERICAN HISTORY           418
POTPOURRI                  401
WORLD HISTORY              377
WORD ORIGINS               371
COLLEGES & UNIVERSITIES    351
HISTORY                    349
SPORTS                     342
Name: count, dtype: int64
The Top 10 most frequent question categories in the Jeopardy! round are:
 category
STUPID ANSWERS         255
POTPOURRI              255
SPORTS                 253
ANIMALS                233
AMERICAN HISTORY       227
SCIENCE                217
STATE CAPITALS         210
TELEVISION             200
U.S. CITIES            195
BUSINESS & INDUSTRY    185
Name: count, dtype: int64
The Top 10 most frequent question categories in the Double Jeopardy! rou

## Solution

7. Compare your program to our <a href="https://content.codecademy.com/PRO/independent-practice-projects/jeopardy/jeopardy_solution.zip">sample solution code</a> - remember, that your program might look different from ours (and probably will) and that's okay!

8. Great work! Visit <a href="https://discuss.codecademy.com/t/this-is-jeopardy-challenge-project-python-pandas/462365">our forums</a> to compare your project to our sample solution code. You can also learn how to host your own solution on GitHub so you can share it with other learners! Your solution might look different from ours, and that's okay! There are multiple ways to solve these projects, and you'll learn more by seeing others' code.

### Working Game Model (first completed version)

In [36]:
# I will try to make an interactive program for asking and answering Jeopardy questions here
balance = 0
question_counter = 0
end_game = False
while not end_game:
    start = input("Would you like to try to answer a question?  Please type 'yes' or 'no':  ")
    if start.lower() == "no":
        break
    elif start.lower() == "yes":
        while question_counter < 7:
            question_counter += 1
            question = jeopardy_round_df.sample()
            print("The Category is:  ", question["category"].iloc[0])
            print("For ", question["value"].iloc[0])
            print("The question is:  ", question["question"].iloc[0])
            response = input("Please type your answer here.  Good luck!  ")
            if response.lower() in question["answer"].iloc[0].lower() or question["answer"].iloc[0].lower() in response.lower():
                balance += question["value_float"].iloc[0]
                print(f"Correct! Here is your updated balance:  ${balance:.0f}")
            else:
                balance -= question["value_float"].iloc[0]
                print(f"I'm sorry.  That is not correct.  The correct answer is {question['answer'].iloc[0]}")
                print(f"Your updated balance is:  ${balance:.0f}")
            proceed = input("Would you like another question?  Please type 'yes' or 'no':  ")
            if proceed.lower() == "no":
                end_game = True
                break
            else:
                print("Here is your next question!")
                continue
        if question_counter == 7:
            if balance <= 0:
                print(f"Thank you for playing Jeopardy!  Your game is over with a final balance of ${balance:.0f}")
                print("Better luck next time!")
                end_game = True
                break
            else:
                print("\n\n\n")
                print("CONGRATULATIONS!  YOU HAVE ADVANCED TO DOUBLE JEOPARDY!")
                print("\n\n\n")
        while question_counter < 14 and not end_game:
            question_counter +=1
            question = double_jeopardy_df.sample()
            print("The Category is:  ", question["category"].iloc[0])
            print("For ", question["value"].iloc[0])
            print("The question is:  ", question["question"].iloc[0])
            response = input("Please type your answer here.  Good luck!  ")
            if response.lower() in question["answer"].iloc[0].lower() or question["answer"].iloc[0].lower() in response.lower():
                balance += question["value_float"].iloc[0]
                print(f"Correct! Here is your updated balance:  ${balance:.0f}")
            else:
                balance -= question["value_float"].iloc[0]
                print(f"I'm sorry.  That is not correct.  The correct answer is {question['answer'].iloc[0]}")
                print(f"Your updated balance is:  ${balance:.0f}")
            proceed = input("Would you like another question?  Please type 'yes' or 'no':  ")
            if proceed.lower() == "no":
                end_game = True
                break
            else:
                print("Here is your next question!")
                continue
        if question_counter == 14:
            if balance <= 0:
                print(f"Thank you for playing Jeopardy!  Your game is over with a final balance of ${balance:.0f}")
                print("Better luck next time!")
                end_game = True
                break
            else:
                print("\n\n\n")
                print("CONGRATULATIONS!  YOU HAVE ADVANCED TO FINAL JEOPARDY!")
                print("\n\n\n")
                print("This will be the final question.")
                question = final_jeopardy_df.sample()
                print("The Category is:  ", question["category"].iloc[0])
                while True:
                    wager = input(f"Please type your wager for this category.  It cannot exceed your balance of ${balance:.0f}.  Type wager here:  ")
                    if not wager.isdigit():
                        print("Please enter a valid number.")
                        continue
                    wager = int(wager)
                    if wager > balance:
                        print(f"Your wager must be no more than your balance of ${balance:.0f}")
                        continue
                    break
                print(f"Your wager of ${wager} has been accepted.")
                print("Here is the Final Jeopardy! question: ", question["question"].iloc[0])
                response = input("Please type your answer here.  Good luck!  ")
                if response.lower() in question["answer"].iloc[0].lower() or question["answer"].iloc[0].lower() in response.lower():
                    balance += wager
                    print("Correct!  You are our Final Jeopardy winner!")
                    print(f"Congratulations!  Your final winning balance is ${balance:.0f}")
                    print("We will see you again tomorrow!")
                    end_game = True
                    break
                else:
                    balance -= wager
                    print(f"I'm sorry.  The correct answer is {question['answer'].iloc[0]}")
                    print(f"Your final balance is ${balance:.0f}")
                    end_game = True
                    break
    else:
        print("You must enter 'yes' or 'no'.  No other input can be accepted")
 # This Code works!  Tested multiple times.  Now I will try a more DRY version using a function call.       

Would you like to try to answer a question?  Please type 'yes' or 'no':  yes
The Category is:   SAY IT IN...
For  $600
The question is:   (Sarah of the Clue Crew in Amsterdam)  On the canal cruise, one Dutch word will be "zwemvest", which is this item
Please type your answer here.  Good luck!  lifevest
I'm sorry.  That is not correct.  The correct answer is life vest
Your updated balance is:  $-600
Would you like another question?  Please type 'yes' or 'no':  no


### A Function that will eliminate some of the need to repeat myself in the next code block

In [43]:
# This is just me trying to make a Function to use as a substitute for the code that gets repeated
def play_jeopardy(dataframe):
    global question_counter, balance
    question_counter += 1
    question = dataframe.sample()
    print("The Category is:  ", question["category"].iloc[0])
    print("For ", question["value"].iloc[0])
    print("The question is:  ", question["question"].iloc[0])
    response = input("Please type your answer here.  Good luck!  ")
    # We check response in answer OR answer in response.  
    # Helps correctly handle things like last name only versus first and last name.  Either will get credit as correct.
    if response.lower() in question["answer"].iloc[0].lower() or question["answer"].iloc[0].lower() in response.lower():
        balance += question["value_float"].iloc[0]
        print(f"Correct! Here is your updated balance:  ${balance:.0f}")
    else:
        balance -= question["value_float"].iloc[0]
        print(f"I'm sorry.  That is not correct.  The correct answer is {question['answer'].iloc[0]}")
        print(f"Your updated balance is:  ${balance:.0f}")
    proceed = input("Would you like another question?  Please type 'yes' or 'no':  ")
    return proceed


### A Working Game Model that is more concise in its code

In [44]:
# Now I will try to integrate my function for a more DRY code
# We need a couple starting values initiated
balance = 0
question_counter = 0

# This is a critical piece. Need a way to end the outer loop from inside the inner loop.  End_game is a Flag variable.
end_game = False
while not end_game:
    start = input("Would you like to try to answer a question?  Please type 'yes' or 'no':  ")
    if start.lower() == "no":
        break
    elif start.lower() == "yes":
          
        while question_counter < 7:
            proceed = play_jeopardy(jeopardy_round_df)
            if proceed.lower() == "no":
                # End_game = True prepares to end the outer loop
                end_game = True
                # Break ends the inner loop. It now looks at the outer loop and sees that end_game is now True.
                break
            else:
                print("Here is your next question!")
                continue
        if question_counter == 7:
            if balance <= 0:
                print(f"Thank you for playing Jeopardy!  Your game is over with a final balance of ${balance:.0f}")
                print("Better luck next time!")
                end_game = True
                break
            else:
                print("\n\n\n")
                print("CONGRATULATIONS!  YOU HAVE ADVANCED TO DOUBLE JEOPARDY!")
                print("\n\n\n")
        
        while question_counter < 14 and not end_game:
            proceed = play_jeopardy(double_jeopardy_df)
            if proceed.lower() == "no":
                end_game = True
                break
            else:
                print("Here is your next question!")
                continue
        if question_counter == 14:
            if balance <= 0:
                print(f"Thank you for playing Jeopardy!  Your game is over with a final balance of ${balance:.0f}")
                print("Better luck next time!")
                end_game = True
                break
            else:
                print("\n\n\n")
                print("CONGRATULATIONS!  YOU HAVE ADVANCED TO FINAL JEOPARDY!")
                print("\n\n\n")
                print("This will be the final question.")
                question = final_jeopardy_df.sample()
                print("The Category is:  ", question["category"].iloc[0])
                
                # The while loop below collects a valid wager from the player
                while True:
                    wager = input(f"Please type your wager for this category.  It cannot exceed your balance of ${balance:.0f}.  Type wager here:  ")
                    if not wager.isdigit():
                        print("Please enter a valid number.")
                        continue
                    wager = int(wager)
                    if wager > balance:
                        print(f"Your wager must be no more than your balance of ${balance:.0f}")
                        continue
                    break
                print(f"Your wager of ${wager} has been accepted.")
                
                print("Here is the Final Jeopardy! question: ", question["question"].iloc[0])
                response = input("Please type your answer here.  Good luck!  ")
                if response.lower() in question["answer"].iloc[0].lower() or question["answer"].iloc[0].lower() in response.lower():
                    balance += wager
                    print("Correct!  You are our Final Jeopardy winner!")
                    print(f"Congratulations!  Your final winning balance is ${balance:.0f}")
                    print("We will see you again tomorrow!")
                    end_game = True
                    break
                else:
                    balance -= wager
                    print(f"I'm sorry.  The correct answer is {question['answer'].iloc[0]}")
                    print(f"Your final balance is ${balance:.0f}")
                    end_game = True
                    break
    else:
        print("You must enter 'yes' or 'no'.  No other input can be accepted")
  

Would you like to try to answer a question?  Please type 'yes' or 'no':  yes
The Category is:   AIRPORTS
For  $1000
The question is:   At this European city's airport, you can have a sandwich & of course a coffee at the Cafe Karen Blixen
Please type your answer here.  Good luck!  Hamburg
I'm sorry.  That is not correct.  The correct answer is Copenhagen
Your updated balance is:  $-1000
Would you like another question?  Please type 'yes' or 'no':  yes
Here is your next question!
The Category is:   INTELLIGENT FILMS
For  $100
The question is:   A 7-year-old chess prodigy is at the center of this 1993 film that mentions an American chess prodigy in its title
Please type your answer here.  Good luck!  chasing Bobby Fischer
I'm sorry.  That is not correct.  The correct answer is Searching for Bobby Fischer
Your updated balance is:  $-1100
Would you like another question?  Please type 'yes' or 'no':  yes
Here is your next question!
The Category is:   WEATHER
For  $400
The question is:   "Sea