# This is Jeopardy!

#### Overview

This project is distinct from typical tutorials as it presents a series of flexible requirements rather than a step-by-step guide. The project encourages creative problem-solving and resourcefulness, leveraging various online platforms, including Codecademy and GitHub, to address challenges encountered during development.

#### Project Goals

The project goal is to work and write several functions that investigate a dataset of _Jeopardy!_ questions and answers. Filter the dataset for topics that you're interested in, compute the average difficulty of those questions, and train to become the next Jeopardy champion!

1. We've provided a csv file containing data about the game show _Jeopardy!_ in a file named `jeopardy.csv`. Load the data into a DataFrame and investigate its contents. Try to print out specific columns.

   Note that in order to make this project as "real-world" as possible, we haven't modified the data at all - we're giving it to you exactly how we found it. As a result, this data isn't as "clean" as the datasets you normally find on Codecademy. More specifically, there's something odd about the column names. After you figure out the problem with the column names, you may want to rename them to make your life easier for the rest of the project.
   
   In order to display the full contents of a column, we've added this line of code for you:
   
   ```py
   pd.set_option('display.max_colwidth', None)
   ```

In [1]:
import pandas as pd
import random as random
pd.set_option('display.max_colwidth', None)
jeopardy = pd.read_csv('jeopardy.csv')
print(jeopardy.head())
jeopardy = jeopardy.rename(columns={' Show Number': 'Show Number', ' Air Date': 'Air Date', ' Round': 'Round', ' Category': 'Category', ' Value': 'Value', ' Question': 'Question', ' Answer': 'Answer'})



   Show Number    Air Date      Round                         Category  Value  \
0         4680  2004-12-31  Jeopardy!                          HISTORY   $200   
1         4680  2004-12-31  Jeopardy!  ESPN's TOP 10 ALL-TIME ATHLETES   $200   
2         4680  2004-12-31  Jeopardy!      EVERYBODY TALKS ABOUT IT...   $200   
3         4680  2004-12-31  Jeopardy!                 THE COMPANY LINE   $200   
4         4680  2004-12-31  Jeopardy!              EPITAPHS & TRIBUTES   $200   

                                                                                                      Question  \
0             For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory   
1  No. 2: 1912 Olympian; football star at Carlisle Indian School; 6 MLB seasons with the Reds, Giants & Braves   
2                     The city of Yuma in this state has a record average of 4,055 hours of sunshine each year   
3                         In 1963, live on "The Art Linkl

2. Write a function that filters the dataset for questions that contains all of the words in a list of words. For example, when the list `["King", "England"]` was passed to our function, the function returned a DataFrame of 49 rows. Every row had the strings `"King"` and `"England"` somewhere in its `" Question"`.

   Test your function by printing out the column containing the question of each row of the dataset.

In [2]:
def data_filter(data, words):
    my_lambda = lambda question: all(word.lower() in question.lower() for word in words)
    filtered = data.loc[data['Question'].apply(my_lambda)]
    return filtered
print(data_filter(jeopardy, ['trousers']))

        Show Number    Air Date             Round  \
3248           5084  2006-10-19         Jeopardy!   
3710           4398  2003-10-22         Jeopardy!   
8124           4724  2005-03-03         Jeopardy!   
14091          5110  2006-11-24         Jeopardy!   
16369          4344  2003-06-19         Jeopardy!   
27304          5887  2010-03-30  Double Jeopardy!   
29997          3724  2000-11-09  Double Jeopardy!   
32174          5568  2008-11-19         Jeopardy!   
41203          1886  1992-11-16  Double Jeopardy!   
50224          1285  1990-03-16         Jeopardy!   
55863          3286  1998-12-14         Jeopardy!   
58561           446  1986-05-26  Double Jeopardy!   
68279          4585  2004-07-09         Jeopardy!   
74954          6283  2012-01-04  Double Jeopardy!   
96889          3667  2000-07-11  Double Jeopardy!   
100566         5414  2008-03-06  Double Jeopardy!   
108327         6083  2011-02-09         Jeopardy!   
112736         3638  2000-05-31   Final Jeopar

4. We may want to eventually compute aggregate statistics, like `.mean()` on the `" Value"` column. But right now, the values in that column are strings. Convert the`" Value"` column to floats. If you'd like to, you can create a new column with float values.

   While most of the values in the `" Value"` column represent a dollar amount as a string, note that some do not &mdash; these values will need to be handled differently!

   Now that you can filter the dataset of question, use your new column that contains the float values of each question to find the "difficulty" of certain topics. For example, what is the average value of questions that contain the word `"King"`?
   
   Make sure to use the dataset that contains the float values as the dataset you use in your filtering function.

In [3]:
#Here I create a new column 'float_value' with removed '$' sign.
jeopardy['float_value'] = jeopardy['Value'].apply(lambda string: float(str(string).replace('$', '').replace(',', '')) if string != 'no value' and string != 'NaN' else 0)

print(jeopardy['float_value'].mean())

#Difficultness of this topic is around 771
filtered = data_filter(jeopardy, ['king'])
print(filtered['float_value'].mean())


            


739.9884755451067
771.8833850722094


5. Write a function that returns the count of unique answers to all of the questions in a dataset. For example, after filtering the entire dataset to only questions containing the word `"King"`, we could then find all of the unique answers to those questions. The answer "Henry VIII" appeared 55 times and was the most common answer.

In [8]:
# In this function we are counting the answers which have the word 'king' in the questions.
def unique_answers(data, answer):
    filter_1 = data_filter(data, answer)
    return filter_1.Answer.value_counts()
print(unique_answers(jeopardy, ['King']))



Answer
Henry VIII                   55
Solomon                      35
Richard III                  33
Louis XIV                    31
David                        30
                             ..
cardiac (in card I acted)     1
Henderson                     1
Computer                      1
Indians                       1
work                          1
Name: count, Length: 5268, dtype: int64


6. Explore from here! This is an incredibly rich dataset, and there are so many interesting things to discover. There are a few columns that we haven't even started looking at yet. Here are some ideas on ways to continue working with this data:

 * Investigate the ways in which questions change over time by filtering by the date. How many questions from the 90s use the word `"Computer"` compared to questions from the 2000s?
 * Is there a connection between the round and the category? Are you more likely to find certain categories, like `"Literature"` in Single Jeopardy or Double Jeopardy?
 * Build a system to quiz yourself. Grab random questions, and use the <a href="https://docs.python.org/3/library/functions.html#input">input</a> function to get a response from the user. Check to see if that response was right or wrong.

1. Investigate the ways in which questions change over time by filtering by the date. How many questions from the 90s use the word "Computer" compared to questions from the 2000s?

In [24]:
#At row below the 'Year' columns first 4 numbers are turned from string to integer
jeopardy['Year'] = jeopardy['Air Date'].str[:4].astype(int)
#this function is investigating have often certain words are encountered in 1990s and 2000s.
def investigate_word(data, words):
    filtered = data_filter(data, words)
    questions_90s = filtered[filtered['Year'] <= 2000]
    questions_90s = len(questions_90s[questions_90s.Year > 1990])
    questions_2000s = len(filtered[filtered.Year > 2000])
    return f'Questions with the word {words} in 1990s: {questions_90s}; Questions with the {words} in 2000s: {questions_2000s}'
print(investigate_word(jeopardy, ['Jeopardy']))


Questions with the word ['Jeopardy'] in 1990s: 58; Questions with the ['Jeopardy'] in 2000s: 139


2. Is there a connection between the round and the category? Are you more likely to find certain categories, like "Literature" in Single Jeopardy or Double Jeopardy?

In [25]:
#Here the code is comparing how much the word 'Literature' is encountered between the rounds - Jepardy and Double Jeopardy
filtered_df = data_filter(jeopardy, ['Literature'])[['Round', 'Category']]
double_jeopardy = filtered_df[filtered_df['Round'] == 'Double Jeopardy!'].shape[0]
single_jeopardy = filtered_df[filtered_df['Round'] == 'Jeopardy!'].shape[0] 
final_jeopardy = filtered_df[filtered_df['Round'] == 'Final Jeopardy!'].shape[0] 
tirbreaker = filtered_df[filtered_df['Round'] == 'Tiebreaker'].shape[0] 
filtered_df = (((double_jeopardy - single_jeopardy)/single_jeopardy)* 100)
print(f'The round "Double Jeopard!" is {filtered_df:.1f}% more encountered compared to "Jeopardy" for the word "Literature"'
      f'\nThe ration of Jeopardy to Double Jeopardy to Final Jeopardy to Tirbreaker is {single_jeopardy} : {double_jeopardy} : {final_jeopardy} : {tirbreaker} for the word "Literature".' )


The round "Double Jeopard!" is 60.0% more encountered compared to "Jeopardy" for the word "Literature"
The ration of Jeopardy to Double Jeopardy to Final Jeopardy to Tirbreaker is 55 : 88 : 8 : 0 for the word "Literature".


3. Build a system to quiz yourself. Grab random questions, and use the input function to get a response from the user. Check to see if that response was right or wrong.

In [14]:
#The code below is a jeopardy game which has input and testing whether it is right or wrong.
def quiz(data):
    number = random.randint(0, 216930)
    question = data.Question.iloc[number]
    point =0
    print(question)
    answer = input('Your response: ')
    if answer == data.Answer.iloc[number]:
        print("Your are right!\nBravo! :)")
        print(f'You received: {data.float_value.iloc[number]}')
        point += data.float_value.iloc[number]
    else:
        print("Wrong... :(")
        print(f'The right answer is: {data.Answer.iloc[number]}')
        point = 0
    print('|')
    return point

def game(data):
    score = 0
    for n in range(5):
        score += quiz(data)
    print(f'Thanks for playing your score is {score}')
game(jeopardy)

The 2 states' largest cities are Fargo & Sioux Falls, not these, their capitals
Your response: fa
Wrong... :(
The right answer is: Bismarck & Pierre
|
Spain's National Organization for the Blind maintains the touchy-feely Museo Tiflologico in this city
Your response: Madrid
Your are right!
Bravo! :)
You received: 400.0
|
"Write if you get work" was Ray Goulding's catchphrase as half of this duo
Your response: Slade
Wrong... :(
The right answer is: Bob and Ray
|
A keratotomy is an incision of this eye part; a keratectomy removes part of it
Your response: Cornea
Wrong... :(
The right answer is: the cornea
|
In 1889 this Elizabethport, New Jersey company electrified the sewing machine
Your response: Car
Wrong... :(
The right answer is: Singer
|
Thanks for playing your score is 400.0
