# This is Jeopardy!

#### Project Goals

We will work to write several functions that investigate a dataset of _Jeopardy!_ questions and answers. Filter the dataset for interesting topics, compute the average difficulty of those questions, and train to become the next Jeopardy champion!

## Project Requirements


   
In order to display the full contents of a column, we've added this line of code:
   
   ```py
   pd.set_option('display.max_colwidth', None)
   ```

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None)

## loading the csv file

jeopardy = pd.read_csv('jeopardy.csv')

## renaming columns for easier usage

jeopardy.rename(columns={
    'Show Number': 'show_number',
    ' Air Date': 'air_date',
    ' Round': 'round',
    ' Category': 'category',
    ' Value': 'value',
    ' Question': 'question',
    ' Answer':'answer'
}, inplace=True)

## displaying the dataset model

print(jeopardy.head())

   show_number    air_date      round                         category value  \
0         4680  2004-12-31  Jeopardy!                          HISTORY  $200   
1         4680  2004-12-31  Jeopardy!  ESPN's TOP 10 ALL-TIME ATHLETES  $200   
2         4680  2004-12-31  Jeopardy!      EVERYBODY TALKS ABOUT IT...  $200   
3         4680  2004-12-31  Jeopardy!                 THE COMPANY LINE  $200   
4         4680  2004-12-31  Jeopardy!              EPITAPHS & TRIBUTES  $200   

                                                                                                      question  \
0             For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory   
1  No. 2: 1912 Olympian; football star at Carlisle Indian School; 6 MLB seasons with the Reds, Giants & Braves   
2                     The city of Yuma in this state has a record average of 4,055 hours of sunshine each year   
3                         In 1963, live on "The Art Linkletter 

2. Write a function that filters the dataset for questions that contains all of the words in a list of words. For example, when the list `["King", "England"]` was passed to our function, the function returned a DataFrame of 49 rows. Every row had the strings `"King"` and `"England"` somewhere in its `" Question"`.

   Test your function by printing out the column containing the question of each row of the dataset.

In [2]:
## filter function

def filter_questions(dataset, words):
    return dataset.loc[dataset['question'].apply(lambda question: all(word.lower() in question.lower() for word in words))]

## testing the filter function

print(filter_questions(jeopardy, ["King", "England"]).head())

       show_number    air_date             round               category  \
4953          3003  1997-09-24  Double Jeopardy!           "PH"UN WORDS   
6337          3517  1999-12-14  Double Jeopardy!                    Y1K   
9191          3907  2001-09-04  Double Jeopardy!         WON THE BATTLE   
11710         2903  1997-03-26  Double Jeopardy!       BRITISH MONARCHS   
13454         4726  2005-03-07         Jeopardy!  A NUMBER FROM 1 TO 10   

       value  \
4953    $200   
6337    $800   
9191    $800   
11710   $600   
13454  $1000   

                                                                                                     question  \
4953                 Both England's King George V & FDR put their stamp of approval on this "King of Hobbies"   
6337   In retaliation for Viking raids, this "Unready" king of England attacks Norse areas of the Isle of Man   
9191                 This king of England beat the odds to trounce the French in the 1415 Battle of Agincourt   


In [3]:
# adding a value column so that holds float values and not strings
jeopardy['float_value'] = jeopardy.value.apply(lambda value: float(value[1:].replace(',', '')) if value != 'None' else 0)

# filtering the dataset for the word "King"
filtered_jeopardy = filter_questions(jeopardy, ['King'])

# printing the average value for the filtered dataset
print(filtered_jeopardy.float_value.mean())





771.8833850722094


We will write a function that returns the count of unique answers to all of the questions in a dataset.

In [4]:
def unique_answer(filtered_dataset):
    # get a filtered version of jeopardy
    return filtered_dataset.groupby('answer').show_number.count().reset_index().sort_values(by='show_number', ascending=False).reset_index(drop=True).rename(columns={'show_number': 'number of answers'})
print(unique_answer(filtered_jeopardy))



                answer  number of answers
0           Henry VIII                 55
1              Solomon                 35
2          Richard III                 33
3            Louis XIV                 31
4                David                 30
...                ...                ...
5263     L. Frank Baum                  1
5264           L'chaim                  1
5265        Königsberg                  1
5266  Kung Pao Chicken                  1
5267           zombies                  1

[5268 rows x 2 columns]
