# Jeopardy Project
### Working with a given data set to analyze freely
[Link to project page](https://www.codecademy.com/paths/data-science/tracks/dscp-data-manipulation-with-pandas/modules/dacp-data-manipulation-challenge-projects/projects/this-is-jeopardy)

Set some defaults

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None)

Format table

In [2]:
jeopardy_df = pd.read_csv('jeopardy.csv', header=0, names=['show_number', 'air_date', 'round', 'category', 'cash_value', 'question', 'answer'], parse_dates=['air_date'])

Format columns into seperate columns to make working with easier.  In this case, converting the questions to lower case and formatting the values to float or defaulting the to 0 if not possible to convert to integer.

In [3]:
jeopardy_df['question'] = jeopardy_df.apply(lambda x: x.question.lower(), axis=1)

In [4]:
def convert_value_to_int(string):
    stripped = string.replace("$","").replace(",","")
    try:
        return int(stripped)
    except:
        return 0
jeopardy_df['cash_value'] = jeopardy_df.apply(lambda row: convert_value_to_int(row['cash_value']), axis=1)

In [5]:
print(jeopardy_df.head())

   show_number   air_date      round                         category  \
0         4680 2004-12-31  Jeopardy!                          HISTORY   
1         4680 2004-12-31  Jeopardy!  ESPN's TOP 10 ALL-TIME ATHLETES   
2         4680 2004-12-31  Jeopardy!      EVERYBODY TALKS ABOUT IT...   
3         4680 2004-12-31  Jeopardy!                 THE COMPANY LINE   
4         4680 2004-12-31  Jeopardy!              EPITAPHS & TRIBUTES   

   cash_value  \
0         200   
1         200   
2         200   
3         200   
4         200   

                                                                                                      question  \
0             for the last 8 years of his life, galileo was under house arrest for espousing this man's theory   
1  no. 2: 1912 olympian; football star at carlisle indian school; 6 mlb seasons with the reds, giants & braves   
2                     the city of yuma in this state has a record average of 4,055 hours of sunshine each year   
3 

Writing a function to filter data.  A previous version I attempted would only return the question, but think is much better as it returns all the other columns and these can be used to perform further analysis.

In [6]:
def filtered_data(df, lst):
    filter = lambda x: all(word.lower() in x.lower() for word in lst)
    return df.loc[df['question'].apply(filter)]

filtered = filtered_data(jeopardy_df, ['king'])
print(filtered.head())

    show_number   air_date             round                    category  \
34         4680 2004-12-31  Double Jeopardy!                 "X"s & "O"s   
40         4680 2004-12-31  Double Jeopardy!  DR. SEUSS AT THE MULTIPLEX   
50         4680 2004-12-31  Double Jeopardy!  DR. SEUSS AT THE MULTIPLEX   
56         5957 2010-07-06         Jeopardy!               GEOGRAPHY "E"   
72         5957 2010-07-06         Jeopardy!                LET'S BOUNCE   

    cash_value  \
34         400   
40        1200   
50        2000   
56         200   
72         600   

                                                                                                                                                                             question  \
34                                                                              around 100 a.d. tacitus wrote a book on how this art of persuasive speaking had declined since cicero   
40  <a href="http://www.j-archive.com/media/2004-12-31_dj_26.mp3

Using the filtered table, we can perform other methods on them to return information.  Here I want to know the count of the unique answers.

In [7]:
def count_answers(data):
    return data['answer'].value_counts()

count_answers(filtered)

Henry VIII            55
Solomon               35
Richard III           33
Louis XIV             31
David                 30
                      ..
AMC                    1
Montgomery & Selma     1
Mike Wallace           1
All thumbs             1
Heidi                  1
Name: answer, Length: 5268, dtype: int64

Finally, I wanted to calculate the mean value of the cash values in Jeopardy.

In [8]:
print(jeopardy_df.cash_value.mean())

739.9884755451067
