# This is Jeopardy!

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None)

In [2]:
jeopardy = pd.read_csv('jeopardy.csv')
jeopardy.head()

Unnamed: 0,Show Number,Air Date,Round,Category,Value,Question,Answer
0,4680,2004-12-31,Jeopardy!,HISTORY,$200,"For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory",Copernicus
1,4680,2004-12-31,Jeopardy!,ESPN's TOP 10 ALL-TIME ATHLETES,$200,"No. 2: 1912 Olympian; football star at Carlisle Indian School; 6 MLB seasons with the Reds, Giants & Braves",Jim Thorpe
2,4680,2004-12-31,Jeopardy!,EVERYBODY TALKS ABOUT IT...,$200,"The city of Yuma in this state has a record average of 4,055 hours of sunshine each year",Arizona
3,4680,2004-12-31,Jeopardy!,THE COMPANY LINE,$200,"In 1963, live on ""The Art Linkletter Show"", this company served its billionth burger",McDonald's
4,4680,2004-12-31,Jeopardy!,EPITAPHS & TRIBUTES,$200,"Signer of the Dec. of Indep., framer of the Constitution of Mass., second President of the United States",John Adams


In [3]:
print(jeopardy.columns)

Index(['Show Number', ' Air Date', ' Round', ' Category', ' Value',
       ' Question', ' Answer'],
      dtype='object')


In [4]:
jeopardy.rename(columns={' Air Date': 'Air Date', ' Round':'Round', ' Category':'Category', ' Value':'Value',
       ' Question':'Question', ' Answer':'Answer'}, inplace=True)
print(jeopardy.columns)

Index(['Show Number', 'Air Date', 'Round', 'Category', 'Value', 'Question',
       'Answer'],
      dtype='object')


2. Write a function that filters the dataset for questions that contains all of the words in a list of words. For example, when the list `["King", "England"]` was passed to our function, the function returned a DataFrame of 49 rows. Every row had the strings `"King"` and `"England"` somewhere in its `" Question"`.

   Test your function by printing out the column containing the question of each row of the dataset.

In [5]:
def filter_dataset(ds, words):
    filter_ds = lambda x: all(word in x for word in words)
    return ds.loc[ds["Question"].apply(filter_ds)]

filtered1 = filter_dataset(jeopardy, ["King", "England"])
print(filtered1["Question"])

4953                                                                                                                                                                                                                                                                      Both England's King George V & FDR put their stamp of approval on this "King of Hobbies"
14912                                                                                                                                                                                                                                                            This country's King Louis IV was nicknamed "Louis From Overseas" because he was raised in England
21511                                                                                                                                                                                                                                                                                 this man and

3. Test your original function with a few different sets of words to try to find some ways your function breaks. Edit your function so it is more robust.

   For example, think about capitalization. We probably want to find questions that contain the word `"King"` or `"king"`.
   
   You may also want to check to make sure you don't find rows that contain substrings of your given words. For example, our function found a question that didn't contain the word `"king"`, however it did contain the word `"viking"` &mdash; it found the `"king"` inside `"viking"`. Note that this also comes with some drawbacks &mdash; you would no longer find questions that contained words like `"England's"`.

In [6]:
filtered2 = filter_dataset(jeopardy, ["king", "england"])
print(filtered2["Question"])

Series([], Name: Question, dtype: object)


In [7]:
filtered3 = filter_dataset(jeopardy, ["Italy"])
print(filtered3["Question"])

6                                             Built in 312 B.C. to link Rome & the South of Italy, it's still in use today
235                      Bordering Italy, Austria, Hungary & Croatia, it's one of the world's newest independent countries
809                                                 Japan had an emperor, Russia, a czar & Italy was ruled by one of these
949                                                            Descriptive term for the flag of Italy & the flag of France
1527                                                             This astronomer was born in Pisa, Italy February 15, 1564
                                                                ...                                                       
213596                 The Laurentian Library in Florence, Italy was founded in the 15th century by members of this family
213735    While Marco Polo was meeting the Khan, he missed the 10th birthday party of this "Divine Comedy" author in Italy
213775          

In [8]:
def filter_dataset(ds, words):
    filter_ds = lambda x: all(word.lower() in x.lower() for word in words)
    return ds.loc[ds["Question"].apply(filter_ds)]

In [9]:
filtered4 = filter_dataset(jeopardy, ["king", "england"])
print(filtered4["Question"])

4953                    Both England's King George V & FDR put their stamp of approval on this "King of Hobbies"
6337      In retaliation for Viking raids, this "Unready" king of England attacks Norse areas of the Isle of Man
9191                    This king of England beat the odds to trounce the French in the 1415 Battle of Agincourt
11710               This Scotsman, the first Stuart king of England, was called "The Wisest Fool in Christendom"
13454                                       It's the number that followed the last king of England named William
                                                           ...                                                  
208295        In 1066 this great-great grandson of Rollo made what some call the last Viking invasion of England
208742                      Dutch-born king who ruled England jointly with Mary II & is a tasty New Zealand fish
213870                In 1781 William Herschel discovered Uranus & initially named it after this

4. We may want to eventually compute aggregate statistics, like `.mean()` on the `" Value"` column. But right now, the values in that column are strings. Convert the`" Value"` column to floats. If you'd like to, you can create a new column with float values.

   While most of the values in the `" Value"` column represent a dollar amount as a string, note that some do not &mdash; these values will need to be handled differently!

   Now that you can filter the dataset of question, use your new column that contains the float values of each question to find the "difficulty" of certain topics. For example, what is the average value of questions that contain the word `"King"`?
   
   Make sure to use the dataset that contains the float values as the dataset you use in your filtering function.

In [10]:
jeopardy.Value

0             $200
1             $200
2             $200
3             $200
4             $200
            ...   
216925       $2000
216926       $2000
216927       $2000
216928       $2000
216929    no value
Name: Value, Length: 216930, dtype: object

In [11]:
jeopardy['Value'] = jeopardy['Value'].apply(lambda x: float(x[1:].replace(',','')) if x!='no value' else 0)


In [12]:
jeopardy.Value

0          200.0
1          200.0
2          200.0
3          200.0
4          200.0
           ...  
216925    2000.0
216926    2000.0
216927    2000.0
216928    2000.0
216929       0.0
Name: Value, Length: 216930, dtype: float64

In [13]:
print(jeopardy['Value'].mean())

739.9884755451067


In [14]:
filtered5 = filter_dataset(jeopardy, ["King"])
print(filtered5['Value'].mean())

771.8833850722094


5. Write a function that returns the count of unique answers to all of the questions in a dataset. For example, after filtering the entire dataset to only questions containing the word `"King"`, we could then find all of the unique answers to those questions. The answer "Henry VIII" appeared 55 times and was the most common answer.

In [15]:
def unique_answers(ds):
    return ds.Answer.value_counts()

#.value_counts() Counts the values in the "title" Series. 
#This results in a new Series, where the index is the "title" and the values are how often each occurred. 
#The series is ordered in descending order from most frequently occurring title.

print(unique_answers(filtered5))

Answer
Henry VIII                   55
Solomon                      35
Richard III                  33
Louis XIV                    31
David                        30
                             ..
cardiac (in card I acted)     1
Henderson                     1
Computer                      1
Indians                       1
work                          1
Name: count, Length: 5268, dtype: int64


In [16]:
print(unique_answers(jeopardy))

Answer
China                             216
Australia                         215
Japan                             196
Chicago                           194
France                            193
                                 ... 
a triathalon                        1
The King of Comedy                  1
The Admirable Chrichton             1
a tribunal                          1
Grigori Alexandrovich Potemkin      1
Name: count, Length: 88267, dtype: int64


Investigate the ways in which questions change over time by filtering by the date. How many questions from the 90s use the word `"Computer"` compared to questions from the 2000s?

In [17]:
computer_filter = filter_dataset(jeopardy, ["Computer"])

In [18]:
computer_filter['Air Date'] = pd.to_datetime(computer_filter['Air Date'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  computer_filter['Air Date'] = pd.to_datetime(computer_filter['Air Date'])


In [19]:
start_date_90s = pd.to_datetime('1991-01-01')
end_date_90s = pd.to_datetime('1999-12-31')
start_date_2000s = pd.to_datetime('2000-01-01')
end_date_2000s = pd.to_datetime('2009-12-31')

In [20]:
# Filter data for the nineties
nineties_computer = computer_filter[(computer_filter['Air Date'] > start_date_90s) & (computer_filter['Air Date'] <= end_date_90s)]

# Filter data for the 2000s 
twenties_computer = computer_filter[(computer_filter['Air Date'] > start_date_2000s) & (computer_filter['Air Date'] <= end_date_2000s)]

In [21]:
print(nineties_computer.count())

Show Number    93
Air Date       93
Round          93
Category       93
Value          93
Question       93
Answer         93
dtype: int64


In [22]:
print(twenties_computer.count())

Show Number    268
Air Date       268
Round          268
Category       268
Value          268
Question       268
Answer         268
dtype: int64
