# Jeopardy! 

This project aims to investigate a dataset containing information on questions and answers asked on Jeopardy! I will be writing functions that will filter the dataset, compute the average difficulty of those questions and find the number of unique answers for those questions.

### Importing the Data
The first step is to import pandas and set the column width so we can see everything within the DataFrame.

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None)

In [2]:
df = pd.read_csv('jeopardy.csv')
df.head()

Unnamed: 0,Show Number,Air Date,Round,Category,Value,Question,Answer
0,4680,2004-12-31,Jeopardy!,HISTORY,$200,"For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory",Copernicus
1,4680,2004-12-31,Jeopardy!,ESPN's TOP 10 ALL-TIME ATHLETES,$200,"No. 2: 1912 Olympian; football star at Carlisle Indian School; 6 MLB seasons with the Reds, Giants & Braves",Jim Thorpe
2,4680,2004-12-31,Jeopardy!,EVERYBODY TALKS ABOUT IT...,$200,"The city of Yuma in this state has a record average of 4,055 hours of sunshine each year",Arizona
3,4680,2004-12-31,Jeopardy!,THE COMPANY LINE,$200,"In 1963, live on ""The Art Linkletter Show"", this company served its billionth burger",McDonald's
4,4680,2004-12-31,Jeopardy!,EPITAPHS & TRIBUTES,$200,"Signer of the Dec. of Indep., framer of the Constitution of Mass., second President of the United States",John Adams


### Cleaning Column Names
The column names for the dataset aren't clean and will be difficult to call for our functions. E.g. the column 'Category' is actually '    Category'. Therefore, before I can write any functions I will need to rename the columns. 

In [3]:
df.columns = ['Show Number', 'Air Date', 'Round', 'Category', 'Value', 'Question', 'Answer']

In [4]:
df.head()

Unnamed: 0,Show Number,Air Date,Round,Category,Value,Question,Answer
0,4680,2004-12-31,Jeopardy!,HISTORY,$200,"For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory",Copernicus
1,4680,2004-12-31,Jeopardy!,ESPN's TOP 10 ALL-TIME ATHLETES,$200,"No. 2: 1912 Olympian; football star at Carlisle Indian School; 6 MLB seasons with the Reds, Giants & Braves",Jim Thorpe
2,4680,2004-12-31,Jeopardy!,EVERYBODY TALKS ABOUT IT...,$200,"The city of Yuma in this state has a record average of 4,055 hours of sunshine each year",Arizona
3,4680,2004-12-31,Jeopardy!,THE COMPANY LINE,$200,"In 1963, live on ""The Art Linkletter Show"", this company served its billionth burger",McDonald's
4,4680,2004-12-31,Jeopardy!,EPITAPHS & TRIBUTES,$200,"Signer of the Dec. of Indep., framer of the Constitution of Mass., second President of the United States",John Adams


### Filtering Questions
The first function will be to filter the questions in the dataset and return those that contain words given in a list. E.g. given a list ['life', 'leave', 'puppy'] the function will return those questions that contain all those words.

In [5]:
def filter_questions(df, words):
    """ Filters the the DataFrame for questions that contain all the words in the list words. """
    filters = lambda x: all(word.lower() in x.lower() for word in words)
    return df.loc[df['Question'].apply(filters)]

In [6]:
filter_questions(df, ['under', 'for'])

Unnamed: 0,Show Number,Air Date,Round,Category,Value,Question,Answer
0,4680,2004-12-31,Jeopardy!,HISTORY,$200,"For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory",Copernicus
31,4680,2004-12-31,Double Jeopardy!,AIRLINE TRAVEL,$400,"It can be a place to leave your puppy when you take a trip, or a carrier for him that fits under an airplane seat",a kennel
66,5957,2010-07-06,Jeopardy!,LET'S BOUNCE,$400,Sound navigation& ranging is the full name for this device that bounces radio waves underwater,sonar
110,5957,2010-07-06,Double Jeopardy!,SCIENCE CLASS,$2000,Lava & igneous rock are formed from this hot liquid rock material found under the earth's crust,magma
111,5957,2010-07-06,Double Jeopardy!,KIDS IN SPORTS,$2000,This sport has an under-17 World Cup every 2 years; Haris Seferovic starred for the 2009 champion Switzerland,soccer
...,...,...,...,...,...,...,...
214951,2833,1996-12-18,Double Jeopardy!,FICTIONAL CHARACTERS,$800,"When Count Vronsky's love for her seems to fade, she throws herself under a train",Anna Karenina
215508,4216,2002-12-23,Jeopardy!,BUSINESS & INDUSTRY,$800,"The ""K"" in K-Mart's name stands for this founder",S.S. Kresge
215575,3589,2000-03-23,Jeopardy!,MAGAZINES,$500,"Its first issue, in 1845, included articles on ""A Smoke Filter for Locomotives"" & ""Cause of Sound and Thunder""",Scientific American
216289,5236,2007-05-21,Jeopardy!,TERMS OF ENGINEERMENT,$1000,"In this type of well named for a region of France, groundwater rises to the surface under pressure from an aquifer",an Artesian well


### Average Difficulty of Questions
The difficulty of the questions is measured by the value. As seen below, the value column contains only strings, therefore if we wanted to do analysis on the difficulty of the questions we need to convert the strigns to floats.

In [7]:
df.Value.unique()

array(['$200', '$400', '$600', '$800', '$2,000', '$1000', '$1200',
       '$1600', '$2000', '$3,200', 'None', '$5,000', '$100', '$300',
       '$500', '$1,000', '$1,500', '$1,200', '$4,800', '$1,800', '$1,100',
       '$2,200', '$3,400', '$3,000', '$4,000', '$1,600', '$6,800',
       '$1,900', '$3,100', '$700', '$1,400', '$2,800', '$8,000', '$6,000',
       '$2,400', '$12,000', '$3,800', '$2,500', '$6,200', '$10,000',
       '$7,000', '$1,492', '$7,400', '$1,300', '$7,200', '$2,600',
       '$3,300', '$5,400', '$4,500', '$2,100', '$900', '$3,600', '$2,127',
       '$367', '$4,400', '$3,500', '$2,900', '$3,900', '$4,100', '$4,600',
       '$10,800', '$2,300', '$5,600', '$1,111', '$8,200', '$5,800',
       '$750', '$7,500', '$1,700', '$9,000', '$6,100', '$1,020', '$4,700',
       '$2,021', '$5,200', '$3,389', '$4,200', '$5', '$2,001', '$1,263',
       '$4,637', '$3,201', '$6,600', '$3,700', '$2,990', '$5,500',
       '$14,000', '$2,700', '$6,400', '$350', '$8,600', '$6,300', '$250',
    

In [8]:
df['Float Value'] = df.Value.apply(lambda x: float(0) if x == 'None' else float(x.strip('$').replace(',', '')))

In [9]:
df

Unnamed: 0,Show Number,Air Date,Round,Category,Value,Question,Answer,Float Value
0,4680,2004-12-31,Jeopardy!,HISTORY,$200,"For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory",Copernicus,200.0
1,4680,2004-12-31,Jeopardy!,ESPN's TOP 10 ALL-TIME ATHLETES,$200,"No. 2: 1912 Olympian; football star at Carlisle Indian School; 6 MLB seasons with the Reds, Giants & Braves",Jim Thorpe,200.0
2,4680,2004-12-31,Jeopardy!,EVERYBODY TALKS ABOUT IT...,$200,"The city of Yuma in this state has a record average of 4,055 hours of sunshine each year",Arizona,200.0
3,4680,2004-12-31,Jeopardy!,THE COMPANY LINE,$200,"In 1963, live on ""The Art Linkletter Show"", this company served its billionth burger",McDonald's,200.0
4,4680,2004-12-31,Jeopardy!,EPITAPHS & TRIBUTES,$200,"Signer of the Dec. of Indep., framer of the Constitution of Mass., second President of the United States",John Adams,200.0
...,...,...,...,...,...,...,...,...
216925,4999,2006-05-11,Double Jeopardy!,RIDDLE ME THIS,$2000,This Puccini opera turns on the solution to 3 riddles posed by the heroine,Turandot,2000.0
216926,4999,2006-05-11,Double Jeopardy!,"""T"" BIRDS",$2000,"In North America this term is properly applied to only 4 species that are crested, including the tufted",a titmouse,2000.0
216927,4999,2006-05-11,Double Jeopardy!,AUTHORS IN THEIR YOUTH,$2000,"In Penny Lane, where this ""Hellraiser"" grew up, the barber shaves another customer--then flays him alive!",Clive Barker,2000.0
216928,4999,2006-05-11,Double Jeopardy!,QUOTATIONS,$2000,"From Ft. Sill, Okla. he made the plea, Arizona is my land, my home, my father's land, to which I now ask to... return""",Geronimo,2000.0


We can now perform analysis on the difficulty of the questions. E.g. below we have found that the average difficulty for questions containing the word 'king' is about $772.

In [10]:
king_qs = filter_questions(df, ['king'])
king_qs['Float Value'].mean()

771.8833850722094

### Counting Unique Answers
The function below can be used to count the number of unique answers in the dataset. E.g. of the 7409 questions that contain the word 'king', there are 5268 unique answers.

In [11]:
def count_unique_ans(df):
    """ Counts the unique answers of the given DataFrame. """
    unique_ans = df.Answer.unique()
    return len(unique_ans)    

In [12]:
count_unique_ans(king_qs)

5268

In [13]:
king_qs

Unnamed: 0,Show Number,Air Date,Round,Category,Value,Question,Answer,Float Value
34,4680,2004-12-31,Double Jeopardy!,"""X""s & ""O""s",$400,Around 100 A.D. Tacitus wrote a book on how this art of persuasive speaking had declined since Cicero,oratory,400.0
40,4680,2004-12-31,Double Jeopardy!,DR. SEUSS AT THE MULTIPLEX,$1200,"<a href=""http://www.j-archive.com/media/2004-12-31_DJ_26.mp3"">Ripped from today's headlines, he was a turtle king gone mad; Mack was the one good turtle who'd bring him down</a>",Yertle,1200.0
50,4680,2004-12-31,Double Jeopardy!,DR. SEUSS AT THE MULTIPLEX,$2000,"<a href=""http://www.j-archive.com/media/2004-12-31_DJ_24.mp3"">""500 Hats""... 500 ways to die. On July 4th, this young boy will defy a king... & become a legend</a>",Bartholomew Cubbins,2000.0
56,5957,2010-07-06,Jeopardy!,"GEOGRAPHY ""E""",$200,It's the largest kingdom in the United Kingdom,England,200.0
72,5957,2010-07-06,Jeopardy!,LET'S BOUNCE,$600,"In this kid's game, you bounce a small rubber ball while picking up 6-pronged metal objects",jacks,600.0
...,...,...,...,...,...,...,...,...
216777,5070,2006-09-29,Double Jeopardy!,ANCIENT HISTORY,$400,The first one of these tombs was built about 2650 B.C. by Imhotep for King Zoser & rose about 200 feet using steps,a pyramid (the pyramids accepted),400.0
216787,5070,2006-09-29,Double Jeopardy!,TALES OF E.T.A. HOFFMANN,"$2,000","A Hoffmann tale title lost the words ""And The Mouse King"" when it became this Tchaikovsky ballet",The Nutcracker,2000.0
216789,5070,2006-09-29,Double Jeopardy!,ANCIENT HISTORY,$1200,"This kingdom of England grew from 2 settlements, one founded around 495 by Cerdic & his son Cynric",Wessex,1200.0
216856,5195,2007-03-23,Double Jeopardy!,HAIL TO THE CHEF,$1600,"You can cook like <a href=""http://www.j-archive.com/media/2007-03-23_DJ_24.jpg"" target=""_blank"">this</a> man who wrote ""The Joy of Wokking""",(Martin) Yan,1600.0


### Filtering Questions
We may also want to filter the data by the category of questions. The code below is a helper function to do that, it will return a DataFrame of all the questions in a given category.

In [14]:
def filter_category(df, category):
    """ This function filters the DataFrame by the category given. """
    poss_cats = df.Category.unique()
    if category in poss_cats:
        return df[df.Category == category]

In [15]:
filter_category(df, 'QUOTATIONS')

Unnamed: 0,Show Number,Air Date,Round,Category,Value,Question,Answer,Float Value
2953,3697,2000-10-03,Double Jeopardy!,QUOTATIONS,$200,"In a 1961 speech he said, ""...ask not what America will do for you, but what together we can do for the freedom of man""",JFK (John F. Kennedy),200.0
2959,3697,2000-10-03,Double Jeopardy!,QUOTATIONS,$400,"This ""Huck Finn"" author wrote ""Few things are harder to put up with than the annoyance of a good example""",Mark Twain,400.0
2965,3697,2000-10-03,Double Jeopardy!,QUOTATIONS,$600,"This talk show host said, ""I admire, respect & adore authors"" when she was honored for her book club",Oprah,600.0
2971,3697,2000-10-03,Double Jeopardy!,QUOTATIONS,"$1,100","In 1944 she wrote in her diary, ""In spite of everything I still believe that people are really good at heart""",Anne Frank,1100.0
2977,3697,2000-10-03,Double Jeopardy!,QUOTATIONS,$1000,The preamble to the U.S. Constitution begins with these 3 words,"""We the People""",1000.0
...,...,...,...,...,...,...,...,...
216904,4999,2006-05-11,Double Jeopardy!,QUOTATIONS,$400,"In 1981 he quipped, ""You can tell a lot about a fellow's character by his way of eating jellybeans""",Ronald Reagan,400.0
216910,4999,2006-05-11,Double Jeopardy!,QUOTATIONS,$800,"In his prime this athlete said, It's hard to be humble ""when you're as great as I am""",Muhammad Ali,800.0
216916,4999,2006-05-11,Double Jeopardy!,QUOTATIONS,"$2,200","Oscar Wilde called this 4-letter word ""the curse of the drinking classes""",work,2200.0
216922,4999,2006-05-11,Double Jeopardy!,QUOTATIONS,$1600,"A motto of hers was ""in politics, if you want anything said, ask a man; if you want anything done, ask a woman""",(Margaret) Thatcher,1600.0


### Final Analysis
We're now able to analyse the data and gain some insights into the questions asked on Jeopardy! For this analysis we'll look at the questions given in the history category.

In [18]:
history_qs = filter_category(df, 'HISTORY')
history_qs_difficulty = history_qs['Float Value'].mean()
history_qs_unique_ans = count_unique_ans(history_qs)

In [19]:
history_qs_difficulty

603.7249283667621

In [20]:
history_qs_unique_ans

302

In [21]:
history_qs

Unnamed: 0,Show Number,Air Date,Round,Category,Value,Question,Answer,Float Value
0,4680,2004-12-31,Jeopardy!,HISTORY,$200,"For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory",Copernicus,200.0
6,4680,2004-12-31,Jeopardy!,HISTORY,$400,"Built in 312 B.C. to link Rome & the South of Italy, it's still in use today",the Appian Way,400.0
12,4680,2004-12-31,Jeopardy!,HISTORY,$600,In 1000 Rajaraja I of the Cholas battled to take this Indian Ocean island now known for its tea,Ceylon (or Sri Lanka),600.0
18,4680,2004-12-31,Jeopardy!,HISTORY,$800,Karl led the first of these Marxist organizational efforts; the second one began in 1889,the International,800.0
24,4680,2004-12-31,Jeopardy!,HISTORY,$1000,"This Asian political party was founded in 1885 with ""Indian National"" as part of its name",the Congress Party,1000.0
...,...,...,...,...,...,...,...,...
216417,3644,2000-06-08,Double Jeopardy!,HISTORY,$200,These ancient people referred to themselves as Hellenes,Greeks,200.0
216423,3644,2000-06-08,Double Jeopardy!,HISTORY,$400,"During the third of these military expeditions, the Palestinian ports of Acre & Jaffa were captured, but not Jerusalem",Crusades,400.0
216429,3644,2000-06-08,Double Jeopardy!,HISTORY,$600,In 1840 regular transatlantic steamship service was inaugurated between Great Britain & this Nova Scotia capital,Halifax,600.0
216435,3644,2000-06-08,Double Jeopardy!,HISTORY,$800,In 1954 the French were defeated in the battle of this Vietnamese village; it was the decisive battle in the Indochina War,Dien Bien Phu,800.0


From this analysis we can see that of the 349 questions in the history category, there are 302 with a unique answer. The average difficulty of the 349 questions was about $604.