## Jeopardy Project

The following is a demonstration of fundamental knowledge and navigation skills within the Pandas library using data from the popular quiz-show 'Jeopardy!". Project idea and original csv file was provided by CodeCademy.com with a list of problem sets for the student (myself) to complete using data-frame manipulation and function creation technical skills.

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None)

jeopardy = pd.read_csv('jeopardy.csv')

The first step was to import the necessary libraries for the project (in this case, only pandas) and create a variable to display a readable dataframe to work with. The set_option() function was used to display the full contents of each column.

In [2]:
jeopardy = jeopardy.rename(columns = {
    ' Show Number': 'Show Number',
    ' Air Date': 'Air Date',
    ' Round': 'Round',
    ' Category': 'Category',
    ' Value': 'Value',
    ' Question': 'Question',
    ' Answer': 'Answer'
})
display(jeopardy.head())

Unnamed: 0,Show Number,Air Date,Round,Category,Value,Question,Answer
0,4680,2004-12-31,Jeopardy!,HISTORY,$200,"For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory",Copernicus
1,4680,2004-12-31,Jeopardy!,ESPN's TOP 10 ALL-TIME ATHLETES,$200,"No. 2: 1912 Olympian; football star at Carlisle Indian School; 6 MLB seasons with the Reds, Giants & Braves",Jim Thorpe
2,4680,2004-12-31,Jeopardy!,EVERYBODY TALKS ABOUT IT...,$200,"The city of Yuma in this state has a record average of 4,055 hours of sunshine each year",Arizona
3,4680,2004-12-31,Jeopardy!,THE COMPANY LINE,$200,"In 1963, live on ""The Art Linkletter Show"", this company served its billionth burger",McDonald's
4,4680,2004-12-31,Jeopardy!,EPITAPHS & TRIBUTES,$200,"Signer of the Dec. of Indep., framer of the Constitution of Mass., second President of the United States",John Adams


The original csv file contained a sneaky empty space before each column header. The code above used the pd.rename() method to remove the space resulting in clear string values to work with for the rest of the project.

Note: I imagine that there is an easier way to write this code. Having to type in each column name twice doesnt seem realistic for larger dataframes with the same or similar issues.

In [3]:
def filtered_df(data, words):
    filter = lambda x: all(word.lower() in x.lower() for word in words)
    return data.loc[data['Question'].apply(filter)]
                    
filtered_questions = filtered_df(jeopardy, ['sports', 'science'])
display(filtered_questions)        

Unnamed: 0,Show Number,Air Date,Round,Category,Value,Question,Answer
160141,6139,2011-04-28,Double Jeopardy!,SPORTS SCIENCE,$400,"(Sarah of the clue crew gives the clue from the Gatorade Sports Science Institute in Barrington, IL) Strength training breaks down muscle fibers but the rebuilding process requires this, made of amino acids, it's a key part of Gatorade's recovery drinks for after your workout",protein
160147,6139,2011-04-28,Double Jeopardy!,SPORTS SCIENCE,$800,"(Jimmy of the clue crew gives the clue from the Gatorade Sports Science Institute in Barrington, IL) High performance liquid chromatography can be used to separate sucrose into its monosaccharide components of fructose and this sugar that's a primary fuel for muscle cells",glucose
160153,6139,2011-04-28,Double Jeopardy!,SPORTS SCIENCE,$1200,"(Sarah of the Clue Crew gives the clue from the Gatorade Sports Science Institute in Barrington, IL.) The concentration of these two expired gases determined by a metabolic chart tells athletes how many carbs as opposed to fat they're burning; burning a higher percentage of fat increases endurance",oxygen & carbon dioxide
160159,6139,2011-04-28,Double Jeopardy!,SPORTS SCIENCE,$1600,"(Sarah of the clue crew gives the clue from the Gatorade Sports Science Institute in Barrington, IL) The resistance changes along with the terrain on the screen as cyclists use this type of electronic device that evaluates their performance literally a ""work measure""",an ergometer
160165,6139,2011-04-28,Double Jeopardy!,SPORTS SCIENCE,$2000,"(Jimmy of the clue crew gives the clue from the Gatorade Sports Science Institute in Barrington, IL) The environmental chamber controls humidity and temperature to examine an athlete's sweating response and excretion of these which are basically minerals that help your cells carry impulses",electrolytes


I defined a function called 'filtered_df' that took in a data-set and a word/ or list of words, and returned any inputs from the 'Question" column of the dataframe that contained the word(s) parameter. Then I created a variable with the function call that used the words 'sports' and 'science' as examples.


Note: This function has lots of room for improvement. I solved the issue with capitalization using a simple lambda function, but there are still many holes e.g. word substrings and punctuation aren't accounted for. 

In [4]:
jeopardy['Float_Values'] = jeopardy['Value'].apply(lambda x: float(x[1:].replace(',', '') if x != 'None' else 0))
mean_difficulty = filtered_df(jeopardy, ['sports'])
#display(jeopardy.head())
print(mean_difficulty['Float_Values'].mean())    

655.193482688391


Next, I wanted to be able to perform some statistics using the dataframe. The 'Value' column of the original dataframe used dollar amounts represented by a string, so they weren't much use in performing calculations. I used a lambda function on each row to create a new column showing the float value of the dollar ammount. I then just performed a simple .mean() calculation to represent the average difficulty of the questions from the filtered dataframe i created before. 