# Winning Jeopardy

Jeopardy is a popular TV show in the US where participants answer questions to win money. Let's say we want to compete on Jeopardy, and we're looking for any edge we can get to win. In this project we will figure out some patterns in the questions that could help us win.

At first we import dataset which contains around 200000 rows of Jeopardy questions from it's beginning from CSV file.

In [33]:
import pandas as pd

jeopardy = pd.read_csv('JEOPARDY_CSV.csv')
jeopardy.head()

Unnamed: 0,Show Number,Air Date,Round,Category,Value,Question,Answer
0,4680,2004-12-31,Jeopardy!,HISTORY,$200,"For the last 8 years of his life, Galileo was ...",Copernicus
1,4680,2004-12-31,Jeopardy!,ESPN's TOP 10 ALL-TIME ATHLETES,$200,No. 2: 1912 Olympian; football star at Carlisl...,Jim Thorpe
2,4680,2004-12-31,Jeopardy!,EVERYBODY TALKS ABOUT IT...,$200,The city of Yuma in this state has a record av...,Arizona
3,4680,2004-12-31,Jeopardy!,THE COMPANY LINE,$200,"In 1963, live on ""The Art Linkletter Show"", th...",McDonald's
4,4680,2004-12-31,Jeopardy!,EPITAPHS & TRIBUTES,$200,"Signer of the Dec. of Indep., framer of the Co...",John Adams


Some of dataset column names have spaces in front of their column names. Let's delete them.

In [34]:
# assign column names with deleted spaces to variable
jeopardy_columns = jeopardy.columns.str.replace(' ', '')
# assign variable's values to column names back
jeopardy.columns = jeopardy_columns
jeopardy.columns

Index(['ShowNumber', 'AirDate', 'Round', 'Category', 'Value', 'Question',
       'Answer'],
      dtype='object')

We also convert Value column to integer value.

In [35]:
# show all unique values for Value column
print(jeopardy['Value'].unique())

# delete '$' and ',' symbol and None value from Value column
jeopardy['Value'] = jeopardy['Value'].str.replace('$', '')
jeopardy['Value'] = jeopardy['Value'].str.replace(',', '')
jeopardy['Value'] = jeopardy['Value'].str.replace('None', '0')

# convert Value column type to int
jeopardy['Value'] = jeopardy['Value'].astype(int)

jeopardy.rename({'Value': 'Value, $'}, axis=1, inplace=True)

['$200' '$400' '$600' '$800' '$2,000' '$1000' '$1200' '$1600' '$2000'
 '$3,200' 'None' '$5,000' '$100' '$300' '$500' '$1,000' '$1,500' '$1,200'
 '$4,800' '$1,800' '$1,100' '$2,200' '$3,400' '$3,000' '$4,000' '$1,600'
 '$6,800' '$1,900' '$3,100' '$700' '$1,400' '$2,800' '$8,000' '$6,000'
 '$2,400' '$12,000' '$3,800' '$2,500' '$6,200' '$10,000' '$7,000' '$1,492'
 '$7,400' '$1,300' '$7,200' '$2,600' '$3,300' '$5,400' '$4,500' '$2,100'
 '$900' '$3,600' '$2,127' '$367' '$4,400' '$3,500' '$2,900' '$3,900'
 '$4,100' '$4,600' '$10,800' '$2,300' '$5,600' '$1,111' '$8,200' '$5,800'
 '$750' '$7,500' '$1,700' '$9,000' '$6,100' '$1,020' '$4,700' '$2,021'
 '$5,200' '$3,389' '$4,200' '$5' '$2,001' '$1,263' '$4,637' '$3,201'
 '$6,600' '$3,700' '$2,990' '$5,500' '$14,000' '$2,700' '$6,400' '$350'
 '$8,600' '$6,300' '$250' '$3,989' '$8,917' '$9,500' '$1,246' '$6,435'
 '$8,800' '$2,222' '$2,746' '$10,400' '$7,600' '$6,700' '$5,100' '$13,200'
 '$4,300' '$1,407' '$12,400' '$5,401' '$7,800' '$1,183' '$1,203

We also convert AirDate column values from str to date type.

In [37]:
# convert to datetime
jeopardy['AirDate'] = pd.to_datetime(jeopardy['AirDate'])

# show information about Jeopardy dataframe columns and their types 
jeopardy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 216930 entries, 0 to 216929
Data columns (total 7 columns):
ShowNumber    216930 non-null int64
AirDate       216930 non-null datetime64[ns]
Round         216930 non-null object
Category      216930 non-null object
Value, $      216930 non-null int32
Question      216930 non-null object
Answer        216928 non-null object
dtypes: datetime64[ns](1), int32(1), int64(1), object(4)
memory usage: 7.4+ MB
