In [43]:
import pandas as pd
pd.set_option('display.max_colwidth', None)

data = pd.read_csv('jeopardy.csv')

data.head()

Unnamed: 0,Show Number,Air Date,Round,Category,Value,Question,Answer
0,4680,2004-12-31,Jeopardy!,HISTORY,$200,"For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory",Copernicus
1,4680,2004-12-31,Jeopardy!,ESPN's TOP 10 ALL-TIME ATHLETES,$200,"No. 2: 1912 Olympian; football star at Carlisle Indian School; 6 MLB seasons with the Reds, Giants & Braves",Jim Thorpe
2,4680,2004-12-31,Jeopardy!,EVERYBODY TALKS ABOUT IT...,$200,"The city of Yuma in this state has a record average of 4,055 hours of sunshine each year",Arizona
3,4680,2004-12-31,Jeopardy!,THE COMPANY LINE,$200,"In 1963, live on ""The Art Linkletter Show"", this company served its billionth burger",McDonald's
4,4680,2004-12-31,Jeopardy!,EPITAPHS & TRIBUTES,$200,"Signer of the Dec. of Indep., framer of the Constitution of Mass., second President of the United States",John Adams


In [44]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 216930 entries, 0 to 216929
Data columns (total 7 columns):
 #   Column       Non-Null Count   Dtype 
---  ------       --------------   ----- 
 0   Show Number  216930 non-null  int64 
 1    Air Date    216930 non-null  object
 2    Round       216930 non-null  object
 3    Category    216930 non-null  object
 4    Value       216930 non-null  object
 5    Question    216930 non-null  object
 6    Answer      216928 non-null  object
dtypes: int64(1), object(6)
memory usage: 11.6+ MB


I notice that there is a space in front of all the columns (except `Show Number`) so I'm going to rename the columns to remove the whitespace.

In [45]:
data.columns = ['Show Number', 'Air Date', 'Round', 'Category', 'Value', 'Question', 'Answer']
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 216930 entries, 0 to 216929
Data columns (total 7 columns):
 #   Column       Non-Null Count   Dtype 
---  ------       --------------   ----- 
 0   Show Number  216930 non-null  int64 
 1   Air Date     216930 non-null  object
 2   Round        216930 non-null  object
 3   Category     216930 non-null  object
 4   Value        216930 non-null  object
 5   Question     216930 non-null  object
 6   Answer       216928 non-null  object
dtypes: int64(1), object(6)
memory usage: 11.6+ MB


Now I'm going to procedd with converting the `Air Date` column into a datetime object

In [47]:
# goint go convert Air Date into datetime object
data['Air Date'] = pd.to_datetime(data['Air Date'], format= '%Y-%m-%d')

# view data info to see if the Dtype changed
data.info() 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 216930 entries, 0 to 216929
Data columns (total 7 columns):
 #   Column       Non-Null Count   Dtype         
---  ------       --------------   -----         
 0   Show Number  216930 non-null  int64         
 1   Air Date     216930 non-null  datetime64[ns]
 2   Round        216930 non-null  object        
 3   Category     216930 non-null  object        
 4   Value        216930 non-null  object        
 5   Question     216930 non-null  object        
 6   Answer       216928 non-null  object        
dtypes: datetime64[ns](1), int64(1), object(5)
memory usage: 11.6+ MB


Now, the `Value` column represents the amount each `Question` is worth in USD. To make things easier, let's convert the `Value` column to `float64` and rename the column to `Value ($)`

There are 2 thinkgs we'll need to remove:
* `$`
* and the commas that separate larger numbers (i.e 2,000)

In [66]:
# rename column 
data.rename(columns={'Value': 'Value ($)'},inplace=True)

# remove $ using .strip()
data['Value ($)'] = data['Value ($)'].apply(lambda x: x.strip('$').strip(','))

# check to see if $ was removed
data.iloc[22]

# convert values to float
# data['Value ($)'] = data['Value ($)'].astype('float64')

# data[data['Value ($)'] == '2,000']

Show Number                                                                                    4680
Air Date                                                                        2004-12-31 00:00:00
Round                                                                                     Jeopardy!
Category                                                                        EPITAPHS & TRIBUTES
Value ($)                                                                                     2,000
Question       1939 Oscar winner: "...you are a credit to your craft, your race and to your family"
Answer                                         Hattie McDaniel (for her role in Gone with the Wind)
Name: 22, dtype: object