# Jeopardy! Game
The main goal of this project is to practice writing several functions to:
1. investigate a dataset of *Jeopardy! Game* questions and answers, filter the dataset
2. filter the dataset for topics that the players are insterested in
3. compute the average difficulty of the questions

In [1]:
# Import needed libraries
import pandas as pd
import datetime
import random
from time import sleep

In [2]:
# Loading the data
df = pd.read_csv('jeopardy.csv', parse_dates=[1]) #parse_dates to convert the date column to datetime type
pd.set_option('display.max_colwidth', None) # to display the full contents of the columns

## Data Wrangling
In this section, I will check for the cleanliness of the data, then trim and clean the dataset to make it ready for the analysis.

In [3]:
# Check the first few lines of the dataset
df.head()

Unnamed: 0,Show Number,Air Date,Round,Category,Value,Question,Answer
0,4680,2004-12-31,Jeopardy!,HISTORY,$200,"For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory",Copernicus
1,4680,2004-12-31,Jeopardy!,ESPN's TOP 10 ALL-TIME ATHLETES,$200,"No. 2: 1912 Olympian; football star at Carlisle Indian School; 6 MLB seasons with the Reds, Giants & Braves",Jim Thorpe
2,4680,2004-12-31,Jeopardy!,EVERYBODY TALKS ABOUT IT...,$200,"The city of Yuma in this state has a record average of 4,055 hours of sunshine each year",Arizona
3,4680,2004-12-31,Jeopardy!,THE COMPANY LINE,$200,"In 1963, live on ""The Art Linkletter Show"", this company served its billionth burger",McDonald's
4,4680,2004-12-31,Jeopardy!,EPITAPHS & TRIBUTES,$200,"Signer of the Dec. of Indep., framer of the Constitution of Mass., second President of the United States",John Adams


In [4]:
# Check the data types
df.dtypes

Show Number             int64
 Air Date      datetime64[ns]
 Round                 object
 Category              object
 Value                 object
 Question              object
 Answer                object
dtype: object

> The **Value** column has data type as *object*, in order to compute the values in this column, it is neccessary to convert it to *float* type

In [5]:
# Check columns 
df.columns

Index(['Show Number', ' Air Date', ' Round', ' Category', ' Value',
       ' Question', ' Answer'],
      dtype='object')

> There are empty spaces in front of the column' names --> Rename all the columns to get rid of empty spaces and make it more consistent.

In [21]:
# Rename the columns
df.rename(columns={'Show Number':'show_number',
		' Air Date':'air_date',
		' Round':'round',
		' Category': 'category',
		' Value':'value',
		' Question':'question',
		' Answer':'answer'},
		inplace = True)

In [22]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 216930 entries, 0 to 216929
Data columns (total 7 columns):
 #   Column       Non-Null Count   Dtype         
---  ------       --------------   -----         
 0   show_number  216930 non-null  int64         
 1   air_date     216930 non-null  datetime64[ns]
 2   round        216930 non-null  object        
 3   category     216930 non-null  object        
 4   value        216930 non-null  object        
 5   question     216930 non-null  object        
 6   answer       216928 non-null  object        
dtypes: datetime64[ns](1), int64(1), object(5)
memory usage: 11.6+ MB


> The dataset has **216 930 rows** and **7 columns**.

In [23]:
# Check if there is null values
df.isnull().sum()

show_number    0
air_date       0
round          0
category       0
value          0
question       0
answer         2
dtype: int64

> **answer** column has **2** null values.

In [25]:
# Check the row has has null values in answer column
answer_null_data = df[df.isnull().any(axis=1)]
answer_null_data

Unnamed: 0,show_number,air_date,round,category,value,question,answer
94817,4346,2003-06-23,Jeopardy!,"GOING ""N""SANE",$200,"It often precedes ""and void""",
143297,6177,2011-06-21,Double Jeopardy!,NOTHING,$400,"This word for ""nothing"" precedes ""and void"" to mean ""not valid""",


> The answer for both of these questions is **"Null"**, so instead of leaving the answer empty, I will assign the answer **Null** to these empty fields.