# Handling Missing Data Values
- As a data analyst or data scientist, you will be dealing with messy data while working on real-life business use cases. 
- How you handle invalid or missing data becomes an important skill.
- We will cover handling null values.
- Cover  following pandas methods:
    1. fillna
    2. dropna
    3. interpolate

In [11]:
import pandas as pd
import csv

In [12]:
df = pd.read_csv("weatherHistory.csv")
df

Unnamed: 0,Formatted_Date,Summary,Precip_Type,Temperature_C,Apparent_Temperature_C,Humidity,Wind_Speed_kmh,Wind_Bearing_degrees,Visibility_km,Loud_Cover,Pressure _millibars,Daily_Summary
0,2006-04-01 00:00:00.000 +0200,Partly Cloudy,rain,9.472222,7.388889,0.89,14.1197,251,15.8263,0,1015.13,Partly cloudy throughout the day.
1,2006-04-01 01:00:00.000 +0200,Partly Cloudy,rain,9.355556,7.227778,0.86,14.2646,259,15.8263,0,1015.63,Partly cloudy throughout the day.
2,2006-04-01 02:00:00.000 +0200,Mostly Cloudy,rain,9.377778,9.377778,0.89,3.9284,204,14.9569,0,1015.94,Partly cloudy throughout the day.
3,2006-04-01 03:00:00.000 +0200,Partly Cloudy,rain,8.288889,5.944444,0.83,14.1036,269,15.8263,0,1016.41,Partly cloudy throughout the day.
4,2006-04-01 04:00:00.000 +0200,Mostly Cloudy,rain,8.755556,6.977778,0.83,11.0446,259,15.8263,0,1016.51,Partly cloudy throughout the day.
...,...,...,...,...,...,...,...,...,...,...,...,...
96448,2016-09-09 19:00:00.000 +0200,Partly Cloudy,rain,26.016667,26.016667,0.43,10.9963,31,16.1000,0,1014.36,Partly cloudy starting in the morning.
96449,2016-09-09 20:00:00.000 +0200,Partly Cloudy,rain,24.583333,24.583333,0.48,10.0947,20,15.5526,0,1015.16,Partly cloudy starting in the morning.
96450,2016-09-09 21:00:00.000 +0200,Partly Cloudy,rain,22.038889,22.038889,0.56,8.9838,30,16.1000,0,1015.66,Partly cloudy starting in the morning.
96451,2016-09-09 22:00:00.000 +0200,Partly Cloudy,rain,21.522222,21.522222,0.60,10.5294,20,16.1000,0,1015.95,Partly cloudy starting in the morning.


In [13]:
type(df.Formatted_Date) #Data Type of Formatted_Date is Series

pandas.core.series.Series

In [14]:
type(df.Formatted_Date[0])    #using [0] to check element wise datatype of 'Formatted_Date'

str

## parse_dates=['name of column']
- This method converts dates into datetime64[ns] directly while reading the data.
- More efficient.
- Will return `Timestamp` as dtype. `Timestamp` is simply date and time.
- Can be Parse a single column(Pass the column name (or index) as a list) and Parse multiple columns(Pass a list of column names.)
- Syntax: 1. `df = pd.read_csv(".csv", parse_dates=['Date_Column_Name'])`
    2. `pd.read_excel(".xlsx", parse_dates=['Date_Column_Name'])`

In [15]:
df_dates = pd.read_csv("weatherHistory.csv", parse_dates=['Formatted_Date'])
type(df.Formatted_Date[0])

str

#### Using `.set_index` to set Date as Index, instead of 0,1,2,3....

In [16]:
df_dates.set_index('Formatted_Date',inplace=True)
df_dates

Unnamed: 0_level_0,Summary,Precip_Type,Temperature_C,Apparent_Temperature_C,Humidity,Wind_Speed_kmh,Wind_Bearing_degrees,Visibility_km,Loud_Cover,Pressure _millibars,Daily_Summary
Formatted_Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2006-04-01 00:00:00+02:00,Partly Cloudy,rain,9.472222,7.388889,0.89,14.1197,251,15.8263,0,1015.13,Partly cloudy throughout the day.
2006-04-01 01:00:00+02:00,Partly Cloudy,rain,9.355556,7.227778,0.86,14.2646,259,15.8263,0,1015.63,Partly cloudy throughout the day.
2006-04-01 02:00:00+02:00,Mostly Cloudy,rain,9.377778,9.377778,0.89,3.9284,204,14.9569,0,1015.94,Partly cloudy throughout the day.
2006-04-01 03:00:00+02:00,Partly Cloudy,rain,8.288889,5.944444,0.83,14.1036,269,15.8263,0,1016.41,Partly cloudy throughout the day.
2006-04-01 04:00:00+02:00,Mostly Cloudy,rain,8.755556,6.977778,0.83,11.0446,259,15.8263,0,1016.51,Partly cloudy throughout the day.
...,...,...,...,...,...,...,...,...,...,...,...
2016-09-09 19:00:00+02:00,Partly Cloudy,rain,26.016667,26.016667,0.43,10.9963,31,16.1000,0,1014.36,Partly cloudy starting in the morning.
2016-09-09 20:00:00+02:00,Partly Cloudy,rain,24.583333,24.583333,0.48,10.0947,20,15.5526,0,1015.16,Partly cloudy starting in the morning.
2016-09-09 21:00:00+02:00,Partly Cloudy,rain,22.038889,22.038889,0.56,8.9838,30,16.1000,0,1015.66,Partly cloudy starting in the morning.
2016-09-09 22:00:00+02:00,Partly Cloudy,rain,21.522222,21.522222,0.60,10.5294,20,16.1000,0,1015.95,Partly cloudy starting in the morning.
