### MISSING DATA AND FIXING THEM

+ Missing Data is represented by NumPy NaN Values.
+ Pandas treat NaN values as floats.
+ Which allow them to be used to in Vectorized Operations.

+ Pandas released its own Missing Data type, NA
+ This allows missing values to be stored as Integers, instead of needing to convert to float.

In [1]:
import pandas as pd
import numpy as np

In [4]:
## with NaN values
sales = [0,5,155,np.nan, 518]
sales_series = pd.Series(sales, name = "Sales Data")
sales_series

0      0.0
1      5.0
2    155.0
3      NaN
4    518.0
Name: Sales Data, dtype: float64

In [7]:
##  with Pandas NA values
sales = [0,5,155,pd.NA, 518]
sales_series = pd.Series(sales, name = "Sales Data", dtype = "Int16")
sales_series

0       0
1       5
2     155
3    <NA>
4     518
Name: Sales Data, dtype: Int16

### IDENTIFYING THE MISSING VALUES

+ There are two methods to identify the missing values in a pandas series.
        + .isna() method.
        + .value_counts() methods.

In [13]:
## checking for the missing values
print(sales_series.isna())
## prniting the sum of missing values
print("Total Number of Missing Values :--",sales_series.isna().sum())

0    False
1    False
2    False
3     True
4    False
Name: Sales Data, dtype: bool
Total Number of Missing Values :-- 1


In [19]:
## checking the missing values using value_counts()
print(sales_series.value_counts())
print("*"*80)
## giving the dropna
print(sales_series.value_counts(dropna=False))

0      1
155    1
5      1
518    1
Name: Sales Data, dtype: Int64
********************************************************************************
NaN    1
0      1
155    1
5      1
518    1
Name: Sales Data, dtype: Int64


### HANDLING THE MISSING DATA

There are two methods to deal with the missing value.
+ Drop the missing value : `.dropna()` method.
+ Fill the missing values with some value like 0 or mean value etc. `.fillna(value)`

In [25]:
## with NaN values
sales = [0,5,155,np.nan, 518]
sales_series = pd.Series(sales, name = "Sales Data")
print(sales_series)
print("DROP THE MISSING VALUE BY DROPNA METHOD")
print(sales_series.dropna())
print("IMPUTE THE MISSING VALUE WITH THE MEAN VALUE")
print(sales_series.fillna(sales_series.mean()))

0      0.0
1      5.0
2    155.0
3      NaN
4    518.0
Name: Sales Data, dtype: float64
DROP THE MISSING VALUE BY DROPNA METHOD
0      0.0
1      5.0
2    155.0
4    518.0
Name: Sales Data, dtype: float64
IMPUTE THE MISSING VALUE WITH THE MEAN VALUE
0      0.0
1      5.0
2    155.0
3    169.5
4    518.0
Name: Sales Data, dtype: float64
