#  Intro to NAN values
Nan is a special value in python that is used to represent missing values.

In [1]:
import numpy as np
import pandas as pd

In [2]:
sales = pd.read_csv("sales.csv", index_col=0)

In [3]:
sales

Unnamed: 0,Mon,Tue,Wed,Thu,Fri
Steven,34,27.0,15,,33
Mike,45,9.0,74,87.0,12
Andi,17,,54,8.0,29
Paul,87,67.0,no data,45.0,7


 For Thu we have 3 value because NaN is not recognized
 Emty space is recognized in data for Tue we have 4 values .

## With info() method we can see the missing values

In [4]:
sales.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, Steven to Paul
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Mon     4 non-null      int64  
 1   Tue     4 non-null      object 
 2   Wed     4 non-null      object 
 3   Thu     3 non-null      float64
 4   Fri     4 non-null      int64  
dtypes: float64(1), int64(2), object(2)
memory usage: 192.0+ bytes


We can produc manually the NaN values

In [5]:
sales.iloc[1,1] = np.nan

In [6]:
sales

Unnamed: 0,Mon,Tue,Wed,Thu,Fri
Steven,34,27.0,15,,33
Mike,45,,74,87.0,12
Andi,17,,54,8.0,29
Paul,87,67.0,no data,45.0,7


OR

In [7]:
sales.iloc[1,1] = None

In [8]:
sales

Unnamed: 0,Mon,Tue,Wed,Thu,Fri
Steven,34,27.0,15,,33
Mike,45,,74,87.0,12
Andi,17,,54,8.0,29
Paul,87,67.0,no data,45.0,7


#  We can use the isna() + sum() method to check for missing values

In [9]:
sales.isna().sum(axis=0)

Mon    0
Tue    1
Wed    0
Thu    1
Fri    0
dtype: int64

# When we select rows with missing values, can use loc method

In [ ]:
sales.loc[sales.isna().any(axis=1)]

In [11]:
sales.Tue.replace(to_replace="Missing Data", value=np.nan)

Steven      27
Mike      None
Andi          
Paul        67
Name: Tue, dtype: object

Shape method is used to check the number of rows and columns. First is rows and second is columns

In [13]:
sales.shape

(4, 5)

## Dropna() method is used to drop the missing values, this method deletes the rows with missing values

In [16]:
sales.Tue.dropna().shape

(3,)

# Method for finding the missing values !!!!

In [19]:
sales[sales.isna().any(axis=1)]

Unnamed: 0,Mon,Tue,Wed,Thu,Fri
Steven,34,27.0,15,,33
Mike,45,,74,87.0,12


# Fillna() method is used to fill the missing values

In [20]:
sales.fillna(0)

Unnamed: 0,Mon,Tue,Wed,Thu,Fri
Steven,34,27.0,15,0.0,33
Mike,45,0.0,74,87.0,12
Andi,17,,54,8.0,29
Paul,87,67.0,no data,45.0,7


# We can change Dtype in a Pythonic type
Some time Object type is not recognized by python. 

In [6]:
sales.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, Steven to Paul
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Mon     4 non-null      int64  
 1   Tue     4 non-null      object 
 2   Wed     4 non-null      object 
 3   Thu     3 non-null      float64
 4   Fri     4 non-null      int64  
dtypes: float64(1), int64(2), object(2)
memory usage: 192.0+ bytes


We can use the convert_dtypes() method to convert the Dtype to pythonic type

In [7]:
sales= sales.convert_dtypes()

In [8]:
sales.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, Steven to Paul
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Mon     4 non-null      Int64 
 1   Tue     4 non-null      string
 2   Wed     4 non-null      string
 3   Thu     3 non-null      Int64 
 4   Fri     4 non-null      Int64 
dtypes: Int64(3), string(2)
memory usage: 204.0+ bytes
