<a href="https://colab.research.google.com/github/YumnaZai/ML-Projects/blob/main/Practice_Chapter_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Indentifying missing values in tabular data**

In [None]:
import pandas as pd
from io import StringIO

csv_data = \
'''A,B,C,D
1.0,2.0,3.0,4.0
5.0,6.0,,8.0
0.0,11.0,12.0,'''

csv_data


'A,B,C,D\n1.0,2.0,3.0,4.0\n5.0,6.0,,8.0\n0.0,11.0,12.0,'

In [None]:
dt = pd.read_csv(StringIO(csv_data))
dt

Unnamed: 0,A,B,C,D
0,1.0,2.0,3.0,4.0
1,5.0,6.0,,8.0
2,0.0,11.0,12.0,


Let's check the number of missing values

In [None]:
dt.isnull().sum()

Unnamed: 0,0
A,0
B,0
C,1
D,1


We can also convert the DataFrame into a Numpy array using the `.values` method.

In [None]:
dt.values

array([[ 1.,  2.,  3.,  4.],
       [ 5.,  6., nan,  8.],
       [ 0., 11., 12., nan]])

In [None]:
dt.dropna(axis=0)

Unnamed: 0,A,B,C,D
0,1.0,2.0,3.0,4.0


In [None]:
import pandas as pd
df = pd.DataFrame([
                    ['green','M',10.1,'class2'],
                    ['red','L', 13.5,'class1'],
                    ['blue', 'XL', 15.3, 'class2']
])
df

Unnamed: 0,0,1,2,3
0,green,M,10.1,class2
1,red,L,13.5,class1
2,blue,XL,15.3,class2


We can label the columns using `.columns` function and assigned to a list

---



In [None]:
df.columns =['color','size','price','classlabel']
df

Unnamed: 0,color,size,price,classlabel
0,green,M,10.1,class2
1,red,L,13.5,class1
2,blue,XL,15.3,class2


# Mapping ordinal features

In [None]:
size_mapping = {'XL': 3,
                'L' : 2,
                'M' : 1}
size_mapping

{'XL': 3, 'L': 2, 'M': 1}

In [None]:
df['size'] = df['size'].map(size_mapping)
df

Unnamed: 0,color,size,price,classlabel
0,green,1,10.1,class2
1,red,2,13.5,class1
2,blue,3,15.3,class2


# Encoding class lables

Here we map categorical label to values

In [None]:
import numpy as np

class_mapping = {label: idx for idx, label in enumerate(np.unique(df['classlabel']))}

class_mapping



{'class1': 0, 'class2': 1}

In [None]:
df['classlabel'] = df['classlabel'].map(class_mapping)
df

Unnamed: 0,color,size,price,classlabel
0,green,1,10.1,1
1,red,2,13.5,0
2,blue,3,15.3,1


Here we reverse the mapping of values back to their original categorical labels

In [None]:
inv_class_mapping = { }