# Data Preprocessing Example (Basic)

**Author**: Muhammed Ashrah  



**Source Adapted From**: [Dive into Deep Learning (d2l.ai)](https://d2l.ai/)


**Note**: This notebook is adapted from the *"Data Preprocessing"* section of the Dive into Deep Learning book.

This notebook provides a beginner-friendly walkthrough of **basic data preprocessing techniques** using pandas and PyTorch.

You'll learn how to:
- Load tabular data from CSV  
- Handle missing values  
- Encode categorical features  

> 💡 The goal is not just to run the code, but to **understand what’s happening behind the scenes**. Feel free to tweak the examples and experiment!


# Creating the file which is to be stored in the directory

In [None]:
import os
m=os.getcwd()
m

'/content'

##### Creating directory data

In [None]:
new_dir="data"
path=os.path.join(m,new_dir)
os.mkdir(path)

Creating file inside data directory

In [None]:
file1="house_tiny.csv"
path_file1=os.path.join(path,file1)

print(path_file1)

with open(path_file1,'w') as f:
  f.write('''Num_Room,Rooftype,Price # Using this  helps us to avoid writing \n after each line
NA,NA,127500                         #
2,NA,106000
4,Slate,178100
NA,NA,140000
''')


/content/data/house_tiny.csv


Reading the file created

In [None]:
import pandas as pd
df=pd.read_csv(path_file1)
df.head()

Unnamed: 0,Num_Room,Rooftype,Price
0,,,127500
1,2.0,,106000
2,4.0,Slate,178100
3,,,140000


# Separating the features (inputs) and the target (label)



In [None]:
inputs=df.iloc[:,0:2]
inputs

Unnamed: 0,Num_Room,Rooftype
0,,
1,2.0,
2,4.0,Slate
3,,


In [None]:
target=df.iloc[:,2]
target

Unnamed: 0,Price
0,127500
1,106000
2,178100
3,140000


Handling Null Value in Categorical by One hot Encoding

In [None]:
inputs=pd.get_dummies(inputs,dummy_na=True)
inputs

Unnamed: 0,Num_Room,Rooftype_Slate,Rooftype_nan
0,,False,True
1,2.0,False,True
2,4.0,True,False
3,,False,True


Handling missing features by imputing them with mean

In [None]:
inputs=inputs.fillna(inputs.mean())
inputs

Unnamed: 0,Num_Room,Rooftype_Slate,Rooftype_nan
0,3.0,False,True
1,2.0,False,True
2,4.0,True,False
3,3.0,False,True


Converting to tensors

In [None]:
import torch
X=torch.tensor(inputs.to_numpy(dtype=float))
y=torch.tensor(target.to_numpy(dtype=float))

X,y

(tensor([[3., 0., 1.],
         [2., 0., 1.],
         [4., 1., 0.],
         [3., 0., 1.]], dtype=torch.float64),
 tensor([127500., 106000., 178100., 140000.], dtype=torch.float64))