Data Preprocessing with Python (Manual Encoding & Imputation)

Welcome to my Data Preprocessing Project

This repository demonstrates core data preprocessing techniques in Python using Pandas, NumPy, and basic Python — all done manually without relying on any pre-built functions or libraries like get_dummies or sklearn.

It's perfect for beginners who want to understand how encoding and imputation work under the hood.

Project Overview

Data preprocessing is a crucial step in any Machine Learning pipeline. It ensures that the dataset is clean, consistent, and ready for modelling.

In this project, we focus on three main techniques:

Ordinal Encoding
- Converting categorical data with natural order (like education levels) into numbers.
- Example: 12th Pass = 1, Graduate = 2, Post-Graduate = 3.
One Hot Encoding
- Converting categorical data with no specific order into separate binary columns (0/1).
- Example: Cities like Delhi, Mumbai, Bangalore → City_Delhi, City_Mumbai, City_Bangalore.
Imputation (Mean)
- Handling missing values (NaN) by replacing them with the mean of the column.
- Ensures there are no gaps in the dataset for numerical analysis.

Dataset :

ID	Name	City	Education	Experience (Years)	Salary (₹)
1	Amit	Delhi	Graduate	2	32000
2	Riya	Mumbai	Post-Graduate	5	54000
3	Sam	Delhi	12th Pass	NaN	25000
4	John	Bangalore	Graduate	3	NaN
5	Neha	Mumbai	Post-Graduate	4	58000
6	Arjun	Delhi	12th Pass	1	NaN
7	Priya	Bangalore	Graduate	NaN	41000

Techniques Implemented

Ordinal Encoding → Mapping education levels to numeric values using basic Python.
One Hot Encoding → Creating city dummy columns using loops.
Mean Imputation → Filling missing values using manually calculated means.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Imputation script.py		Imputation script.py
README.md		README.md
employee_data.csv		employee_data.csv
one hot encoding.py		one hot encoding.py
ordinal_encoding.py		ordinal_encoding.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Preprocessing with Python (Manual Encoding & Imputation)

About

Uh oh!

Releases

Packages

Languages

adityafilesx/Data-Preprocessing-in-Python

Folders and files

Latest commit

History

Repository files navigation

Data Preprocessing with Python (Manual Encoding & Imputation)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages