Skip to content

This project demonstrates manual data preprocessing techniques in Python using Pandas and NumPy. It focuses on three key tasks: Ordinal Encoding , One Hot Encoding , Mean Imputation

Notifications You must be signed in to change notification settings

adityafilesx/Data-Preprocessing-in-Python

Repository files navigation

Data Preprocessing with Python (Manual Encoding & Imputation)

Welcome to my Data Preprocessing Project

This repository demonstrates core data preprocessing techniques in Python using Pandas, NumPy, and basic Python — all done manually without relying on any pre-built functions or libraries like get_dummies or sklearn.

It's perfect for beginners who want to understand how encoding and imputation work under the hood.


Project Overview

Data preprocessing is a crucial step in any Machine Learning pipeline. It ensures that the dataset is clean, consistent, and ready for modelling.

In this project, we focus on three main techniques:

  1. Ordinal Encoding

    • Converting categorical data with natural order (like education levels) into numbers.
    • Example: 12th Pass = 1, Graduate = 2, Post-Graduate = 3.
  2. One Hot Encoding

    • Converting categorical data with no specific order into separate binary columns (0/1).
    • Example: Cities like Delhi, Mumbai, Bangalore → City_Delhi, City_Mumbai, City_Bangalore.
  3. Imputation (Mean)

    • Handling missing values (NaN) by replacing them with the mean of the column.
    • Ensures there are no gaps in the dataset for numerical analysis.

Dataset :

ID Name City Education Experience (Years) Salary (₹)
1 Amit Delhi Graduate 2 32000
2 Riya Mumbai Post-Graduate 5 54000
3 Sam Delhi 12th Pass NaN 25000
4 John Bangalore Graduate 3 NaN
5 Neha Mumbai Post-Graduate 4 58000
6 Arjun Delhi 12th Pass 1 NaN
7 Priya Bangalore Graduate NaN 41000

Techniques Implemented

  1. Ordinal Encoding → Mapping education levels to numeric values using basic Python.
  2. One Hot Encoding → Creating city dummy columns using loops.
  3. Mean Imputation → Filling missing values using manually calculated means.

About

This project demonstrates manual data preprocessing techniques in Python using Pandas and NumPy. It focuses on three key tasks: Ordinal Encoding , One Hot Encoding , Mean Imputation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages