Data Cleaning with Pandas

This repository contains practical examples and exercises for data cleaning using Python's Pandas library.

Overview

Data cleaning is a crucial step in the data analysis pipeline. This project demonstrates common data cleaning techniques including handling missing values, removing duplicates, standardizing formats, and detecting outliers.

Example Dataset

The main dataset includes employee information with the following fields:

Name - Employee names (with various formatting issues)
Join_Date - Joining dates (in multiple formats)
Salary - Salary information (with currency symbols and missing values)
Age - Age (includes outliers for demonstration)

Data Cleaning Techniques Demonstrated

1. String Cleaning

df['Name'] = df['Name'].str.strip()

Removes leading and trailing whitespace from names.

2. Currency Format Standardization

df['Salary'] = df['Salary'].astype(str).str.replace(r'[$,]','', regex=True)
df['Salary'] = pd.to_numeric(df['Salary'], errors='coerce')

Removes currency symbols and converts to numeric format, handling parsing errors gracefully.

3. Missing Value Imputation

median_val = df['Salary'].median()
df['Salary'] = df['Salary'].fillna(median_val)

Fills missing salary values with the median to maintain data distribution.

Requirements

Python 3.x
pandas
numpy

Installation

pip install pandas numpy

Usage

Run the practical examples:

python practicals.py

Key Takeaways

Always inspect data for inconsistencies before analysis
Use appropriate imputation methods for missing values
Handle data type conversions carefully with error handling
Strip whitespace and standardize formats for consistency
Be aware of outliers in your dataset

License

This project is for educational purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
cpn assessment.py		cpn assessment.py
data_cleaning.py		data_cleaning.py
data_cleaning_2.py		data_cleaning_2.py
excer.py		excer.py
practicals.py		practicals.py
work.py		work.py
wrangling_2.py		wrangling_2.py
wrangling_project.py		wrangling_project.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Cleaning with Pandas

Overview

Contents

Example Dataset

Data Cleaning Techniques Demonstrated

1. String Cleaning

2. Currency Format Standardization

3. Missing Value Imputation

Requirements

Installation

Usage

Key Takeaways

License

About

Uh oh!

Releases

Packages

Languages

Cutie-Ice/Data-Cleaning

Folders and files

Latest commit

History

Repository files navigation

Data Cleaning with Pandas

Overview

Contents

Example Dataset

Data Cleaning Techniques Demonstrated

1. String Cleaning

2. Currency Format Standardization

3. Missing Value Imputation

Requirements

Installation

Usage

Key Takeaways

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages