Skip to content

Cutie-Ice/Data-Cleaning

Repository files navigation

Data Cleaning with Pandas

This repository contains practical examples and exercises for data cleaning using Python's Pandas library.

Overview

Data cleaning is a crucial step in the data analysis pipeline. This project demonstrates common data cleaning techniques including handling missing values, removing duplicates, standardizing formats, and detecting outliers.

Contents

  • practicals.py - Hands-on data cleaning examples covering:
    • String cleaning (trimming whitespace)
    • Currency format standardization
    • Missing value imputation using median
    • Data type conversions

Example Dataset

The main dataset includes employee information with the following fields:

  • Name - Employee names (with various formatting issues)
  • Join_Date - Joining dates (in multiple formats)
  • Salary - Salary information (with currency symbols and missing values)
  • Age - Age (includes outliers for demonstration)

Data Cleaning Techniques Demonstrated

1. String Cleaning

df['Name'] = df['Name'].str.strip()

Removes leading and trailing whitespace from names.

2. Currency Format Standardization

df['Salary'] = df['Salary'].astype(str).str.replace(r'[$,]','', regex=True)
df['Salary'] = pd.to_numeric(df['Salary'], errors='coerce')

Removes currency symbols and converts to numeric format, handling parsing errors gracefully.

3. Missing Value Imputation

median_val = df['Salary'].median()
df['Salary'] = df['Salary'].fillna(median_val)

Fills missing salary values with the median to maintain data distribution.

Requirements

  • Python 3.x
  • pandas
  • numpy

Installation

pip install pandas numpy

Usage

Run the practical examples:

python practicals.py

Key Takeaways

  • Always inspect data for inconsistencies before analysis
  • Use appropriate imputation methods for missing values
  • Handle data type conversions carefully with error handling
  • Strip whitespace and standardize formats for consistency
  • Be aware of outliers in your dataset

License

This project is for educational purposes.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages