Skip to content

abhisar9149/Excel-Data-Cleaning-Automation-in-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿงน Excel Data Cleaning Automation in Python

This project automates the process of cleaning and validating an Excel dataset using Python and pandas. It was designed for a real-world sales dataset and performs column-wise validation including ID formats, categorical standardization, numeric conversion, date parsing, and derived column verification.


๐Ÿ“ Dataset

  • Input File: sales.xlsx (expected to be placed in the data/ directory)
  • Output File: cleaned_sales.xlsx (or with a timestamp)

๐Ÿš€ Features

  • โœ… Drop Duplicates from the dataset
  • ๐Ÿ” Validate IDs (e.g., must match TXN_1234567 format)
  • ๐Ÿท๏ธ Categorical Text Cleaning (standardize case, validate against allowed values)
  • ๐Ÿ”ข Numeric Column Validation (convert invalid entries to NaN)
  • ๐Ÿ“… Datetime Parsing (invalid dates converted to NaT)
  • ๐Ÿงฎ Derived Column Check: Validates if Total Spent = Quantity ร— Price Per Unit and fixes incorrect values
  • ๐Ÿชต Column-wise Summary Logs printed for transparency
  • ๐Ÿ“ Well-documented and modular code using functions and docstrings

๐Ÿงช Validation Summary Example

--- Validation Report: Total Spent ---
Invalid rows found: 123
Sample invalid values:
   Quantity  Price Per Unit  Total Spent
0         3             5.0           14
1         2             7.0           15

๐Ÿ“Š Before vs After Example

๐ŸŸฅ Before Cleaning:

Before Cleaning

๐ŸŸฉ After Cleaning:

After Cleaning

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published