Skip to content

Ashcoder666/Data-Automation-Toolkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📊 Data Automation Toolkit

A beginner-friendly data cleaning and automation toolkit built with Python.
This project uses real-world datasets (Titanic, Netflix, and Superstore) to demonstrate how to clean, transform, and export data in multiple formats (CSV, JSON, Excel).


🚀 Features

  • Handle missing/null values (drop, fill, interpolate).
  • Normalize text (case formatting, remove duplicates).
  • Convert column types (dates, numbers, categorical).
  • Perform filtering, sorting, and aggregations.
  • Create derived columns (e.g., profit margin, categories).
  • Import/export datasets to CSV, JSON, Excel.
  • Command-line utility to clean datasets with a single command.

📊 Datasets Used


⚡ Installation

Clone this repo and install dependencies:

git clone https://github.com/your-username/data-automation-toolkit.git
cd data-automation-toolkit
pip install -r requirements.txt
🛠 Usage
1️⃣ Run from CLI

python clean_data.py data/titanic.csv --export json
Options:

--export csv → save as CSV

--export json → save as JSON

--export excel → save as Excel

2️⃣ Use in Jupyter Notebook

import pandas as pd
from utils.clean_utils import clean_dataframe
from utils.file_utils import export_data

df = pd.read_csv("data/netflix.csv")
df_clean = clean_dataframe(df)
export_data(df_clean, "output/netflix_clean.json", "json")
📦 Requirements
Python 3.9+

Pandas

OpenPyXL

Jupyter Notebook

Install via:


pip install -r requirements.txt
📸 Example Output
Name	Age	Survived
Jack Dawson	20	0
Rose DeWitt	19	1
Thomas Andrews	39	0

Cleaned Titanic dataset sample (missing ages handled, casing normalized).

🔮 Roadmap
Add automated data profiling report (using pandas-profiling).

Extend CLI to support multiple datasets at once.

Add visualization utilities for summary statistics.

🤝 Contributing
Contributions are welcome!

Fork the repo

Create a feature branch (git checkout -b feature-new)

Commit changes (git commit -m 'Add new feature')

Push to branch & open a Pull Request

📜 License
This project is licensed under the MIT License.

About

Toolkit to clean, transform, and export datasets (CSV, JSON, Excel).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published