🚀 Data-Prepez

A fast, scalable, and intelligent data preprocessing library for machine learning workflows.

Data-Prepez is an open-source Python library designed to simplify and accelerate the data preprocessing stage of machine learning. It automatically handles missing values, encoding, scaling, type detection, and supports tabular, text (NLP), and time series data — all while scaling efficiently to millions of rows.

📌 Features

✅ Automatic detection of numerical, categorical, and text features
✅ Missing value imputation (mean, median, mode, forward fill, etc.)
✅ One-hot and label encoding
✅ Standard, MinMax, and Robust scaling
✅ NLP preprocessing (cleaning, tokenization, lemmatization)
✅ Time series resampling, smoothing, windowing
✅ AutoPipeline: one-liner fit_transform() interface
✅ Modular and production-ready design
✅ Large dataset support (planned: Dask, Modin integration)
✅ Compatible with Pandas, NumPy, and Scikit-learn

🔧 Installation

📦 PyPI release coming soon!

For now, clone the repository:

https://github.com/Data-PrepeZ/Data_Prepez.git
cd Data-Prepez
pip install -r requirements.txt

⚡ Quick Start

from dataprepez import AutoPreprocessor
import pandas as pd

# Load dataset
df = pd.read_csv("your_data.csv")

# Initialize the preprocessor
prep = AutoPreprocessor(target='target_column', type='tabular')

# Run preprocessing
X_clean, y = prep.fit_transform(df)

🧪 Example Notebooks

Check out the examples/ folder:

🧹 tabular_demo.ipynb – Basic tabular preprocessing
📝 nlp_cleaning.ipynb – NLP cleaning pipeline (coming soon)
📈 timeseries_demo.ipynb – Time series preprocessing (coming soon)

🗂️ Project Structure

Click to expand

Data-Prepez/
├── dataprepez/
│   ├── tabular/
│   │   ├── __init__.py
│   ├── nlp/
│   │   ├── __init__.py
│   ├── timeseries/
│   │   ├── __init__.py
│   ├── core/
│   │   ├── __init__.py
│   │   └── preprocessor.py
│   └── __init__.py
├── tests/
│   └── test_preprocessor.py
├── examples/
│   └── tabular_demo.ipynb
├── README.md
├── setup.py
├── requirements.txt
└── LICENSE

🤝 Contributing

We welcome contributions from the community!
To contribute:

Fork this repository
Create your feature branch: git checkout -b feature/YourFeature
Commit your changes: git commit -m "Add your feature"
Push to the branch: git push origin feature/YourFeature
Open a pull request 🎉

👥 Team

Bala Mosay J – Team Lead, Core Developer
Allwyn Jeffo Raj A – NLP Module Developer
Rasik S – Time Series & Testing

📄 License

This project is licensed under the MIT License.

🌟 Show Your Support

If you like this project, please ⭐ star this repository and share it with others!

Clean your data, prep it like a pro — with Data-Prepez

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Data-Prepez

📌 Features

🔧 Installation

⚡ Quick Start

🧪 Example Notebooks

🗂️ Project Structure

🤝 Contributing

👥 Team

📄 License

🌟 Show Your Support

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

License

Data-PrepeZ/Data_Prepez

Folders and files

Latest commit

History

Repository files navigation

🚀 Data-Prepez

📌 Features

🔧 Installation

⚡ Quick Start

🧪 Example Notebooks

🗂️ Project Structure

🤝 Contributing

👥 Team

📄 License

🌟 Show Your Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages