# 🧼 Data Preprocessing Toolbox for Advanced Data Scientists

## 📦 1. Missing Value Handling
- `df.isna()`, `df.dropna()`, `fillna()` – Identify and handle missing values
- `SimpleImputer` – Impute using mean/median/most frequent
- `KNNImputer` – Use K-Nearest Neighbors for imputation

## 📏 2. Feature Scaling & Transformation
- `StandardScaler` – Scale to mean=0, std=1
- `MinMaxScaler` – Scale to [0, 1]
- `RobustScaler` – Use median and IQR (robust to outliers)
- `Normalizer` – Normalize row vectors
- `QuantileTransformer` – Uniform or normal transformation
- `PowerTransformer` – Box-Cox or Yeo-Johnson
- Manual: `np.log`, `np.sqrt`

## 🏷️ 3. Encoding Categorical Variables
- `OneHotEncoder`
- `OrdinalEncoder`
- `LabelEncoder`
- `pd.get_dummies()`
- Target / Mean Encoding

## 🔍 4. Outlier Detection & Removal
- Z-score
- IQR (1.5×IQR rule)
- Winsorization
- `RobustScaler`
- Boxplots, Histograms

## 📐 5. Dimensionality Reduction & Feature Engineering
- `PCA`
- `TruncatedSVD`, `NMF`
- `PolynomialFeatures`
- `FeatureHasher`
- Date/time extraction

## 🎯 6. Feature Selection
- `SelectKBest`, `SelectPercentile`
- `RFE` (Recursive Feature Elimination)
- `Lasso`
- Tree model importances

## 🔬 7. Multicollinearity Detection
- `VIF` (Variance Inflation Factor)
- Correlation heatmap
- Condition number

## 🔁 8. Pipelines & Automation
- `Pipeline`
- `ColumnTransformer`
- `FunctionTransformer`

## 📊 9. Data Splitting & Cross-validation
- `train_test_split()`
- `KFold`, `StratifiedKFold`, `GroupKFold`, `TimeSeriesSplit`

## 📚 10. Text & NLP Preprocessing
- `TfidfVectorizer`, `CountVectorizer`
- `nltk`, `spaCy`
- `TextBlob`, `VADER`
- Text cleaning functions

## 📆 11. Time Series Preprocessing
- `pd.to_datetime()`
- `.dt.year`, `.dt.month`, etc.
- `resample()`, rolling stats, lag features
- Differencing

## ⚖️ 12. Imbalanced Data Handling
- `SMOTE`, `ADASYN`
- `RandomUnderSampler`, `RandomOverSampler`
- Class weights

## 🧠 13. EDA & Visualization
- `pandas_profiling`, `Sweetviz`
- `dtale`, `autoviz`
- `seaborn`, `matplotlib`, `plotly`
