# Daily Blog #13 – Scikit-learn Cheatsheet
### May 13, 2025

---

## 1. Import Essentials

```python
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
```

---

## 2. Train/Test Split

```python
X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```

---

## 3. Preprocessing

### Standardization (feature scaling)

```python
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
```

### Label encoding (for categorical targets)

```python
le = LabelEncoder()
y = le.fit_transform(y)
```

---

## 4. Model Training (Common Algorithms)

```python
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier

model = RandomForestClassifier()
model.fit(X_train, y_train)
```

---

## 5. Predictions & Evaluation

```python
y_pred = model.predict(X_test)

accuracy_score(y_test, y_pred)
confusion_matrix(y_test, y_pred)
print(classification_report(y_test, y_pred))
```

---

## 6. Regression Models

```python
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score

reg = LinearRegression()
reg.fit(X_train, y_train)
y_pred = reg.predict(X_test)

mean_squared_error(y_test, y_pred)
r2_score(y_test, y_pred)
```

---

## 7. Cross-Validation

```python
from sklearn.model_selection import cross_val_score

scores = cross_val_score(model, X, y, cv=5)
print(scores.mean())
```

---

## 8. Model Tuning: Grid Search

```python
from sklearn.model_selection import GridSearchCV

params = {'n_estimators': [100, 200], 'max_depth': [None, 10]}
grid = GridSearchCV(RandomForestClassifier(), params, cv=5)
grid.fit(X_train, y_train)
best_model = grid.best_estimator_
```

---

## 9. Pipelines (Clean Workflow)

```python
from sklearn.pipeline import Pipeline

pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('clf', LogisticRegression())
])

pipeline.fit(X_train, y_train)
pipeline.predict(X_test)
```

---

## 10. Feature Selection

```python
from sklearn.feature_selection import SelectKBest, f_classif

selector = SelectKBest(score_func=f_classif, k=5)
X_new = selector.fit_transform(X, y)
```

---

## 11. Common Classification Models

| Algorithm              | Import                                                | Notes                             |
| ---------------------- | ----------------------------------------------------- | --------------------------------- |
| Logistic Regression    | `from sklearn.linear_model import LogisticRegression` | Good baseline, fast               |
| Decision Tree          | `from sklearn.tree import DecisionTreeClassifier`     | Easy to interpret                 |
| Random Forest          | `from sklearn.ensemble import RandomForestClassifier` | High performance, low overfitting |
| Support Vector Machine | `from sklearn.svm import SVC`                         | Good for high-dimensional         |
| K-Nearest Neighbors    | `from sklearn.neighbors import KNeighborsClassifier`  | Lazy learner                      |
| Naive Bayes            | `from sklearn.naive_bayes import GaussianNB`          | Text & categorical features       |

---

## 12. Common Regression Models

| Algorithm               | Import                                               |
| ----------------------- | ---------------------------------------------------- |
| Linear Regression       | `from sklearn.linear_model import LinearRegression`  |
| Ridge/Lasso             | `from sklearn.linear_model import Ridge/Lasso`       |
| Random Forest Regressor | `from sklearn.ensemble import RandomForestRegressor` |
| SVR                     | `from sklearn.svm import SVR`                        |

---

## Common Errors to Avoid

| Problem                 | Fix                                                            |
| ----------------------- | -------------------------------------------------------------- |
| Data not scaled         | Use `StandardScaler` or `MinMaxScaler`                         |
| Categorical not encoded | Use `LabelEncoder`, `OneHotEncoder`, or `pd.get_dummies()`     |
| Target data mismatch    | Make sure `y_train.shape == y_pred.shape`                      |
| Overfitting             | Use `cross_val_score`, `train_test_split`, or `regularization` |

---

## Bonus: Model Evaluation Metrics

| Task           | Metric                                                            |
| -------------- | ----------------------------------------------------------------- |
| Classification | `accuracy_score`, `f1_score`, `confusion_matrix`, `roc_auc_score` |
| Regression     | `mean_squared_error`, `mean_absolute_error`, `r2_score`           |