
# 🤖 Training & Evaluating Machine Learning Models

This notebook provides **code templates and checklists** for **training, evaluating, and tuning ML models**.

### 🔹 What’s Covered:
- Splitting data and choosing the right model
- Training & evaluating regression and classification models
- Using cross-validation for better evaluation
- Hyperparameter tuning techniques


In [None]:

# Ensure required libraries are installed (Uncomment if necessary)
# !pip install pandas numpy sklearn



## 📂 Splitting Data for Model Training

✅ Always split data into **training and testing sets**.  
✅ Consider **stratified sampling** for imbalanced classification problems.  
✅ Avoid **data leakage** when preparing datasets.  


In [None]:

import pandas as pd
from sklearn.model_selection import train_test_split

# Load dataset (Replace with actual data)
df = pd.read_csv("your_dataset.csv")

# Define features (X) and target (y)
X = df.drop(columns=['target'])
y = df['target']

# Split dataset into 80% training and 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

print(f"Training data shape: {X_train.shape}, Testing data shape: {X_test.shape}")



## 📈 Training a Regression Model

✅ Use **Linear Regression** as a baseline model.  
✅ Check **Mean Squared Error (MSE)** and **R² Score** for evaluation.  


In [None]:

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse}, R² Score: {r2}")



## 🔍 Training a Classification Model

✅ Try a **Logistic Regression** or **Decision Tree** as a baseline.  
✅ Evaluate with **Accuracy, Precision, Recall, and F1-score**.  


In [None]:

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# Train a logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")
print(classification_report(y_test, y_pred))



## 🔄 Using Cross-Validation

✅ Use **cross-validation** to assess model performance on different subsets.  
✅ Helps prevent **overfitting**.  


In [None]:

from sklearn.model_selection import cross_val_score

# Perform cross-validation
scores = cross_val_score(model, X, y, cv=5, scoring="accuracy")
print(f"Cross-Validation Accuracy: {scores.mean():.2f} ± {scores.std():.2f}")



## ⚙️ Hyperparameter Tuning

✅ Use **Grid Search** for exhaustive search over hyperparameters.  
✅ Use **Randomized Search** for faster tuning.  


In [None]:

from sklearn.model_selection import GridSearchCV

# Define hyperparameter grid
param_grid = {
    "C": [0.1, 1, 10],
    "solver": ["lbfgs", "liblinear"]
}

# Perform grid search
grid_search = GridSearchCV(LogisticRegression(), param_grid, cv=5, scoring="accuracy")
grid_search.fit(X_train, y_train)

print(f"Best Parameters: {grid_search.best_params_}, Best Score: {grid_search.best_score_}")



## ✅ Best Practices & Common Pitfalls

- **Baseline first**: Always start with a simple model.  
- **Use cross-validation**: Prevents misleading results from a single test split.  
- **Check assumptions**: Ensure data distributions align with model expectations.  
- **Avoid overfitting**: Use regularization and hyperparameter tuning.  
