# Week 12: Comprehensive Applications - Final Synthesis---**Date**: 2025-11-22  **Course**: BSMA1001 - Mathematics for Data Science I  **Level**: Foundation  **Week**: 12 of 12 (FINAL)  **Topic Area**: Applied Mathematics - End-to-End ML Applications---## 🎓 Course Completion Celebration!**Congratulations!** You've completed all 12 weeks of BSMA1001 Mathematics I!This final notebook synthesizes **everything** from Weeks 4-11 into practical machine learning applications:- Linear regression from scratch- Logistic regression with regularization- Gradient descent optimization- Feature engineering using mathematical functions- Complete ML pipeline implementation## Learning Path Summary**Weeks 4-11 Covered:**1. Polynomials & algebra2. Functions (composition, inverses)3. Exponential & logarithmic functions4. Trigonometric functions5. Sequences & series6. Limits & continuity7. **Derivatives** (optimization!)8. **Integration** (probability, expected values)**This Week:** Put it all together in real ML applications!## Prerequisites```pythonimport numpy as npimport matplotlib.pyplot as pltimport seaborn as snsfrom sklearn.datasets import make_regression, make_classification, load_irisfrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import StandardScalerfrom sklearn.metrics import mean_squared_error, accuracy_score, confusion_matrix```

In [None]:
import numpy as npimport matplotlib.pyplot as pltimport seaborn as snsfrom sklearn.datasets import make_regression, make_classification, load_irisfrom sklearn.model_split import train_test_splitfrom sklearn.preprocessing import StandardScalerfrom sklearn.metrics import mean_squared_error, accuracy_score, confusion_matriximport warningswarnings.filterwarnings('ignore')# Set random seednp.random.seed(42)# Configure plottingplt.style.use('seaborn-v0_8-darkgrid')sns.set_palette("husl")print("✓ Libraries imported successfully")print(f"NumPy version: {np.__version__}")print()print("🚀 Ready to build ML models from scratch using Weeks 4-11 concepts!")

## 1. Linear Regression from Scratch**Model:** $h_\theta(x) = \theta_0 + \theta_1 x_1 + \cdots + \theta_n x_n = \theta^T x$**Loss (MSE):** $L(\theta) = \frac{1}{2m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)})^2$**Gradient:** $\nabla L = \frac{1}{m} X^T (X\theta - y)$**Update:** $\theta := \theta - \alpha \nabla L$

In [None]:
class LinearRegressionGD:    """Linear Regression using Gradient Descent."""        def __init__(self, learning_rate=0.01, n_iterations=1000, tol=1e-6):        self.lr = learning_rate        self.n_iters = n_iterations        self.tol = tol        self.theta = None        self.loss_history = []            def fit(self, X, y):        """Train model using gradient descent."""        # Add bias term (column of ones)        m, n = X.shape        X_b = np.c_[np.ones((m, 1)), X]  # Add x0=1                # Initialize parameters        self.theta = np.zeros(n + 1)                # Gradient descent        for iteration in range(self.n_iters):            # Predictions            predictions = X_b @ self.theta                        # Loss            loss = (1 / (2 * m)) * np.sum((predictions - y) ** 2)            self.loss_history.append(loss)                        # Gradient            gradient = (1 / m) * X_b.T @ (predictions - y)                        # Update parameters            self.theta -= self.lr * gradient                        # Check convergence            if iteration > 0 and abs(self.loss_history[-2] - self.loss_history[-1]) < self.tol:                print(f"Converged at iteration {iteration}")                break                return self        def predict(self, X):        """Make predictions."""        X_b = np.c_[np.ones((X.shape[0], 1)), X]        return X_b @ self.theta# Generate synthetic dataX, y = make_regression(n_samples=100, n_features=1, noise=15, random_state=42)# Split dataX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# Train modelmodel = LinearRegressionGD(learning_rate=0.01, n_iterations=1000)model.fit(X_train, y_train)# Predictionsy_pred_train = model.predict(X_train)y_pred_test = model.predict(X_test)# Evaluatetrain_rmse = np.sqrt(mean_squared_error(y_train, y_pred_train))test_rmse = np.sqrt(mean_squared_error(y_test, y_pred_test))print(f"\nTrain RMSE: {train_rmse:.4f}")print(f"Test RMSE:  {test_rmse:.4f}")print(f"\nLearned parameters: θ₀={model.theta[0]:.4f}, θ₁={model.theta[1]:.4f}")# Visualizefig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))# Plot 1: Regression lineax1.scatter(X_train, y_train, alpha=0.6, label='Training data')ax1.scatter(X_test, y_test, alpha=0.6, color='red', label='Test data')X_plot = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)y_plot = model.predict(X_plot)ax1.plot(X_plot, y_plot, 'b-', linewidth=2.5, label='Fitted line')ax1.set_xlabel('X', fontsize=12)ax1.set_ylabel('y', fontsize=12)ax1.set_title('Linear Regression: Data + Fitted Line', fontsize=13, fontweight='bold')ax1.legend(fontsize=11)ax1.grid(True, alpha=0.3)# Plot 2: Loss convergenceax2.plot(model.loss_history, 'purple', linewidth=2)ax2.set_xlabel('Iteration', fontsize=12)ax2.set_ylabel('Loss (MSE)', fontsize=12)ax2.set_title('Gradient Descent Convergence', fontsize=13, fontweight='bold')ax2.grid(True, alpha=0.3)plt.tight_layout()plt.show()print("\n✓ Linear regression from scratch complete!")

## 2. Feature Engineering with Mathematical Functions**Idea:** Transform features using functions from Weeks 4-7 to capture nonlinear relationships.**Polynomial features** (Week 4): $x, x^2, x^3, \ldots$  **Logarithmic features** (Week 6): $\log(x)$  **Exponential features** (Week 6): $e^x$  **Trigonometric features** (Week 7): $\sin(x), \cos(x)$

In [None]:
def create_polynomial_features(X, degree):    """Create polynomial features up to given degree."""    X_poly = X.copy()    for d in range(2, degree + 1):        X_poly = np.column_stack([X_poly, X ** d])    return X_poly# Generate nonlinear datanp.random.seed(42)X_nonlin = np.linspace(-3, 3, 100).reshape(-1, 1)y_nonlin = 0.5 * X_nonlin**3 - 2 * X_nonlin + np.random.randn(100, 1) * 2# SplitX_train_nl, X_test_nl, y_train_nl, y_test_nl = train_test_split(    X_nonlin, y_nonlin, test_size=0.2, random_state=42)# Compare linear vs polynomial regressiondegrees = [1, 2, 3, 5]fig, axes = plt.subplots(2, 2, figsize=(14, 10))axes = axes.flatten()for idx, degree in enumerate(degrees):    ax = axes[idx]        # Create polynomial features    X_train_poly = create_polynomial_features(X_train_nl, degree)    X_test_poly = create_polynomial_features(X_test_nl, degree)        # Train model    model_poly = LinearRegressionGD(learning_rate=0.001, n_iterations=2000)    model_poly.fit(X_train_poly, y_train_nl.ravel())        # Predictions    y_pred_train = model_poly.predict(X_train_poly)    y_pred_test = model_poly.predict(X_test_poly)        # Evaluate    train_rmse = np.sqrt(mean_squared_error(y_train_nl, y_pred_train))    test_rmse = np.sqrt(mean_squared_error(y_test_nl, y_pred_test))        # Plot    ax.scatter(X_train_nl, y_train_nl, alpha=0.6, label='Train data')    ax.scatter(X_test_nl, y_test_nl, alpha=0.6, color='red', label='Test data')        # Fitted curve    X_plot = np.linspace(-3, 3, 300).reshape(-1, 1)    X_plot_poly = create_polynomial_features(X_plot, degree)    y_plot = model_poly.predict(X_plot_poly)    ax.plot(X_plot, y_plot, 'b-', linewidth=2.5, label=f'Degree {degree}')        ax.set_title(f'Degree {degree}: Train RMSE={train_rmse:.2f}, Test RMSE={test_rmse:.2f}',                fontsize=11, fontweight='bold')    ax.set_xlabel('X', fontsize=11)    ax.set_ylabel('y', fontsize=11)    ax.legend(fontsize=9)    ax.grid(True, alpha=0.3)plt.tight_layout()plt.show()print("\n📊 Observation: Degree 3 fits well (true underlying function is cubic)")print("   Degree 1 underfits, degree 5 might overfit with more data points!")

## 3. Logistic Regression for Classification**Sigmoid activation** (Week 10): $\sigma(z) = \frac{1}{1+e^{-z}}$**Model:** $h_\theta(x) = \sigma(\theta^T x)$**Loss (Cross-Entropy):** $L(\theta) = -\frac{1}{m} \sum [y \log(h_\theta(x)) + (1-y)\log(1-h_\theta(x))]$**Gradient:** $\nabla L = \frac{1}{m} X^T (\sigma(X\theta) - y)$Note: Same form as linear regression despite different loss function!

In [None]:
class LogisticRegressionGD:    """Logistic Regression using Gradient Descent."""        def __init__(self, learning_rate=0.01, n_iterations=1000, tol=1e-6):        self.lr = learning_rate        self.n_iters = n_iterations        self.tol = tol        self.theta = None        self.loss_history = []        def sigmoid(self, z):        """Sigmoid activation function."""        return 1 / (1 + np.exp(-z))        def compute_loss(self, X_b, y):        """Compute binary cross-entropy loss."""        m = len(y)        h = self.sigmoid(X_b @ self.theta)        # Avoid log(0)        epsilon = 1e-8        loss = -(1/m) * (y @ np.log(h + epsilon) + (1 - y) @ np.log(1 - h + epsilon))        return loss        def fit(self, X, y):        """Train model using gradient descent."""        m, n = X.shape        X_b = np.c_[np.ones((m, 1)), X]                # Initialize parameters        self.theta = np.zeros(n + 1)                # Gradient descent        for iteration in range(self.n_iters):            # Predictions (probabilities)            h = self.sigmoid(X_b @ self.theta)                        # Loss            loss = self.compute_loss(X_b, y)            self.loss_history.append(loss)                        # Gradient            gradient = (1 / m) * X_b.T @ (h - y)                        # Update parameters            self.theta -= self.lr * gradient                        # Check convergence            if iteration > 0 and abs(self.loss_history[-2] - self.loss_history[-1]) < self.tol:                print(f"Converged at iteration {iteration}")                break                return self        def predict_proba(self, X):        """Predict probabilities."""        X_b = np.c_[np.ones((X.shape[0], 1)), X]        return self.sigmoid(X_b @ self.theta)        def predict(self, X, threshold=0.5):        """Predict class labels."""        return (self.predict_proba(X) >= threshold).astype(int)# Generate classification dataX_clf, y_clf = make_classification(    n_samples=200, n_features=2, n_redundant=0, n_informative=2,    n_clusters_per_class=1, random_state=42)# SplitX_train_clf, X_test_clf, y_train_clf, y_test_clf = train_test_split(    X_clf, y_clf, test_size=0.2, random_state=42)# Feature scaling (important for gradient descent!)scaler = StandardScaler()X_train_scaled = scaler.fit_transform(X_train_clf)X_test_scaled = scaler.transform(X_test_clf)# Train modellog_model = LogisticRegressionGD(learning_rate=0.1, n_iterations=1000)log_model.fit(X_train_scaled, y_train_clf)# Predictionsy_pred_train = log_model.predict(X_train_scaled)y_pred_test = log_model.predict(X_test_scaled)# Evaluatetrain_acc = accuracy_score(y_train_clf, y_pred_train)test_acc = accuracy_score(y_test_clf, y_pred_test)print(f"\nTrain Accuracy: {train_acc:.4f}")print(f"Test Accuracy:  {test_acc:.4f}")# Visualize decision boundaryfig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))# Plot 1: Decision boundaryx_min, x_max = X_train_scaled[:, 0].min() - 0.5, X_train_scaled[:, 0].max() + 0.5y_min, y_max = X_train_scaled[:, 1].min() - 0.5, X_train_scaled[:, 1].max() + 0.5xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),                     np.linspace(y_min, y_max, 200))Z = log_model.predict(np.c_[xx.ravel(), yy.ravel()])Z = Z.reshape(xx.shape)ax1.contourf(xx, yy, Z, alpha=0.3, levels=1, colors=['red', 'blue'])ax1.scatter(X_train_scaled[y_train_clf == 0, 0], X_train_scaled[y_train_clf == 0, 1],           c='red', marker='o', s=50, edgecolors='k', label='Class 0')ax1.scatter(X_train_scaled[y_train_clf == 1, 0], X_train_scaled[y_train_clf == 1, 1],           c='blue', marker='s', s=50, edgecolors='k', label='Class 1')ax1.set_xlabel('Feature 1 (scaled)', fontsize=12)ax1.set_ylabel('Feature 2 (scaled)', fontsize=12)ax1.set_title(f'Logistic Regression Decision Boundary (Acc={test_acc:.3f})',              fontsize=13, fontweight='bold')ax1.legend(fontsize=11)ax1.grid(True, alpha=0.3)# Plot 2: Loss convergenceax2.plot(log_model.loss_history, 'purple', linewidth=2)ax2.set_xlabel('Iteration', fontsize=12)ax2.set_ylabel('Loss (Cross-Entropy)', fontsize=12)ax2.set_title('Training Loss Convergence', fontsize=13, fontweight='bold')ax2.grid(True, alpha=0.3)plt.tight_layout()plt.show()print("\n✓ Logistic regression with gradient descent complete!")

## 4. Regularization to Prevent Overfitting**L2 Regularization (Ridge):** Add penalty $\frac{\lambda}{2m} \sum \theta_j^2$**Modified loss:** $L_{\text{reg}}(\theta) = L(\theta) + \frac{\lambda}{2m} \sum_{j=1}^{n} \theta_j^2$**Gradient:** $\nabla L_{\text{reg}} = \nabla L + \frac{\lambda}{m} \theta$**Effect:** Shrinks weights, reduces model complexity, improves generalization.

In [None]:
class RidgeRegressionGD(LinearRegressionGD):    """Linear Regression with L2 Regularization (Ridge)."""        def __init__(self, learning_rate=0.01, n_iterations=1000, tol=1e-6, lambda_reg=0.1):        super().__init__(learning_rate, n_iterations, tol)        self.lambda_reg = lambda_reg        def fit(self, X, y):        """Train with L2 regularization."""        m, n = X.shape        X_b = np.c_[np.ones((m, 1)), X]                self.theta = np.zeros(n + 1)                for iteration in range(self.n_iters):            predictions = X_b @ self.theta                        # Loss with L2 penalty            mse_loss = (1 / (2 * m)) * np.sum((predictions - y) ** 2)            reg_penalty = (self.lambda_reg / (2 * m)) * np.sum(self.theta[1:]**2)  # Don't regularize bias            loss = mse_loss + reg_penalty            self.loss_history.append(loss)                        # Gradient with regularization            gradient = (1 / m) * X_b.T @ (predictions - y)            gradient[1:] += (self.lambda_reg / m) * self.theta[1:]  # Regularize only weights                        self.theta -= self.lr * gradient                        if iteration > 0 and abs(self.loss_history[-2] - self.loss_history[-1]) < self.tol:                break                return self# Generate high-degree polynomial features (prone to overfitting)X_over = np.linspace(-1, 1, 30).reshape(-1, 1)y_over = 2 * X_over + np.random.randn(30, 1) * 0.3X_train_over, X_test_over, y_train_over, y_test_over = train_test_split(    X_over, y_over, test_size=0.3, random_state=42)degree = 15  # Very high degree!X_train_poly_over = create_polynomial_features(X_train_over, degree)X_test_poly_over = create_polynomial_features(X_test_over, degree)# Compare unregularized vs regularizedlambdas = [0, 0.01, 0.1, 1.0]fig, axes = plt.subplots(2, 2, figsize=(14, 10))axes = axes.flatten()for idx, lam in enumerate(lambdas):    ax = axes[idx]        if lam == 0:        model = LinearRegressionGD(learning_rate=0.0001, n_iterations=5000)    else:        model = RidgeRegressionGD(learning_rate=0.0001, n_iterations=5000, lambda_reg=lam)        model.fit(X_train_poly_over, y_train_over.ravel())        y_pred_train = model.predict(X_train_poly_over)    y_pred_test = model.predict(X_test_poly_over)        train_rmse = np.sqrt(mean_squared_error(y_train_over, y_pred_train))    test_rmse = np.sqrt(mean_squared_error(y_test_over, y_pred_test))        # Plot    ax.scatter(X_train_over, y_train_over, alpha=0.7, label='Train')    ax.scatter(X_test_over, y_test_over, alpha=0.7, color='red', label='Test')        X_plot = np.linspace(-1, 1, 200).reshape(-1, 1)    X_plot_poly = create_polynomial_features(X_plot, degree)    y_plot = model.predict(X_plot_poly)    ax.plot(X_plot, y_plot, 'b-', linewidth=2.5, label=f'λ={lam}')        title = f'λ={lam}: Train RMSE={train_rmse:.3f}, Test RMSE={test_rmse:.3f}'    if lam == 0:        title += ' (OVERFIT!)'    ax.set_title(title, fontsize=11, fontweight='bold')    ax.set_xlabel('X', fontsize=11)    ax.set_ylabel('y', fontsize=11)    ax.legend(fontsize=9)    ax.grid(True, alpha=0.3)    ax.set_ylim(-2, 2)plt.tight_layout()plt.show()print("\n📊 Observation: λ=0 (no regularization) overfits wildly!")print("   Higher λ values smooth the curve and improve test performance.")

## 5. Complete ML Pipeline: Iris Classification**Real dataset:** Iris flowers (3 classes, 4 features)**Pipeline steps:**1. Load and explore data2. Train/test split3. Feature scaling4. Train model5. Evaluate with confusion matrix6. Visualize resultsWe'll use one-vs-all logistic regression for multiclass classification.

In [None]:
# Load Iris datasetiris = load_iris()X_iris = iris.datay_iris = iris.target# Use only 2 classes for binary classificationbinary_mask = y_iris != 2X_binary = X_iris[binary_mask]y_binary = y_iris[binary_mask]# Use only 2 features for visualizationX_vis = X_binary[:, :2]# SplitX_train_iris, X_test_iris, y_train_iris, y_test_iris = train_test_split(    X_vis, y_binary, test_size=0.3, random_state=42)# Scale featuresscaler_iris = StandardScaler()X_train_iris_scaled = scaler_iris.fit_transform(X_train_iris)X_test_iris_scaled = scaler_iris.transform(X_test_iris)# Train logistic regressioniris_model = LogisticRegressionGD(learning_rate=0.1, n_iterations=1000)iris_model.fit(X_train_iris_scaled, y_train_iris)# Predictionsy_pred_iris = iris_model.predict(X_test_iris_scaled)# Evaluatetest_acc_iris = accuracy_score(y_test_iris, y_pred_iris)cm = confusion_matrix(y_test_iris, y_pred_iris)print("\n" + "="*70)print("IRIS CLASSIFICATION RESULTS")print("="*70)print(f"\nTest Accuracy: {test_acc_iris:.4f}")print(f"\nConfusion Matrix:")print(cm)print(f"\nTrue Negatives:  {cm[0,0]}")print(f"False Positives: {cm[0,1]}")print(f"False Negatives: {cm[1,0]}")print(f"True Positives:  {cm[1,1]}")# Visualizefig, axes = plt.subplots(1, 3, figsize=(18, 5))# Plot 1: Training dataax = axes[0]ax.scatter(X_train_iris_scaled[y_train_iris == 0, 0],           X_train_iris_scaled[y_train_iris == 0, 1],          c='red', marker='o', s=60, edgecolors='k', label='Setosa')ax.scatter(X_train_iris_scaled[y_train_iris == 1, 0],           X_train_iris_scaled[y_train_iris == 1, 1],          c='blue', marker='s', s=60, edgecolors='k', label='Versicolor')ax.set_xlabel('Sepal Length (scaled)', fontsize=12)ax.set_ylabel('Sepal Width (scaled)', fontsize=12)ax.set_title('Training Data', fontsize=13, fontweight='bold')ax.legend(fontsize=11)ax.grid(True, alpha=0.3)# Plot 2: Decision boundaryax = axes[1]x_min, x_max = X_train_iris_scaled[:, 0].min() - 0.5, X_train_iris_scaled[:, 0].max() + 0.5y_min, y_max = X_train_iris_scaled[:, 1].min() - 0.5, X_train_iris_scaled[:, 1].max() + 0.5xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),                     np.linspace(y_min, y_max, 200))Z = iris_model.predict(np.c_[xx.ravel(), yy.ravel()])Z = Z.reshape(xx.shape)ax.contourf(xx, yy, Z, alpha=0.3, levels=1, colors=['red', 'blue'])ax.scatter(X_test_iris_scaled[y_test_iris == 0, 0],           X_test_iris_scaled[y_test_iris == 0, 1],          c='red', marker='o', s=60, edgecolors='k', label='Setosa')ax.scatter(X_test_iris_scaled[y_test_iris == 1, 0],           X_test_iris_scaled[y_test_iris == 1, 1],          c='blue', marker='s', s=60, edgecolors='k', label='Versicolor')ax.set_xlabel('Sepal Length (scaled)', fontsize=12)ax.set_ylabel('Sepal Width (scaled)', fontsize=12)ax.set_title(f'Decision Boundary (Acc={test_acc_iris:.3f})',             fontsize=13, fontweight='bold')ax.legend(fontsize=11)ax.grid(True, alpha=0.3)# Plot 3: Confusion matrix heatmapax = axes[2]sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', cbar=False, ax=ax,           xticklabels=['Setosa', 'Versicolor'],           yticklabels=['Setosa', 'Versicolor'])ax.set_xlabel('Predicted', fontsize=12)ax.set_ylabel('Actual', fontsize=12)ax.set_title('Confusion Matrix', fontsize=13, fontweight='bold')plt.tight_layout()plt.show()print("\n✓ Complete ML pipeline on real data successful!")

## 6. 🎓 Final Course Summary### Mathematical Concepts → ML Applications| Week | Concept | ML Application ||------|---------|----------------|| 4 | Polynomials | Polynomial regression, feature engineering || 5 | Functions | Model as function approximation $h: X \rightarrow Y$ || 6 | Exponential/Log | Sigmoid ($\sigma$), softmax, log-loss || 7 | Trigonometry | Fourier features, cosine similarity || 8 | Series | Taylor approximations, weight initialization || 9 | Limits/Continuity | Loss surface analysis, gradient existence || 10 | **Derivatives** | Gradient descent, backpropagation || 11 | **Integration** | Expected values, probability distributions |### What You've Accomplished✅ **Built ML models from scratch:**- Linear regression with gradient descent- Logistic regression for classification- Ridge regression with L2 regularization✅ **Applied mathematical concepts:**- Used derivatives for optimization (gradient descent)- Employed polynomial features for nonlinear modeling- Implemented sigmoid activation function- Computed cross-entropy loss with logarithms- Scaled features for better convergence✅ **Understood the theory:**- Why gradient descent minimizes loss- How learning rate affects convergence- Why regularization prevents overfitting- Connection between calculus and ML### Next Steps in Your Journey**Foundation Level (Continue):**- Statistics I & II (probability, inference)- Python Programming (data structures, algorithms)- Computational Thinking (problem-solving)**Diploma Level (After Foundation):**- Linear Algebra (matrices, eigenvectors - CRITICAL for ML!)- Probability Theory (Bayesian methods)- Machine Learning (advanced algorithms)**Keep Practicing:**- Implement more algorithms (decision trees, SVM, neural networks)- Work on Kaggle competitions- Build end-to-end projects- Read research papers---## 🚀 Congratulations on Completing BSMA1001!You've built a **solid mathematical foundation** for data science. The calculus, functions, and series you mastered are the **engines of modern AI**. Every neural network, every optimization algorithm, every probabilistic model relies on these principles.**You're now ready to:**- Understand ML research papers- Implement algorithms from mathematical descriptions- Debug models using mathematical intuition- Contribute to data science projects**Keep learning, keep coding, and enjoy the journey!** 🎉---*Thank you for your dedication to learning mathematics for data science. The skills you've developed will serve you throughout your career in AI/ML!*