In [None]:
{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Modeling and Evaluation\n",
        "\n",
        "This notebook trains a linear regression model to predict `weeks_to_goal` and evaluates its performance using metrics like R², RMSE, and MAE."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Import Libraries"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import pandas as pd\n",
        "from sklearn.linear_model import LinearRegression\n",
        "from sklearn.model_selection import train_test_split\n",
        "from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error\n",
        "import matplotlib.pyplot as plt\n",
        "import seaborn as sns\n",
        "import pickle\n",
        "\n",
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Load Dataset\n",
        "\n",
        "Load the engineered dataset from the previous notebook."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "df = pd.read_csv('engineered_fitness_data.csv')\n",
        "df"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Prepare Data\n",
        "\n",
        "Split the data into features (X) and target (y), using all features including `weight_difference`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "X = df[['current_weight', 'exercise_hours', 'calorie_intake', 'weight_difference']]\n",
        "y = df['weeks_to_goal']\n",
        "\n",
        "# Split into training and testing sets (80/20)\n",
        "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Train Model\n",
        "\n",
        "Train a linear regression model."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "model = LinearRegression()\n",
        "model.fit(X_train, y_train)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Evaluate Model\n",
        "\n",
        "Calculate R², RMSE, and MAE for training and testing sets."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Predictions\n",
        "y_train_pred = model.predict(X_train)\n",
        "y_test_pred = model.predict(X_test)\n",
        "\n",
        "# Metrics\n",
        "train_r2 = r2_score(y_train, y_train_pred)\n",
        "test_r2 = r2_score(y_test, y_test_pred)\n",
        "train_rmse = mean_squared_error(y_train, y_train_pred, squared=False)\n",
        "test_rmse = mean_squared_error(y_test, y_test_pred, squared=False)\n",
        "train_mae = mean_absolute_error(y_train, y_train_pred)\n",
        "test_mae = mean_absolute_error(y_test, y_test_pred)\n",
        "\n",
        "print(f'Training R²: {train_r2:.3f}')\n",
        "print(f'Testing R²: {test_r2:.3f}')\n",
        "print(f'Training RMSE: {train_rmse:.3f}')\n",
        "print(f'Testing RMSE: {test_rmse:.3f}')\n",
        "print(f'Training MAE: {train_mae:.3f}')\n",
        "print(f'Testing MAE: {test_mae:.3f}')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Visualize Predictions\n",
        "\n",
        "Plot predicted vs. actual values for the test set."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Set up the plotting style\n",
        "sns.set(style='whitegrid')\n",
        "\n",
        "# Scatter plot of predicted vs. actual\n",
        "plt.figure(figsize=(6, 4))\n",
        "plt.scatter(y_test, y_test_pred, alpha=0.6)\n",
        "plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--')\n",
        "plt.xlabel('Actual Weeks to Goal')\n",
        "plt.ylabel('Predicted Weeks to Goal')\n",
        "plt.title('Predicted vs. Actual Weeks to Goal')\n",
        "plt.savefig('predicted_vs_actual.png')\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Save Model\n",
        "\n",
        "Save the trained model for use in the Flask app."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "with open('model.pkl', 'wb') as f:\n",
        "    pickle.dump(model, f)\n",
        "print('Model saved as model.pkl')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Summary\n",
        "\n",
        "The linear regression model was trained on the engineered dataset, including `weight_difference`. Evaluation metrics (R², RMSE, MAE) indicate model performance, though the small dataset limits generalizability. The predicted vs. actual plot shows how well the model fits the test data."
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.10"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
}