In [None]:
{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Data Cleaning\n",
        "\n",
        "This notebook checks the synthetic dataset for issues such as missing values, outliers, and incorrect data types, preparing it for further analysis."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Import Libraries"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import pandas as pd\n",
        "import matplotlib.pyplot as plt\n",
        "import seaborn as sns\n",
        "\n",
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Load Dataset\n",
        "\n",
        "Load the dataset saved from the data collection notebook."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "df = pd.read_csv('fitness_data.csv')\n",
        "df"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Check for Missing Values\n",
        "\n",
        "Verify if there are any missing values in the dataset."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "df.isnull().sum()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Check for Outliers\n",
        "\n",
        "Use box plots to visually inspect for outliers. Given the small dataset, we expect minimal outliers."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Set up the plotting style\n",
        "sns.set(style='whitegrid')\n",
        "\n",
        "# Plot box plots for each feature\n",
        "fig, axes = plt.subplots(2, 2, figsize=(12, 8))\n",
        "fig.suptitle('Box Plots for Outlier Detection')\n",
        "\n",
        "sns.boxplot(y=df['current_weight'], ax=axes[0, 0])\n",
        "axes[0, 0].set_title('Current Weight (kg)')\n",
        "\n",
        "sns.boxplot(y=df['exercise_hours'], ax=axes[0, 1])\n",
        "axes[0, 1].set_title('Weekly Exercise Hours')\n",
        "\n",
        "sns.boxplot(y=df['calorie_intake'], ax=axes[1, 0])\n",
        "axes[1, 0].set_title('Daily Calorie Intake')\n",
        "\n",
        "sns.boxplot(y=df['weeks_to_goal'], ax=axes[1, 1])\n",
        "axes[1, 1].set_title('Weeks to Goal')\n",
        "\n",
        "plt.tight_layout(rect=[0, 0, 1, 0.95])\n",
        "plt.savefig('box_plots.png')\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Check Data Types\n",
        "\n",
        "Ensure all columns have appropriate data types (numerical)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "df.dtypes"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Summary\n",
        "\n",
        "The synthetic dataset has no missing values or significant outliers due to its controlled creation. All columns are numerical (float64 or int64), suitable for regression. The dataset is clean and ready for further analysis."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Save Cleaned Dataset\n",
        "\n",
        "Save the dataset (unchanged, as no cleaning was needed) for consistency."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "df.to_csv('cleaned_fitness_data.csv', index=False)\n",
        "print('Cleaned dataset saved as cleaned_fitness_data.csv')"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.10"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
}