In [None]:
{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Data Collection\n",
        "\n",
        "This notebook covers the data collection process for the Fitness Goal Tracker project. The dataset is synthetic, embedded in `app.py`, and contains user attributes to predict the time (in weeks) to reach a target weight."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Import Libraries"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import pandas as pd\n",
        "import matplotlib.pyplot as plt\n",
        "import seaborn as sns\n",
        "\n",
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Load Dataset\n",
        "\n",
        "The dataset is synthetic, with 10 records of user fitness data: `current_weight`, `exercise_hours`, `calorie_intake`, and `weeks_to_goal`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Synthetic dataset from app.py\n",
        "data = {\n",
        "    'current_weight': [80, 90, 75, 85, 70, 95, 82, 78, 88, 92],\n",
        "    'exercise_hours': [5, 3, 6, 4, 7, 2, 5, 6, 3, 4],\n",
        "    'calorie_intake': [2000, 2200, 1800, 2100, 1900, 2300, 2000, 1950, 2150, 2250],\n",
        "    'weeks_to_goal': [12, 15, 10, 13, 8, 18, 11, 9, 14, 16]\n",
        "}\n",
        "df = pd.DataFrame(data)\n",
        "\n",
        "# Display the dataset\n",
        "df"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Dataset Overview\n",
        "\n",
        "Examine the dataset's structure and basic statistics."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Display dataset info\n",
        "df.info()\n",
        "\n",
        "# Display basic statistics\n",
        "df.describe()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Save Dataset\n",
        "\n",
        "Save the dataset as a CSV file for use in other notebooks."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "df.to_csv('fitness_data.csv', index=False)\n",
        "print('Dataset saved as fitness_data.csv')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Visualizations\n",
        "\n",
        "Plot distributions of each feature to understand the data."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Set up the plotting style\n",
        "sns.set(style='whitegrid')\n",
        "\n",
        "# Plot histograms for each feature\n",
        "fig, axes = plt.subplots(2, 2, figsize=(12, 8))\n",
        "fig.suptitle('Feature Distributions')\n",
        "\n",
        "sns.histplot(df['current_weight'], ax=axes[0, 0], kde=True)\n",
        "axes[0, 0].set_title('Current Weight (kg)')\n",
        "\n",
        "sns.histplot(df['exercise_hours'], ax=axes[0, 1], kde=True)\n",
        "axes[0, 1].set_title('Weekly Exercise Hours')\n",
        "\n",
        "sns.histplot(df['calorie_intake'], ax=axes[1, 0], kde=True)\n",
        "axes[1, 0].set_title('Daily Calorie Intake')\n",
        "\n",
        "sns.histplot(df['weeks_to_goal'], ax=axes[1, 1], kde=True)\n",
        "axes[1, 1].set_title('Weeks to Goal')\n",
        "\n",
        "plt.tight_layout(rect=[0, 0, 1, 0.95])\n",
        "plt.savefig('feature_distributions.png')\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Summary\n",
        "\n",
        "The dataset contains 10 records with no missing values. All features and the target are numerical, suitable for regression. The distributions show reasonable ranges for a synthetic fitness dataset."
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.10"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
}