# Fitbit Data Analysis

## Introduction
This notebook aims to explore and analyze the Fitbit dataset. The dataset contains various CSV files with information on daily activities, calories burned, heart rate, sleep patterns, and more. We will start by listing the directory structure of the dataset, then proceed to load and inspect some key CSV files. Basic statistics and visualizations will be generated to understand the data better.

## Table of Contents
1. [Import Libraries](#Import-Libraries)
2. [List Directory Structure](#List-Directory-Structure)
3. [Read CSV File](#Read-CSV-File)
4. [Basic Statistics](#Basic-Statistics)
5. [Check Missing Values](#Check-Missing-Values)
6. [Visualize Data](#Visualize-Data)
7. [Main Execution](#Main-Execution)


## Accessing the Kaggle Dataset

This repository includes a GitHub Actions workflow that continuously tests the process of connecting to the Kaggle API, via github secrets to pass username and key securely as environment variables, and then checks if the files are correctly downloaded, unzipped and imported to python using pandas.

I tried to replicate this behavior saving my credentials in a /.env file, but kept getting 502 bad gateway error. 

So as a solution to this we will do an analogue process using the `requests` library to handle a HTTP request, and `zipfile` to extract the files within the main folder of the dataset.

### installing libraries required to download and unzip the data from the kaggle dataset website

In [6]:
import requests
import zipfile
import os
import pandas

### Tell python where the file is, and download it

The dataset is available at the [kaggle website](https://www.kaggle.com/datasets/arashnic/fitbit). 

We can create the data directory that will hold the dataset: `../data/`, and use the `makedirs` function from the `os` module.

Then, since in the `~/../.kaggle/` folder we have saved our kaggle credentials (mine are not commited to this repository) the `kaggle.api.dataset_download_files` function uses it automatically to authenticate with the api, once we define the dataset identifier as `dataset = 'arashnic/fitbit'` : 

In [25]:
import os
import kaggle
import zipfile

# Step 1: Ensure the data directory exists
data_dir = '../data/'
os.makedirs(data_dir, exist_ok=True)

# Step 2: Use Kaggle API to download the dataset
dataset = 'arashnic/fitbit'  # The dataset identifier on Kaggle
kaggle.api.dataset_download_files(dataset, path=data_dir, unzip=False)

print("Dataset downloaded successfully.")


Dataset URL: https://www.kaggle.com/datasets/arashnic/fitbit
Dataset downloaded successfully.


Now we can unzip the files using the `zipfile` module: 

In [26]:
import zipfile

zip_file_path = '../data/fitbit.zip'
extract_to_path = '../data/'

# check if the file is indeed a zip file

if zipfile.is_zipfile(zip_file_path):
    with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
        zip_ref.extractall(extract_to_path)
    print("File extracted successfully.")
else:
    print("The file is not a valid ZIP file.")

File extracted successfully.


Now that we have successfully unzipped the files, we can take a look to what we are dealing with, by printing the structure of the files within the `../data/` folder:

In [28]:
# print the structure of the ../data/ directory

def print_directory_structure(root_dir, indent=''):
    for item in os.listdir(root_dir):
        item_path = os.path.join(root_dir, item)
        if os.path.isdir(item_path):
            print(f"{indent}📁 {item}/")
            print_directory_structure(item_path, indent + '    ')
        else:
            print(f"{indent}📄 {item}")

# Define the root directory
root_directory = '../data'

# Print the directory structure
print(f"Directory structure of {root_directory}:")
print_directory_structure(root_directory)


Directory structure of ../data:
📄 fitbit.zip
📁 mturkfitbit_export_4.12.16-5.12.16/
    📁 Fitabase Data 4.12.16-5.12.16/
        📄 minuteIntensitiesNarrow_merged.csv
        📄 minuteStepsWide_merged.csv
        📄 dailyActivity_merged.csv
        📄 hourlySteps_merged.csv
        📄 dailyIntensities_merged.csv
        📄 minuteCaloriesWide_merged.csv
        📄 hourlyCalories_merged.csv
        📄 minuteStepsNarrow_merged.csv
        📄 dailyCalories_merged.csv
        📄 minuteCaloriesNarrow_merged.csv
        📄 weightLogInfo_merged.csv
        📄 hourlyIntensities_merged.csv
        📄 dailySteps_merged.csv
        📄 minuteMETsNarrow_merged.csv
        📄 heartrate_seconds_merged.csv
        📄 minuteSleep_merged.csv
        📄 minuteIntensitiesWide_merged.csv
        📄 sleepDay_merged.csv
📁 mturkfitbit_export_3.12.16-4.11.16/
    📁 Fitabase Data 3.12.16-4.11.16/
        📄 minuteIntensitiesNarrow_merged.csv
        📄 dailyActivity_merged.csv
        📄 hourlySteps_merged.csv
        📄 hourlyCalorie