### **Introduction**

In this small project, we aim to explore and analyze a dataset 'Exercise and Fitness Metrics Dataset' ( https://www.kaggle.com/datasets/aakashjoshi123/exercise-and-fitness-metrics-dataset/data?select=exercise_dataset.csv) containing various metrics related to exercise, including exercise type, heart rate, age, and exercise intensity. The primary goal is to uncover relationships between these variables to better understand the data and its potential utility.  By employing SQL queries within a Jupyter Notebook environment, we load the dataset, perform data analysis, and investigate the integrity and validity of the dataset. Specifically, we focus on evaluating average exercise intensity per exercise category, average heart rate per exercise category and the relationship between age and heart rate during exercise. 

#### **Loading CSV Data into a Pandas DataFrame**

This code serves load a CSV file into a Pandas DataFrame using Python. First, it imports the Pandas library, which is essential for data manipulation and analysis. The CSV file, located at '~/Desktop/Exercise_Dataset/exercise_dataset.csv', is read into a DataFrame using the pd.read_csv() function. Finally, the df.head() function displays the first few rows of the DataFrame to verify that the data has been loaded correctly. This step is crucial for ensuring the dataset is ready for further analysis.

#### **Verifying Table Creation in SQLite Database**

This code snippet creates an in-memory SQLite database and loads the previously read Pandas DataFrame into it as a table named exercise_dataset. By running the SQL query "SELECT name FROM sqlite_master WHERE type='table';", it confirms the successful creation of the table within the database. The result, displaying exercise_dataset under the name column, indicates that the table has been successfully created and is now available for executing SQL queries, enabling further data analysis and manipulation within the Jupyter Notebook environment.

In [18]:
import sqlite3

# Create an in-memory SQLite database
conn = sqlite3.connect(':memory:')

# Load the DataFrame into the SQLite database
df.to_sql('exercise_dataset', conn, index=False, if_exists='replace')

# Verify that the table was created
query = "SELECT name FROM sqlite_master WHERE type='table';"
tables = pd.read_sql(query, conn)
tables


Unnamed: 0,name
0,exercise_dataset


#### **Analyzing Average Exercise Intensity by Exercise Category**

This  code block executes an SQL query to calculate the average exercise intensity for each exercise category from the exercise_dataset loaded into an in-memory SQLite database. The query selects the Exercise column and computes the average value of the Exercise Intensity column, grouping the results by the Exercise category. The results are then loaded into a Pandas DataFrame for easy viewing and analysis. The resulting output shows that the average exercise intensity across the 10 exercise categories has very little variance, with values clustering closely around the mid-5 range. This lack of variance is suspicious, as one would typically expect more variation in exercise intensity across different categories. This finding prompts further investigation into the dataset's creation and the relationships within the data, especially since there are no details provided about how the dataset was constructed.



In [19]:
# SQL query to determine the average Exercise Intensity for the different Exercise categories
query = """
SELECT Exercise, AVG("Exercise Intensity") AS avg_exercise_intensity
FROM exercise_dataset
GROUP BY Exercise;
"""

# Execute the query and load the results into a DataFrame
result = pd.read_sql(query, conn)
result


Unnamed: 0,Exercise,avg_exercise_intensity
0,Exercise 1,5.25062
1,Exercise 10,5.432161
2,Exercise 2,5.35942
3,Exercise 3,5.579221
4,Exercise 4,5.67655
5,Exercise 5,5.405941
6,Exercise 6,5.39895
7,Exercise 7,5.572539
8,Exercise 8,5.476071
9,Exercise 9,5.431472


#### **Analyzing Average Heart Rate by Exercise Category**


The third code block executes an SQL query to calculate the average heart rate for each exercise category from the exercise_dataset loaded into an in-memory SQLite database. The query selects the Exercise column and computes the average value of the Heart Rate column, grouping the results by the Exercise category. The results are then loaded into a Pandas DataFrame for easy viewing and analysis. The resulting output shows that the average heart rate across the 10 exercise categories has very little variance, with values clustering closely around the mid-130s to 140s range. This lack of variance is unusual, especially for a comparison of different exercise categories, These findings are consistent with the minimal variance observed in the average exercise intensity further indicating potential issues with the dataset that require additional examination.










In [20]:
# SQL query to determine the average Heart Rate for the different Exercise categories
query = """
SELECT Exercise, AVG("Heart Rate") AS avg_heart_rate
FROM exercise_dataset
GROUP BY Exercise;
"""

# Execute the query and load the results into a DataFrame
result = pd.read_sql(query, conn)
result


Unnamed: 0,Exercise,avg_heart_rate
0,Exercise 1,139.734491
1,Exercise 10,138.721106
2,Exercise 2,138.6
3,Exercise 3,140.761039
4,Exercise 4,139.743935
5,Exercise 5,141.581683
6,Exercise 6,137.71916
7,Exercise 7,141.194301
8,Exercise 8,140.687657
9,Exercise 9,138.829949


In [21]:
# SQL query to determine the average Heart Rate for different age groups
query = """
SELECT Age, AVG("Heart Rate") AS avg_heart_rate
FROM exercise_dataset
GROUP BY Age
ORDER BY Age;
"""

# Execute the query and load the results into a DataFrame
result = pd.read_sql(query, conn)
result


Unnamed: 0,Age,avg_heart_rate
0,18,135.988372
1,19,143.223404
2,20,140.333333
3,21,139.860215
4,22,142.233333
5,23,140.84
6,24,138.786517
7,25,140.964706
8,26,139.71028
9,27,140.273973


#### **Analyzing Average Heart Rate by Age**

The third code block executes an SQL query to calculate the average heart rate for each age group from the exercise_dataset loaded into an in-memory SQLite database. The query selects the Age column and computes the average value of the Heart Rate column, grouping the results by the Age column and ordering them to show a clear trend. The results are then loaded into a Pandas DataFrame for easy viewing and analysis. The output reveals that the average heart rate across different age groups shows no significant decrease as a function of age. This finding is not credible, as it contradicts the well-established physiological expectation that heart rate typically decreases with age during exercise. This has been verified consistently tested. This is not a credible finding. 



### **Conclusion**

Upon analysis, several key findings quickly revealed a dataset of little utility. The analysis of average exercise intensity across different exercise categories showed very minimal variance which is highly unusual.  Ten diffrent exercises should elicit a much greater variance as is consistent in the established science. The evaluation of average heart rate per exercise category also displayed very little variance again in a very unusual fashion as compared to well esatablished expectations.  Finally, the examination of the relationship between age and heart rate revealed no significant decrease in heart rate as a function of age, contradicting the well-established physiological expectation that heart rate typically decreases with age during exercise. This is a well established study expectation. These combined findings indicate that this dataset is unreliable and unworthy of time investment for further analysis. 






