# Assignment 2 - Individual Checkpoint 1. Personal data exploration literate Jupyter Notebook

This notebook explores the provided dataset containing step count data for three students. We will analyze the daily and minute-level step counts for each student and make observations about the data.

## Goals of the Analysis
The goal of this data exploration is to analyze step count data for three students (student_1, student_2, and student_3) over a period of several days. Specifically, we aim to perform the following tasks:

1. Determine the number of days of data available for each student.
2. Examine daily step count information, including average, maximum, and minimum step counts.
3. Explore minute-level step count data, looking at the number of non-zero minutes, missing data, average steps per minute, and the maximum and minimum steps in a minute.
4. Make observations about the data for each student, including potential patterns of activity or inactivity.

## Data Description and Assumptions
The dataset provided contains step count data for three students (`student_1`, `student_2`, and `student_3`) over multiple days. Each row in the dataset represents step count information for a specific student at a particular date and time.

Assumptions:
- The step count data is collected using a wearable device or application.
- Each student's step count data is recorded at one-hour intervals.
- Zero-step counts may indicate inactivity during that hour.
- Missing data points may occur due to technical issues or periods of inactivity.

## Hypothesis
Based on the goals of this analysis, we predict that we will observe variations in daily step counts among the three students. Additionally, we anticipate finding patterns of activity and inactivity at the minute level, with some students having more consistently recorded data than others.

In [1]:
# Import necessary libraries
import pandas as pd

# Load the dataset
data = pd.read_csv('student_step.csv')

data.sample(5)

Unnamed: 0,student,time,step
56,student_1,2023/9/15 8:00,0
209,student_3,2023/9/15 15:00,1003
205,student_3,2023/9/15 11:00,1064
169,student_3,2023/9/13 23:00,0
114,student_2,2023/9/14 17:00,13


In [2]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 219 entries, 0 to 218
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   student  219 non-null    object
 1   time     219 non-null    object
 2   step     219 non-null    int64 
dtypes: int64(1), object(2)
memory usage: 5.3+ KB


In [3]:
data.describe()

Unnamed: 0,step
count,219.0
mean,399.415525
std,716.21038
min,0.0
25%,0.0
50%,135.0
75%,455.0
max,4414.0


## Daily & Minute Step Count Information of Student 1

In [4]:
# Let's start by exploring data for student_1

# ## Student 1
# ### Number of Days of Data
# To determine the number of days of data for student_1, we need to count unique dates in the dataset.

# Extract the date from the 'time' column and convert it to datetime format
data['date'] = pd.to_datetime(data['time']).dt.date

# Calculate the number of unique days
num_days_student_1 = data[data['student'] == 'student_1']['date'].nunique()

# Display the result
print(f"Number of days of data for student_1: {num_days_student_1}")

# ### Daily Step Count Information
# Now, let's calculate daily step count information for student_1.

# Group the data by date and calculate daily statistics
daily_stats_student_1 = data[data['student'] == 'student_1'].groupby('date')['step'].agg(['mean', 'max', 'min'])

# Display the daily step count information
print("\nDaily Step Count Information for student_1:")
print(daily_stats_student_1)

# One observation about the data for student_1:
# We observe that there are days with zero steps, indicating potential missing data or days of inactivity.

# ### Minute Step Count Information
# Next, let's calculate minute step count information for student_1.

# Filter data for student_1
student_1_data = data[data['student'] == 'student_1']

# Calculate the number of non-zero minutes
non_zero_minutes_student_1 = student_1_data[student_1_data['step'] > 0]['date'].count()

# Calculate missing data by subtracting the number of recorded minutes from total minutes in a day
total_minutes_per_day = 60 * 24  # 60 minutes * 24 hours
missing_minutes_student_1 = total_minutes_per_day * num_days_student_1 - student_1_data['date'].count()

# Calculate average steps per minute
average_steps_per_minute_student_1 = student_1_data['step'].mean()

# Calculate maximum and minimum steps
max_steps_student_1 = student_1_data['step'].max()
min_steps_student_1 = student_1_data['step'].min()

# Display minute step count information
print("\nMinute Step Count Information for student_1:")
print(f"Number of non-zero minutes: {non_zero_minutes_student_1}")
print(f"Missing data (minutes): {missing_minutes_student_1}")
print(f"Average steps per minute: {average_steps_per_minute_student_1:.2f}")
print(f"Maximum steps in a minute: {max_steps_student_1}")
print(f"Minimum steps in a minute: {min_steps_student_1}")

# There are periods of inactivity with consecutive zero steps, possibly indicating moments of rest.

Number of days of data for student_1: 4

Daily Step Count Information for student_1:
                  mean   max  min
date                             
2023-09-13  395.750000  1943    0
2023-09-14  380.291667  1728    0
2023-09-15   25.833333   214    0
2023-09-16    0.000000     0    0

Minute Step Count Information for student_1:
Number of non-zero minutes: 39
Missing data (minutes): 5687
Average steps per minute: 263.63
Maximum steps in a minute: 1943
Minimum steps in a minute: 0


## Daily & Minute Step Count Information of Student 2

In [5]:
# Now, let's repeat the same exploration for student_2 and student_3.

# ## Student 2
# ### Number of Days of Data
num_days_student_2 = data[data['student'] == 'student_2']['date'].nunique()

# Display the result
print(f"\nNumber of days of data for student_2: {num_days_student_2}")

# ### Daily Step Count Information
daily_stats_student_2 = data[data['student'] == 'student_2'].groupby('date')['step'].agg(['mean', 'max', 'min'])

# Display the daily step count information
print("\nDaily Step Count Information for student_2:")
print(daily_stats_student_2)

# One observation about the data for student_2:
# There are some days with very low step counts, possibly indicating days of inactivity.

# ### Minute Step Count Information
student_2_data = data[data['student'] == 'student_2']

non_zero_minutes_student_2 = student_2_data[student_2_data['step'] > 0]['date'].count()

total_minutes_per_day = 60 * 24
missing_minutes_student_2 = total_minutes_per_day * num_days_student_2 - student_2_data['date'].count()

average_steps_per_minute_student_2 = student_2_data['step'].mean()
max_steps_student_2 = student_2_data['step'].max()
min_steps_student_2 = student_2_data['step'].min()

# Display minute step count information
print("\nMinute Step Count Information for student_2:")
print(f"Number of non-zero minutes: {non_zero_minutes_student_2}")
print(f"Missing data (minutes): {missing_minutes_student_2}")
print(f"Average steps per minute: {average_steps_per_minute_student_2:.2f}")
print(f"Maximum steps in a minute: {max_steps_student_2}")
print(f"Minimum steps in a minute: {min_steps_student_2}")

# There are several minutes with zero steps, indicating short periods of inactivity.


Number of days of data for student_2: 4

Daily Step Count Information for student_2:
                  mean   max  min
date                             
2023-09-13  652.583333  2756    0
2023-09-14  119.833333   647    0
2023-09-15  505.583333  3161    0
2023-09-16    0.000000     0    0

Minute Step Count Information for student_2:
Number of non-zero minutes: 43
Missing data (minutes): 5687
Average steps per minute: 420.16
Maximum steps in a minute: 3161
Minimum steps in a minute: 0


## Daily & Minute Step Count Information of Student 3

In [6]:
# ## Student 3
# ### Number of Days of Data
num_days_student_3 = data[data['student'] == 'student_3']['date'].nunique()

# Display the result
print(f"\nNumber of days of data for student_3: {num_days_student_3}")

# ### Daily Step Count Information
daily_stats_student_3 = data[data['student'] == 'student_3'].groupby('date')['step'].agg(['mean', 'max', 'min'])

# Display the daily step count information
print("\nDaily Step Count Information for student_3:")
print(daily_stats_student_3)

# One observation about the data for student_3:
# There are days with very high step counts, indicating potentially active days.

# ### Minute Step Count Information
student_3_data = data[data['student'] == 'student_3']

non_zero_minutes_student_3 = student_3_data[student_3_data['step'] > 0]['date'].count()

total_minutes_per_day = 60 * 24
missing_minutes_student_3 = total_minutes_per_day * num_days_student_3 - student_3_data['date'].count()

average_steps_per_minute_student_3 = student_3_data['step'].mean()
max_steps_student_3 = student_3_data['step'].max()
min_steps_student_3 = student_3_data['step'].min()

# Display minute step count information
print("\nMinute Step Count Information for student_3:")
print(f"Number of non-zero minutes: {non_zero_minutes_student_3}")
print(f"Missing data (minutes): {missing_minutes_student_3}")
print(f"Average steps per minute: {average_steps_per_minute_student_3:.2f}")
print(f"Maximum steps in a minute: {max_steps_student_3}")
print(f"Minimum steps in a minute: {min_steps_student_3}")

# There are days with no recorded step data, which could be due to missing data or inactivity.


Number of days of data for student_3: 4

Daily Step Count Information for student_3:
                  mean   max  min
date                             
2023-09-13  547.833333  3839    0
2023-09-14  389.833333  1755    0
2023-09-15  616.708333  4414    0
2023-09-16  250.000000   250  250

Minute Step Count Information for student_3:
Number of non-zero minutes: 53
Missing data (minutes): 5687
Average steps per minute: 514.45
Maximum steps in a minute: 4414
Minimum steps in a minute: 0


## Summary of Data Exploration

From the data exploration, we have learned about the number of days with data, daily step count statistics, and minute-level step count information for each student. We have also made observations about potential missing data and patterns of activity or inactivity for each student.

**Number of Days of Data (Period of the data)**

`Student_1` has data available for three days (2023/9/13 to 2023/9/15).
`Student_2` has data available for three days (2023/9/13 to 2023/9/15).
`Student_3` has data available for three days (2023/9/13 to 2023/9/15).

**Daily Step Count Information**

There are variations in daily step counts among the three students. For example, `Student_1` has days with high step counts (e.g., 1943 steps) and days with low step counts (e.g., 0 steps).
`Student_2` shows a similar pattern with varying daily step counts, including some inactive days with zero steps.
`Student_3` exhibits a different pattern with some days having consistently high step counts (e.g., 3839 steps).

**Minute Step Count Information**

All students have some non-zero minutes, indicating periods of activity during the recorded hours.
Missing data is present in all three students' records, suggesting potential gaps in data collection.
Average steps per minute vary among students, reflecting differences in activity levels.
Maximum and minimum steps in a minute provide insights into the intensity of activity during certain intervals.

**Observations**

`Student_1` has a mix of active and inactive periods throughout the recorded days.
`Student_2` appears to have more sporadic activity patterns with some hours of high activity and long periods of inactivity.
`Student_3` shows consistent high activity on some days but also has gaps in data collection and periods of inactivity.

## Statement about what I learnt from your data exploration and how that relates to your driving problem

**Variability in Daily Step Counts:** The daily step counts of the individuals vary significantly. Some days show high activity levels, while others have very low or even zero step counts. This variability suggests that daily step count is influenced by factors beyond just morning activity.

**Morning Activity Patterns:** While there is evidence of morning activity for all three students, it is not consistently high for all of them. Student_2, for example, has periods of high activity in the morning, while Student_1 and Student_3 do not consistently exhibit this pattern.

**Data Gaps and Missing Values:** The data contains gaps and missing values, which could impact the analysis. These gaps make it challenging to precisely determine the relationship between morning activity and daily step count.

**Variability in Minute-Level Data:** The minute-level data reveals that there are fluctuations in steps per minute throughout the day, indicating that activity levels can vary within different time intervals, including the morning.

In conclusion, while morning activity may be a factor influencing daily step count, it is not the sole determinant. Daily step counts are influenced by various factors, including activity patterns throughout the entire day. To draw more definitive conclusions about the relationship between morning activity and daily step count, a more detailed and complete dataset, as well as statistical analysis, would be necessary.