# Neighborhood Gym Check-in Analytics
You’ve been hired by a small neighborhood gym to make sense of their check-in logs.
They manually recorded when each member visited, what they did, and roughly how long they stayed.
The manager wants quick insights about which members are most engaged, how different membership types behave, and what a simple weekly report would look like.
You’ll use pandas to turn this tiny log into something that actually answers those questions.

## Generate data

In [1]:
import pandas as pd

data = [
    {"member_id": "M001", "join_date": "2025-01-05", "checkin_date": "2025-02-01", "duration_min": 45, "activity_type": "Cardio",      "calories_est": 350, "membership_type": "Standard"},
    {"member_id": "M002", "join_date": "2025-01-12", "checkin_date": "2025-02-01", "duration_min": 60, "activity_type": "Weights",     "calories_est": 420, "membership_type": "Premium"},
    {"member_id": "M001", "join_date": "2025-01-05", "checkin_date": "2025-02-03", "duration_min": 30, "activity_type": "Yoga",        "calories_est": 180, "membership_type": "Standard"},
    {"member_id": "M003", "join_date": "2025-01-20", "checkin_date": "2025-02-03", "duration_min": 50, "activity_type": "Cardio",      "calories_est": 380, "membership_type": "Standard"},
    {"member_id": "M002", "join_date": "2025-01-12", "checkin_date": "2025-02-05", "duration_min": 40, "activity_type": "Cardio",      "calories_est": 300, "membership_type": "Premium"},
    {"member_id": "M004", "join_date": "2025-01-25", "checkin_date": "2025-02-05", "duration_min": 70, "activity_type": "Weights",     "calories_est": 500, "membership_type": "Premium"},
    {"member_id": "M003", "join_date": "2025-01-20", "checkin_date": "2025-02-08", "duration_min": 35, "activity_type": "Yoga",        "calories_est": 200, "membership_type": "Standard"},
    {"member_id": "M001", "join_date": "2025-01-05", "checkin_date": "2025-02-08", "duration_min": 55, "activity_type": "Weights",     "calories_est": 430, "membership_type": "Standard"},
    {"member_id": "M002", "join_date": "2025-01-12", "checkin_date": "2025-02-10", "duration_min": 45, "activity_type": "Cardio",      "calories_est": 340, "membership_type": "Premium"},
    {"member_id": "M004", "join_date": "2025-01-25", "checkin_date": "2025-02-11", "duration_min": 30, "activity_type": "Yoga",        "calories_est": 150, "membership_type": "Premium"},
    {"member_id": "M003", "join_date": "2025-01-20", "checkin_date": "2025-02-11", "duration_min": 60, "activity_type": "Weights",     "calories_est": 450, "membership_type": "Standard"},
    {"member_id": "M005", "join_date": "2025-02-01", "checkin_date": "2025-02-11", "duration_min": 25, "activity_type": "Cardio",      "calories_est": 200, "membership_type": "Standard"},
]

df = pd.DataFrame(data)


## Task 1 – Basic cleaning & inspection

1. Convert the join_date and checkin_date columns to proper datetime type.

2. Print the DataFrame’s shape and data types.

3. Add a new column called visit_weekday with the weekday name of each checkin_date.

In [None]:
df['join_date'] = pd.to_datetime(df['join_date'])
df['checkin_date'] = pd.to_datetime(df['checkin_date'])

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 7 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   member_id        12 non-null     object        
 1   join_date        12 non-null     datetime64[ns]
 2   checkin_date     12 non-null     datetime64[ns]
 3   duration_min     12 non-null     int64         
 4   activity_type    12 non-null     object        
 5   calories_est     12 non-null     int64         
 6   membership_type  12 non-null     object        
dtypes: datetime64[ns](2), int64(2), object(3)
memory usage: 804.0+ bytes


In [15]:
df['weekday'] = df['checkin_date'].dt.weekday

In [16]:
df.head()

Unnamed: 0,member_id,join_date,checkin_date,duration_min,activity_type,calories_est,membership_type,weekday,dow
0,M001,2025-01-05,2025-02-01,45,Cardio,350,Standard,5,Saturday
1,M002,2025-01-12,2025-02-01,60,Weights,420,Premium,5,Saturday
2,M001,2025-01-05,2025-02-03,30,Yoga,180,Standard,0,Monday
3,M003,2025-01-20,2025-02-03,50,Cardio,380,Standard,0,Monday
4,M002,2025-01-12,2025-02-05,40,Cardio,300,Premium,2,Wednesday


In [10]:
dow_dict = {0:'Monday',
            1: 'Tuesday',
            2:'Wednesday',
            3:'Thursday',
            4:'Friday',
            5:'Saturday',
            6:'Sunday'}

In [11]:
df['dow'] = df['weekday'].map(dow_dict)
df.head()

Unnamed: 0,member_id,join_date,checkin_date,duration_min,activity_type,calories_est,membership_type,weekday,dow
0,M001,2025-01-05,2025-02-01,45,Cardio,350,Standard,5,Saturday
1,M002,2025-01-12,2025-02-01,60,Weights,420,Premium,5,Saturday
2,M001,2025-01-05,2025-02-03,30,Yoga,180,Standard,0,Monday
3,M003,2025-01-20,2025-02-03,50,Cardio,380,Standard,0,Monday
4,M002,2025-01-12,2025-02-05,40,Cardio,300,Premium,2,Wednesday


## Task 2 – Member engagement summary
Create a member-level summary DataFrame with one row per member_id containing:

* total_minutes: total minutes spent in the gym (sum of duration_min).

* num_visits: number of check-ins.

* first_visit: the earliest checkin_date for that member.

Sort this summary by total_minutes in descending order.

In [29]:
member_summary = (
    df
    .groupby("member_id")
    .agg(
        total_minutes=("duration_min", "sum"),
        num_visits=("duration_min", "size"),
        first_visit=("checkin_date", "min"),
    )
    .sort_values("total_minutes", ascending=False)
)

print("\nMember engagement summary:")
print(member_summary)



Member engagement summary:
           total_minutes  num_visits first_visit
member_id                                       
M002                 145           3  2025-02-01
M003                 145           3  2025-02-03
M001                 130           3  2025-02-01
M004                 100           2  2025-02-05
M005                  25           1  2025-02-11


## Task 3 – Behavior by membership type & activity
Using the original df:

* For each membership_type, compute the average visit duration.

* For each combination of membership_type and activity_type, compute the percentage of visits that fall into that activity type within that membership type.

Output a tidy DataFrame showing membership_type, activity_type, and activity_share as a percentage (0–100).

In [None]:
df.groupby(by = 'membership_type')['duration_min'].mean().round(2)

membership_type
Premium     49.00
Standard    42.86
Name: duration_min, dtype: float64

In [43]:
two_group_count = df.groupby(by = ['membership_type', 'activity_type']).size().reset_index().rename(columns={0:'count'})

membership_total = df.groupby(by = ['membership_type']).size().reset_index().rename(columns={0:'total'})

behaviour = pd.merge(two_group_count, membership_total, how = 'left', on = 'membership_type')

behaviour['pct_activity'] = round((behaviour['count'] / behaviour['total'])*100,2)

behaviour



Unnamed: 0,membership_type,activity_type,count,total,pct_activity
0,Premium,Cardio,2,5,40.0
1,Premium,Weights,2,5,40.0
2,Premium,Yoga,1,5,20.0
3,Standard,Cardio,3,7,42.86
4,Standard,Weights,2,7,28.57
5,Standard,Yoga,2,7,28.57


## Task 4 – Weekly report
Treating checkin_date as the time axis:

Build a weekly summary DataFrame where the index is the week (e.g., Monday-anchored weeks).

For each week, compute:

* total_checkins

* unique_members

* avg_duration (mean of duration_min)

Return this as a DataFrame with those three columns.

In [64]:
df['week'] = df['checkin_date'].dt.isocalendar().week
df.groupby(['week']).agg(total_chekins = ('checkin_date', "size"),
                                unique_members = ('member_id', pd.Series.nunique),
                                avg_duration = ("duration_min", "mean")).reset_index()

Unnamed: 0,week,total_chekins,unique_members,avg_duration
0,5,2,2,52.5
1,6,6,4,46.666667
2,7,4,4,40.0


# === End Of Challenge ===