# Cyclistic Bike-Share Analysis  
## Exploratory Analysis & Rider Behavior Insights

This notebook explores riding patterns and behavioural differences between casual riders and annual members to support data-driven business recommendations.

In [1]:
import pandas as pd
import numpy as np
from pathlib import Path

DATA_PATH = Path("../data/raw")

files = sorted(DATA_PATH.glob("*.csv"))

df = pd.concat(
    [pd.read_csv(file) for file in files],
    ignore_index=True
)

# Convert timestamps
df["started_at"] = pd.to_datetime(df["started_at"], errors="coerce")
df["ended_at"] = pd.to_datetime(df["ended_at"], errors="coerce")

# Create ride duration
df["ride_duration_minutes"] = (
    (df["ended_at"] - df["started_at"])
    .dt.total_seconds() / 60
)

# Clean invalid rides
df = df[
    (df["ride_duration_minutes"] > 0) &
    (df["ride_duration_minutes"] <= 1440)
].copy()

# Time features
df["ride_day_of_week"] = df["started_at"].dt.day_name()
df["ride_hour"] = df["started_at"].dt.hour
df["is_weekend"] = df["ride_day_of_week"].isin(["Saturday", "Sunday"])

df.shape

(5547380, 17)

In [2]:
df["member_casual"].value_counts(normalize=True).round(3)

member_casual
member    0.64
casual    0.36
Name: proportion, dtype: float64

## Rider Type Overview

- Annual members account for the majority of rides.
- Casual riders still represent a substantial portion of total usage.
- Behavioral differences between these two groups are critical for identifying
  opportunities to convert casual riders into annual members.

In [3]:
df.groupby("member_casual")["ride_duration_minutes"].describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
member_casual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
casual,1994811.0,19.133336,38.979432,0.000767,6.295667,11.377383,21.0511,1439.97595
member,3552569.0,11.952484,20.355706,0.0013,5.02975,8.575667,14.4987,1439.901683


## Ride Duration by Rider Type

- Casual riders have significantly longer ride durations than annual members.
- The median casual ride is approximately 33% longer than the median member ride.
- Members exhibit shorter and more consistent ride durations, suggesting routine or commute-oriented usage.
- Casual riders display higher variability and longer upper-range durations, indicating recreational or exploratory behaviour.

This difference highlights an opportunity to convert frequent casual riders into annual members by targeting leisure-focused use cases.

In [4]:
df.groupby(["member_casual", "is_weekend"])["ride_duration_minutes"].median()

member_casual  is_weekend
casual         False         10.486800
               True          13.152733
member         False          8.418792
               True           9.141983
Name: ride_duration_minutes, dtype: float64

## Ride Duration by Rider Type and Day Type

- Casual riders take significantly longer rides on weekends compared to weekdays.
- This pattern suggests leisure-oriented usage, with weekends amplifying recreational behaviour.
- Annual members show relatively consistent ride durations across weekdays and weekends, indicating routine, utility-driven usage.

These patterns highlight an opportunity to design weekend-focused promotions and messaging to convert casual riders into annual members.

In [5]:
df.groupby(["member_casual", "ride_hour"])["ride_duration_minutes"].median().head(10)

member_casual  ride_hour
casual         0             8.626733
               1             8.652825
               2             8.495767
               3             8.153000
               4             6.894950
               5             7.567883
               6             7.170400
               7             7.811800
               8             8.559750
               9            10.091817
Name: ride_duration_minutes, dtype: float64

In [6]:
hourly_duration = (
    df
    .groupby(["member_casual", "ride_hour"])["ride_duration_minutes"]
    .median()
    .reset_index()
)

hourly_duration.head(12)

Unnamed: 0,member_casual,ride_hour,ride_duration_minutes
0,casual,0,8.626733
1,casual,1,8.652825
2,casual,2,8.495767
3,casual,3,8.153
4,casual,4,6.89495
5,casual,5,7.567883
6,casual,6,7.1704
7,casual,7,7.8118
8,casual,8,8.55975
9,casual,9,10.091817


In [7]:
hourly_duration.tail(12)

Unnamed: 0,member_casual,ride_hour,ride_duration_minutes
36,member,12,7.947283
37,member,13,8.206283
38,member,14,8.625717
39,member,15,8.882925
40,member,16,9.25255
41,member,17,9.57845
42,member,18,9.142492
43,member,19,8.721608
44,member,20,8.65445
45,member,21,8.70765


In [8]:
hourly_pivot = hourly_duration.pivot(
    index="ride_hour",
    columns="member_casual",
    values="ride_duration_minutes"
)

hourly_pivot.head(12)

member_casual,casual,member
ride_hour,Unnamed: 1_level_1,Unnamed: 2_level_1
0,8.626733,7.66515
1,8.652825,7.309067
2,8.495767,7.235033
3,8.153,7.6357
4,6.89495,7.04725
5,7.567883,7.051717
6,7.1704,7.625858
7,7.8118,8.269708
8,8.55975,8.370833
9,10.091817,7.6956


In [9]:
hourly_pivot.tail(12)

member_casual,casual,member
ride_hour,Unnamed: 1_level_1,Unnamed: 2_level_1
12,13.216242,7.947283
13,13.44375,8.206283
14,13.474517,8.625717
15,12.857917,8.882925
16,12.153633,9.25255
17,11.967533,9.57845
18,11.572992,9.142492
19,11.1511,8.721608
20,10.844233,8.65445
21,10.515067,8.70765


## Ride Duration by Hour of Day

- Casual riders exhibit significantly longer ride durations during late morning and early afternoon hours, peaking between 10:00 and 14:00.
- This pattern suggests leisure-driven usage, such as tourism and recreational rides.
- Annual members maintain relatively consistent ride durations throughout the day, with a modest increase during late afternoon commute hours.
- The contrast highlights distinct use cases: casual riders prioritize experience, while members prioritize efficiency and routine.

These insights suggest that targeted midday and weekend campaigns could be effective in converting casual riders into annual members.

In [10]:
df.groupby(["member_casual", "ride_hour"]).size().unstack().T

member_casual,casual,member
ride_hour,Unnamed: 1_level_1,Unnamed: 2_level_1
0,38847,32451
1,24892,19975
2,16634,12007
3,9183,7941
4,7271,8878
5,11408,34201
6,26523,100906
7,49547,199688
8,70313,256538
9,70689,167126


## Ride Volume by Hour of Day

- Annual members show strong ride volume peaks during traditional commute hours (7–9 AM and 16–18 PM), with the highest activity at 17:00.
- This pattern reflects routine, work-related usage and time efficiency.
- Casual riders display a gradual increase in ride volume throughout the day, peaking in the late afternoon.
- The absence of sharp morning peaks among casual riders indicates discretionary, leisure-oriented riding behavior.

Together with duration analysis, these patterns highlight a clear behavioral distinction between rider types and reinforce the opportunity to convert casual riders by targeting leisure-focused time windows.

## Day 3 Summary: Rider Behaviour Differences

- Casual riders consistently take longer trips than annual members, indicating leisure-oriented usage rather than point-to-point transportation.
- Annual members exhibit shorter ride durations and highly concentrated ride volumes during weekday commute hours (7–9 AM and 4–6 PM).
- Casual riders show increased activity during afternoons and weekends, with longer median ride durations during these periods.
- Ride volume analysis reveals that members use the service as a reliable mode of daily transportation, while casual riders use it more flexibly based on personal time and preference.
- The combination of higher duration and discretionary timing among casual riders suggests strong potential for conversion through weekend plans, day passes, or leisure-focused membership offerings.