# Cyclistic Bike-Share Analysis  
## Data Loading & Initial Overview

### Business Context
Cyclistic is a bike-share company that wants to understand how
**casual riders** and **annual members** use the service differently.

The goal of this analysis is to explore historical ride data and
identify behavioral patterns that can help convert casual riders
into annual members.

---

### Scope of this notebook
- Load multiple months of trip data
- Inspect data structure and quality
- Identify key variables for analysis

In [1]:
import pandas as pd
import numpy as np
import os

pd.set_option("display.max_columns", None)
pd.set_option("display.float_format", "{:.2f}".format)

In [2]:
DATA_PATH = "../data/raw"

files = sorted([f for f in os.listdir(DATA_PATH) if f.endswith(".csv")])
len(files), files[:5]

(12,
 ['202501-divvy-tripdata.csv',
  '202502-divvy-tripdata.csv',
  '202503-divvy-tripdata.csv',
  '202504-divvy-tripdata.csv',
  '202505-divvy-tripdata.csv'])

In [3]:
# Load all monthly CSV files
df_list = []

for file in files:
    file_path = os.path.join(DATA_PATH, file)
    temp_df = pd.read_csv(file_path)
    df_list.append(temp_df)

df = pd.concat(df_list, ignore_index=True)

df.shape

(5552994, 13)

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5552994 entries, 0 to 5552993
Data columns (total 13 columns):
 #   Column              Dtype  
---  ------              -----  
 0   ride_id             object 
 1   rideable_type       object 
 2   started_at          object 
 3   ended_at            object 
 4   start_station_name  object 
 5   start_station_id    object 
 6   end_station_name    object 
 7   end_station_id      object 
 8   start_lat           float64
 9   start_lng           float64
 10  end_lat             float64
 11  end_lng             float64
 12  member_casual       object 
dtypes: float64(4), object(9)
memory usage: 550.8+ MB


In [5]:
df.head()

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
0,7569BC890583FCD7,classic_bike,2025-01-21 17:23:54.538,2025-01-21 17:37:52.015,Wacker Dr & Washington St,KA1503000072,McClurg Ct & Ohio St,TA1306000029,41.88,-87.64,41.89,-87.62,member
1,013609308856B7FC,electric_bike,2025-01-11 15:44:06.795,2025-01-11 15:49:11.139,Halsted St & Wrightwood Ave,TA1309000061,Racine Ave & Belmont Ave,TA1308000019,41.93,-87.65,41.94,-87.66,member
2,EACACD3CE0607C0D,classic_bike,2025-01-02 15:16:27.730,2025-01-02 15:28:03.230,Southport Ave & Waveland Ave,13235,Broadway & Cornelia Ave,13278,41.95,-87.66,41.95,-87.65,member
3,EAA2485BA64710D3,classic_bike,2025-01-23 08:49:05.814,2025-01-23 08:52:40.047,Southport Ave & Waveland Ave,13235,Southport Ave & Roscoe St,13071,41.95,-87.66,41.94,-87.66,member
4,7F8BE2471C7F746B,electric_bike,2025-01-16 08:38:32.338,2025-01-16 08:41:06.767,Southport Ave & Waveland Ave,13235,Southport Ave & Roscoe St,13071,41.95,-87.66,41.94,-87.66,member


In [6]:
df.tail()

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
5552989,53202484E8371237,electric_bike,2025-12-19 13:43:41.793,2025-12-19 13:54:07.779,Sheffield Ave & Kingsbury St,CHI02023,Halsted St & Melrose St,CHI02162,41.91,-87.65,41.94,-87.65,member
5552990,3582BF76707AB2ED,classic_bike,2025-12-27 12:43:48.446,2025-12-27 12:52:45.141,Larrabee St & Webster Ave,CHI00454,Halsted St & Melrose St,CHI02162,41.92,-87.64,41.94,-87.65,member
5552991,B12D92DF229EFA41,electric_bike,2025-12-20 01:12:39.053,2025-12-20 01:18:55.928,Clarendon Ave & Junior Ter,CHI00395,Halsted St & Melrose St,CHI02162,41.96,-87.65,41.94,-87.65,casual
5552992,04D5C6C11909ECAE,electric_bike,2025-12-08 16:46:37.496,2025-12-08 16:58:32.762,Lincoln Ave & Waveland Ave,CHI00446,Halsted St & Wrightwood Ave,CHI00504,41.95,-87.68,41.93,-87.65,member
5552993,D2CF23B6D79232B6,classic_bike,2025-12-20 13:08:27.589,2025-12-20 13:13:54.815,Larrabee St & Webster Ave,CHI00454,Halsted St & Wrightwood Ave,CHI00504,41.92,-87.64,41.93,-87.65,member


In [7]:
df.isnull().sum()

ride_id                     0
rideable_type               0
started_at                  0
ended_at                    0
start_station_name    1184673
start_station_id      1184673
end_station_name      1243305
end_station_id        1243305
start_lat                   0
start_lng                   0
end_lat                  5535
end_lng                  5535
member_casual               0
dtype: int64

In [8]:
df["member_casual"].value_counts(normalize=True).round(3)

member_casual
member   0.64
casual   0.36
Name: proportion, dtype: float64

## Rider Type Distribution

- Approximately 64% of rides are taken by annual members.
- Casual riders account for 36% of total rides.
- This confirms that while members dominate overall usage, casual riders represent a substantial segment.
- Understanding behavioral differences between these two groups is critical for identifying conversion opportunities.