# Data Analysis - Summary Tables Created from Master Table

**\---**

**title: Cyclistic Data Analysis Case Study**

**author: Sibeso Like**

**date: July 21, 2025**

**\---**

  

<span style="color: var(--vscode-foreground);">As part of the Exploratory Data Analysis (EDA) phase, a series of summary tables were created from the master dataset to uncover patterns, trends, and relationships in the data. These tables include metrics such as average ride duration by user type, number of rides per day of the week, and hourly ride distribution, among others. By aggregating and segmenting the data in meaningful ways, these EDA tables provide a foundational understanding of user behavior and help identify key insights that inform the final analysis and business recommendations.</span>

## Step 1: Trip Volume by User Type
This step helps us understand the overall usage breakdown between casual riders and annual members.

In [None]:
SELECT 
    member_casual,
    COUNT(*) AS total_rides
FROM master_dataset
GROUP BY member_casual;

In [None]:
-- Create a summary table for Trip Volume Usage by Type.
SELECT 
    member_casual,
    COUNT(*) AS total_rides
INTO eda_rides_by_user_type
FROM master_dataset
GROUP BY member_casual;

## Step 2: Ride Duration Comparison
This step examines the average and range of ride durations to identify behavioral differences in usage between user types.

In [None]:
SELECT 
    member_casual,
    COUNT(*) AS total_rides,
    AVG(DATEDIFF(MINUTE, started_at, ended_at)) AS avg_duration_mins,
    MIN(DATEDIFF(MINUTE, started_at, ended_at)) AS shortest_ride,
    MAX(DATEDIFF(MINUTE, started_at, ended_at)) AS longest_ride
INTO eda_duration_by_user_type
FROM master_dataset
GROUP BY member_casual;

## Step 3: Rides by Day of the Week (Per User Type)
This step addresses:
- Do casual riders ride more on weekends?
- Do members ride more on weekdays (e.g., for commuting)?

In [None]:
SELECT 
    member_casual,
    DATENAME(WEEKDAY, started_at) AS day_of_week,
    COUNT(*) AS total_rides
INTO eda_rides_by_day_of_week
FROM master_dataset
GROUP BY member_casual, DATENAME(WEEKDAY, started_at);

## Step 4: Rides by Hour of the Day (Per User Type)
This step investigates:
- Do members ride more during rush hours (commutes)?
- Do casual riders ride more during leisure hours (midday or evening)?

In [None]:
SELECT 
    member_casual,
    DATEPART(HOUR, started_at) AS ride_hour,
    COUNT(*) AS total_rides
INTO eda_rides_by_hour
FROM master_dataset
GROUP BY member_casual, DATEPART(HOUR, started_at);

## Step 5: Rideable Type Usage per User Type
This step determines whether casual or member riders prefer a specific type of bike (e.g., electric vs. classic).

In [None]:
SELECT 
    member_casual,
    rideable_type,
    COUNT(*) AS total_rides
INTO eda_rides_by_rideable_type
FROM master_dataset
GROUP BY member_casual, rideable_type;

## Step 6: Most Common Start and End Stations by User Type
This step identifies:
- Which locations are most used by casual vs. member riders.
- Whether casual users tend to start/stop in different places than members.

In [None]:
-- Create summary table for start_station_name
SELECT 
    member_casual,
    start_station_name,
    COUNT(*) AS start_count
INTO eda_top_start_stations
FROM master_dataset
GROUP BY member_casual, start_station_name;

In [None]:
-- Create summary table for end_station_name
SELECT 
    member_casual,
    end_station_name,
    COUNT(*) AS end_count
INTO eda_top_end_stations
FROM master_dataset
GROUP BY member_casual, end_station_name;

## Additional Queries for Insights
The following queries retrieve results from the summary tables to gain deeper insights into user behavior.

In [None]:
-- 1. Rides by User Type – Ride totals per user type
SELECT TOP (1000) [member_casual]
      ,[total_rides]
  FROM [Cyclistic_Case_Study].[dbo].[eda_rides_by_user_type]

In [None]:
-- 2. Ride Duration by User Type – Avg, min, max ride durations
SELECT TOP (1000) [member_casual]
      ,[total_rides]
      ,[avg_duration_mins]
      ,[shortest_ride]
      ,[longest_ride]
  FROM [Cyclistic_Case_Study].[dbo].[eda_duration_by_user_type]

In [None]:
-- 3. Rides by Day of the Week – Ride volume by day for each user type
SELECT 
    member_casual,
    day_of_week,
    total_rides
FROM eda_rides_by_day_of_week
ORDER BY 
    member_casual,
    CASE 
        WHEN day_of_week = 'Monday' THEN 1
        WHEN day_of_week = 'Tuesday' THEN 2
        WHEN day_of_week = 'Wednesday' THEN 3
        WHEN day_of_week = 'Thursday' THEN 4
        WHEN day_of_week = 'Friday' THEN 5
        WHEN day_of_week = 'Saturday' THEN 6
        WHEN day_of_week = 'Sunday' THEN 7
        ELSE 8
    END;

In [None]:
-- 4. Rides by Hour – Ride counts by hour of day per user type
SELECT TOP (48) [member_casual]
      ,[ride_hour]
      ,[total_rides]
  FROM [Cyclistic_Case_Study].[dbo].[eda_rides_by_hour];

In [None]:
-- 5. Rideable Type Usage – Ride type usage by user type
SELECT TOP (6) [member_casual]
      ,[rideable_type]
      ,[total_rides]
  FROM [Cyclistic_Case_Study].[dbo].[eda_rides_by_rideable_type];

In [None]:
-- 6. Top Start Stations (Top 5–10 entries per user type)
WITH RankedStartStations AS (
    SELECT 
        member_casual,
        start_station_name,
        start_count,
        RANK() OVER (PARTITION BY member_casual ORDER BY start_count DESC) AS station_rank
    FROM eda_top_start_stations
)
SELECT 
    member_casual,
    start_station_name,
    start_count
FROM RankedStartStations
WHERE station_rank <= 10
ORDER BY member_casual, station_rank;

In [None]:
-- 7. Top End Stations (Top 5–10 entries per user type)
WITH RankedEndStations AS (
    SELECT 
        member_casual,
        end_station_name,
        end_count,
        RANK() OVER (PARTITION BY member_casual ORDER BY end_count DESC) AS station_rank
    FROM eda_top_end_stations
)
SELECT 
    member_casual,
    end_station_name,
    end_count
FROM RankedEndStations
WHERE station_rank <= 10
ORDER BY member_casual, station_rank;

## Summary of Data Analysis Procedures

The table below summarizes the data analysis procedures conducted in this EDA process.

| Step | Analysis Procedure | Description | Summary Table Created |
| --- | --- | --- | --- |
| 1 | Trip Volume by User Type | Analyzed the total number of rides per user type (casual vs. member). | eda\_rides\_by\_user\_type |
| 2 | Ride Duration Comparison | Compared average, minimum, and maximum ride durations between user types. | eda\_duration\_by\_user\_type |
| 3 | Rides by Day of the Week | Examined ride volume by day of the week for each user type to identify commuting vs. leisure patterns. | eda\_rides\_by\_day\_of\_week |
| 4 | Rides by Hour of the Day | Analyzed ride counts by hour to identify peak usage times for each user type. | eda\_rides\_by\_hour |
| 5 | Rideable Type Usage | Investigated preferences for bike types (e.g., electric vs. classic) by user type. | eda\_rides\_by\_rideable\_type |
| 6 | Most Common Start and End Stations | Identified the most frequently used start and end stations for each user type. | eda\_top\_start\_stations, eda\_top\_end\_stations |