# 📊 Bike Rental Analytics - Phase 5: Analytics Views

## Project Overview
This notebook focuses on creating SQL views for business intelligence and analytics, enabling the bike rental company to understand weather impact on ridership patterns.

## Phase 5 Objectives
- Create weather-ridership correlation views
- Build time-based aggregations (hourly/daily/weekly)
- Design station utilization analytics
- Implement KPI views (ride counts, duration stats)
- Document view purposes and usage

---


## 📦 Setup and Database Connection

Let's connect to our database and set up for creating analytics views.


In [1]:
# Import necessary libraries and connect to database
import pandas as pd
import psycopg2
from sqlalchemy import create_engine
import warnings
warnings.filterwarnings('ignore')

# Database connection parameters (same as Phase 4)
db_params = {
    'host': 'localhost',
    'database': 'bike_rental_db',
    'user': 'franciscoteixeirabarbosa',
    'password': '',
    'port': 5432
}

# Create connection string
conn_string = f"host={db_params['host']} dbname={db_params['database']} user={db_params['user']} port={db_params['port']}"

# Create SQLAlchemy engine
engine = create_engine(f"postgresql://{db_params['user']}@{db_params['host']}:{db_params['port']}/{db_params['database']}")

print("✅ Database connection established!")
print(f"📊 Connected to: {db_params['database']} on {db_params['host']}")


✅ Database connection established!
📊 Connected to: bike_rental_db on localhost


## 🌤️ Weather-Ridership Correlation Views

Let's create views that combine weather data with ridership patterns to understand correlations.


In [2]:
# Create daily weather-ridership correlation view
try:
    conn = psycopg2.connect(conn_string)
    cursor = conn.cursor()
    
    # Create view for daily weather and ridership correlation
    create_daily_weather_rides_view = """
    CREATE OR REPLACE VIEW daily_weather_rides AS
    SELECT 
        w.date,
        w.avg_temp,
        w.max_temp,
        w.min_temp,
        w.precipitation,
        w.avg_wind_speed,
        w.weather_category,
        w.season,
        COUNT(r.ride_id) as total_rides,
        COUNT(DISTINCT r.start_station_id) as active_stations,
        ROUND(AVG(r.trip_duration_minutes), 2) as avg_trip_duration,
        ROUND(AVG(r.age), 1) as avg_rider_age,
        COUNT(CASE WHEN r.user_type = 'Subscriber' THEN 1 END) as subscriber_rides,
        COUNT(CASE WHEN r.user_type = 'Customer' THEN 1 END) as customer_rides,
        ROUND(COUNT(CASE WHEN r.user_type = 'Subscriber' THEN 1 END) * 100.0 / COUNT(r.ride_id), 1) as subscriber_percentage
    FROM weather w
    LEFT JOIN rides r ON w.date = r.date
    GROUP BY w.date, w.avg_temp, w.max_temp, w.min_temp, w.precipitation, 
             w.avg_wind_speed, w.weather_category, w.season
    ORDER BY w.date;
    """
    
    cursor.execute(create_daily_weather_rides_view)
    conn.commit()
    
    print("✅ Created daily_weather_rides view successfully!")
    
    # Test the view
    cursor.execute("SELECT * FROM daily_weather_rides LIMIT 5;")
    sample_data = cursor.fetchall()
    
    print("📊 Sample data from daily_weather_rides view:")
    for row in sample_data:
        print(f"   {row[0]}: {row[1]}°F, {row[8]} rides, {row[9]} stations")
    
    cursor.close()
    conn.close()
    
except Exception as e:
    print(f"❌ Error creating daily weather-rides view: {e}")


✅ Created daily_weather_rides view successfully!
📊 Sample data from daily_weather_rides view:
   2016-01-01: 41°F, 163 rides, 28 stations
   2016-01-02: 36°F, 206 rides, 29 stations
   2016-01-03: 37°F, 276 rides, 30 stations
   2016-01-04: 32°F, 286 rides, 29 stations
   2016-01-05: 19°F, 273 rides, 30 stations


## ⏰ Time-Based Analytics Views

Let's create views for analyzing ridership patterns by time (hourly, daily, weekly).


In [3]:
# Create hourly ridership patterns view
try:
    conn = psycopg2.connect(conn_string)
    cursor = conn.cursor()
    
    # Create view for hourly ridership patterns
    create_hourly_patterns_view = """
    CREATE OR REPLACE VIEW hourly_ridership_patterns AS
    SELECT 
        hour_of_day,
        COUNT(*) as total_rides,
        ROUND(AVG(trip_duration_minutes), 2) as avg_trip_duration,
        COUNT(CASE WHEN user_type = 'Subscriber' THEN 1 END) as subscriber_rides,
        COUNT(CASE WHEN user_type = 'Customer' THEN 1 END) as customer_rides,
        ROUND(AVG(age), 1) as avg_rider_age,
        COUNT(DISTINCT start_station_id) as active_stations,
        ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 2) as percentage_of_daily_rides
    FROM rides
    WHERE hour_of_day IS NOT NULL
    GROUP BY hour_of_day
    ORDER BY hour_of_day;
    """
    
    cursor.execute(create_hourly_patterns_view)
    conn.commit()
    
    print("✅ Created hourly_ridership_patterns view successfully!")
    
    # Test the view
    cursor.execute("SELECT * FROM hourly_ridership_patterns ORDER BY total_rides DESC LIMIT 5;")
    peak_hours = cursor.fetchall()
    
    print("📊 Peak hours by ridership:")
    for row in peak_hours:
        print(f"   {row[0]:02d}:00: {row[1]} rides ({row[7]}% of daily)")
    
    cursor.close()
    conn.close()
    
except Exception as e:
    print(f"❌ Error creating hourly patterns view: {e}")


✅ Created hourly_ridership_patterns view successfully!
📊 Peak hours by ridership:
   08:00: 29241 rides (11.83% of daily)
   18:00: 24452 rides (9.90% of daily)
   17:00: 22049 rides (8.92% of daily)
   19:00: 18293 rides (7.40% of daily)
   07:00: 16818 rides (6.81% of daily)


In [4]:
# Create weekly ridership patterns view
try:
    conn = psycopg2.connect(conn_string)
    cursor = conn.cursor()
    
    # Create view for weekly ridership patterns
    create_weekly_patterns_view = """
    CREATE OR REPLACE VIEW weekly_ridership_patterns AS
    SELECT 
        CASE day_of_week
            WHEN 0 THEN 'Monday'
            WHEN 1 THEN 'Tuesday'
            WHEN 2 THEN 'Wednesday'
            WHEN 3 THEN 'Thursday'
            WHEN 4 THEN 'Friday'
            WHEN 5 THEN 'Saturday'
            WHEN 6 THEN 'Sunday'
        END as day_name,
        day_of_week,
        COUNT(*) as total_rides,
        ROUND(AVG(trip_duration_minutes), 2) as avg_trip_duration,
        COUNT(CASE WHEN user_type = 'Subscriber' THEN 1 END) as subscriber_rides,
        COUNT(CASE WHEN user_type = 'Customer' THEN 1 END) as customer_rides,
        ROUND(AVG(age), 1) as avg_rider_age,
        COUNT(DISTINCT start_station_id) as active_stations,
        ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 2) as percentage_of_weekly_rides
    FROM rides
    WHERE day_of_week IS NOT NULL
    GROUP BY day_of_week
    ORDER BY day_of_week;
    """
    
    cursor.execute(create_weekly_patterns_view)
    conn.commit()
    
    print("✅ Created weekly_ridership_patterns view successfully!")
    
    # Test the view
    cursor.execute("SELECT * FROM weekly_ridership_patterns ORDER BY total_rides DESC;")
    weekly_data = cursor.fetchall()
    
    print("📊 Weekly ridership patterns:")
    for row in weekly_data:
        print(f"   {row[0]}: {row[2]} rides ({row[8]}% of weekly)")
    
    cursor.close()
    conn.close()
    
except Exception as e:
    print(f"❌ Error creating weekly patterns view: {e}")


✅ Created weekly_ridership_patterns view successfully!
📊 Weekly ridership patterns:
   Wednesday: 40517 rides (16.40% of weekly)
   Thursday: 39513 rides (15.99% of weekly)
   Tuesday: 38751 rides (15.68% of weekly)
   Friday: 37205 rides (15.06% of weekly)
   Monday: 36296 rides (14.69% of weekly)
   Saturday: 27843 rides (11.27% of weekly)
   Sunday: 26986 rides (10.92% of weekly)


## 🏪 Station Utilization Analytics

Let's create views for analyzing station performance and utilization patterns.


In [5]:
# Create station utilization view
try:
    conn = psycopg2.connect(conn_string)
    cursor = conn.cursor()
    
    # Create view for station utilization analytics
    create_station_utilization_view = """
    CREATE OR REPLACE VIEW station_utilization AS
    SELECT 
        s.station_id,
        s.station_name,
        s.latitude,
        s.longitude,
        COUNT(r.ride_id) as total_rides,
        COUNT(CASE WHEN r.start_station_id = s.station_id THEN 1 END) as rides_started,
        COUNT(CASE WHEN r.end_station_id = s.station_id THEN 1 END) as rides_ended,
        ROUND(AVG(r.trip_duration_minutes), 2) as avg_trip_duration,
        COUNT(DISTINCT r.date) as active_days,
        ROUND(COUNT(r.ride_id) * 100.0 / SUM(COUNT(r.ride_id)) OVER(), 2) as percentage_of_total_rides,
        ROUND(COUNT(r.ride_id) / COUNT(DISTINCT r.date), 1) as avg_rides_per_day
    FROM stations s
    LEFT JOIN rides r ON (s.station_id = r.start_station_id OR s.station_id = r.end_station_id)
    GROUP BY s.station_id, s.station_name, s.latitude, s.longitude
    ORDER BY total_rides DESC;
    """
    
    cursor.execute(create_station_utilization_view)
    conn.commit()
    
    print("✅ Created station_utilization view successfully!")
    
    # Test the view
    cursor.execute("SELECT * FROM station_utilization LIMIT 5;")
    top_stations = cursor.fetchall()
    
    print("📊 Top 5 stations by utilization:")
    for row in top_stations:
        print(f"   {row[1]}: {row[4]} total rides, {row[10]} rides/day")
    
    cursor.close()
    conn.close()
    
except Exception as e:
    print(f"❌ Error creating station utilization view: {e}")


✅ Created station_utilization view successfully!
📊 Top 5 stations by utilization:
   Grove St PATH: 66361 total rides, 183.0 rides/day
   Exchange Place: 40252 total rides, 111.0 rides/day
   Sip Ave: 32675 total rides, 90.0 rides/day
   Hamilton Park: 29876 total rides, 82.0 rides/day
   Newport PATH: 26187 total rides, 72.0 rides/day


## 📈 KPI and Business Intelligence Views

Let's create views for key performance indicators and business insights.


In [6]:
# Create monthly KPI summary view
try:
    conn = psycopg2.connect(conn_string)
    cursor = conn.cursor()
    
    # Create view for monthly KPI summary
    create_monthly_kpi_view = """
    CREATE OR REPLACE VIEW monthly_kpi_summary AS
    SELECT 
        EXTRACT(YEAR FROM r.date) as year,
        EXTRACT(MONTH FROM r.date) as month,
        TO_CHAR(r.date, 'Month') as month_name,
        COUNT(r.ride_id) as total_rides,
        COUNT(DISTINCT r.start_station_id) as active_stations,
        COUNT(DISTINCT r.bike_id) as bikes_used,
        ROUND(AVG(r.trip_duration_minutes), 2) as avg_trip_duration,
        ROUND(AVG(r.age), 1) as avg_rider_age,
        COUNT(CASE WHEN r.user_type = 'Subscriber' THEN 1 END) as subscriber_rides,
        COUNT(CASE WHEN r.user_type = 'Customer' THEN 1 END) as customer_rides,
        ROUND(COUNT(CASE WHEN r.user_type = 'Subscriber' THEN 1 END) * 100.0 / COUNT(r.ride_id), 1) as subscriber_percentage,
        ROUND(AVG(w.avg_temp), 1) as avg_temperature,
        ROUND(SUM(w.precipitation), 2) as total_precipitation,
        COUNT(CASE WHEN w.weather_category = 'Hot' THEN 1 END) as hot_days,
        COUNT(CASE WHEN w.weather_category = 'Cold' THEN 1 END) as cold_days
    FROM rides r
    JOIN weather w ON r.date = w.date
    GROUP BY EXTRACT(YEAR FROM r.date), EXTRACT(MONTH FROM r.date), TO_CHAR(r.date, 'Month')
    ORDER BY year, month;
    """
    
    cursor.execute(create_monthly_kpi_view)
    conn.commit()
    
    print("✅ Created monthly_kpi_summary view successfully!")
    
    # Test the view
    cursor.execute("SELECT * FROM monthly_kpi_summary ORDER BY total_rides DESC LIMIT 3;")
    top_months = cursor.fetchall()
    
    print("📊 Top 3 months by ridership:")
    for row in top_months:
        print(f"   {row[2]} {int(row[0])}: {row[3]} rides, {row[10]}% subscribers, {row[11]}°F avg")
    
    cursor.close()
    conn.close()
    
except Exception as e:
    print(f"❌ Error creating monthly KPI view: {e}")


✅ Created monthly_kpi_summary view successfully!
📊 Top 3 months by ridership:
   August    2016: 34083 rides, 92.1% subscribers, 79.5°F avg
   September 2016: 33284 rides, 94.2% subscribers, 72.3°F avg
   October   2016: 29553 rides, 96.4% subscribers, 59.6°F avg


In [7]:
# Create weather impact analysis view
try:
    conn = psycopg2.connect(conn_string)
    cursor = conn.cursor()
    
    # Create view for weather impact analysis
    create_weather_impact_view = """
    CREATE OR REPLACE VIEW weather_impact_analysis AS
    SELECT 
        w.weather_category,
        w.season,
        COUNT(r.ride_id) as total_rides,
        ROUND(AVG(r.trip_duration_minutes), 2) as avg_trip_duration,
        ROUND(AVG(w.avg_temp), 1) as avg_temperature,
        ROUND(AVG(w.precipitation), 3) as avg_precipitation,
        ROUND(AVG(w.avg_wind_speed), 1) as avg_wind_speed,
        COUNT(DISTINCT r.start_station_id) as active_stations,
        ROUND(COUNT(r.ride_id) * 100.0 / SUM(COUNT(r.ride_id)) OVER(), 2) as percentage_of_rides,
        ROUND(AVG(r.age), 1) as avg_rider_age,
        COUNT(CASE WHEN r.user_type = 'Subscriber' THEN 1 END) as subscriber_rides,
        COUNT(CASE WHEN r.user_type = 'Customer' THEN 1 END) as customer_rides
    FROM weather w
    JOIN rides r ON w.date = r.date
    WHERE w.weather_category IS NOT NULL
    GROUP BY w.weather_category, w.season
    ORDER BY total_rides DESC;
    """
    
    cursor.execute(create_weather_impact_view)
    conn.commit()
    
    print("✅ Created weather_impact_analysis view successfully!")
    
    # Test the view
    cursor.execute("SELECT * FROM weather_impact_analysis;")
    weather_impact = cursor.fetchall()
    
    print("📊 Weather impact on ridership:")
    for row in weather_impact:
        print(f"   {row[0]} ({row[1]}): {row[2]} rides, {row[3]} min avg, {row[4]}°F")
    
    cursor.close()
    conn.close()
    
except Exception as e:
    print(f"❌ Error creating weather impact view: {e}")


✅ Created weather_impact_analysis view successfully!
📊 Weather impact on ridership:
   Mild (Fall): 79704 rides, 10.17 min avg, 61.9°F
   Mild (Summer): 55857 rides, 12.95 min avg, 74.9°F
   Mild (Spring): 46491 rides, 12.57 min avg, 58.0°F
   Hot (Summer): 26585 rides, 12.17 min avg, 83.5°F
   Cold (Winter): 17789 rides, 8.67 min avg, 32.4°F
   Mild (Winter): 12586 rides, 9.98 min avg, 45.4°F
   Hot (Fall): 3081 rides, 13.33 min avg, 84.0°F
   Cold (Spring): 2251 rides, 10.33 min avg, 35.3°F
   Cold (Fall): 1407 rides, 6.71 min avg, 39.0°F
   Hot (Spring): 528 rides, 18.18 min avg, 82.0°F


## 🔍 View Documentation and Testing

Let's document all the views we've created and test some complex analytics queries.


In [8]:
# List all created views and their purposes
try:
    conn = psycopg2.connect(conn_string)
    cursor = conn.cursor()
    
    # Get all views in the database
    cursor.execute("""
        SELECT viewname, definition 
        FROM pg_views 
        WHERE schemaname = 'public' 
        ORDER BY viewname;
    """)
    
    views = cursor.fetchall()
    
    print("📋 Created Analytics Views:")
    print("=" * 50)
    
    for view in views:
        print(f"\n🔍 {view[0]}")
        if 'daily_weather_rides' in view[0]:
            print("   Purpose: Daily weather-ridership correlation analysis")
            print("   Use case: Understand how weather affects daily ridership patterns")
        elif 'hourly_ridership_patterns' in view[0]:
            print("   Purpose: Hourly ridership patterns and peak time analysis")
            print("   Use case: Identify peak hours and optimize bike distribution")
        elif 'weekly_ridership_patterns' in view[0]:
            print("   Purpose: Weekly ridership patterns by day of week")
            print("   Use case: Understand weekday vs weekend usage patterns")
        elif 'station_utilization' in view[0]:
            print("   Purpose: Station performance and utilization metrics")
            print("   Use case: Identify high-traffic stations and optimize placement")
        elif 'monthly_kpi_summary' in view[0]:
            print("   Purpose: Monthly KPI summary with weather correlation")
            print("   Use case: Monthly business performance tracking")
        elif 'weather_impact_analysis' in view[0]:
            print("   Purpose: Weather impact on ridership by category")
            print("   Use case: Understand weather sensitivity for operations planning")
    
    cursor.close()
    conn.close()
    
except Exception as e:
    print(f"❌ Error listing views: {e}")


📋 Created Analytics Views:

🔍 daily_weather_rides
   Purpose: Daily weather-ridership correlation analysis
   Use case: Understand how weather affects daily ridership patterns

🔍 hourly_ridership_patterns
   Purpose: Hourly ridership patterns and peak time analysis
   Use case: Identify peak hours and optimize bike distribution

🔍 monthly_kpi_summary
   Purpose: Monthly KPI summary with weather correlation
   Use case: Monthly business performance tracking

🔍 station_utilization
   Purpose: Station performance and utilization metrics
   Use case: Identify high-traffic stations and optimize placement

🔍 weather_impact_analysis
   Purpose: Weather impact on ridership by category
   Use case: Understand weather sensitivity for operations planning

🔍 weekly_ridership_patterns
   Purpose: Weekly ridership patterns by day of week
   Use case: Understand weekday vs weekend usage patterns


In [9]:
# Test complex analytics query using multiple views
try:
    conn = psycopg2.connect(conn_string)
    cursor = conn.cursor()
    
    # Complex query: Weather impact on peak hour ridership
    complex_query = """
    SELECT 
        w.weather_category,
        h.hour_of_day,
        COUNT(*) as rides_at_peak_weather,
        ROUND(AVG(r.trip_duration_minutes), 2) as avg_duration,
        ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(PARTITION BY w.weather_category), 2) as percentage_of_weather_rides
    FROM rides r
    JOIN weather w ON r.date = w.date
    JOIN hourly_ridership_patterns h ON r.hour_of_day = h.hour_of_day
    WHERE w.weather_category IN ('Hot', 'Cold', 'Mild')
    AND h.hour_of_day IN (8, 9, 17, 18)  -- Peak hours
    GROUP BY w.weather_category, h.hour_of_day
    ORDER BY w.weather_category, h.hour_of_day;
    """
    
    cursor.execute(complex_query)
    results = cursor.fetchall()
    
    print("🔍 Complex Analytics Query Results:")
    print("Weather Impact on Peak Hour Ridership")
    print("=" * 50)
    
    for row in results:
        print(f"   {row[0]} weather at {row[1]:02d}:00: {row[2]} rides ({row[4]}% of {row[0]} rides)")
    
    cursor.close()
    conn.close()
    
except Exception as e:
    print(f"❌ Error executing complex query: {e}")


🔍 Complex Analytics Query Results:
Weather Impact on Peak Hour Ridership
   Cold weather at 08:00: 3058 rides (36.15% of Cold rides)
   Cold weather at 09:00: 1445 rides (17.08% of Cold rides)
   Cold weather at 17:00: 1808 rides (21.37% of Cold rides)
   Cold weather at 18:00: 2148 rides (25.39% of Cold rides)
   Hot weather at 08:00: 3280 rides (31.23% of Hot rides)
   Hot weather at 09:00: 1734 rides (16.51% of Hot rides)
   Hot weather at 17:00: 2579 rides (24.56% of Hot rides)
   Hot weather at 18:00: 2909 rides (27.70% of Hot rides)
   Mild weather at 08:00: 22794 rides (31.93% of Mild rides)
   Mild weather at 09:00: 11661 rides (16.33% of Mild rides)
   Mild weather at 17:00: 17608 rides (24.66% of Mild rides)
   Mild weather at 18:00: 19328 rides (27.07% of Mild rides)


## 📝 Phase 5 Summary

### Document Your Analytics Views:
1. **Views Created**: 
   - **6 comprehensive analytics views** for business intelligence
   - **Weather-ridership correlation** analysis
   - **Time-based patterns** (hourly, weekly, monthly)
   - **Station utilization** metrics
   - **KPI tracking** and performance monitoring

2. **Business Intelligence Capabilities**:
   - **Daily weather impact** on ridership patterns
   - **Peak hour identification** for operations optimization
   - **Station performance** analysis for placement decisions
   - **Monthly KPI tracking** with weather correlation
   - **Weather sensitivity** analysis for planning

3. **Analytics Features**:
   - **Complex JOIN queries** across multiple tables
   - **Aggregation functions** (COUNT, AVG, SUM, ROUND)
   - **Window functions** for percentage calculations
   - **CASE statements** for conditional analysis
   - **Date/time functions** for temporal analysis

4. **View Documentation**:
   - [x] daily_weather_rides - Weather-ridership correlation
   - [x] hourly_ridership_patterns - Peak time analysis
   - [x] weekly_ridership_patterns - Day-of-week patterns
   - [x] station_utilization - Station performance metrics
   - [x] monthly_kpi_summary - Business performance tracking
   - [x] weather_impact_analysis - Weather sensitivity analysis

### Business Value Delivered:
- **Operational Insights**: Peak hours, station utilization, weather impact
- **Strategic Planning**: Monthly trends, seasonal patterns, user behavior
- **Performance Monitoring**: KPI tracking, subscriber analysis, efficiency metrics
- **Data-Driven Decisions**: Weather-based operations, station optimization

### Next Steps:
- [x] Complete Phase 5 analytics views
- [ ] Move to Phase 6: Portfolio Documentation
- [ ] Create comprehensive project write-up
- [ ] Document technical decisions and business insights

---

**Excellent work!** Your analytics views provide comprehensive business intelligence capabilities. The bike rental company now has powerful tools for data-driven decision making!
