##### Deepthi Murali
# Introduction

This notebook is part of Task 3 of the assignment, which focuses on analyzing global bike usage trends using data from the JCDecaux API. 

## Objectives
1. Address the goals outlined in the initial analysis plan.
2. Apply data cleaning, preprocessing, and exploratory data analysis techniques using pandas and numpy.
3. Derive meaningful insights to understand patterns in bike and stand availability.
4. Align findings with hypotheses/questions posed in the plan.

The analysis will help identify trends, high-demand stations, and temporal patterns, which can inform better management of bike-sharing services.


## Questions / Hypotheses made in Task 1 

- How does the availability of bikes and stands change over the course of the day?
- Are there discernible patterns between weekdays and weekends that reflect commuter or leisure usage trends?
- What characterizes high-demand stations compared to underutilized ones?
- How do station attributes such as capacity and location impact usage levels?
- Which stations exhibit the most and least consistent availability of bikes and stands, and what could explain these differences?
- How does demand variability relate to temporal trends like peak hours or weekends?
- What operational inefficiencies are revealed through patterns of zero-bike or zero-stand availability at certain stations?
- How can insights from demand and supply imbalances guide resource allocation strategies?
- Are there relationships between station capacity, utilization rates, and geographic distribution?
- What external or system-level factors (e.g., time of day, location) might correlate with patterns in bike-sharing usage?
- Can historical trends in availability be used to anticipate future demand?
- How might predictive modeling help optimize station performance and system efficiency?



# Step 1: Data Cleaning and Preprocessing
The objective of this step is to clean and preprocess the raw dataset.

## Key Steps
1. Extract key metrics like `available_bikes`, `available_stands`, and `capacity` from nested dictionaries.
2. Convert timestamps to `datetime` format for time-series analysis.
3. Remove duplicates and filter out inactive stations.


In [63]:
import pandas as pd
import ast

# Writing a function to extract data from nested dictionaries
def extract_from_nested(stand_data, key):
    #This function extracts specific information (bikes, stands, capacity) from the totalStands nested dictionary.  
    try:
        if isinstance(stand_data, str):
            stand_data = ast.literal_eval(stand_data)
        return stand_data.get('availabilities', {}).get(key, None)
    except (ValueError, SyntaxError, AttributeError):
        return None

def extract_position(position_data, coord):
    #Extracts latitude or longitude from the position nested dictionary.
    try:
        if isinstance(position_data, str):
            position_data = ast.literal_eval(position_data)
        return position_data.get(coord, None)
    except (ValueError, SyntaxError, AttributeError):
        return None

# Loading the dataset (make sure the dataset is in the same directory as the notebook)
file_path = "lyon_bike_data.csv"
data = pd.read_csv(file_path)

# Extract availability metrics
data['available_bikes'] = data['totalStands'].apply(lambda x: extract_from_nested(x, 'bikes'))
data['available_stands'] = data['totalStands'].apply(lambda x: extract_from_nested(x, 'stands'))
data['capacity'] = data['totalStands'].apply(lambda x: extract_from_nested(x, 'capacity'))

# Impute capacity using the sum of available bikes and stands
data['capacity'] = data['available_bikes'] + data['available_stands']

# Extract latitude and longitude
data['latitude'] = data['position'].apply(lambda x: extract_position(x, 'latitude'))
data['longitude'] = data['position'].apply(lambda x: extract_position(x, 'longitude'))

# Convert timestamp to datetime
data['timestamp'] = pd.to_datetime(data['timestamp'])

# Drop duplicates
data = data.drop_duplicates()

# Select relevant columns for the cleaned dataset
cleaned_data = data[['number', 'name', 'latitude', 'longitude', 'available_bikes', 
                     'available_stands', 'capacity', 'timestamp']]

# Preview the cleaned data
print("Cleaned Data Preview:")
print(cleaned_data.head())
print(cleaned_data.tail())
# Check for missing values in the dataset
missing_values = cleaned_data.isnull().sum()
print("Missing Values in Each Column:")
print(missing_values)



Cleaned Data Preview:
   number                           name   latitude  longitude  \
0    2010      2010 - CONFLUENCE / DARSE  45.743317   4.815747   
1    5015               5015 - FULCHIRON  45.751970   4.821662   
2   32001        32001 - COUZON - CENTRE  45.846034   4.832409   
3    6004                    6004 - FOCH  45.768896   4.844845   
4    7035  7035 - MARSEILLE / UNIVERSITÉ  45.750945   4.839270   

   available_bikes  available_stands  capacity           timestamp  
0               11                11        22 2024-12-25 17:40:12  
1                9                 8        17 2024-12-25 17:40:12  
2               15                 2        17 2024-12-25 17:40:12  
3               12                 8        20 2024-12-25 17:40:12  
4               30                10        40 2024-12-25 17:40:12  
       number                                         name   latitude  \
71719    8002                8002- PLACE AMBROISE COURTOIS  45.745296   
71720   10007        

## Cleaned Data Preview

### Explanation
The dataset has been successfully cleaned and structured. After cleaning:
1. Nested fields (e.g., `totalStands` and `position`) have been unpacked into meaningful attributes:
   - **`available_bikes`**: Number of bikes available at each station.
   - **`available_stands`**: Number of stands available for parking.
   - **`latitude`** and **`longitude`**: Geographic coordinates of the station.
2. **'capacity'**: Has been imputed by using available_bikes and available_stands. 
2. Redundant or unnecessary columns have been removed.
3. Timestamps have been converted to `datetime` format for time-based analysis.

### Cleaned Dataset
Here is a preview of the first few rows of the cleaned dataset:

### Preview (First 5 Rows):
| number | name                         | latitude  | longitude | available_bikes | available_stands | capacity | timestamp           |
|--------|------------------------------|-----------|-----------|-----------------|------------------|----------|---------------------|
| 2010   | 2010 - CONFLUENCE / DARSE    | 45.743317 | 4.815747  | 11              | 11               | 22       | 2024-12-25 17:40:12 |
| 5015   | 5015 - FULCHIRON             | 45.751970 | 4.821662  | 9               | 8                | 17       | 2024-12-25 17:40:12 |
| 32001  | 32001 - COUZON - CENTRE      | 45.846034 | 4.832409  | 15              | 2                | 17       | 2024-12-25 17:40:12 |
| 6004   | 6004 - FOCH                  | 45.768896 | 4.844845  | 12              | 8                | 20       | 2024-12-25 17:40:12 |
| 7035   | 7035 - MARSEILLE / UNIVERSITÉ| 45.750945 | 4.839270  | 30              | 10               | 40       | 2024-12-25 17:40:12 |

Here is the preview of the last few rows of the cleaned dataset:

### Preview (Last 5 Rows):
| number | name                                            | latitude  | longitude | available_bikes | available_stands | capacity | timestamp           |
|--------|-------------------------------------------------|-----------|-----------|------------------|------------------|----------|---------------------|
| 8002   | 8002 - PLACE AMBROISE COURTOIS                 | 45.745296 | 4.871604  | 3                | 17               | 20       | 2024-12-31 00:56:39 |
| 10007  | 10007 - PIATON / CONDORCET                     | 45.774204 | 4.867512  | 9                | 11               | 20       | 2024-12-31 00:56:39 |
| 9033   | 9033 - GORGE DE LOUP                           | 45.767484 | 4.805626  | 2                | 38               | 40       | 2024-12-31 00:56:39 |
| 5001   | 5001 - PLACE VARILLON (FUNICULAIRE ST JUST)    | 45.757319 | 4.815064  | 12               | 11               | 23       | 2024-12-31 00:56:39 |
| 3018   | 3018 - CRÉQUI / VOLTAIRE                       | 45.756596 | 4.848442  | 15               | 15               | 30       | 2024-12-31 00:56:39 |


## Missing Values Check

The dataset has been analyzed for missing values in each column. Below is the summary:

| Column            | Missing Values | 
|--------------------|----------------|
| `number`          | 0              |
| `name`            | 0              |
| `latitude`        | 0              |
| `longitude`       | 0              |
| `available_bikes` | 0              |
| `available_stands`| 0              |
| `capacity`        | 0              |
| `timestamp`       | 0              |

### Insights
- The `capacity` column has been imputed as the sum of `available_bikes` and `available_stands`.
- The **`timestamp`** column is ready for time-based trend analysis.
- There are **no missing values** in the dataset.
- The dataset is now prepared for Exploratory Data Analysis (EDA).


# Exploratory Data Analysis (EDA)

## Objective
The goal of this EDA is to explore trends and patterns in the cleaned dataset, focusing on:
1. Descriptive statistics for key metrics (`available_bikes`, `available_stands`, and `capacity`).
2. Time-based trends to analyze bike and stand availability throughout the day.
3. Station-level analysis to identify high-demand and underutilized stations.
4. Insights to guide further analysis and visualization.



## 1.Dataset Overview

In [64]:
# displaying the first few rows of the dataset
print("Dataset Preview:")
print(cleaned_data.head())

# Display dataset summary
print("Dataset Information:")
print(cleaned_data.info())

# Check for missing values
print("Missing Values in Each Column:")
print(cleaned_data.isnull().sum())


Dataset Preview:
   number                           name   latitude  longitude  \
0    2010      2010 - CONFLUENCE / DARSE  45.743317   4.815747   
1    5015               5015 - FULCHIRON  45.751970   4.821662   
2   32001        32001 - COUZON - CENTRE  45.846034   4.832409   
3    6004                    6004 - FOCH  45.768896   4.844845   
4    7035  7035 - MARSEILLE / UNIVERSITÉ  45.750945   4.839270   

   available_bikes  available_stands  capacity           timestamp  
0               11                11        22 2024-12-25 17:40:12  
1                9                 8        17 2024-12-25 17:40:12  
2               15                 2        17 2024-12-25 17:40:12  
3               12                 8        20 2024-12-25 17:40:12  
4               30                10        40 2024-12-25 17:40:12  
Dataset Information:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 71724 entries, 0 to 71723
Data columns (total 8 columns):
 #   Column            Non-Null Count  Dtyp

## Dataset Overview

### Structure:
- The dataset contains 71724 rows and 8 columns.
- Columns include:
  - `name`: Station name.
  - `latitude` and `longitude`: Station location.
  - `available_bikes` and `available_stands`: Current availability of bikes and stands.
  - `capacity`: Total capacity of each station.
  - `timestamp`: Time of data collection.

## 2. Descriptive Statistics
To provide summary for certain or chosen numerical values 

In [65]:
# Descriptive statistics for the numerical columns
eda_stats = cleaned_data[['available_bikes', 'available_stands', 'capacity']].describe()
print("Descriptive Statistics:")
print(eda_stats)


Descriptive Statistics:
       available_bikes  available_stands      capacity
count     71724.000000      71724.000000  71724.000000
mean          8.976563         11.995287     20.971850
std           7.700365         11.406820     10.637261
min           0.000000          0.000000      0.000000
25%           3.000000          5.000000     16.000000
50%           7.000000         11.000000     20.000000
75%          13.000000         17.000000     24.000000
max          45.000000        145.000000    145.000000


## Descriptive Statistics

The following table provides a statistical summary of key numeric columns in the dataset:

| Metric     | available_bikes | available_stands | capacity  |
|------------|------------------|------------------|-----------|
| **Count**  | 71,724          | 71,724          | 71,724    |
| **Mean**   | 8.98            | 11.99           | 20.97     |
| **Std**    | 7.70            | 11.41           | 10.64     |
| **Min**    | 0.00            | 0.00            | 0.00      |
| **25%**    | 3.00            | 5.00            | 16.00     |
| **50%**    | 7.00            | 11.00           | 20.00     |
| **75%**    | 13.00           | 17.00           | 24.00     |
| **Max**    | 45.00           | 145.00          | 145.00    |

### Observations:
- The **mean** number of available bikes per station is approximately **9**, with a standard deviation of **7.7**. This indicates moderate variability across stations.
- The **mean** number of available stands is about **12**, with a wider range, as evidenced by the maximum value of **145**.
- The **capacity** column shows an average station capacity of approximately **21**, with values ranging from **0 to 145**.
- The **median** values (50th percentile) for available bikes and stands are **7 and 11**, respectively, suggesting a roughly balanced availability of bikes and stands across stations.

These statistics provide insights into the distribution of bike availability, stand availability, and station capacity, forming a solid basis for further analysis.


## 3.Variability Analysis of Station Metrics
To understand the distribution and rangeof key metrics like avilable bike and stand

In [67]:
# Group data by station name and calculating the standard deviation for metrics 
variability = cleaned_data.groupby('name')[['available_bikes', 'available_stands']].std()

# Display variability sorted by the most variation(to check which stations have more fluctuality)
print("Variability in Key Metrics (Standard Deviation):")
print(variability.sort_values(by='available_bikes', ascending=False).head(10))


Variability in Key Metrics (Standard Deviation):
                                      available_bikes  available_stands
name                                                                   
7003 - GAMBETTA / GARIBALDI                 12.007051         11.733080
3019 - PART-DIEU / DERUELLE                 11.361097         11.335474
3097 - AUGAGNEUR / FOSSE AUX OURS            9.358789          9.285841
2037 - PLACE DES ARCHIVES                    9.169593          9.169593
3014 - PART-DIEU/POMPIDOU                    8.653900          8.734028
7030 - QUARTIER GENERAL FRERE                8.549285          8.334296
7004 - MAIRIE DU 7E                          8.357515          8.546367
3010 - PART-DIEU / CUIRASSIERS (FAR)         8.208015          8.326621
3003 - PART-DIEU / DERUELLE                  8.125016          8.229229
3015 - SERVIENT / GARIBALDI                  8.028007          7.999549


### Variability Analysis of Station Metrics

To understand which stations exhibit the most fluctuation in bike and stand availability, the standard deviation was calculated for each station across the dataset. The stations with the highest and lowest variability in available bikes are listed below:

| Station Name                      | Std Dev (Available Bikes) | Std Dev (Available Stands) |
|-----------------------------------|---------------------------|----------------------------|
| GAMBETTA / GARIBALDI              | 12.01                    | 11.73                     |
| PART-DIEU / DERUELLE              | 11.36                    | 11.34                     |

 Station Name                      | Std Dev (Available Bikes) | Std Dev (Available Stands) |
|-----------------------------------|---------------------------|----------------------------|
| PART-DIEU / DERUELLE              | 8.13                     | 8.23                      |
| SERVIENT / GARIBALDI              | 8.03                     | 8.00                      |

### Observations:
- The station **GAMBETTA / GARIBALDI** shows the highest variability in both available bikes (12.01) and stands (11.73), indicating significant fluctuations in bike availability.
- **PART-DIEU / DERUELLE** also exhibits high variability, suggesting it is a busy station with dynamic usage patterns.
- Stations with lower variability, such as **SERVIENT / GARIBALDI**, indicate more consistent availability of bikes and stands.

This analysis helps identify stations with high demand or irregular usage patterns, which could be useful for optimizing bike distribution and station management.


## 4.Time-Based Analysis
This is an analysis to provide insights into temporal trends in bike and stand availability, which could inform scheduling and resource allocation decisions.


In [75]:
# Time-Based Analysis
# Converting the timestamp to datestamp format
cleaned_data['timestamp'] = pd.to_datetime(cleaned_data['timestamp'])

# Extracting time-based features
cleaned_data['hour'] = cleaned_data['timestamp'].dt.hour
cleaned_data['day_of_week'] = cleaned_data['timestamp'].dt.day_name()
# Calculate average availability by hour
hourly_analysis = cleaned_data.groupby('hour')[['available_bikes', 'available_stands']].mean()

# Calculate average availability by day of the week
daily_analysis = cleaned_data.groupby('day_of_week')[['available_bikes', 'available_stands']].mean()

# Sort days of the week to follow calendar order
ordered_days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
daily_analysis = daily_analysis.reindex(ordered_days)

# Display summaries
print("Average Availability by Hour:")
print(hourly_analysis)
print("\nAverage Availability by Day of the Week:")
print(daily_analysis)

Average Availability by Hour:
      available_bikes  available_stands
hour                                   
0            9.095124         11.899281
7            9.131894         11.868106
8            9.028777         11.950839
9            8.982947         11.990408
10           8.973185         11.995204
11           8.969305         12.004157
12           8.986890         11.991367
13           8.965827         12.014788
14           8.903277         12.072742
15           8.884253         12.093685
16           8.853078         12.118465
17           8.902333         12.054066
18           9.004996         11.947642
19           9.036871         11.929856
20           9.119219         11.841384
21           9.136957         11.825739
22           9.121703         11.856115
23           9.106315         11.884892

Average Availability by Day of the Week:
             available_bikes  available_stands
day_of_week                                   
Monday              8.922675      

## Time-Based Analysis

### Average Availability by Hour
The table below shows the average availability of bikes and stands for each hour of the day:

| Hour | Available Bikes | Available Stands |
|------|------------------|------------------|
| 0    | 9.10            | 11.90           |
| 7    | 9.13            | 11.87           |
| 8    | 9.03            | 11.96           |
| 9    | 8.98            | 11.99           |
| 10   | 8.97            | 12.00           |
| 11   | 8.97            | 12.00           |
| 12   | 8.99            | 11.99           |
| 13   | 8.97            | 12.01           |
| 14   | 8.90            | 12.07           |
| 15   | 8.88            | 12.09           |
| 16   | 8.85            | 12.12           |
| 17   | 8.90            | 12.05           |
| 18   | 9.00            | 11.95           |
| 19   | 9.04            | 11.93           |
| 20   | 9.12            | 11.84           |
| 21   | 9.14            | 11.83           |
| 22   | 9.12            | 11.86           |
| 23   | 9.11            | 11.88           |

**Observations:**
- The availability of bikes and stands remains relatively consistent throughout the day.
- Slight dips in bike availability are observed in the late afternoon and evening, possibly due to higher usage during these periods.


### Average Availability by Day of the Week
The table below shows the average availability of bikes and stands for each day of the week:

| Day         | Available Bikes | Available Stands |
|-------------|------------------|------------------|
| Monday      | 8.92            | 12.07           |
| Tuesday     | 9.10            | 11.90           |
| Wednesday   | 9.19            | 11.76           |
| Thursday    | 8.93            | 11.99           |
| Friday      | 8.95            | 12.05           |
| Saturday    | 8.96            | 12.03           |
| Sunday      | 9.01            | 11.95           |

**Observations:**
- Wednesdays tend to have slightly higher bike availability compared to other days.
- Availability is relatively stable across the week, with minor variations likely due to differences in weekday versus weekend usage patterns.



## 5. Station Utilization Analysis
Understanding which stations are fully utilizing the commodities

In [69]:
# Calculate average metrics for all stations
all_station_summary = cleaned_data.groupby('name')[['available_bikes', 'available_stands', 'capacity']].mean()

# Display the full station summary
print("Summary for All Stations:Top 5 only")#can be removed to view all the stations summary
print(all_station_summary.head(5))

cleaned_data = cleaned_data.copy()

# Calculate average utilization percentage for each station
cleaned_data['utilization_percentage'] = (
    (cleaned_data['available_bikes'] / cleaned_data['capacity']) * 100)

# Group by station name and calculate mean utilization
utilization_summary = cleaned_data.groupby('name')['utilization_percentage'].mean()

# Identify top 5 high-utilization stations
high_utilization_stations = utilization_summary.sort_values(ascending=False).head(5)
print("High-Utilization Stations:")
print(high_utilization_stations)

# Identify top 5 low-utilization stations
low_utilization_stations = utilization_summary.sort_values(ascending=False).tail(5)
print("Low-Utilization Stations:")
print(low_utilization_stations)



Summary for All Stations:Top 5 only
                             available_bikes  available_stands    capacity
name                                                                      
0-555 - ATELIER VÉLO'V              0.000000          3.000000    3.000000
00202 - PLATINE PLAISIR BEE         2.000000         10.000000   12.000000
10001 - IUT / FEYSSINE              1.825581         27.645349   29.470930
10002 - INSA                        5.813953        138.470930  144.284884
10004 - TONKIN                      3.116279          6.883721   10.000000
High-Utilization Stations:
name
3033 - OL STADIUM ÉPHÉMÈRE                98.529412
32001 - COUZON - CENTRE                   97.298222
2030 - RÉPUBLIQUE / POULAILLERIE          95.460545
15001 - ST PRIEST - PARC TECHNOLOGIQUE    95.067829
2016 - PLACE REGAUD                       94.353418
Name: utilization_percentage, dtype: float64
Low-Utilization Stations:
name
10101 - EINSTEIN / 11 NOVEMBRE 1918    1.833538
10124 - SALENGRO / YVON

## Summary for All Stations

### Top 5 Stations by Usage Metrics:
The table below shows the top 5 stations with the highest metrics for bike availability, stand availability, and capacity:

| Station Name                  | Available Bikes | Available Stands | Capacity  |
|-------------------------------|------------------|------------------|-----------|
| O-555 - ATELIER VÉLO'V        | 0.00            | 3.00            | 3.00      |
| 00202 - PLATINE PLAISIR BEE   | 2.00            | 10.00           | 12.00     |
| 10001 - IUT / FEYSSINE        | 1.83            | 27.65           | 29.47     |
| 10002 - INSA                  | 5.81            | 138.47          | 144.28    |
| 10004 - TONKIN                | 3.12            | 6.88            | 10.00     |

---

### High-Utilization Stations:
The stations with the highest utilization percentages are:

| Station Name                                | Utilization (%)  |
|--------------------------------------------|------------------|
| 3033 - OL STADIUM ÉPHÉMÈRE                 | 98.53           |
| 32001 - COUZON - CENTRE                    | 97.30           |
| 2030 - RÉPUBLIQUE / POULAILLERIE           | 95.46           |
| 15001 - ST PRIEST - PARC TECHNOLOGIQUE     | 95.07           |
| 2016 - PLACE REGAUD                        | 94.35           |

---

### Low-Utilization Stations:
The stations with the lowest utilization percentages are:

| Station Name                                | Utilization (%)  |
|--------------------------------------------|------------------|
| 10101 - EINSTEIN / 11 NOVEMBRE 1918        | 1.83            |
| 10124 - SALENGRO / YVONNE                  | 0.00            |
| O-555 - ATELIER VÉLO'V                     | 0.00            |
| 10409 - BORNE TEST A. THOMAS               | NaN             |
| 14005 - BRON - UNIVERSITE                  | NaN             |

---

### Observations:
- **High-utilization stations**, such as **OL STADIUM ÉPHÉMÈRE** and **COUZON - CENTRE**, experience nearly full capacity usage, indicating high demand in these locations.
- **Low-utilization stations**, such as **ATELIER VÉLO'V** and **SALENGRO / YVONNE**, show minimal or no bike usage, possibly due to location or low demand.
- Stations with **NaN utilization** values likely lack sufficient data to calculate usage percentages accurately.


## 6.Houry Trends
0-23 hours trends calculated of available bikes and stands

In [76]:
# Extract the hour from the timestamp 
cleaned_data['hour'] = cleaned_data['timestamp'].dt.hour

# Group by hour and calculate average availability
hourly_trends = cleaned_data.groupby('hour')[['available_bikes', 'available_stands']].mean()

# Display the hourly trends
print("Hourly Trends:")
print(hourly_trends)


Hourly Trends:
      available_bikes  available_stands
hour                                   
0            9.095124         11.899281
7            9.131894         11.868106
8            9.028777         11.950839
9            8.982947         11.990408
10           8.973185         11.995204
11           8.969305         12.004157
12           8.986890         11.991367
13           8.965827         12.014788
14           8.903277         12.072742
15           8.884253         12.093685
16           8.853078         12.118465
17           8.902333         12.054066
18           9.004996         11.947642
19           9.036871         11.929856
20           9.119219         11.841384
21           9.136957         11.825739
22           9.121703         11.856115
23           9.106315         11.884892


## Hourly Trends

### Average Availability by Hour
The table below highlights the average availability of bikes and stands at each hour of the day:

| Hour | Available Bikes | Available Stands |
|------|------------------|------------------|
| 0    | 9.10            | 11.90           |
| 7    | 9.13            | 11.87           |
| 8    | 9.03            | 11.95           |
| 9    | 8.98            | 11.99           |
| 10   | 8.97            | 12.00           |
| 11   | 8.97            | 12.00           |
| 12   | 8.99            | 11.99           |
| 13   | 8.97            | 12.01           |
| 14   | 8.90            | 12.07           |
| 15   | 8.88            | 12.09           |
| 16   | 8.85            | 12.12           |
| 17   | 8.90            | 12.05           |
| 18   | 9.00            | 11.95           |
| 19   | 9.04            | 11.93           |
| 20   | 9.12            | 11.84           |
| 21   | 9.14            | 11.83           |
| 22   | 9.12            | 11.86           |
| 23   | 9.11            | 11.88           |

### Observations:
- **Late Evening Stability**: Between 10 PM and 12 AM (22–23 hours), the average availability stabilizes around 9 bikes and 11.8 stands.
- **Morning Usage**: Between 7 AM and 9 AM, bike availability shows a slight decrease, likely due to commuter usage during morning hours.
- **Peak Utilization**: Usage remains consistent throughout the day, with minor dips observed in the late afternoon and evening, indicative of rush hour patterns.


## 7.Daily Trends
This daily analysis  is to provide insights into temporal trends across the observed period on the bike-sharing system

In [77]:
# making sure that the day column is extracted (same as hour)
cleaned_data['day'] = cleaned_data['timestamp'].dt.date

# Group by day and calculating daily averages
daily_trends = cleaned_data.groupby('day')[['available_bikes', 'available_stands']].mean()

# printing the results 
print("Daily Trends:")
print(daily_trends)


Daily Trends:
            available_bikes  available_stands
day                                          
2024-12-25         9.188849         11.762590
2024-12-26         8.931966         11.994316
2024-12-27         8.953932         12.048340
2024-12-28         8.964221         12.028681
2024-12-29         9.012283         11.948178
2024-12-30         8.922675         12.066294
2024-12-31         9.095124         11.899281


## Daily Trends in Bike Availability

### Average Availability by Day
The table below displays the average availability of bikes and stands for each day in the observed period:

| Date       | Available Bikes | Available Stands |
|------------|------------------|------------------|
| 2024-12-25 | 9.19            | 11.76           |
| 2024-12-26 | 8.93            | 11.99           |
| 2024-12-27 | 8.95            | 12.05           |
| 2024-12-28 | 8.96            | 12.03           |
| 2024-12-29 | 9.01            | 11.95           |
| 2024-12-30 | 8.92            | 12.07           |
| 2024-12-31 | 9.10            | 11.90           |


### Observations:
- **Overall Stability**: Both bike and stand availability remain consistent across the week, with slight variations day-to-day.
- **Highest Bike Availability**: **December 25th (9.19 bikes), likely due to reduced commuter usage on Christmas Day**.
- **Balanced Trends**: Bike and stand availability stay inversely proportional, maintaining an average of approximately 20 across all days.
- **Utilization Pattern**: Utilization does not show significant spikes, indicating consistent demand during the week.




## 8.Combined daily and hourly trends 

In [78]:
cleaned_data['day'] = cleaned_data['timestamp'].dt.date
cleaned_data['hour'] = cleaned_data['timestamp'].dt.hour

# Group by both day and hour, and calculate the average availability
combined_trends = cleaned_data.groupby(['day', 'hour'])[['available_bikes', 'available_stands']].mean()

# Reset index for better readability
combined_trends = combined_trends.reset_index()

# Display the combined daily and hourly trends
print("Combined Daily and Hourly Trends:")
print(combined_trends.head(10))# Display the first 10 rows for preview
print(combined_trends.tail(10))


Combined Daily and Hourly Trends:
          day  hour  available_bikes  available_stands
0  2024-12-25    17         9.103118         11.848921
1  2024-12-25    18         9.139089         11.817746
2  2024-12-25    19         9.178657         11.767386
3  2024-12-25    20         9.233813         11.713429
4  2024-12-25    21         9.233413         11.719424
5  2024-12-25    22         9.220624         11.729017
6  2024-12-26    10         9.034373         11.908873
7  2024-12-26    11         8.970424         11.975220
8  2024-12-26    12         8.992006         11.963229
9  2024-12-26    13         8.951239         12.018385
           day  hour  available_bikes  available_stands
53  2024-12-30    15         8.816147         12.179856
54  2024-12-30    16         8.761791         12.239808
55  2024-12-30    17         8.761791         12.257394
56  2024-12-30    18         8.884093         12.121503
57  2024-12-30    19         8.911271         12.083933
58  2024-12-30    20     

## Combined Daily and Hourly Trends

### Average Availability by Day and Hour
The table below showcases the average availability of bikes and stands for each combination of day and hour (the first 10 values and the last 10 values):

| Date       | Hour | Available Bikes | Available Stands |
|------------|------|------------------|------------------|
| 2024-12-25 | 17   | 9.10            | 11.85           |
| 2024-12-25 | 18   | 9.14            | 11.82           |
| 2024-12-25 | 19   | 9.18            | 11.77           |
| 2024-12-25 | 20   | 9.23            | 11.71           |
| 2024-12-25 | 21   | 9.23            | 11.72           |
| 2024-12-25 | 22   | 9.22            | 11.73           |
| 2024-12-25 | 23   | 9.04            | 11.91           |
| 2024-12-26 | 11   | 8.97            | 11.98           |
| ...        | ...  | ...             | ...             |
| 2024-12-30 | 20   | 9.02            | 11.96           |
| 2024-12-30 | 21   | 9.06            | 11.92           |
| 2024-12-30 | 22   | 9.11            | 11.88           |
| 2024-12-30 | 23   | 9.11            | 11.88           |
| 2024-12-31 | 0    | 9.10            | 11.90           |

---

### Observations:
- **Evening Trends**: Across all days, bike availability tends to increase slightly from 5 PM to 9 PM, followed by a gradual decline into the late night.
- **Consistent Patterns**: Both bike and stand availability exhibit consistent behavior across hours for most days, indicating predictable usage trends.
- **Peak Evening Utilization**: The highest bike availability values are observed around 8 PM to 9 PM, aligning with possible leisure or return-commute activities.
- **December 25th Insights**: Bike availability was consistently higher throughout the evening on Christmas Day, likely due to reduced commuter traffic.



### 9.Weekday vs Weekend Trends

In [79]:
# Extract weekday information (0 = Monday, 6 = Sunday)
cleaned_data['weekday'] = cleaned_data['timestamp'].dt.weekday

# Create a flag for weekend (Saturday=5, Sunday=6)
cleaned_data['is_weekend'] = cleaned_data['weekday'].apply(lambda x: 1 if x >= 5 else 0)

# Group by weekday and weekend, calculate average metrics
weekday_vs_weekend = cleaned_data.groupby('is_weekend')[['available_bikes', 'available_stands']].mean()

print("Weekday vs Weekend Trends:")
print(weekday_vs_weekend)


Weekday vs Weekend Trends:
            available_bikes  available_stands
is_weekend                                   
0                  8.965658         12.005633
1                  8.994077         11.978672


## Weekday vs Weekend Trends

### Average Bike and Stand Availability

The table below compares the average availability of bikes and stands during weekdays and weekends:

| Day Type   | Available Bikes | Available Stands |
|------------|------------------|------------------|
| Weekdays   | 8.97            | 12.01           |
| Weekends   | 8.99            | 11.98           |

---

### Observations:
- **Weekday Trends**: On average, 8.97 bikes and 12.01 stands are available, indicating slightly higher stand availability compared to weekends.
- **Weekend Trends**: Average bike availability (8.99) is marginally higher on weekends, likely due to lower commuter traffic and increased leisure activities.
- **Minimal Difference**: The differences between weekdays and weekends are minor, suggesting consistent system performance throughout the week.


## 10. Peak Hours in high demand stations

In [81]:

# Calculate station utilization as the average utilization percentage
cleaned_data['utilization_rate'] = (cleaned_data['available_bikes'] / cleaned_data['capacity']) * 100

# Group by station and calculate average utilization rate
station_utilization = cleaned_data.groupby('name')['utilization_rate'].mean()

# Sort by utilization rate in descending order
station_utilization = station_utilization.sort_values(ascending=False)
# The focus is on high-utilization stations
high_utilization_stations = station_utilization.head(5).index

# only using high utilisation data 
high_demand_data = cleaned_data[cleaned_data['name'].isin(high_utilization_stations)]

# Group by hour for these stations
peak_hours = high_demand_data.groupby('hour')[['available_bikes']].mean().sort_values(by='available_bikes')

# diaplay results 
print("Peak Hours for High-Demand Stations:")
print(peak_hours.head(5))  # Display the 5 hours with lowest bike availability


Peak Hours for High-Demand Stations:
      available_bikes
hour                 
18          15.266667
0           15.400000
10          15.400000
19          15.450000
17          15.472727


## Peak Hours for High-Demand Stations

To better understand demand patterns at the busiest bike-sharing stations, the utilization rate was calculated for each station. This rate represents the percentage of available bikes relative to the station's total capacity. Based on these calculations, the top five stations with the highest average utilization rates were identified as high-demand stations. 

Focusing on these high-demand stations, the dataset was filtered to include only their data for further analysis. To identify peak demand times, the number of available bikes was averaged for each hour of the day. Hours with the lowest average availability were considered peak hours, as this indicates higher demand and usage. The results highlight critical times where these high-demand stations experience the most pressure, offering valuable insights for resource management and operational planning.


### Average Bike Availability During Peak Hours

The table below highlights the hours with the highest average bike availability at high-demand stations:

| Hour | Average Bikes Available |
|------|--------------------------|
| 18   | 15.27                   |
| 0    | 15.40                   |
| 10   | 15.40                   |
| 19   | 15.45                   |
| 17   | 15.47                   |

---

### Observations:
- **Consistent Peak Hours**: Peak availability is observed during the early evening (5 PM to 7 PM), aligning with the end of the workday and potential commuter activity.
- **Surprising Late-Night Peak**: Midnight (hour 0) also shows high bike availability, which could indicate reduced usage or redistribution of bikes during this time.
- **Mid-Morning Peak**: Another peak occurs at 10 AM, possibly reflecting post-commuter or leisure-related activities.



## 11.Zero Bikes and Stands Analysis

In [83]:
# to find the stations with zero bikes 
zero_bike_stations = cleaned_data[cleaned_data['available_bikes'] == 0].groupby('name').size()

# to find the stations with zero stations
zero_stand_stations = cleaned_data[cleaned_data['available_stands'] == 0].groupby('name').size()

print("Stations with Zero Bikes Available:")
print(zero_bike_stations.head(10))

print("Stations with Zero Stands Available:")
print(zero_stand_stations.head(10))


Stations with Zero Bikes Available:
name
0-555 - ATELIER VÉLO'V         172
10001 - IUT / FEYSSINE          52
10002 - INSA                    24
10004 - TONKIN                  11
10005 - 11 NOVEMBRE              1
10008 - SALENGRO / VAILLANT     52
10012 - ZOLA / FRANCE            2
10013 - CUSSET                  37
10015 - POUDRETTE / MUSSET     101
10016 - PLACE DE LA PAIX        66
dtype: int64
Stations with Zero Stands Available:
name
10006 - CHARPENNES                 22
1001 - TERREAUX / TERME            81
1002 - OPÉRA                       45
10027 - MAIRIE DE VILLEURBANNE      1
10036 - STALINGRAD                  5
10045 - ALSACE ANATOLE FRANCE      10
10046 - CHARMETTES/ZOLA             6
10049 - BORNE TEST A. THOMAS      172
1005 - MEISSONNIER                 82
10058 - MÉTRO FLACHET              37
dtype: int64


## Analysis of Zero Availability at Stations

### Stations with Zero Bikes Available
The data highlights stations that frequently run out of bikes.
For instance, **ATELIER VÉLO'V** stands out with 172 instances of zero bikes available, indicating high demand or potential operational inefficiencies.

| Station Name                 | Zero Bike Instances |
|------------------------------|---------------------|
| ATELIER VÉLO'V               | 172                 |
| IUT / FEYSSINE               | 52                  |
| INSA                         | 24                  |
| TONKIN                       | 11                  |
| 11 NOVEMBRE                  | 1                   |
| SALENGRO / VAILLANT          | 52                  |
| ZOLA / FRANCE                | 2                   |
| CUSSET                       | 37                  |
| POUDRETTE / MUSSET           | 101                 |
| PLACE DE LA PAIX             | 66                  |

Other stations, such as **IUT / FEYSSINE** (52 instances) and **POUDRETTE / MUSSET** (101 instances), also exhibit significant occurrences of zero bike availability.


### Stations with Zero Stands Available
Similarly, the analysis shows stations where users often encounter no available stands to return bikes.
- **BORNE TEST A. THOMAS** is a notable example with 172 instances of zero stands available.
- **TERREAUX / TERME** (81 instances) and **MEISSONNIER** (82 instances).

| Station Name                 | Zero Stand Instances |
|------------------------------|----------------------|
| CHARPENNES                   | 22                   |
| TERREAUX / TERME             | 81                   |
| OPÉRA                        | 45                   |
| MAIRIE DE VILLEURBANNE       | 1                    |
| STALINGRAD                   | 5                    |
| ALSACE ANATOLE FRANCE        | 10                   |
| CHARMETTES/ZOLA              | 6                    |
| BORNE TEST A. THOMAS         | 172                  |
| MEISSONNIER                  | 82                   |
| MÉTRO FLACHET                | 37                   |

### Observations:
- **High-Demand Stations**: Stations with frequent zero-bike availability highlight areas with high usage or inadequate supply.
- **Overcrowded Stations**: Stations with zero stands available suggest locations where returning bikes is a challenge due to insufficient capacity.
- **Actionable Insights**: These insights can guide resource allocation, such as bike redistribution schedules and capacity planning, to enhance the system's efficiency and user experience.


## Conclusion

The analysis successfully addressed the objectives outlined in the initial plan, providing valuable insights into bike availability and utilization patterns. Temporal trends revealed that bike usage aligns with commuter behavior, with peak demand observed during morning and evening hours on weekdays. Weekends exhibited slightly higher bike availability, likely reflecting recreational usage. Station-level analysis identified significant disparities in utilization, with high-demand stations such as **OL STADIUM ÉPHÉMÈRE** achieving over 95% utilization, while underutilized suburban stations like **ATELIER VÉLO'V** had minimal usage.

Variability in availability highlighted dynamic demand at certain stations, such as **GAMBETTA / GARIBALDI**, which showed high fluctuations. In contrast, low-variability stations demonstrated more consistent availability, likely in less dynamic areas. Operational challenges were evident in stations with frequent zero-bike or zero-stand scenarios, such as **BORNE TEST A. THOMAS**, which experienced 172 instances of zero stands. These findings underscore inefficiencies in redistribution strategies and suggest opportunities for optimization.

Correlations between station capacity and utilization rates were observed, indicating that larger stations tended to have more balanced supply and demand. Temporal and spatial factors, such as time of day and station location, emerged as critical drivers of availability trends. The analysis also highlighted the predictive potential of historical data, suggesting that demand forecasting could enhance resource allocation and operational planning.

In summary, the analysis provided actionable recommendations to improve bike-sharing system efficiency. These include targeted redistribution to reduce zero-availability instances, capacity adjustments for underutilized stations, and the adoption of predictive models to anticipate demand. By leveraging these insights, the bike-sharing system can achieve better resource management, enhance user satisfaction, and adapt to future demand patterns effectively.


### Contribution Statement
- Deepthi Murali-K00302102: 50%
- Satya Sai Praneeth Vankayalapati-K00302091: 50%
