# Bergen Bysykkel 2024: Weather and Usage Analysis in Python
**Author**: Syed Amjad Ali

---

## Introduction

This notebook analyzes Bergen Bysykkel rental patterns using 2024 data, integrating weather data for advanced predictive modeling. It aims to:
- Understand bike usage trends by station and time of day.
- Predict hourly ride counts to optimize station capacity.
- Explore weather's impact on ride patterns.

---

## Table of Contents
1. [Introduction](#introduction)
2. [Part 1: Exploratory Analysis](#part-1-exploratory-analysis)
    - [Data Collection](#data-collection)
    - [Data Cleaning](#data-cleaning)
3. [Part 2: Predictive Analytics](#part-2-predictive-analytics)
    - [Regression Models](#regression-models)
    - [Weather Data Integration](#weather-data-integration)
4. [Conclusion](#conclusion)



## Introduction
<a id="introduction"></a>

This notebook analyzes Bergen Bysykkel rental patterns using 2024 data, integrating weather data for advanced predictive modeling. It aims to:
- Understand bike usage trends by station and time of day.
- Predict hourly ride counts to optimize station capacity.
- Explore weather's impact on ride patterns.


## Part 1: Exploratory Analysis
<a id="part-1-exploratory-analysis"></a>



In [None]:
# Importing required libraries for data manipulation and visualization
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set global plotting style
sns.set(style="whitegrid")




### Data Collection
<a id="data-collection"></a>


In [None]:
# Load CSV files (placeholder for combining monthly data)
# Replace 'path/to/files' with the actual path to your datasets
file_list = [f"path/to/files/month_{i}.csv" for i in range(1, 13)]
bike_data = pd.concat([pd.read_csv(file) for file in file_list], ignore_index=True)

print("Bike data successfully loaded and combined.")

### Data Cleaning
<a id="data-cleaning"></a>



In [None]:
# Convert timestamps and add derived features
bike_data['started_at'] = pd.to_datetime(bike_data['started_at'])
bike_data['hour'] = bike_data['started_at'].dt.hour
bike_data['weekday'] = bike_data['started_at'].dt.dayofweek  # Monday = 0, Sunday = 6

# Aggregate ride counts by station, date, and hour
ride_counts = bike_data.groupby(['start_station_id', 'weekday', 'hour']).size().reset_index(name='ride_count')

print(ride_counts.head())


### Exploratory Data Analysis (EDA)
- Visualize hourly ride patterns.
- Identify station-level trends.


In [None]:
# Plot hourly ride counts
plt.figure(figsize=(12, 6))
sns.lineplot(data=ride_counts, x='hour', y='ride_count', hue='weekday', palette='viridis')
plt.title('Hourly Ride Counts by Weekday')
plt.xlabel('Hour of the Day')
plt.ylabel('Ride Count')
plt.legend(title='Weekday', labels=['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'])
plt.show()


## Part 2: Predictive Analytics
<a id="part-2-predictive-analytics"></a>



In [None]:
# Importing required libraries for machine learning
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Prepare data for modeling
X = ride_counts[['weekday', 'hour']]  # Features
y = ride_counts['ride_count']  # Target variable

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

# Fit a simple linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print(f"Mean Squared Error: {mean_squared_error(y_test, y_pred):.2f}")
print(f"R² Score: {r2_score(y_test, y_pred):.2f}")


### Regression Models
<a id="regression-models"></a>


### Weather Data Integration
<a id="weather-data-integration"></a>


In [None]:
# Load weather data
# Replace 'path/to/weather_data.csv' with the actual path
weather_data = pd.read_csv("path/to/weather_data.csv")

# Merge weather data with ride counts
ride_weather_data = pd.merge(ride_counts, weather_data, on=['date', 'hour'], how='inner')

print("Combined data preview:")
print(ride_weather_data.head())


### Results and Insights
- Evaluate the impact of weather on ride patterns.
- Visualize model predictions and compare with actual data.


In [None]:
# Visualize predictions vs actual values
plt.figure(figsize=(12, 6))
plt.scatter(y_test, y_pred, alpha=0.6)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], '--', color='red')
plt.title('Predicted vs Actual Ride Counts')
plt.xlabel('Actual Ride Counts')
plt.ylabel('Predicted Ride Counts')
plt.show()


## Conclusion
<a id="conclusion"></a>

