# Exploratory Data Analysis (EDA)

In this notebook, we will perform exploratory data analysis on the bike-sharing rental demand dataset. We will examine the structure of the dataset, clean the data, handle missing values, and detect outliers.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualisation style
sns.set(style='whitegrid')

In [2]:
# Load the dataset
data_path = '../data/processed/bike_sharing_data.csv'
df = pd.read_csv(data_path)

# Display the first few rows of the dataset
df.head()

In [3]:
# Check the structure of the dataset
df.info()

In [4]:
# Check for missing values
missing_values = df.isnull().sum()
missing_values[missing_values > 0]

In [5]:
# Handle missing values (if any)
# For example, we can fill missing values with the mean or drop them
# df.fillna(df.mean(), inplace=True)
# or
# df.dropna(inplace=True)

# Display summary statistics
df.describe()

In [6]:
# Detect outliers using boxplots
plt.figure(figsize=(12, 6))
sns.boxplot(data=df)
plt.title('Boxplot of Features to Detect Outliers')
plt.xticks(rotation=45)
plt.show()

In [7]:
# Visualize the distribution of the target variable
plt.figure(figsize=(8, 4))
sns.histplot(df['count'], bins=30, kde=True)
plt.title('Distribution of Bike Rental Counts')
plt.xlabel('Count')
plt.ylabel('Frequency')
plt.show()

## Correlation Analysis
We will analyze the correlation between features and the target variable to understand which features are most influential.

In [8]:
# Correlation matrix
plt.figure(figsize=(12, 8))
correlation_matrix = df.corr()
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm', square=True)
plt.title('Correlation Matrix')
plt.show()

## Conclusion
In this EDA, we have examined the dataset's structure, handled missing values, detected outliers, and analyzed correlations. This analysis will guide us in feature engineering and model building in subsequent notebooks.