# Exploratory Data Analysis (EDA) of Ride-Hailing Dataset

This notebook presents an initial exploratory data analysis of the raw ride-hailing dataset used in the early stages of this research.
The goal is to understand the underlying patterns, distributions, and potential data quality issues before applying any preprocessing or modeling techniques.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style='whitegrid')

# Increase display width for large dataframes
pd.set_option('display.max_columns', None)

## Load Dataset

Ensure that the dataset file (`rides.csv`) is placed in the same directory as this notebook.

In [None]:
# Load dataset (replace 'rides.csv' with your actual file if different)
df = pd.read_csv('rides.csv')
df.shape

## Dataset Overview

In [None]:
df.head()

In [None]:
df.info()

In [None]:
df.describe()

## Missing Value Analysis

In [None]:
sns.heatmap(df.isnull(), cbar=False, cmap='viridis')
plt.title('Missing Values Heatmap')
plt.show()

## Feature Distributions

In [None]:
# Replace with actual column names if different
numeric_cols = df.select_dtypes(include=['number']).columns.tolist()
for col in numeric_cols:
    plt.figure(figsize=(6, 4))
    sns.histplot(df[col], bins=30, kde=True)
    plt.title(f'Distribution of {col}')
    plt.xlabel(col)
    plt.ylabel('Frequency')
    plt.tight_layout()
    plt.show()

## Correlation Analysis

In [None]:
plt.figure(figsize=(10, 8))
corr = df.corr()
sns.heatmap(corr, annot=True, fmt='.2f', cmap='coolwarm', square=True)
plt.title('Correlation Matrix')
plt.show()