# Bike Prices EDA Project

This notebook performs exploratory data analysis (EDA) on bike listings scraped from BikesWale.com.

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import warnings
warnings.filterwarnings('ignore')

# Load dataset
df = pd.read_csv("Bikes_Data.csv")
df.head()

## 1. Dataset Overview

In [None]:
df.info()

## 2. Missing Values

In [None]:
df.isnull().sum()

## 3. Duplicate Rows

In [None]:
df.duplicated().sum()

## 4. Data Cleaning - Price Column

In [None]:
# Remove currency symbol and commas from Price
df['Price'] = df['Price'].replace({'₹': '', ',': ''}, regex=True).astype(float)
df.head()

## 5. Univariate Analysis

In [None]:
numerical_cols = ['CC', 'BHP', 'Mileage', 'Price', 'Weight', 'Rating']
for col in numerical_cols:
    plt.figure(figsize=(6, 4))
    sns.histplot(df[col], kde=True)
    plt.title(f'Distribution of {col}')
    plt.show()

## 6. Bivariate Analysis

In [None]:
# Price vs CC
sns.scatterplot(data=df, x='CC', y='Price')
plt.title('Price vs CC')
plt.show()

# Boxplot: Price by Brand
plt.figure(figsize=(10, 6))
sns.boxplot(data=df, x='Brand', y='Price')
plt.xticks(rotation=45)
plt.title('Price by Brand')
plt.show()

## 7. Multivariate Analysis

In [None]:
# Pairplot of numerical columns
sns.pairplot(df[numerical_cols])
plt.show()

# Heatmap of correlations
plt.figure(figsize=(10, 6))
sns.heatmap(df[numerical_cols].corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

## 8. Conclusion

This EDA helped uncover key patterns in bike pricing, engine capacity, and brand influence. This insight can guide decision-making for buyers or sellers in the second-hand bike market.