# EDA on Bike Details Dataset



This dataset focuses on used bikes and includes various attributes to analyze the second-hand bike market.
It includes columns like `name`, `selling_price`, `year`, `seller_type`, `owner`, `km_driven`, and `ex_showroom_price`.

**Source:** Simulated dataset commonly observed in real-world online bike resale platforms.


### 1. What is the range of selling prices in the dataset?

In [None]:
df['selling_price'].min(), df['selling_price'].max()

### 2. What is the median selling price for bikes in the dataset?

In [None]:
df['selling_price'].median()

### 3. What is the most common seller type?

In [None]:
df['seller_type'].value_counts().idxmax()

### 4. How many bikes have driven more than 50,000 kilometers?

In [None]:
(df['km_driven'] > 50000).sum()

### 5. What is the average km_driven value for each ownership type?

In [None]:
df.groupby('owner')['km_driven'].mean()

### 6. What proportion of bikes are from the year 2015 or older?

In [None]:
(df['year'] <= 2015).mean()

### 7. What is the trend of missing values across the dataset?

In [None]:
df.isnull().sum()

### 8. What is the highest ex_showroom_price recorded, and for which bike?

In [None]:
df.loc[df['ex_showroom_price'].idxmax()][['name', 'ex_showroom_price']]

### 9. What is the total number of bikes listed by each seller type?

In [None]:
df['seller_type'].value_counts()

### 10. Relationship between selling_price and km_driven for first-owner bikes

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
first_owner_bikes = df[df['owner'] == '1st owner']
sns.scatterplot(data=first_owner_bikes, x='km_driven', y='selling_price')
plt.title('Selling Price vs KM Driven (1st Owner Bikes)')
plt.show()

### 11. Identify and remove outliers in the km_driven column using the IQR method

In [None]:
Q1 = df['km_driven'].quantile(0.25)
Q3 = df['km_driven'].quantile(0.75)
IQR = Q3 - Q1
filtered_df = df[(df['km_driven'] >= (Q1 - 1.5 * IQR)) & (df['km_driven'] <= (Q3 + 1.5 * IQR))]
filtered_df.shape

### 12. Visualize the relationship between year and selling_price

In [None]:
sns.boxplot(data=df, x='year', y='selling_price')
plt.xticks(rotation=90)
plt.title('Selling Price by Year')
plt.show()

### 13. What is the average depreciation in selling price based on the bike's age?

In [None]:
df['bike_age'] = 2025 - df['year']
df['depreciation'] = df['ex_showroom_price'] - df['selling_price']
df.groupby('bike_age')['depreciation'].mean()

### 14. Bikes priced significantly above the average price for their manufacturing year

In [None]:
avg_price_by_year = df.groupby('year')['selling_price'].mean()
df[df['selling_price'] > df['year'].map(avg_price_by_year) * 1.5][['name', 'year', 'selling_price']]

### 15. Correlation matrix for numeric columns visualized using a heatmap

In [None]:
sns.heatmap(df.corr(numeric_only=True), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()