# Car Sales Analysis

This project analyzes car sales data to uncover patterns and insights in the automobile market.  
The analysis includes data cleaning, exploratory data analysis (EDA), and visualizations to highlight key trends such as popular brands, price distributions, and the effect of factors like fuel type and transmission.  


In [8]:
import pandas as pd

# Load dataset
df = pd.read_csv("car_dekho_cars.csv")

# check first 5 rows
df.head()


Unnamed: 0,name,year,selling_price,km_driven,fuel,seller_type,transmission,owner
0,Maruti 800 AC,2007,60000,70000,Petrol,Individual,Manual,First Owner
1,Maruti Wagon R LXI Minor,2007,135000,50000,Petrol,Individual,Manual,First Owner
2,Hyundai Verna 1.6 SX,2012,600000,100000,Diesel,Individual,Manual,First Owner
3,Datsun RediGO T Option,2017,250000,46000,Petrol,Individual,Manual,First Owner
4,Honda Amaze VX i-DTEC,2014,450000,141000,Diesel,Individual,Manual,Second Owner


In [25]:
# Checking info & size & types
df.info()

# checking for duplicates
print("\nDuplicated rows:",df.duplicated().sum())

# Count missing (NaN) values per column
print("\nMissing values per column:")
df.isnull().sum()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4340 entries, 0 to 4339
Data columns (total 8 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   name           4340 non-null   object
 1   year           4340 non-null   int64 
 2   selling_price  4340 non-null   int64 
 3   km_driven      4340 non-null   int64 
 4   fuel           4340 non-null   object
 5   seller_type    4340 non-null   object
 6   transmission   4340 non-null   object
 7   owner          4340 non-null   object
dtypes: int64(3), object(5)
memory usage: 271.4+ KB

Duplicated rows: 763

Missing values per column:


name             0
year             0
selling_price    0
km_driven        0
fuel             0
seller_type      0
transmission     0
owner            0
dtype: int64

In [28]:
# removing duplicates
df = df.drop_duplicates()

print("Duplicates after cleaning:", df.duplicated().sum())
print("New shape:", df.shape)


Duplicates after cleaning: 0
New shape: (3577, 8)


In [43]:
# how many unique car names
print("Unique car names:",df['name'].nunique())

# common car names
print("\n",df['name'].value_counts().head(10))

# Create a new column "brand" by taking the first word from 'name'
df['brand'] = df['name'].str.split().str[0]

# Check top 10 most common brands
print("\n",df['brand'].value_counts().head(10))


Unique car names: 1491

 name
Maruti Swift Dzire VDI      54
Maruti Alto 800 LXI         48
Maruti Alto LXi             42
Maruti Alto LX              30
Maruti Swift VDI BSIV       28
Hyundai EON Era Plus        28
Maruti Wagon R VXI BS IV    26
Maruti Swift VDI            23
Maruti Wagon R LXI Minor    21
Hyundai Santro Xing GLS     20
Name: count, dtype: int64

 brand
Maruti        1072
Hyundai        637
Mahindra       328
Tata           308
Ford           220
Honda          216
Toyota         170
Chevrolet      151
Renault        110
Volkswagen      93
Name: count, dtype: int64


In [53]:
import numpy as np

# prevent scientific notation in pandas
pd.set_option('display.float_format', '{:.0f}'.format)

# summary stats
print(df['selling_price'].describe())


count      3577
mean     473913
std      509302
min       20000
25%      200000
50%      350000
75%      600000
max     8900000
Name: selling_price, dtype: float64
