# Cars Dataset Analysis

This notebook analyzes vehicle specifications data including:
- Make and model information
- Engine specifications
- Performance metrics
- Origin and weight characteristics

In [16]:
import pandas as pd

# Load the dataset with error handling
try:
    df = pd.read_csv("Cars_Dataset.csv")
    print(f"Dataset loaded successfully with {df.shape[0]} rows and {df.shape[1]} columns")
except FileNotFoundError:
    print("Error: File not found. Please check the file path.")
except Exception as e:
    print(f"An error occurred while loading the dataset: {str(e)}")

Dataset loaded successfully with 432 rows and 15 columns


## Initial Data Exploration

In [17]:
print("\n=== First 5 Rows ===")
display(df.head())


=== First 5 Rows ===


Unnamed: 0,Make,Model,Type,Origin,DriveTrain,MSRP,Invoice,EngineSize,Cylinders,Horsepower,MPG_City,MPG_Highway,Weight,Wheelbase,Length
0,Acura,MDX,SUV,Asia,All,"$36,945","$33,337",3.5,6.0,265.0,17.0,23.0,4451.0,106.0,189.0
1,Acura,RSX Type S 2dr,Sedan,Asia,Front,"$23,820","$21,761",2.0,4.0,200.0,24.0,31.0,2778.0,101.0,172.0
2,Acura,TSX 4dr,Sedan,Asia,Front,"$26,990","$24,647",2.4,4.0,200.0,22.0,29.0,3230.0,105.0,183.0
3,Acura,TL 4dr,Sedan,Asia,Front,"$33,195","$30,299",3.2,6.0,270.0,20.0,28.0,3575.0,108.0,186.0
4,Acura,3.5 RL 4dr,Sedan,Asia,Front,"$43,755","$39,014",3.5,6.0,225.0,18.0,24.0,3880.0,115.0,197.0


In [18]:
print("\n=== Dataset Shape ===")
print(f"Total vehicles: {df.shape[0]}")
print(f"Features per vehicle: {df.shape[1]}")


=== Dataset Shape ===
Total vehicles: 432
Features per vehicle: 15


In [19]:
print("\n=== Column Information ===")
print(df.columns.tolist())


=== Column Information ===
['Make', 'Model', 'Type', 'Origin', 'DriveTrain', 'MSRP', 'Invoice', 'EngineSize', 'Cylinders', 'Horsepower', 'MPG_City', 'MPG_Highway', 'Weight', 'Wheelbase', 'Length']


In [20]:
print("\n=== Missing Values Summary ===")
print(df.isnull().sum())


=== Missing Values Summary ===
Make           4
Model          4
Type           4
Origin         4
DriveTrain     4
MSRP           4
Invoice        4
EngineSize     4
Cylinders      6
Horsepower     4
MPG_City       4
MPG_Highway    4
Weight         4
Wheelbase      4
Length         4
dtype: int64


## Data Cleaning

In [21]:
# Create copy to avoid SettingWithCopyWarning
df_clean = df.copy()

In [22]:
print("\n=== Before Cleaning ===")
print("Missing values per column:")
print(df_clean.isnull().sum())


=== Before Cleaning ===
Missing values per column:
Make           4
Model          4
Type           4
Origin         4
DriveTrain     4
MSRP           4
Invoice        4
EngineSize     4
Cylinders      6
Horsepower     4
MPG_City       4
MPG_Highway    4
Weight         4
Wheelbase      4
Length         4
dtype: int64


In [23]:
# Fill numeric columns with their mean
numeric_cols = df_clean.select_dtypes(include=['number']).columns
for col in numeric_cols:
    if df_clean[col].isnull().sum() > 0:
        col_mean = df_clean[col].mean()
        df_clean[col].fillna(col_mean, inplace=True)
        print(f"Filled {df_clean[col].isnull().sum()} missing values in {col} with mean {col_mean:.2f}")

Filled 0 missing values in EngineSize with mean 3.20
Filled 0 missing values in Cylinders with mean 5.81
Filled 0 missing values in Horsepower with mean 215.89
Filled 0 missing values in MPG_City with mean 20.06
Filled 0 missing values in MPG_Highway with mean 26.84
Filled 0 missing values in Weight with mean 3577.95
Filled 0 missing values in Wheelbase with mean 108.15
Filled 0 missing values in Length with mean 186.36


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_clean[col].fillna(col_mean, inplace=True)


In [24]:
print("\n=== After Cleaning ===")
print("Missing values per column:")
print(df_clean.isnull().sum())


=== After Cleaning ===
Missing values per column:
Make           4
Model          4
Type           4
Origin         4
DriveTrain     4
MSRP           4
Invoice        4
EngineSize     0
Cylinders      0
Horsepower     0
MPG_City       0
MPG_Highway    0
Weight         0
Wheelbase      0
Length         0
dtype: int64


## Data Analysis

In [25]:
# Manufacturer analysis
print("\n=== Vehicle Manufacturers ===")
make_counts = df['Make'].value_counts()
print(f"Number of unique manufacturers: {make_counts.shape[0]}")
print("\nTop manufacturers:")
display(make_counts.head(10))


=== Vehicle Manufacturers ===
Number of unique manufacturers: 38

Top manufacturers:


Make
Toyota           28
Chevrolet        27
Mercedes-Benz    26
Ford             23
BMW              20
Audi             19
Honda            17
Nissan           17
Volkswagen       15
Chrysler         15
Name: count, dtype: int64

In [26]:
# Regional analysis
print("\n=== Vehicles by Region ===")
origin_counts = df['Origin'].value_counts()
print("Vehicles per region:")
display(origin_counts)


=== Vehicles by Region ===
Vehicles per region:


Origin
Asia      158
USA       147
Europe    123
Name: count, dtype: int64

In [27]:
# Filtering examples
print("\n=== Asian and European Vehicles ===")
asia_europe = df[df['Origin'].isin(['Asia', 'Europe'])]
print(f"Found {asia_europe.shape[0]} vehicles from Asia or Europe")
display(asia_europe.head())


=== Asian and European Vehicles ===
Found 281 vehicles from Asia or Europe


Unnamed: 0,Make,Model,Type,Origin,DriveTrain,MSRP,Invoice,EngineSize,Cylinders,Horsepower,MPG_City,MPG_Highway,Weight,Wheelbase,Length
0,Acura,MDX,SUV,Asia,All,"$36,945","$33,337",3.5,6.0,265.0,17.0,23.0,4451.0,106.0,189.0
1,Acura,RSX Type S 2dr,Sedan,Asia,Front,"$23,820","$21,761",2.0,4.0,200.0,24.0,31.0,2778.0,101.0,172.0
2,Acura,TSX 4dr,Sedan,Asia,Front,"$26,990","$24,647",2.4,4.0,200.0,22.0,29.0,3230.0,105.0,183.0
3,Acura,TL 4dr,Sedan,Asia,Front,"$33,195","$30,299",3.2,6.0,270.0,20.0,28.0,3575.0,108.0,186.0
4,Acura,3.5 RL 4dr,Sedan,Asia,Front,"$43,755","$39,014",3.5,6.0,225.0,18.0,24.0,3880.0,115.0,197.0


In [28]:
# Weight analysis
print("\n=== Weight Distribution ===")
heavy_cars = df[df['Weight'] > 4000]
print(f"Found {heavy_cars.shape[0]} vehicles weighing >4000 units")
print("Removing these heavy vehicles...")
light_cars = df[df['Weight'] <= 4000]
print(f"New dataset has {light_cars.shape[0]} vehicles")


=== Weight Distribution ===
Found 103 vehicles weighing >4000 units
Removing these heavy vehicles...
New dataset has 325 vehicles


In [29]:
# MPG adjustment
print("\n=== Adjusting City MPG ===")
print("Increasing all MPG_City values by 3...")
df['MPG_City_Adjusted'] = df['MPG_City'] + 3
print("\nBefore adjustment:")
print(f"Average MPG City: {df['MPG_City'].mean():.2f}")
print("\nAfter adjustment:")
print(f"Average MPG City: {df['MPG_City_Adjusted'].mean():.2f}")


=== Adjusting City MPG ===
Increasing all MPG_City values by 3...

Before adjustment:
Average MPG City: 20.06

After adjustment:
Average MPG City: 23.06
