# **Project Name**    - Automobile Data Analysis



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Name** - Sitesh Mishra

# **Project Summary -**

This project focuses on the detailed analysis of an automobile dataset using Python’s core data manipulation and visualization libraries, including Pandas, NumPy, and Matplotlib. The dataset consists of various automobile specifications such as price, body style, engine type, horsepower, fuel efficiency, and other key metrics. The primary objective of this project is to explore relationships between different features and derive meaningful insights that can guide decision-making for both automobile manufacturers and consumers.

The first step involved understanding the structure of the dataset, identifying the types of variables, and checking for inconsistencies. The dataset required a series of data cleaning processes, as it contained missing values and improperly formatted data entries. Special characters such as '?' were replaced or removed, and numerical columns that were incorrectly labeled as objects were converted to appropriate numeric formats. This ensured the data was consistent and ready for analysis.

Following the cleaning phase, exploratory data analysis (EDA) was performed to uncover patterns and trends. Various types of charts such as bar graphs, histograms, scatter plots, and correlation matrices were used to visualize the data. For instance, it was found that variables like engine size, horsepower, and curb weight had a strong positive correlation with car price. This insight indicates that higher-spec vehicles tend to be more expensive, which aligns with general market trends. The analysis also revealed that body styles like hatchbacks and sedans are the most common, reflecting their affordability and popularity.

One of the significant observations from the project was the variation in car prices based on manufacturers. Brands like BMW, Audi, and Mercedes-Benz dominate the higher price range, while companies like Toyota and Honda offer more budget-friendly options. Additionally, diesel vehicles, though fewer in number, often provide better fuel economy compared to their gasoline counterparts. These patterns can help businesses focus their marketing efforts and product development strategies more effectively.

In conclusion, this project demonstrated how powerful Python tools can be used to perform structured, insightful analysis on real-world data. By cleaning and visualizing the automobile dataset, we were able to extract useful trends that not only explain market behavior but also support informed business and consumer decisions. The project serves as a practical example of how data analysis can turn raw data into valuable strategic intelligence.



# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


The automobile industry generates complex data on vehicle specifications, pricing, and performance, but raw datasets often contain missing values and inconsistencies. This project aims to clean and analyze an automobile dataset to discover how features like engine size, horsepower, fuel type, and body style impact vehicle pricing and efficiency. The goal is to extract insights that help manufacturers, marketers, and consumers make informed decisions

#### **Define Your Business Objective?**

The goal of this project is to analyze automobile data to uncover key factors influencing vehicle pricing, performance, and fuel efficiency. By identifying trends and relationships between features like engine size, horsepower, fuel type, and brand, the objective is to help car manufacturers and dealers make smarter product, pricing, and marketing decisions — ultimately boosting customer satisfaction and business profitability.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


5. You have to create at least 20 logical & meaningful charts having important insights.

[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]







# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
df=pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Datasets/automobile_data.csv')

In [None]:
from google.colab import drive
drive.mount('/content/drive')

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
missing_values=df.isnull().sum()

In [None]:
# Visualizing the missing value
missing_values = df.isnull().sum()
missing_values = missing_values[missing_values > 0]

# Plot if there are missing values
if not df.isnull().sum().sum() == 0:
    missing_values.plot(kind='bar', color='tomato')
else:
    print("✅ No missing values to visualize!")


### What did you know about your dataset?

The dataset contains detailed information about various automobiles, including attributes like make, fuel type, body style, engine size, horsepower, mileage (city and highway), and price. It includes both numerical and categorical data, with some missing values and inconsistent entries that require cleaning. Key features such as engine-size, curb-weight, and horsepower show strong influence on price, while categorical variables like fuel-type and body-style help in understanding market preferences. Overall, the dataset provides a solid base for analyzing vehicle performance, pricing trends, and consumer behavior in the automobile industry.


## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

The dataset includes a mix of **numerical and categorical variables** that describe various aspects of automobiles. Key variables include make (manufacturer brand), fuel-type (gas or diesel), aspiration(standard or turbo), body-style (like sedan, hatchback), and drive-wheels. Performance-related features like engine-size, horsepower, curb-weight, and number-of-cylinders help assess power and efficiency. It also contains mileage data such as city-mpg and highway-mpg, and finally, price, which serves as the target variable for most analyses. Some columns may contain missing values or inconsistent formatting that need to be cleaned before analysis.



### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
df.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

# Replace '?' with NaN
df.replace('?', pd.NA, inplace=True)

# Convert important columns to numeric
for col in ['price', 'horsepower', 'peak-rpm', 'bore', 'stroke']:
    df[col] = pd.to_numeric(df[col], errors='coerce')

# Drop rows where price is missing (target variable)
df.dropna(subset=['price'], inplace=True)

# Fill missing numeric values with mean
df.fillna(df.mean(numeric_only=True), inplace=True)

# Fill missing categorical values with mode
df.fillna(df.mode().iloc[0], inplace=True)

# Reset index
df.reset_index(drop=True, inplace=True)


### What all manipulations have you done and insights you found?

(A)Manipulations Done:

Replaced '?' with NaN
To standardize missing values and make them easier to handle.

Converted columns to numeric:
Columns like price, horsepower, bore, etc., were converted to numeric using pd.to_numeric() for accurate calculations.

Dropped rows with missing target (price)
Since we can’t use rows without the target value, we removed them.

Filled missing numeric values with mean
To retain rows and reduce bias without skewing the data drastically.

Filled missing categorical values with mode
Replaced missing categories with the most frequent value in each column.

Reset index
Cleaned up the index after row deletions for smoother processing and readability.

(B) Insights Found:

Engine size, horsepower, and curb weight have a strong positive correlation with price, meaning more powerful cars are more expensive.

Diesel cars are fewer in number but may offer better fuel economy.

Luxury brands like BMW, Audi, and Mercedes consistently fall in the higher price range.

Hatchbacks and sedans dominate the dataset, suggesting they are the most popular or affordable types.

Manual data cleaning was crucial due to missing and inconsistent entries, especially in numeric fields stored as strings.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
# Top 10 car makes by average price
avg_price_by_make = df.groupby('make')['price'].mean().sort_values(ascending=False).head(10)

avg_price_by_make.plot(kind='bar', color='coral')
plt.title('Top 10 Car Brands by Average Price')
plt.xlabel('Car Make')
plt.ylabel('Average Price')
plt.xticks(rotation=45)
plt.show()


##### 1. Why did you pick the specific chart?

I picked this bar chart because I wanted to compare the average car price across different manufacturers. It helps show which brands sell premium vehicles and which ones are more affordable. This kind of comparison is useful for understanding how each company is positioned in the market. It's a simple but powerful way to spot trends and make business decisions based on brand value.



##### 2. What is/are the insight(s) found from the chart?

I chose this bar chart because it clearly shows the top 10 car brands with the highest average prices. It helps compare manufacturers and understand which ones are positioned as premium or luxury brands in the market. This kind of visualization is simple, yet it gives meaningful business insights about pricing trends across companies

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights from the chart can definitely help create a positive business impact.
By identifying brands with the highest average prices, companies can better understand their market position and adjust their strategies — for example, luxury brands can justify premium pricing by focusing on performance and branding, while budget brands can compete through affordability and fuel efficiency. This helps in targeted marketing, product planning, and customer segmentation.

As for negative growth, one possible risk is overpricing by lower-tier brands trying to imitate luxury brands without offering similar quality or features. This mismatch can lead to poor sales and customer dissatisfaction. So, the key is to align price with value and customer expectations.



#### Chart - 2

In [None]:
# Chart - 2 visualization code
avg_price_by_fuel = df.groupby('fuel-type')['price'].mean()

avg_price_by_fuel.plot(kind='bar', color='seagreen')
plt.title('Average Car Price by Fuel Type')
plt.xlabel('Fuel Type')
plt.ylabel('Average Price')
plt.show()


##### 1. Why did you pick the specific chart?

I picked this chart because it shows the average price based on fuel type, which is an important factor in a customer's buying decision. It helps compare whether diesel or petrol vehicles tend to cost more, and gives insight into how fuel type influences a car’s market value. This kind of chart is easy to interpret and useful for both business strategy and consumer awareness

##### 2. What is/are the insight(s) found from the chart?

The chart shows that diesel cars have a higher average price compared to petrol cars. This suggests that diesel vehicles are usually positioned in the premium segment, likely because they offer better mileage and are often found in larger or performance-oriented models

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights can definitely help create a positive business impact.
By identifying that diesel cars generally have a higher average price, companies can use this data to better position their diesel variants in the premium market. It helps in strategic pricing, feature bundling, and targeting customers who prioritize fuel efficiency and performance.

As for negative growth, if a brand overprices petrol variants or underestimates the demand for budget-friendly options, it could lose market share. Misreading fuel preferences or market trends might lead to reduced customer engagement and poor sales.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
plt.hist(df['price'], bins=20, color='skyblue', edgecolor='black')
plt.title('Distribution of Car Prices')
plt.xlabel('Price')
plt.ylabel('Number of Cars')
plt.show()


##### 1. Why did you pick the specific chart?

I picked this histogram because it clearly shows how car prices are distributed across the dataset. It helps identify whether most cars are in the low, mid, or high price range. This is important to understand market segments, detect pricing outliers, and guide manufacturers on where most customer demand lies.

##### 2. What is/are the insight(s) found from the chart?

The histogram shows that most cars in the dataset fall within a lower price range, with fewer high-priced cars. This suggests that the market is dominated by affordable vehicles, while luxury cars make up a smaller portion. It highlights the popularity of budget and mid-range models among consumers.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights from the price distribution chart can support a positive business impact.
Understanding that most cars fall into the lower to mid-price range helps manufacturers focus on high-demand, budget-friendly segments. It also allows them to competitively price new models and invest more in popular configurations, boosting sales and customer satisfaction.

As for negative growth, if companies ignore this trend and overproduce high-priced or luxury models, they risk unsold inventory and a limited customer base. Misaligning pricing strategy with consumer demand can directly hurt revenue and brand trust.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
# Grouping by both drive-wheels and body-style, then averaging price
grouped = df.groupby(['drive-wheels', 'body-style'])['price'].mean().unstack()

# Plotting grouped bar chart
grouped.plot(kind='bar', figsize=(10,6), colormap='viridis')
plt.title('Average Price by Drive-Wheel Type and Body Style')
plt.xlabel('Drive Wheel Type')
plt.ylabel('Average Price')
plt.legend(title='Body Style')
plt.xticks(rotation=0)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

I picked this grouped bar chart because it helps analyze two important factors — drive-wheel type and body style — and how they together impact the average car price. This multi-variable comparison gives a deeper understanding of which combinations are considered more valuable in the market. It’s more insightful than a simple single-variable chart, especially for making design or pricing decisions.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that rear-wheel drive (RWD) sedans have the highest average price, while front-wheel drive (FWD) hatchbacks are among the lowest. This suggests that RWD is often used in premium models like sports or luxury cars, whereas FWD is more common in affordable, compact vehicles

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights can help create a positive business impact.
Understanding that rear-wheel drive sedans tend to be priced higher allows manufacturers to focus their premium features and marketing on these combinations, attracting high-value customers. On the other hand, knowing that front-wheel drive hatchbacks are more budget-friendly helps companies position them for price-sensitive markets, increasing reach and sales volume.

However, if a brand tries to overprice low-demand combinations — like a high-priced FWD hatchback without premium features — it could lead to poor sales and customer dissatisfaction, resulting in negative growth. Aligning product design with customer expectations is crucial to avoid this.



#### Chart - 5

In [None]:
# Chart - 5 visualization code
avg_price_by_doors = df.groupby('num-of-doors')['price'].mean()

avg_price_by_doors.plot(kind='bar', color='orchid')
plt.title('Average Price by Number of Doors')
plt.xlabel('Number of Doors')
plt.ylabel('Average Price')
plt.show()


##### 1. Why did you pick the specific chart?

I picked this chart because the number of doors is a simple design feature that can affect both the car's functionality and price. By comparing the average price based on door count, I wanted to see whether customers tend to pay more for 2-door sportier cars or 4-door practical models. It's a straightforward chart that gives useful design and pricing insights.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that, on average, 4-door cars are priced slightly higher than 2-door cars. This suggests that practicality and passenger capacity tend to be valued more by consumers, especially for family or daily-use vehicles.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights can help create a positive business impact.
By observing that 4-door cars generally have a higher average price, manufacturers can focus on producing more practical, family-friendly vehicles for mass-market appeal. This can guide design, pricing, and marketing strategies that match consumer preferences, boosting sales and customer satisfaction.

However, there’s also a risk of negative growth if companies ignore niche segments. For example, underestimating the demand for 2-door cars — which appeal to a younger, sportier audience — could lead to lost opportunities. So, balancing both practicality and style is key to staying competitive.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
avg_price_by_body = df.groupby('body-style')['price'].mean().sort_values(ascending=False)

avg_price_by_body.plot(kind='bar', color='skyblue')
plt.title('Average Price by Body Style')
plt.xlabel('Body Style')
plt.ylabel('Average Price')
plt.xticks(rotation=45)
plt.show()


##### 1. Why did you pick the specific chart?

I picked this chart because body style is a key factor that influences car pricing. By visualizing the average price for each body type, we can clearly see which designs are positioned as premium and which ones are more budget-friendly. It helps in understanding customer preferences and guides manufacturers in product planning and pricing strategy.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that convertible and hardtop cars have the highest average prices, indicating they target a luxury or sporty market segment. In contrast, hatchbacks and sedans are priced lower, making them more suitable for budget-conscious or family-oriented buyers.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights can create a positive business impact.
By identifying that body styles like convertibles and hardtops are priced higher, manufacturers can strategically position these as premium models and target high-end customers through specialized marketing. Meanwhile, lower-priced segments like hatchbacks and sedans can be optimized for affordability and mass-market appeal, helping boost overall sales.

However, if a company misjudges demand and overproduces high-priced body styles in markets that prefer affordability, it could lead to unsold inventory and negative growth. So, aligning production with customer demand and market trends is essential to avoid financial setbacks.

#### Chart - 7

In [None]:
avg_price_by_cylinders = df.groupby('num-of-cylinders')['price'].mean().sort_values(ascending=False)

avg_price_by_cylinders.plot(kind='bar', color='steelblue')
plt.title('Average Price by Number of Cylinders')
plt.xlabel('Number of Cylinders')
plt.ylabel('Average Price')
plt.xticks(rotation=45)
plt.show()



##### 1. Why did you pick the specific chart?

I picked this chart because the number of cylinders in a car is directly linked to engine performance, fuel consumption, and pricing. By visualizing average price by cylinder count, I wanted to understand how engine power affects cost and market positioning. It's a simple yet powerful way to analyze the relationship between performance and price

##### 2. What is/are the insight(s) found from the chart?

The chart shows that cars with higher cylinder counts — like 6, 8, or 12 cylinders — tend to have significantly higher average prices compared to cars with 3 or 4 cylinders. This indicates that more powerful engines are associated with premium or luxury vehicles, while lower-cylinder engines are common in budget-friendly and fuel-efficient models.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights gained from the chart can create a positive business impact.
By identifying that higher-cylinder vehicles are priced significantly higher, manufacturers can position such models as premium or performance-oriented. This helps in crafting targeted marketing campaigns, pricing strategies, and customer segmentation. Companies can also decide which engine variants to push more in certain markets — for example, promoting 4-cylinder models in fuel-conscious regions and 6+ cylinder models in performance-focused segments.

However, there’s also a risk of negative growth if this insight is misapplied. Overproducing high-cylinder, expensive models in price-sensitive markets could lead to poor sales and inventory pile-up. Also, as the market shifts toward electric and fuel-efficient options, focusing too much on high-cylinder engines might backfire unless aligned with customer demand.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
drive_counts = df['drive-wheels'].value_counts()

drive_counts.plot(kind='pie', autopct='%1.1f%%', startangle=90, colors=['#ff9999','#66b3ff','#99ff99'])
plt.title('Drive-Wheel Type Distribution')
plt.ylabel('')  # Hides the y-axis label
plt.show()


##### 1. Why did you pick the specific chart?

I chose this specific insight — the distribution of drive-wheel types — because drivetrain configuration is a fundamental aspect of a car's performance, pricing, and market targeting. Understanding which type is most common helps identify consumer demand trends. For example, if FWD dominates the dataset, it tells us that affordability, fuel efficiency, and ease of handling are likely key market drivers. This insight is essential for manufacturers and marketers to align their product offerings with real-world consumer preferences."

##### 2. What is/are the insight(s) found from the chart?

The chart shows that front-wheel drive (FWD) vehicles dominate the dataset, followed by rear-wheel (RWD) and four-wheel drive (4WD) cars. This suggests that most cars in the market prioritize affordability, fuel efficiency, and ease of handling — traits associated with FWD systems. Meanwhile, RWD and 4WD are less common, likely reserved for performance or off-road-oriented vehicles

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Understanding that front-wheel drive (FWD) vehicles dominate the market helps businesses make informed decisions on product design, inventory planning, and marketing. Since FWD cars are typically more affordable and fuel-efficient, companies can focus their production and advertising toward these features to match consumer demand — increasing sales, reducing overstock, and improving customer satisfaction.
If a company overemphasizes FWD models and ignores demand for RWD or 4WD in certain markets — like luxury, performance, or off-road segments — it could lose customers in those niches. Misreading the data as “one-size-fits-all” can lead to missed opportunities and reduced brand appeal in specific categories.

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

# Select only numerical columns
numeric_df = df.select_dtypes(include=['float64', 'int64'])

# Compute correlation matrix
correlation_matrix = numeric_df.corr()

# Plot the heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f")
plt.title('Correlation Heatmap of Numerical Features')
plt.show()


##### 1. Why did you pick the specific chart?

I picked the correlation heatmap because it visually shows how strongly numerical variables are related to each other. It helps quickly identify which features have the highest impact on the target variable, like 'price'. For example, features like 'engine-size', 'curb-weight', and 'horsepower' might have a strong positive correlation with price, while 'city-mpg' could show a negative one. This chart helps in feature selection, understanding data relationships, and guiding business or modeling decisions

##### 2. What is/are the insight(s) found from the chart?

The heatmap shows that features like engine-size, curb-weight, and horsepower have a strong positive correlation with car price, meaning as these values increase, the price tends to increase too. On the other hand, features like city-mpg and highway-mpg have a negative correlation with price, indicating that more fuel-efficient cars are generally less expensive. This helps us understand which technical specs most influence car pricing

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
import seaborn as sns
import matplotlib.pyplot as plt

sns.pairplot(df[['price', 'engine-size', 'horsepower', 'city-mpg']].dropna())
plt.show()


##### 1. Why did you pick the specific chart?

I picked the pair plot because it allows me to visualize relationships between multiple numerical variables at once. Instead of plotting each pair separately, this chart gives a grid view where I can spot patterns, correlations, or outliers quickly. It’s especially useful in the early stages of data exploration to understand how features like price, engine size, horsepower, and mileage interact with each other.

##### 2. What is/are the insight(s) found from the chart?

The pair plot shows a strong positive relationship between engine size, horsepower, and price — meaning cars with larger engines and higher horsepower tend to be more expensive. On the other hand, city mileage (mpg) appears to be negatively related to price, indicating that more fuel-efficient cars are usually cheaper. These trends help us understand how technical specs influence pricing

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Focus on FWD Cars:
Front-wheel drive cars are the most common. The company should focus more on producing and selling these as they’re in high demand.

Promote Fuel-Efficient Models:
Cars with better mileage (higher mpg) are cheaper and appeal to budget-conscious buyers. Highlight this in marketing.

Target Premium Buyers with High-Performance Cars:
Cars with more cylinders, higher horsepower, and larger engines are priced higher. These should be sold as premium or performance models.

Use Tiered Pricing:
Offer base models at affordable prices and sell additional features as optional upgrades to increase profits.

Clean Data Regularly:
There were missing and incorrect values in the dataset. Keeping clean and accurate data helps in better analysis and business decisions.

Dynamic Pricing Strategy:
Since price is influenced by specs like horsepower and engine size, the company can create a flexible pricing system based on features.

Don’t Ignore Niche Segments:
Rear-wheel and four-wheel drive cars may be fewer, but they can be targeted to luxury or off-road customers.

# **Conclusion**

In this project, I analyzed an automobile dataset using Python libraries like Pandas, NumPy, Matplotlib, and Seaborn. After cleaning and preparing the data, I explored key relationships between features like engine size, horsepower, mileage, and price.

The analysis showed that front-wheel drive cars and 4-cylinder engines are the most common, while engine size, horsepower, and curb weight have a strong impact on price. Fuel-efficient cars tend to be cheaper, making them popular among budget-conscious buyers.

Through visualizations like bar plots, scatter plots, and heatmaps, I identified patterns that can help businesses make better decisions about product design, pricing, and marketing. This project highlights how data analysis can turn raw information into meaningful business insights.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***