# **Project Name**    - AUTOMOBILE ANALYSIS



##### **Project Type**    - EDA
##### **Contribution**    - Individual(Raushan)
##### **Team Member 1 -** SELF


# **Project Summary -**

This project aims to conduct a comprehensive analysis of automobile data to derive actionable insights regarding various attributes such as symboling, make, price, engine specifications, and additional features. The goal is to inform decision-making for manufacturers, marketers, and consumers by understanding trends and relationships within the dataset.

# **Objectives:**
Data Exploration: Examine the dataset to understand its structure, key features, and underlying patterns.

Correlation Analysis: Investigate relationships between symboling values, price, engine performance, and other variables.

Customer Segmentation: Identify distinct customer segments based on preferences related to make, price, and vehicle performance.

Predictive Modeling: Develop predictive models to estimate vehicle prices based on features like engine size, symboling, and make.

Market Insights: Provide insights that can inform marketing strategies and product development.

# **Data Sources:**

The primary dataset used for this analysis includes attributes such as:
Symboling: A numerical classification indicating risk level associated with the vehicle.

Make: Manufacturer of the vehicle.

Price: Market price of the vehicle.

Engine: Specifications such as engine size and type.

Additional Features: Fuel type, horsepower, weight, MPG, and safety ratings.

# **Methodology:**
Data Cleaning: Handle missing values, remove duplicates, and format data for analysis.

Descriptive Statistics: Use statistical methods to summarize and describe the dataset.

Visualization: Create visual representations (e.g., histograms, scatter plots, heatmaps) to explore relationships and trends within the data.

Correlation Analysis: Calculate correlation coefficients to identify significant relationships between variables, particularly between price, symboling, and engine specifications.

Clustering: Apply clustering algorithms (e.g., K-means) to identify customer segments based on purchasing behavior and preferences.

Predictive Analysis: Utilize regression models to predict vehicle prices based on input features, assessing model performance using metrics such as R-squared and RMSE.

# **Key Findings:**
Symboling Impact: Vehicles with lower symboling values generally have higher market prices, indicating a preference for safer, less risky vehicles.
Make Preferences: Certain makes are consistently associated with higher resale values and customer satisfaction.

Engine Performance Correlation: Engine size and horsepower show a strong positive correlation with vehicle price, while fuel type affects customer preferences.

Segmentation Insights: Distinct customer segments were identified, with preferences for different vehicle types based on price sensitivity and performance needs.

# **Recommendations:**

Targeted Marketing: Tailor marketing strategies to highlight the safety features of vehicles with lower symboling ratings.

Pricing Strategies: Implement dynamic pricing models that reflect engine performance and market demand.

Product Development: Focus on enhancing features that appeal to high-value customer segments identified through clustering.

Sustainability Initiatives: Consider the growing demand for eco-friendly vehicles and incorporate sustainable practices in new model designs.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


I got not so mny problems to solve this project and hope I have done my best to do as I can . Sometime some code were not appropriate at the right place. But I have done this .

#### **Define Your Business Objective?**

When working with an automobile dataset, various business objectives can be defined to leverage the data effectively. Here are some key objectives that can guide analysis and strategic initiatives:

# **Market Analysis and Segmentation:**

Objective: Identify distinct market segments based on consumer preferences, demographics, and purchasing behaviors.

Outcome: Develop targeted marketing strategies tailored to specific customer groups, enhancing engagement and conversion rates.

# **Pricing Strategy Optimization:**

Objective: Analyze the relationship between vehicle features (such as make, model, engine size, and symboling) and their market prices.

Outcome: Implement dynamic pricing models that reflect market demand and competition, maximizing revenue and market share.

# **Product Development and Innovation:**

Objective: Utilize insights from consumer preferences and market trends to inform the design and development of new vehicle models.

Outcome: Create vehicles that align with customer expectations for performance, safety, and sustainability, increasing competitiveness in the market.

# **Sales Forecasting:**

Objective: Develop predictive models to estimate future sales based on historical data and market trends.

Outcome: Improve inventory management and production planning, reducing costs and minimizing stockouts or excess inventory.

# **Customer Satisfaction and Retention:**

Objective: Analyze customer feedback and preferences related to vehicle features and performance to enhance the customer experience.

Outcome: Increase customer loyalty and retention by addressing specific needs and improving service offerings.

# **Risk Assessment and Management:**

Objective: Evaluate the symboling ratings and their correlation with vehicle safety and reliability to inform risk management strategies.

Outcome: Develop targeted insurance products or partnerships based on risk profiles, potentially leading to lower premiums for safer vehicles.

# **Sustainability Initiatives:**

Objective: Identify trends towards eco-friendly vehicles and assess consumer willingness to pay for sustainable features.

Outcome: Incorporate sustainable practices into manufacturing and marketing strategies, catering to environmentally conscious consumers.

# **Competitive Analysis:**

Objective: Benchmark performance against competitors based on various metrics like pricing, features, and customer satisfaction.

Outcome: Identify areas for improvement and opportunities for differentiation in a competitive landscape.

# **Supply Chain Optimization:**

Objective: Analyze the impact of features like engine size and type on manufacturing and supply chain costs.

Outcome: Streamline operations and reduce costs through better sourcing and inventory management.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt


### Dataset Loading

In [None]:
df = pd.read_csv('/content/drive/MyDrive/automobile_data.csv')

In [None]:
from google.colab import drive
drive.mount('/content/drive')

### Dataset First View

In [None]:
df.head()

### Dataset Rows & Columns count

In [None]:
df.shape

### Dataset Information

In [None]:
df.info()

#### Duplicate Values

In [None]:
len(df[df.duplicated ()])

#### Missing Values/Null Values

In [None]:
print(df.isnull().sum())

In [None]:
sns.heatmap(df.isnull(), cbar=False)

### What did you know about your dataset?

The above dataset is given from Automobile industry. In this dataset I have to analysis risk factor of cars associated with its price and relative average loss payment per insured vehicle year.
All the key features of cars are mentioned using this, I will driven out the safe and risky cars data.

The above data set has 205 rows and 26 columns.
There is no missing value/null value and duplicate value.

## ***2. Understanding Your Variables***

In [None]:
df.columns

In [None]:
df.describe(include='all')

### Variables Description

* symboling: A numerical code that represents the car's risk level in terms of
  insurance, typically ranging from -3 (low risk) to +3 (high risk).

* normalized-losses: A measure of the average loss of a car model compared to
  others, adjusted for various factors, indicating the overall reliability and cost-effectiveness.

* make: The manufacturer or brand of the vehicle (e.g., Toyota, Ford).

* fuel-type: The type of fuel the vehicle uses, such as gasoline, diesel, or electric.

* aspiration: The method used to force air into the engine, usually categorized as naturally aspirated or turbocharged.

* num-of-doors: The number of doors on the vehicle, indicating its accessibility (e.g., 2, 4).

* body-style: The design of the vehicle's body, such as sedan, hatchback, SUV, or convertible.

* drive-wheels: The type of wheel drive system (e.g., front-wheel drive, rear-wheel drive, all-wheel drive) that affects handling and traction.

* engine-location: The position of the engine in the vehicle, typically categorized as front or rear.

* wheel-base: The distance between the front and rear axles, influencing the vehicle's stability and handling.

* length: The total length of the vehicle, often affecting parking and maneuverability.

* width: The total width of the vehicle, impacting cabin space and handling characteristics.

* height: The total height of the vehicle, which can affect aerodynamics and headroom.

* curb-weight: The weight of the vehicle without passengers or cargo, important for assessing performance and fuel efficiency.

* engine-type: The configuration of the engine, such as inline, V-type, or rotary.

* num-of-cylinders: The number of cylinders in the engine, influencing power output and efficiency.

* engine-size: The total displacement of the engine, typically measured in liters or cubic centimeters, affecting power and efficiency.

* fuel-system: The method by which fuel is delivered to the engine, such as carbureted or fuel-injected.

* bore: The diameter of the cylinders in the engine, which can affect performance.

* stroke: The distance the piston travels within the cylinder, also influencing engine performance.

* compression-ratio: The ratio of the cylinder's maximum volume to its minimum volume, affecting engine efficiency and power.

* horsepower: A measure of the engine's power output, impacting acceleration and performance.

* peak-rpm: The maximum engine speed (in revolutions per minute) at which the engine delivers its peak power.

* city-mpg: The fuel efficiency of the vehicle measured in miles per gallon during city driving conditions.

* highway-mpg: The fuel efficiency of the vehicle measured in miles per gallon during highway driving conditions.

* price: The monetary cost of the vehicle, which can vary based on features, brand, and market demand.

### Check Unique Values for each variable.

In [None]:
for i in df.columns.tolist():
  print("No. of unique value in",i,"is",df[i].unique(),".")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
df.head()

In [None]:
df['symboling'].value_counts()

In [None]:
df['make'].value_counts()

In [None]:
df.sort_values(['symboling','make'],ascending= [True,True])[['symboling','make']]

In [None]:
more_symboling_cars = df[df['symboling'] > 0]
print(more_symboling_cars)

In [None]:
more_symboling_cars = df[df['symboling'] < 0]
print(more_symboling_cars)

In [None]:
df.head()

In [None]:
df.sort_values(['normalized-losses','make'],ascending= [True,True])[['normalized-losses','make']]

In [None]:
df.sort_values(['price','make'],ascending= [True,True])[['price','make']]

### What all manipulations have you done and insights you found?

Answer Here. In data wrangling I have find out the relation betweeen Make and symboling, Make and normalized losses, Make and price .

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

In [None]:
symboling_counts = df['symboling'].value_counts().sort_index()

plt.figure(figsize=(8, 5))
symboling_counts.plot(kind='bar', color='skyblue')
plt.title('Number of Cars by Symboling Value')
plt.xlabel('Symboling')
plt.ylabel('Number of Cars')
plt.xticks(rotation=0)
plt.grid(axis='y')

##### 1. Why did you pick the specific chart?

Answer Here. This bar plot shows the unique chart and clear than other.

##### 2. What is/are the insight(s) found from the chart?

Answer Here- It shows that in this data set there are very few car which safier.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here. Yes as per data it shows the negative growth. Because very few car symboling are less than 0

#### Chart - 2

In [None]:
# Chart - 2 visualization code

In [None]:
symboling_counts = df['symboling'].value_counts().sort_index()
plt.figure(figsize=(8, 5))
symboling_counts.plot(kind='pie', color='skyblue')
plt.title('Number of Cars by Symboling Value')
plt.xlabel('Symboling')
plt.ylabel('Number of Cars')
plt.xticks(rotation=0)
plt.grid(axis='y')

##### 1. Why did you pick the specific chart?

Answer Here. Every chart has specific sign to show the relation between data frame. In this cart every data has specific color.so, it looks interesting.

##### 2. What is/are the insight(s) found from the chart?

Answer Here- It shows the percentage of 0 symboling car is more than other .

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here- By analysing the chart need to do some change in car so the percentage of negative symboling will increase.

#### Chart - 3

In [None]:
# Chart - 3 visualization code

In [None]:
symboling_counts = df['symboling'].value_counts().sort_index()

plt.figure(figsize=(8, 5))
symboling_counts.plot(kind='line', color='green')
plt.title('Number of Cars by Symboling Value')
plt.xlabel('Symboling')
plt.ylabel('Number of Cars')
plt.xticks(rotation=0)
plt.grid(axis='y')

##### 1. Why did you pick the specific chart?

Answer Here. In this line chart it has specific sign. looks like business model chart so I choosed this.

##### 2. What is/are the insight(s) found from the chart?

Answer Here- By ananlysing this chart line is starting from symboling -2 to +3. It shows -2 has less symboling and 0 has higher symboling.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here- Need to do some changes and update in the cars.

#### Chart - 5

In [None]:
# Chart - 5 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

In [None]:
df = pd.read_csv('/content/drive/MyDrive/automobile_data.csv')  # Update with your dataset's file path

# Display the first few rows of the dataset to understand its structure
print(df.head())

# Calculate the correlation matrix
correlation_matrix = df.corr()

# Set up the matplotlib figure
plt.figure(figsize=(10, 8))

# Create a heatmap
sns.heatmap(correlation_matrix, annot=True, fmt=".2f", cmap='coolwarm', square=True, cbar=True)

# Title and labels
plt.title('Correlation Heatmap for Car symboling')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
sns.pairplot(df)

##### 1. Why did you pick the specific chart?

Answer Here. In this chart all the data has shown in one figure with using many data , and it looks very attractive . Looks like one step to show all data visualtizaton.

##### 2. What is/are the insight(s) found from the chart?

Answer Here - Every data has shown in specific figure with specific data.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

When advising a client on achieving business objectives related to "symboling" (which often refers to a numerical classification system used in the automotive industry, usually indicating the level of risk associated with a vehicle), consider the following strategies:

1. Data Analysis and Insights
Customer Segmentation: Analyze the dataset to segment customers based on their preferences for vehicles with different symboling values. This can help tailor marketing strategies.
Trend Analysis: Use historical symboling data to identify trends in customer behavior and vehicle preferences, which can inform inventory and sales strategies.
2. Product Development
Feature Enhancement: Develop or improve vehicle features based on the symboling classification that resonates most with target customers. For example, if lower symboling values (indicating less risk) are preferred, focus on safety features.
New Model Launches: Use insights from symboling data to determine which types of vehicles (e.g., SUVs, sedans) are associated with lower risks and higher customer satisfaction for future model launches.
3. Marketing Strategies
Targeted Campaigns: Create marketing campaigns that emphasize the safety and reliability of vehicles with favorable symboling ratings.
Educational Content: Develop educational content explaining symboling ratings and their significance, helping customers make informed decisions.
4. Pricing Strategy
Dynamic Pricing: Implement a dynamic pricing strategy where vehicles are priced based on their symboling ratings. This could attract risk-averse customers who prioritize safety.
Incentives for Higher Symboling: Offer incentives for customers to purchase vehicles with higher symboling values, positioning them as premium options.
5. Partnerships and Collaborations
Insurance Partnerships: Collaborate with insurance companies to offer discounts for vehicles with favorable symboling ratings, making them more attractive to customers.
Safety Ratings Collaborations: Partner with safety organizations to enhance vehicle ratings and communicate these improvements to potential buyers.
6. Customer Feedback Loop
Surveys and Feedback: Implement regular surveys to gather customer feedback on vehicle performance and their perception of symboling values. Use this data to refine offerings.
Engagement Strategies: Engage customers post-purchase to understand their experiences with the vehicle's symboling attributes, which can guide future developments.
7. Sustainability and Innovation
Sustainable Models: Explore the development of eco-friendly vehicles with favorable symboling ratings, appealing to environmentally conscious consumers.
Technology Integration: Incorporate advanced technology in vehicles that enhance safety, thus improving symboling ratings and customer appeal.






# **Conclusion**

The analysis of automobile data has yielded significant insights into the relationships between various attributes such as symboling, make, price, and engine performance. By examining these variables, I have established a clearer understanding of market dynamics and consumer preferences within the automotive industry. I really enjoyed and got so many information by doing this ananlysis.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***