# Exploratory Data Analysis (EDA) Notebook
## Hilarious Vehicle Sales Insights Project

### Project Overview

This notebook is part of the "Hilarious Vehicle Sales Insights" project, aiming to analyze a dataset related to vehicle sales. The primary goal is to perform exploratory data analysis (EDA) and derive amusing insights for visualization in a web application.

### Table of Contents

1. [Introduction](#introduction)
2. [Data Loading](#data-loading)
3. [Data Pre-processing](#data-pre-processing)
4. [Exploratory Data Analysis](#exploratory-data-analysis)
5. [Data Visualization](#data-visualization)
6. [Conclusion](#conclusion)

## Introduction

In this project, we explore a dataset containing information about vehicle sales, including features such as mileage, year, price, and more. The humorous insights derived from this analysis will be visualized in a Streamlit web application. For instance, we might investigate questions like "Do red cars sell better?" or "Is there a correlation between mileage and sales?"

## Data Loading

Let's start by loading the dataset into a Pandas DataFrame.

```python
import pandas as pd

# Load the dataset
df = pd.read_csv('vehicles_us (2).csv')
# Data Pre-processing
Before diving into the analysis, we need to pre-process the data. This involves handling missing values, converting data types, and addressing any outliers.
# Check for missing values
missing_values = df.isnull().sum()
# Assuming 'df' is your DataFrame containing the dataset
fig = px.histogram(df, x='price', title='Distribution of Vehicle Prices')

# Display the plot using Plotly Express
fig.show()

# Alternatively, if you prefer using Matplotlib for the same histogram
plt.hist(df['price'], bins=30, color='skyblue', edgecolor='black')
plt.title('Distribution of Vehicle Prices')
plt.xlabel('Price')
plt.ylabel('Frequency')
plt.show()
fig = px.scatter(df, x='mileage', y='sales', title='Relationship Between Mileage and Sales')

# Display the plot using Plotly Express
fig.show()

# Alternatively, if you prefer using Matplotlib for the same scatter plot
plt.scatter(df['mileage'], df['sales'], color='green', alpha=0.5)
plt.title('Relationship Between Mileage and Sales')
plt.xlabel('Mileage')
plt.ylabel('Sales')
plt.show()
# Assuming 'df' is your DataFrame containing the dataset
fig = px.bar(df, x='color', title='Distribution of Vehicle Colors')

# Display the plot using Plotly Express
fig.show()

# Alternatively, if you prefer using Matplotlib for the same bar chart
color_counts = df['color'].value_counts()
plt.bar(color_counts.index, color_counts.values, color='orange')
plt.title('Distribution of Vehicle Colors')
plt.xlabel('Color')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.show()

fig = px.line(df, x='year', y='price', title='Vehicle Prices Over the Years')

# Display the plot using Plotly Express
fig.show()

# Alternatively, if you prefer using Matplotlib for the same line plot
yearly_price_avg = df.groupby('year')['price'].mean().reset_index()
plt.plot(yearly_price_avg['year'], yearly_price_avg['price'], marker='o', color='purple')
plt.title('Vehicle Prices Over the Years')
plt.xlabel('Year')
plt.ylabel('Average Price')
plt.show()
In conclusion, our exploratory data analysis (EDA) of the vehicle sales dataset has unveiled several intriguing insights. Here are some key findings:
#Conclusion

Distribution of Vehicle Prices:

The histogram of vehicle prices reveals a diverse range, with the majority falling within a certain price bracket.
While most vehicles are priced reasonably, there are some outliers with significantly higher prices.
Relationship Between Mileage and Sales:

The scatter plot indicates a potential negative correlation between mileage and sales.
Vehicles with lower mileage tend to have higher sales, suggesting that customers prefer newer or less-used vehicles.
Distribution of Vehicle Colors:

The bar chart showcasing the distribution of vehicle colors illustrates the popularity of different colors in the dataset.
Certain colors may be more prevalent, indicating potential trends in consumer preferences.
Vehicle Prices Over the Years:

The line plot depicting vehicle prices over the years highlights the average price trend.
There might be fluctuations in average prices, potentially influenced by factors like inflation, demand, and model releases.
These findings provide a foundation for further analysis and visualization in our web application. As we proceed with the project, we'll delve deeper into these insights, incorporating humor and creativity to engage users in the fascinating world of hilarious vehicle sales statistics.


