<h1 style='text-align: center; front-size: 50px;'>Car Advertisement Price Analysis</h1>

## Itroduction:

In this project, we will work with data from Car Advertisement. Our mission is to clean up the data and prepare a report that gives insight into the pricing based on many factors such as the mileage and the condition. The dataset is stored in a downloadable file. During our data preprocessing we will: Display the dataset following the standardized format, Verify and fix data types, Identify and fill in missing values, Identify and remove duplicate values, Create plots that communicate clear and concise understanding of the data.

In [None]:
# Importing all necessary Libraries:
import pandas as pd
import numpy as np
import math
import seaborn as sns
import matplotlib.pyplot as plt
import scipy.stats as stats

In [None]:
# Loading the data file:
vehicles_df = pd.read_csv('C:/Users/youss/OneDrive/Desktop/vehicles_us.csv')
vehicles_df.sample(n=10)

## Fixing Data:

In [None]:
vehicles_df.info()

In [None]:
# Checking for duplicate values:
vehicles_df.duplicated().sum()

In [None]:
# Checking for missing values:
vehicles_df.isnull().sum()

In [None]:
# Filling the missing values with 0.0:
vehicles_df['is_4wd'] = vehicles_df['is_4wd'].fillna(0.0)

In [None]:
# Converting the column 'is_4wd' into str:
vehicles_df['is_4wd'] = vehicles_df['is_4wd'].astype(str) 

In [None]:
# Replacing values:
vehicles_df['is_4wd'] = vehicles_df['is_4wd'].replace({'1.0': 'Yes', '0.0': 'No'})
vehicles_df.sample(n=10)

In [None]:
# Finding the mode of 'paint_color':
mode_color = vehicles_df['paint_color'].mode()
mode_color


In [None]:
# Filling the missing Values with the mode:
vehicles_df['paint_color'] = vehicles_df['paint_color'].fillna(vehicles_df['paint_color'].mode()[0])
vehicles_df.sample(n=10)

In [None]:
# Finding the mode of 'model_year':
mode_model_year = vehicles_df['model_year'].mode()
mode_model_year

In [None]:
# Filling the missing Values with the mode:
vehicles_df['model_year'] = vehicles_df['model_year'].fillna(vehicles_df['model_year'].mode()[0])

In [None]:
# Converting 'model_year' type into int:
vehicles_df['model_year'] = vehicles_df['model_year'].astype(int)

In [None]:
# Finding the mode of 'cylinders':
mode_cylinders = vehicles_df['cylinders'].mode()
mode_cylinders

In [None]:
# Filling the missing Values with the mode:
vehicles_df['cylinders'] = vehicles_df['cylinders'].fillna(vehicles_df['cylinders'].mode()[0])

In [None]:
# Calculating the mean 'odometer':
odometer_mean = vehicles_df['odometer'].mean(skipna=True)
odometer_mean

In [None]:
# Filling the missing Values with the mean:
vehicles_df['odometer'] = vehicles_df['odometer'].fillna(odometer_mean)

In [None]:
# Converting 'odometer' type into int:
vehicles_df['odometer'] = vehicles_df['odometer'].astype(int)
vehicles_df.sample(n=10)

In [None]:
# Converting 'date_posted' type into datetime:
vehicles_df['date_posted'] = pd.to_datetime(vehicles_df['date_posted'])
vehicles_df.head()


In [None]:
# Checking again for missing values:
vehicles_df.isnull().sum()

In [None]:
vehicles_df.info()

## Price VS Odometer:

Analyzing how the price of a car decreases with increased mileage.

In [None]:
# how the price is effected:
price_model_odo = vehicles_df[['price', 'model_year', 'odometer']]
price_model_odo.head()

In [None]:
# Scatterplot showing how the price is affected by the mileage:
plt.figure(figsize=(10, 6))
sns.scatterplot(data=price_model_odo, x='odometer', y='price', hue='model_year', palette='viridis')
plt.title('Price vs Odometer')
plt.xlabel('Odometer in (Miles)')
plt.ylabel('Price in ($)')
plt.show()

Looking at the Scatterplot above, Cars with lower odometer generally have higher prices, and cars with higher odometer tend to have lower prices.

## Price Distribution by Condition:

See how conditions affects the price.

In [None]:
price_condition = vehicles_df[['price', 'condition']]
price_condition.head()

In [None]:
# Barplot showing how the price is affected by the condion:
plt.figure(figsize=(10, 6))
sns.barplot(data=price_condition, x='condition', y='price', errorbar=None)
plt.title('Price Distribution by Car Condition')
plt.xlabel('Car Condition')
plt.ylabel('Price in USD')
plt.show()

Based on the plot above, Cars in 'New' condition have the highest price, followed by 'Like New' and 'Excellent' condition vehicles, indicating a great value for money deals. Cars in 'Salvage' and 'Fair' condition have a significantly lower prices, reflecting lower demand.

## Days listed Vs. Condition Vs. Mileage:

Exploring how Mileage and Condition impact The Duration cars are listed.

In [None]:
sampled_data = vehicles_df.sample(n=1000)

In [None]:
days_on_the_market = sampled_data[['condition', 'odometer', 'days_listed']]
days_on_the_market.head()                               

In [None]:
# Create a figure with 2 subplots:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))

# First subplot for Days Lister Vs. Condition:
sns.histplot(data=days_on_the_market, x='days_listed', hue='condition', bins=20, ax=ax1, palette='muted')
ax1.set_title('Days Listed Distribution by Condition')
ax1.set_xlabel('Days Listed')
ax1.set_ylabel('Number of Cars')

# Second subplot for Days Listed Vs. Odometer:
sns.histplot(data=days_on_the_market, x='days_listed', hue='odometer', bins=20, ax=ax2, palette='pastel', legend=False)
ax2.set_title('Days Listed Distribution by Mileage')
ax2.set_xlabel('Days Listed')
ax2.set_ylabel('Number of Cars')

# Adjust layout to prevent overlap:
plt.tight_layout()
plt.show()

From the Histograms ploted above, we can notice that:

- Cars with low mileage (<100k) and in better condition ('Good', 'Excellent', or 'Like New'), sell faster, typically whithin 25-50 days.

- Cars with high mileage (>150k) and in poor condition ('Fair' or 'Salvage') remain listed longer due to lower demand. 

## General Conclution:

The used car market is strongly influenced by condition, mileage and age. Cars in better condition ('Good', 'Excellent', or 'Like New') and with lower mileage (<100K) tend to sell faster, typically whithin 25-50 days and command higher prices due to strong demand.

In contrast, high mileage cars (>150K) and those in poor condition ('Fair' or 'Salvage') remain listed longer and are priced lower, reflecting reduced buyer interest.