# Car Price Exploration

In this notebook, we will explore the car price dataset to gain insights and understand the features that may influence car prices. We will visualize the data and perform some basic analysis.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set the style of seaborn
sns.set(style='whitegrid')

In [2]:
# Load the raw data
data_path = '../data/raw/cars_sample.csv'
cars_df = pd.read_csv(data_path)

# Display the first few rows of the dataset
cars_df.head()

In [3]:
# Check for missing values
missing_values = cars_df.isnull().sum()
missing_values[missing_values > 0]

In [4]:
# Summary statistics of the dataset
cars_df.describe()

In [5]:
# Visualize the distribution of car prices
plt.figure(figsize=(10, 6))
sns.histplot(cars_df['price'], bins=30, kde=True)
plt.title('Distribution of Car Prices')
plt.xlabel('Price')
plt.ylabel('Frequency')
plt.show()

In [6]:
# Correlation heatmap
plt.figure(figsize=(12, 8))
correlation = cars_df.corr()
sns.heatmap(correlation, annot=True, fmt='.2f', cmap='coolwarm', square=True)
plt.title('Correlation Heatmap')
plt.show()

## Conclusion

In this notebook, we explored the car price dataset, visualized the distribution of car prices, and examined the correlations between different features. This analysis will help us in feature selection and understanding the factors that influence car prices.