In [78]:
import streamlit as st
import pandas as pd
import plotly.express as px

Car Sales Dashboard

This project is a simple web application dashboard developed using Streamlit. It's dedicated to helping  find and purchase car. This project visualizes car advertisement data using Streamlit and Plotly. The dashboard includes a histogram of car prices and a scatter plot showing the relationship between car odometer readings and prices. 

In [79]:
df = pd.read_csv("C:/Users/luken/P4/vehicles_us.csv", sep=',')
df.head()

Unnamed: 0,price,model_year,model,condition,cylinders,fuel,odometer,transmission,type,paint_color,is_4wd,date_posted,days_listed
0,9400,2011.0,bmw x5,good,6.0,gas,145000.0,automatic,SUV,,1.0,2018-06-23,19
1,25500,,ford f-150,good,6.0,gas,88705.0,automatic,pickup,white,1.0,2018-10-19,50
2,5500,2013.0,hyundai sonata,like new,4.0,gas,110000.0,automatic,sedan,red,,2019-02-07,79
3,1500,2003.0,ford f-150,fair,8.0,gas,,automatic,pickup,,,2019-03-22,9
4,14900,2017.0,chrysler 200,excellent,4.0,gas,80903.0,automatic,sedan,black,,2019-04-02,28


In [80]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51525 entries, 0 to 51524
Data columns (total 13 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   price         51525 non-null  int64  
 1   model_year    47906 non-null  float64
 2   model         51525 non-null  object 
 3   condition     51525 non-null  object 
 4   cylinders     46265 non-null  float64
 5   fuel          51525 non-null  object 
 6   odometer      43633 non-null  float64
 7   transmission  51525 non-null  object 
 8   type          51525 non-null  object 
 9   paint_color   42258 non-null  object 
 10  is_4wd        25572 non-null  float64
 11  date_posted   51525 non-null  object 
 12  days_listed   51525 non-null  int64  
dtypes: float64(4), int64(2), object(7)
memory usage: 5.1+ MB


Based on the initial observations, the data frame has 51525 entries, with a total of 13 columns. The data types seem not appropriate for few columns like 'date_posted', so I'll handle issue with the data types. Non-Null Count is not equal to the number of entries for a few columns like 'paint_color', so it's seem there is mssing values in the data. 

In [81]:
# Handle Missing Values
df.isna().sum()

price               0
model_year       3619
model               0
condition           0
cylinders        5260
fuel                0
odometer         7892
transmission        0
type                0
paint_color      9267
is_4wd          25953
date_posted         0
days_listed         0
dtype: int64

We have 5 columns with missing value

In [82]:
#fill the missing values base on grouping, specific columns and calculate median for model_year, median for odemeter and cylinders
df['model_year'] = df.groupby('model')['model_year'].transform('median')
df['odometer'] = df.groupby('model')['odometer'].transform('median')
df['cylinders'] = df.groupby('model')['cylinders'].transform('median')

In [83]:
df.isna().sum()

price               0
model_year          0
model               0
condition           0
cylinders           0
fuel                0
odometer           41
transmission        0
type                0
paint_color      9267
is_4wd          25953
date_posted         0
days_listed         0
dtype: int64

In [84]:
# since we have missing values on odometer we can replace the missing values with 0 because we no need to drop the rows.
df['odometer'] = df['odometer'].fillna(0)

In [85]:
df.isna().sum()

price               0
model_year          0
model               0
condition           0
cylinders           0
fuel                0
odometer            0
transmission        0
type                0
paint_color      9267
is_4wd          25953
date_posted         0
days_listed         0
dtype: int64

In [86]:
# convert the year and odonometer to int
df['model_year'] = df['model_year'].astype(int)
df['odometer'] = df['odometer'].astype(int)
print(df.head())

   price  model_year           model  condition  cylinders fuel  odometer  \
0   9400        2010          bmw x5       good        6.0  gas    108500   
1  25500        2011      ford f-150       good        8.0  gas    121928   
2   5500        2012  hyundai sonata   like new        4.0  gas    105976   
3   1500        2011      ford f-150       fair        8.0  gas    121928   
4  14900        2014    chrysler 200  excellent        4.0  gas     85000   

  transmission    type paint_color  is_4wd date_posted  days_listed  
0    automatic     SUV         NaN     1.0  2018-06-23           19  
1    automatic  pickup       white     1.0  2018-10-19           50  
2    automatic   sedan         red     NaN  2019-02-07           79  
3    automatic  pickup         NaN     NaN  2019-03-22            9  
4    automatic   sedan       black     NaN  2019-04-02           28  


In [87]:
#Correct the date_posted to datetime
df['date_posted'] = pd.to_datetime(df['date_posted'])

In [88]:
print(df.dtypes)

price                    int64
model_year               int64
model                   object
condition               object
cylinders              float64
fuel                    object
odometer                 int64
transmission            object
type                    object
paint_color             object
is_4wd                 float64
date_posted     datetime64[ns]
days_listed              int64
dtype: object


In [89]:
#Remove Duplicates
print(df.duplicated().head())
print(df.duplicated().sum())

0    False
1    False
2    False
3    False
4    False
dtype: bool
0


In [90]:
fig = px.histogram(df, x='price', title='Vehicle Price Distribution')
fig.show()

Most vehicles have price less than $50k

In [91]:
fig = px.histogram(df, x='odometer', title='Odometer Reading Distribution')
fig.show()

About 4000 vehicles have ondometer reading 120k miles.

In [92]:
fig = px.scatter(df, x='odometer', y='price', color='condition',
                title='Price vs. Odometer by Vehicle Condition')
fig.show()

This fig show vehicle's condition, its mileage, and its price. Its normal to assume that vehicles with 0 miles should cost more than the others, but from the plot we see that vehicle have odometer 105252 mile and cost $375K