## Vehicle Breakdown Visuals

Here we will be creating visuals on most common searches for people looking to buy vehicles. Optimizing the search for vehicles in inventory, based on model, odometer, cylinders, and transmission. Making the best choice of purchase easiest to find.

In [2]:
import pandas as pd
import plotly.express as px
import streamlit as st
df = pd.read_csv('vehicles_us.csv')

In [3]:
display(df.sample(10))
print(df.info())

Unnamed: 0,price,model_year,model,condition,cylinders,fuel,odometer,transmission,type,paint_color,is_4wd,date_posted,days_listed
4625,9990,2005.0,ford f-250 sd,good,8.0,gas,,automatic,truck,white,,2019-02-02,59
33013,4995,2009.0,chevrolet impala,excellent,6.0,gas,82000.0,automatic,sedan,grey,,2018-05-18,27
11525,1,2018.0,chevrolet colorado,excellent,6.0,gas,8032.0,automatic,truck,white,1.0,2018-12-26,37
12289,5500,2001.0,chevrolet silverado 1500,good,8.0,gas,238465.0,automatic,pickup,brown,1.0,2019-01-31,62
34201,6000,2006.0,ford mustang,good,6.0,gas,103364.0,automatic,coupe,,,2018-10-19,46
6033,4000,2011.0,ford fusion,excellent,6.0,gas,130000.0,automatic,sedan,white,,2018-11-12,24
28245,3695,2006.0,dodge grand caravan,good,6.0,gas,,automatic,mini-van,red,,2018-07-20,15
4394,17963,2012.0,jeep grand cherokee,excellent,6.0,gas,73375.0,automatic,SUV,black,1.0,2019-03-04,67
270,34900,2017.0,ford f-250 sd,excellent,8.0,gas,,automatic,truck,white,1.0,2018-12-06,56
27505,6995,2011.0,gmc acadia,good,6.0,gas,160560.0,automatic,SUV,grey,,2018-09-04,31


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51525 entries, 0 to 51524
Data columns (total 13 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   price         51525 non-null  int64  
 1   model_year    47906 non-null  float64
 2   model         51525 non-null  object 
 3   condition     51525 non-null  object 
 4   cylinders     46265 non-null  float64
 5   fuel          51525 non-null  object 
 6   odometer      43633 non-null  float64
 7   transmission  51525 non-null  object 
 8   type          51525 non-null  object 
 9   paint_color   42258 non-null  object 
 10  is_4wd        25572 non-null  float64
 11  date_posted   51525 non-null  object 
 12  days_listed   51525 non-null  int64  
dtypes: float64(4), int64(2), object(7)
memory usage: 5.1+ MB
None


Notes:
Data loaded succesfully. Check for errors

In [4]:
df.duplicated().sum()

0

Notes:
no duplications, now we begin the visualization process

In [13]:
# Group by 'model' and sum the 'odometer' values
model_miles = df.groupby('model')['odometer'].sum().reset_index()

# Streamlit app
st.header('Total Miles by Model')

# Plot the data
fig, ax = plt.subplots(figsize=(21, 7))
ax.bar(model_miles['model'], model_miles['odometer'], color='skyblue')
ax.set_xlabel('Model')
ax.set_ylabel('Total Miles')
ax.set_title('Total Miles by Model')
plt.xticks(rotation=90)
plt.tight_layout()

# Display the plot in Streamlit
st.pyplot(fig)


2024-08-21 11:35:56.282 
  command:

    streamlit run c:\Users\iking\anaconda3\Lib\site-packages\ipykernel_launcher.py [ARGUMENTS]


DeltaGenerator()

Notes:
We now have a chart that shows the difference in each model based on the totals miles from the odometer. We can go further into depth by taking the top Brands and seeing if theres a differnce based on the number of cylinders.

In [9]:
# Streamlit app
st.title('Miles by Number of Cylinders')

# Create a scatter plot using Plotly Express
fig = px.scatter(df, x='cylinders', y='odometer', 
                 title='Miles by Number of Cylinders',
                 labels={'cylinders': 'Number of Cylinders', 'odometer': 'Miles'},
                 color='cylinders')

# Display the plot in Streamlit
st.plotly_chart(fig)


Notes:
With the difference in cylinders now visualized, we can now include a chart for transmission to see the difference.

In [7]:
# Group by 'transmission' and calculate the average 'odometer' values
transmission_avg_miles = df.groupby('transmission')['odometer'].mean().reset_index()

# Streamlit app
st.title('Average Miles by Transmission Type')

# Create a scatter plot using Plotly Express
fig = px.scatter(transmission_avg_miles, x='transmission', y='odometer', 
                 title='Average Miles by Transmission Type',
                 labels={'transmission': 'Transmission Type', 'odometer': 'Average Miles'},
                 color='transmission')

# Display the plot in Streamlit
st.plotly_chart(fig)

Notes: Finally a histogram for a broad overview before we draw any conclusions.

In [12]:
# Streamlit app
st.header('Vehicle Listings Analysis')

# Checkbox to show/hide histogram
show_histogram = st.checkbox('Show Price Histogram')

if show_histogram:
    fig_hist = px.histogram(df, x='price', title='Price Distribution')
    st.plotly_chart(fig_hist)

# Scatter plot
fig_scatter = px.scatter(df, x='model_year', y='price', color='condition', title='Price vs. Model Year')
st.plotly_chart(fig_scatter)

Conclusion:
The optimal choice for the selection of cars provided is to go with a "v6" "manual transmission" manufactured by either "Ford" or "Chevrolet"