# Software Developement Tool Project

The focus of this project is to provide additional practice with common software engineering tasks.

Using data provided (`'vehicles_us.csv'`) regarding different vehicles for sale, both new and old, we will look
at putting together a web app to filter through the data based on a variety of conditions.

In [6]:
import streamlit as st
import pandas as pd
import plotly.express as px

In [7]:
df = pd.read_csv('vehicles_us.csv')

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51525 entries, 0 to 51524
Data columns (total 13 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   price         51525 non-null  int64  
 1   model_year    47906 non-null  float64
 2   model         51525 non-null  object 
 3   condition     51525 non-null  object 
 4   cylinders     46265 non-null  float64
 5   fuel          51525 non-null  object 
 6   odometer      43633 non-null  float64
 7   transmission  51525 non-null  object 
 8   type          51525 non-null  object 
 9   paint_color   42258 non-null  object 
 10  is_4wd        25572 non-null  float64
 11  date_posted   51525 non-null  object 
 12  days_listed   51525 non-null  int64  
dtypes: float64(4), int64(2), object(7)
memory usage: 5.1+ MB


In [13]:
df['model_year'] = df['model_year'].fillna(df.groupby('model')['model_year'].transform('median'))
df['model_year'].unique()

array([2011. , 2013. , 2003. , 2017. , 2014. , 2015. , 2012. , 2008. ,
       2018. , 2009. , 2010. , 2007. , 2004. , 2005. , 2001. , 2006. ,
       1966. , 1994. , 2019. , 2000. , 2016. , 1993. , 1999. , 2006.5,
       1997. , 2002. , 1981. , 1995. , 1996. , 1975. , 1998. , 1985. ,
       1977. , 1987. , 1974. , 1990. , 1992. , 1991. , 1972. , 1967. ,
       1988. , 1969. , 1989. , 1978. , 1965. , 1979. , 1968. , 1986. ,
       1980. , 1964. , 1963. , 1984. , 1982. , 2010.5, 1973. , 1970. ,
       1955. , 1971. , 1976. , 1983. , 1954. , 1962. , 1948. , 1960. ,
       1908. , 1961. , 1936. , 1949. , 1958. , 1929. ])

In [15]:
df['cylinders'] = df['cylinders'].fillna(df.groupby('model')['cylinders'].transform('median'))
df['cylinders'].unique()

array([ 6.,  4.,  8.,  5., 10.,  3., 12.])

In [18]:
df['odometer'] = df['odometer'].fillna(df.groupby('model_year')['odometer'].transform('median'))
df['odometer'].unique()

array([145000.,  88705., 110000., ..., 121778., 181500., 139573.])

In [21]:
df['paint_color'] = df['paint_color'].fillna('No Info')
df['paint_color'].unique()

array(['No Info', 'white', 'red', 'black', 'blue', 'grey', 'silver',
       'custom', 'orange', 'yellow', 'brown', 'green', 'purple'],
      dtype=object)

In [22]:
df['is_4wd'] = df['is_4wd'].fillna(0)
df['is_4wd'].unique()

array([1., 0.])

In [None]:
#creating header with an option to filter the data and the checkbox:
#let users decide whether they want to see new cars from dealers or not


st.header('Market of used cars')
st.write("""
Filter the data below to see the information by different vehicle types
""")
show_new_cars = st.checkbox('Include new cars from dealers')

In [None]:
show_new_cars

In [None]:
if not show_new_cars:
    df = df[df.condition != 'new']

In [None]:
#creating options for filter from all vehicle types and different years
type_choice = df['type'].unique()
make_type_choice = st.selectbox('Select vehicle type:', type_choice)

In [None]:
make_type_choice

In [None]:
#next let's create a slider for years, so that users can filter cars by years of production
#creating min and max years as limits for sliders
min_year, max_year = int(df['model_year'].min()), int(df['model_year'].max())

year_range = st.slider(
    "Choose years",
    value = (min_year, max_year), min_value = min_year, max_value = max_year)

In [None]:
year_range

In [None]:
#creating actual range based on slider that will be used to filter in the dataset
actual_range = list(range(year_range[0], year_range[1]+1))

In [None]:
#filtering dataset on chosen vehicle type and chosen year range
filtered_type = df[(df.type == make_type_choice) & (df.model_year.isin(list(actual_range)))]

#showing the final table in streamlit
st.dataframe(filtered_type)

In [None]:
filtered_type

In [None]:
st.header("Price analysis")
st.write("""
Let's analyze what influences price the most. We will check how distribution of price varies depending on
transmission, cylinders, body type and condition
""")

#will create histograms with the split by paramater of choice: paint color, transmission, type, and condition

#creating list of options to choose from
list_for_hist = ['transmission', 'cylinders', 'type', 'condition']

#creating selectbox
choice_for_hist = st.selectbox('Split for price distribution', list_for_hist)

#plotly histogram, where price is split by the choice mode in the select box
fig1 = px.histogram(df, x = 'price', color = choice_for_hist)

#adding title
fig1.update_layout(title = "<b> Split of price by ()</b>".format(choice_for_hist))

#embedding into streamlit
st.plotly_chart(fig1)

In [None]:
fig1.show()

In [None]:
# creating age category of cars, because we want to take it into account when analyzing the price
df['age'] = 2023 - df['model_year']

def age_category(x):
    if x < 5:
        return '<5'
    elif x >= 5 and x < 10:
        return '5-10'
    elif x >= 10 and x < 20:
        return '10-20'
    elif x >= 20:
        return '20+'
    else:
        return 'unknown'
    
df['age_category'] = df['age'].apply(age_category)

In [None]:
df['age_category']

In [None]:
st.write("""
Now let's check how price is affected by odometer, number of cylinders, or days listed
""")

#distribution of price depending on odometer, cylinders, days_listed with the split by age category

list_for_scatter = ['odometer', 'cylinders', 'days_listed']
choice_for_scatter = st.selectbox('Price dependency on ', list_for_scatter)
fig2 = px.scatter(df, x = 'price', y = choice_for_scatter, hover_data = ['model_year'])

fig2.update_layout(
title = '<b> Price vs {}<b>'.format(choice_for_scatter))
st.plotly_chart(fig2)

In [None]:
fig2

# Conclusion

After taking care of the missing values within the dataset, we have a code to build a web app to narrow your search by filtering the data by age of the vehicle, and model type. This also allows you to see the way the price is reflected by different factors, including transmission, odometer, number of cylinders and the number of days listed.