# CAR SALES ADVERTISEMENT APP DEVELOPMENT

***

#### Purpose of the project: To develop and deploy a web application about car sales advertisements to a cloud service so that it is accessible to the public.

#### On this project I want to see some hypotesis:
#### - if car price is determined by odometer, the most used cars are cheaper than newer ones.
#### - if recent cars are lesser used than older cars.

#### To achieve that and further analysis I will outline the following steps below in sequence.

***

### 1. IMPORT THE DATA AND LIBRARY

The first step I did is to import the necessary libraries and open the data file.
The following steps will be dedicated to process the data, clean missing values, dupplicates and set the data frame for further analysis.

In [108]:
import pandas as pd
import plotly.express as px
import numpy as np


In [109]:
df = pd.read_csv('vehicles_us.csv') #open the data set
print(df)

df.info()

       price  model_year           model  condition  cylinders fuel  odometer  \
0       9400      2011.0          bmw x5       good        6.0  gas  145000.0   
1      25500         NaN      ford f-150       good        6.0  gas   88705.0   
2       5500      2013.0  hyundai sonata   like new        4.0  gas  110000.0   
3       1500      2003.0      ford f-150       fair        8.0  gas       NaN   
4      14900      2017.0    chrysler 200  excellent        4.0  gas   80903.0   
...      ...         ...             ...        ...        ...  ...       ...   
51520   9249      2013.0   nissan maxima   like new        6.0  gas   88136.0   
51521   2700      2002.0     honda civic    salvage        4.0  gas  181500.0   
51522   3950      2009.0  hyundai sonata  excellent        4.0  gas  128000.0   
51523   7455      2013.0  toyota corolla       good        4.0  gas  139573.0   
51524   6300      2014.0   nissan altima       good        4.0  gas       NaN   

      transmission    type 

***

### 2. PRE-PROCESSING DATA AND CLEANING

In [110]:
df.info() #to display what columns have missing values


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51525 entries, 0 to 51524
Data columns (total 13 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   price         51525 non-null  int64  
 1   model_year    47906 non-null  float64
 2   model         51525 non-null  object 
 3   condition     51525 non-null  object 
 4   cylinders     46265 non-null  float64
 5   fuel          51525 non-null  object 
 6   odometer      43633 non-null  float64
 7   transmission  51525 non-null  object 
 8   type          51525 non-null  object 
 9   paint_color   42258 non-null  object 
 10  is_4wd        25572 non-null  float64
 11  date_posted   51525 non-null  object 
 12  days_listed   51525 non-null  int64  
dtypes: float64(4), int64(2), object(7)
memory usage: 5.1+ MB


There are 5 columns with missing values:
- model_year
- cylinders
- odometer
- paint_color
- is_4wd

1. clean 'cylinders' column from missing values.

In [111]:
print (df['cylinders'].isna().sum()) #to display missing values


5260


In [112]:
df['cylinders'] = df['cylinders'].fillna(0.0) #to replace missing values by 0.0
print (df['cylinders'].isna().sum())


0


5360 rows with missing values have been replaced with 0 value.

2. clean paint_color column from missing values.

In [113]:
print (df['paint_color'].isna().sum())


9267


In [114]:
df['paint_color'] = df['paint_color'].fillna('unkown') #to replace missing values by unkown
print (df['paint_color'].isna().sum())


0


9267 rows with missing values have been replaced by 'unkown'.

3. clean is_4wd column from missing values.

In [115]:
print (df['is_4wd'].isna().sum()) #to display missing values


25953


In [116]:
df['is_4wd'] = df['is_4wd'].fillna(0.0) #to replace missing values by 0.0
print (df['is_4wd'].isna().sum())


0


25953 rows with missing values have been replaced with 0 value.

4. clean 'model_year' column from missing values.

In [117]:
print (df['model_year'].isna().sum()) #to identify missing values

3619


Since I can't just replace years by 0 or delete the rows, this will loose a lot of data, I am going to investigate a little bit more to see what is the proportion of missing values by car model to see if there is any pattern. 

In [118]:
model_missing_prop = df.groupby('model')['model_year'].apply(lambda x: x.isnull().mean())

model_missing_prop = model_missing_prop.reset_index()
model_missing_prop.columns = ['model', 'missing_prop']

fig = px.bar(model_missing_prop, x='model', y='missing_prop', title='Missing Model Year Proportion by Car Model')
fig.update_xaxes(tickangle=45)
fig.show()


From the above visualization I can see that there is no specific pattern where some models have more missing values than others, except two models that have more proportion of missing values compared to the rest.

I will calculate the median value using the groupby from other columns so I can reconstruct the missing values.

In [119]:
df['model_year'] = df['model_year'].fillna(df.groupby(['model'])['model_year'].transform('median'))
print(df['model_year'])

0        2011.0
1        2011.0
2        2013.0
3        2003.0
4        2017.0
          ...  
51520    2013.0
51521    2002.0
51522    2009.0
51523    2013.0
51524    2014.0
Name: model_year, Length: 51525, dtype: float64


In [120]:
print (df['model_year'].isna().sum())


0


In [121]:
missing_values = df['model_year'].isna()

# Explore unique values (might reveal non-numeric characters)
print(df['model_year'].unique())

[2011.  2013.  2003.  2017.  2014.  2015.  2012.  2008.  2018.  2009.
 2010.  2007.  2004.  2005.  2001.  2006.  1966.  1994.  2019.  2000.
 2016.  1993.  1999.  2006.5 1997.  2002.  1981.  1995.  1996.  1975.
 1998.  1985.  1977.  1987.  1974.  1990.  1992.  1991.  1972.  1967.
 1988.  1969.  1989.  1978.  1965.  1979.  1968.  1986.  1980.  1964.
 1963.  1984.  1982.  2010.5 1973.  1970.  1955.  1971.  1976.  1983.
 1954.  1962.  1948.  1960.  1908.  1961.  1936.  1949.  1958.  1929. ]


In [122]:
df['model_year'] = df['model_year'].astype(int)
print(df['model_year'].unique())


[2011 2013 2003 2017 2014 2015 2012 2008 2018 2009 2010 2007 2004 2005
 2001 2006 1966 1994 2019 2000 2016 1993 1999 1997 2002 1981 1995 1996
 1975 1998 1985 1977 1987 1974 1990 1992 1991 1972 1967 1988 1969 1989
 1978 1965 1979 1968 1986 1980 1964 1963 1984 1982 1973 1970 1955 1971
 1976 1983 1954 1962 1948 1960 1908 1961 1936 1949 1958 1929]


The result is 0 rows with missing values now and the years format cleaned from trailing zeroes and decimals.

23619 rows with missing values have been replaced with median grouped by model to restaure the year values.

5. clean odometer column from missing values.

In [123]:
print (df['odometer'].isna().sum()) #to identify missing values


7892


In [124]:
df['odometer'] = df['odometer'].fillna(df.groupby(['model_year'])['odometer'].transform('median'))
print(df['odometer'])


0        145000.0
1         88705.0
2        110000.0
3        161397.0
4         80903.0
           ...   
51520     88136.0
51521    181500.0
51522    128000.0
51523    139573.0
51524     90000.0
Name: odometer, Length: 51525, dtype: float64


In [125]:
df['odometer'] = df['odometer'].fillna(0.0) #to replace missing values by 0.0
print (df['odometer'].isna().sum())

0


In [126]:
df['odometer'] = df['odometer'].astype(int)
print(df['odometer'].unique())

[145000  88705 110000 ... 121778 181500 139573]


7892 rows with missing values have been replaced with median grouped by model_year to restaure the year values. And the format of the values has been corrected removing the trailing zeroes and decimals.

In [127]:
df.columns #to display if there are typos or other errors on column titles

Index(['price', 'model_year', 'model', 'condition', 'cylinders', 'fuel',
       'odometer', 'transmission', 'type', 'paint_color', 'is_4wd',
       'date_posted', 'days_listed'],
      dtype='object')

I am checking that the titles of the columns are clean or need to be clean, in this case the titles are clean.

In [128]:
df.isnull().sum() #display of missing values after processing


price           0
model_year      0
model           0
condition       0
cylinders       0
fuel            0
odometer        0
transmission    0
type            0
paint_color     0
is_4wd          0
date_posted     0
days_listed     0
dtype: int64

The missing values on the columns have been cleaned.

In [129]:
df.duplicated() #checking for dupplicates on the data frame


0        False
1        False
2        False
3        False
4        False
         ...  
51520    False
51521    False
51522    False
51523    False
51524    False
Length: 51525, dtype: bool

In [130]:
df[df.duplicated()] #checking for dupplicates on the titles


Unnamed: 0,price,model_year,model,condition,cylinders,fuel,odometer,transmission,type,paint_color,is_4wd,date_posted,days_listed


In [131]:
df.duplicated().sum() #result of dupplicates


0

There are no dupplicate values on the data frame.

On the next steps I am checking for non-obvious dupplicates on some columns, checking if there are variations on same model or brand of car, etc.

In [132]:
df.model.nunique() #checking if there are hidden dupplicates


100

There are 100 unique model names on the data frame.

In [133]:
sorted(df.model.unique()) #to scroll for dupplicates on the 'model' column


['acura tl',
 'bmw x5',
 'buick enclave',
 'cadillac escalade',
 'chevrolet camaro',
 'chevrolet camaro lt coupe 2d',
 'chevrolet colorado',
 'chevrolet corvette',
 'chevrolet cruze',
 'chevrolet equinox',
 'chevrolet impala',
 'chevrolet malibu',
 'chevrolet silverado',
 'chevrolet silverado 1500',
 'chevrolet silverado 1500 crew',
 'chevrolet silverado 2500hd',
 'chevrolet silverado 3500hd',
 'chevrolet suburban',
 'chevrolet tahoe',
 'chevrolet trailblazer',
 'chevrolet traverse',
 'chrysler 200',
 'chrysler 300',
 'chrysler town & country',
 'dodge charger',
 'dodge dakota',
 'dodge grand caravan',
 'ford econoline',
 'ford edge',
 'ford escape',
 'ford expedition',
 'ford explorer',
 'ford f-150',
 'ford f-250',
 'ford f-250 sd',
 'ford f-250 super duty',
 'ford f-350 sd',
 'ford f150',
 'ford f150 supercrew cab xlt',
 'ford f250',
 'ford f250 super duty',
 'ford f350',
 'ford f350 super duty',
 'ford focus',
 'ford focus se',
 'ford fusion',
 'ford fusion se',
 'ford mustang',
 '

In [134]:
df.condition.unique() #checking for dupplicates on 'condition' column


array(['good', 'like new', 'fair', 'excellent', 'salvage', 'new'],
      dtype=object)

In [135]:
df.type.unique() #checking for dupplicates on 'type' column


array(['SUV', 'pickup', 'sedan', 'truck', 'coupe', 'van', 'convertible',
       'hatchback', 'wagon', 'mini-van', 'other', 'offroad', 'bus'],
      dtype=object)

In [136]:
df.paint_color.unique() #checking for dupplicates onn 'paint_color' column


array(['unkown', 'white', 'red', 'black', 'blue', 'grey', 'silver',
       'custom', 'orange', 'yellow', 'brown', 'green', 'purple'],
      dtype=object)

In [137]:
df.fuel.unique() #checking for dupplicates on 'fuel' column


array(['gas', 'diesel', 'other', 'hybrid', 'electric'], dtype=object)

In [138]:
df.transmission.unique() #checking for dupplicates on 'transmission' column


array(['automatic', 'manual', 'other'], dtype=object)

From the above steps I can deduce that there are defenitely no dupplicate values and the data frame is cleaned in regards to this factor so I can proceed with the analysys.

In [139]:
df.info() #to display the result of fixing dupplicates and missing values


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51525 entries, 0 to 51524
Data columns (total 13 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   price         51525 non-null  int64  
 1   model_year    51525 non-null  int64  
 2   model         51525 non-null  object 
 3   condition     51525 non-null  object 
 4   cylinders     51525 non-null  float64
 5   fuel          51525 non-null  object 
 6   odometer      51525 non-null  int64  
 7   transmission  51525 non-null  object 
 8   type          51525 non-null  object 
 9   paint_color   51525 non-null  object 
 10  is_4wd        51525 non-null  float64
 11  date_posted   51525 non-null  object 
 12  days_listed   51525 non-null  int64  
dtypes: float64(2), int64(4), object(7)
memory usage: 5.1+ MB


Now we can see above that the columsn have been clean and it shows the same values on all the columns.
In short:
- Missing values for some objects have been replaced by "unkown" values.
- Missing values on 4_wd column have been reploaced by zeroes.
- Missing values on model_year and odometer have been replaced by the median grouped by other column.
- model_year format have been cleaned deleting the trailing zeroes from the year and de decimal.
- odometer columns format have been cleaned deleting the trailing zeroes from the number of km and the decimal.

On the next steps I am going to work on the column 'price' to be able to use it as parameter for the analysis. In order to do that, first I will investigate for the presence of outliers, I will delete the outliers values from the columns and I will keep the columns on some logical range of prices that I can use.

In [140]:
print(df['price'].value_counts()[1]) #to display outliers on the column


798


From the above you can see there are 798 rows as outliers with value 1 'price'.

Next I will see what is the minimun and maximum price on the column.

In [141]:
print(df['price'].max()) #to display the maximum amount of price
print(df['price'].min()) #to display the minimum amount of price


375000
1


I can't say that the maximum price is an outlier, because there are cars on that range of price in the market.

In [142]:
df.sort_values("price") #to sort the 'price' column in ascendent order


Unnamed: 0,price,model_year,model,condition,cylinders,fuel,odometer,transmission,type,paint_color,is_4wd,date_posted,days_listed
22900,1,2013,dodge charger,excellent,10.0,gas,25146,other,sedan,black,1.0,2018-09-29,60
10873,1,2018,ram 3500,excellent,6.0,gas,3047,automatic,truck,white,1.0,2018-11-22,27
9536,1,2016,dodge charger,excellent,10.0,gas,29406,other,sedan,red,1.0,2018-11-14,27
9537,1,2012,ford fusion,excellent,4.0,gas,110000,automatic,sedan,blue,1.0,2018-06-18,5
9538,1,2015,chevrolet camaro,excellent,10.0,gas,28926,other,coupe,grey,1.0,2018-10-06,7
...,...,...,...,...,...,...,...,...,...,...,...,...,...
30634,189000,2014,ford f-150,good,6.0,gas,90000,automatic,truck,black,0.0,2018-07-21,42
33434,189000,2014,ford f-150,good,6.0,gas,151248,automatic,truck,black,0.0,2019-02-05,102
27375,189000,2014,ford f-150,good,6.0,gas,151248,automatic,truck,black,0.0,2018-09-25,72
11359,300000,2015,ram 2500,excellent,0.0,diesel,78285,automatic,truck,grey,1.0,2018-10-15,39


In [143]:
df = df.drop(df[df['price'] == 1].index) #to display the result after removing the outliers, which shows more outliers
print(df)


       price  model_year           model  condition  cylinders fuel  odometer  \
0       9400        2011          bmw x5       good        6.0  gas    145000   
1      25500        2011      ford f-150       good        6.0  gas     88705   
2       5500        2013  hyundai sonata   like new        4.0  gas    110000   
3       1500        2003      ford f-150       fair        8.0  gas    161397   
4      14900        2017    chrysler 200  excellent        4.0  gas     80903   
...      ...         ...             ...        ...        ...  ...       ...   
51520   9249        2013   nissan maxima   like new        6.0  gas     88136   
51521   2700        2002     honda civic    salvage        4.0  gas    181500   
51522   3950        2009  hyundai sonata  excellent        4.0  gas    128000   
51523   7455        2013  toyota corolla       good        4.0  gas    139573   
51524   6300        2014   nissan altima       good        4.0  gas     90000   

      transmission    type 

Eventhough I got rid of the outliers with value 1, I can still see there are more, so I will take out all the values below 200 which is a price I can take as minimum for the analysis.

In [144]:
df.sort_values("price") #to display the 'price' column in ascendent order to spot further outliers


Unnamed: 0,price,model_year,model,condition,cylinders,fuel,odometer,transmission,type,paint_color,is_4wd,date_posted,days_listed
44205,3,2005,jeep liberty,good,6.0,gas,153108,automatic,SUV,unkown,1.0,2018-06-19,22
50430,5,2011,toyota sienna,good,6.0,gas,123025,automatic,SUV,green,0.0,2018-12-03,5
31263,6,1999,ford f250,good,8.0,gas,173500,automatic,pickup,unkown,1.0,2019-02-07,53
39379,9,2010,subaru forester,good,4.0,gas,119,automatic,SUV,grey,1.0,2019-03-02,12
50971,10,2012,toyota prius,excellent,4.0,hybrid,101000,automatic,hatchback,green,0.0,2019-03-16,29
...,...,...,...,...,...,...,...,...,...,...,...,...,...
30634,189000,2014,ford f-150,good,6.0,gas,90000,automatic,truck,black,0.0,2018-07-21,42
34389,189000,2014,ford f-150,good,6.0,gas,151248,automatic,truck,black,0.0,2019-02-02,28
1668,189000,2014,ford f-150,good,6.0,gas,151248,automatic,truck,unkown,0.0,2019-03-20,21
11359,300000,2015,ram 2500,excellent,0.0,diesel,78285,automatic,truck,grey,1.0,2018-10-15,39


In [145]:
df = df.drop(df[df['price'] < 200].index) #to filter the 'price' column disregarding the outliers
print(df)
           

       price  model_year           model  condition  cylinders fuel  odometer  \
0       9400        2011          bmw x5       good        6.0  gas    145000   
1      25500        2011      ford f-150       good        6.0  gas     88705   
2       5500        2013  hyundai sonata   like new        4.0  gas    110000   
3       1500        2003      ford f-150       fair        8.0  gas    161397   
4      14900        2017    chrysler 200  excellent        4.0  gas     80903   
...      ...         ...             ...        ...        ...  ...       ...   
51520   9249        2013   nissan maxima   like new        6.0  gas     88136   
51521   2700        2002     honda civic    salvage        4.0  gas    181500   
51522   3950        2009  hyundai sonata  excellent        4.0  gas    128000   
51523   7455        2013  toyota corolla       good        4.0  gas    139573   
51524   6300        2014   nissan altima       good        4.0  gas     90000   

      transmission    type 

In [146]:
median_price = df['price'].median()
average_price = int(df['price'].mean())
print("Average Price (rounded):", average_price)
print("Median Price:", median_price)


Average Price (rounded): 12350
Median Price: 9499.0


Average price for the listing is around 12.000.

Now we have a columns with cars with minimum price 200 to maximum price 375000.

***

### 3. ANALYSYS OF DATA

In [147]:
df.groupby("condition").size() #to filter the data frame by 'condition' column


condition
excellent    23943
fair          1597
good         20110
like new      4710
new            139
salvage        115
dtype: int64

Most of cars available to sell are in good and excellent condition.

Now I will explore with the model_year column to see if there are any meaning results comparing with prices and condition.

In [148]:
df.groupby("model_year").size() #to filter the data frame by 'model_year' column


model_year
1908       2
1929       1
1936       1
1948       1
1949       1
        ... 
2015    3313
2016    2895
2017    2300
2018    1962
2019     330
Length: 68, dtype: int64

I see there are outliers with model_year, there are models from more than 100 years ago, I will filter from 1970 and earlier to clean the column.

In [149]:
df_filtered = df[df['model_year'] >= 1970] #filter the 'model_year column to disregard outliers
new_df = df_filtered
print(new_df)


       price  model_year           model  condition  cylinders fuel  odometer  \
0       9400        2011          bmw x5       good        6.0  gas    145000   
1      25500        2011      ford f-150       good        6.0  gas     88705   
2       5500        2013  hyundai sonata   like new        4.0  gas    110000   
3       1500        2003      ford f-150       fair        8.0  gas    161397   
4      14900        2017    chrysler 200  excellent        4.0  gas     80903   
...      ...         ...             ...        ...        ...  ...       ...   
51520   9249        2013   nissan maxima   like new        6.0  gas     88136   
51521   2700        2002     honda civic    salvage        4.0  gas    181500   
51522   3950        2009  hyundai sonata  excellent        4.0  gas    128000   
51523   7455        2013  toyota corolla       good        4.0  gas    139573   
51524   6300        2014   nissan altima       good        4.0  gas     90000   

      transmission    type 

FINAL DATA FRAME CLEAN

In [150]:
print(new_df) #to display the clean data frame after processing


       price  model_year           model  condition  cylinders fuel  odometer  \
0       9400        2011          bmw x5       good        6.0  gas    145000   
1      25500        2011      ford f-150       good        6.0  gas     88705   
2       5500        2013  hyundai sonata   like new        4.0  gas    110000   
3       1500        2003      ford f-150       fair        8.0  gas    161397   
4      14900        2017    chrysler 200  excellent        4.0  gas     80903   
...      ...         ...             ...        ...        ...  ...       ...   
51520   9249        2013   nissan maxima   like new        6.0  gas     88136   
51521   2700        2002     honda civic    salvage        4.0  gas    181500   
51522   3950        2009  hyundai sonata  excellent        4.0  gas    128000   
51523   7455        2013  toyota corolla       good        4.0  gas    139573   
51524   6300        2014   nissan altima       good        4.0  gas     90000   

      transmission    type 

***

### 4. VISUALIZATIONS

In [151]:
df_filtered = new_df[new_df['model_year'] >= 1960] #added a filter with models after 1960 only
#scatter graph comparing amount of km. run for each car and price list
fig3 = px.scatter(df_filtered, x='model_year', y='odometer', title="Comparison of model year (for 1960 and later cars) and odometer")
fig3.show()


VISUALIZATION #1: From the visualization we can see that cars that are from 1990 to 2010 have the most amount of km run. The are most used than the rest.

In [152]:
df_type = new_df.groupby('type').size() #filter the data frame by 'type' of car
print(df_type)


type
SUV            12131
bus               24
convertible      417
coupe           2176
hatchback       1032
mini-van        1159
offroad          212
other            252
pickup          6973
sedan          11938
truck          12059
van              612
wagon           1537
dtype: int64


Top 3 car types are: SUV, sedan and truck.

In [153]:
df_type = new_df.groupby('fuel').size() #filter the data frame by 'fuel' type for each car
print(df_type)


fuel
diesel       3705
electric        5
gas         46300
hybrid        407
other         105
dtype: int64


Top car fuel used is Gas.

In [154]:
df_type = new_df.groupby('odometer').size() #filter the data frame to display cars for each kilometrage done
df_type = df_type.sort_values(ascending=False)
print(df_type)


odometer
140000    738
110000    725
99840     646
90000     637
123025    611
         ... 
376000      1
375000      1
53          1
373200      1
152977      1
Length: 17600, dtype: int64


In [155]:
average_price = int(new_df['odometer'].mean())
print("Average Price (rounded):", average_price)


Average Price (rounded): 116487


Cars on the listing have an average of 11.500 km.

In [156]:
df_type = new_df.groupby('transmission').size()
print(df_type)


transmission
automatic    46254
manual        2756
other         1512
dtype: int64


In [157]:
# histogram by groups with different colors comparing 'type' of cars and 'condition' of the car
fig = px.histogram(new_df, x="type", color='condition') 
fig.show()

VISUALIZATION #2: Histogram where you can see the distribution of types of cars depending on their condition. You can click on the colors you want, is a multiple selecion, to visualize the graph for those variables.

In [158]:
# histogram to visualize groups by 'type' of car compared with the 'fuel' they use and the count added up for each one
fig = px.histogram(new_df, x="type", color='fuel') 
fig.show()

VIUALIZATION #3: Histogram where you can see the distribution of types of cars depending on their fuel. You can click on the colors you want, is a multiple selecion, to visualize the graph for those variables. It is obvious that the majority of cars are fueled by gas.

***

### 5. SUMMARY AND CONCLUSIONS

##### After the analsys performed I can conclude that the following hypotesis has been demonstrated:

##### 1. Cars with high odometer are not necessariry old cars, there is a range of years, 1990 to 2010, where cars have the most amount of kilometers run.  Therefore, they are more used than other years.
##### 2. The odometer is not a crucial factor to determine a good price for a car. Price is determined by other parameters together.
    
##### - Cars that are from 1990 to 2010 years have the most amount of km run (highest values on odometer).
##### - Top 4 types of cars with best condition to sell are: SUV, sedan, pipckup and truck.
##### - Top 4 types of cars run by gas are: SUV, sedan, pickup and truck.
##### - Top types of cars run by diesel are: truck and pickup.
##### - Top types of cars run by hybrid are: sedan and hatchback.
##### - Top types of cars run electric are: sedan, truck and hatchback.

##### Most cars are fueled by gas.
##### Cars on the listing have an average of 12.000 km.
##### Most cars are automatic.




--END OF PROJECT--

In [160]:
new_df.to_csv('new_df.csv', index=False)