# Car Price Investigations


## Navigation:
* [Readme](https://github.com/Fazestar01/Car-Price-Analysis/blob/main/README.md)
* [Clean Data](https://github.com/Fazestar01/Car-Price-Analysis/blob/main/data/cleanedcardata.csv)
* [Raw Data](https://github.com/Fazestar01/Car-Price-Analysis/blob/main/data/CarPrice_Assignment.csv)
* [Dashboard](https://public.tableau.com/app/profile/kaori.ikarashi/viz/CarPriceAnalysis_17501618237170/Story1?publish=yes)



First we import necessary libraries

In [72]:
import pandas as pd 
import numpy as np

We then loads the data set and display the first 5 rows for initial investigation

In [73]:
df = pd.read_csv('../data/CarPrice_Assignment.csv')
df.head()

Unnamed: 0,car_ID,symboling,CarName,fueltype,aspiration,doornumber,carbody,drivewheel,enginelocation,wheelbase,...,enginesize,fuelsystem,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price
0,1,3,alfa-romero giulia,gas,std,two,convertible,rwd,front,88.6,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,13495.0
1,2,3,alfa-romero stelvio,gas,std,two,convertible,rwd,front,88.6,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,16500.0
2,3,1,alfa-romero Quadrifoglio,gas,std,two,hatchback,rwd,front,94.5,...,152,mpfi,2.68,3.47,9.0,154,5000,19,26,16500.0
3,4,2,audi 100 ls,gas,std,four,sedan,fwd,front,99.8,...,109,mpfi,3.19,3.4,10.0,102,5500,24,30,13950.0
4,5,2,audi 100ls,gas,std,four,sedan,4wd,front,99.4,...,136,mpfi,3.19,3.4,8.0,115,5500,18,22,17450.0


 ## Check the shape of the data frame

In [74]:
df.shape

(205, 26)

The data frame is not large enough to merit sampling the data.

## Checking for null values

In [75]:
df.isnull().sum()

car_ID              0
symboling           0
CarName             0
fueltype            0
aspiration          0
doornumber          0
carbody             0
drivewheel          0
enginelocation      0
wheelbase           0
carlength           0
carwidth            0
carheight           0
curbweight          0
enginetype          0
cylindernumber      0
enginesize          0
fuelsystem          0
boreratio           0
stroke              0
compressionratio    0
horsepower          0
peakrpm             0
citympg             0
highwaympg          0
price               0
dtype: int64

There are no null values in the data set.

## Check data types

In [76]:
df.dtypes


car_ID                int64
symboling             int64
CarName              object
fueltype             object
aspiration           object
doornumber           object
carbody              object
drivewheel           object
enginelocation       object
wheelbase           float64
carlength           float64
carwidth            float64
carheight           float64
curbweight            int64
enginetype           object
cylindernumber       object
enginesize            int64
fuelsystem           object
boreratio           float64
stroke              float64
compressionratio    float64
horsepower            int64
peakrpm               int64
citympg               int64
highwaympg            int64
price               float64
dtype: object

We notice cylindernumber is an object and thus needs further investigation.

In [77]:
df['cylindernumber'].unique()


array(['four', 'six', 'five', 'three', 'twelve', 'two', 'eight'],
      dtype=object)

We discovered that the cylindernumber is an an object because it's listing the numbers in word format. To make things easier for analysis we will change the values into a numerical format.

In [78]:
df.replace({'cylindernumber': {'four': 4, 'six': 6, 'five': 5, 'three': 3, 'twelve': 12, 'two': 2, 'eight': 8}}, inplace=True)
df['cylindernumber'] = df['cylindernumber'].astype(int)

We check that the changes have successfully been made.

In [79]:
df.dtypes

car_ID                int64
symboling             int64
CarName              object
fueltype             object
aspiration           object
doornumber           object
carbody              object
drivewheel           object
enginelocation       object
wheelbase           float64
carlength           float64
carwidth            float64
carheight           float64
curbweight            int64
enginetype           object
cylindernumber        int32
enginesize            int64
fuelsystem           object
boreratio           float64
stroke              float64
compressionratio    float64
horsepower            int64
peakrpm               int64
citympg               int64
highwaympg            int64
price               float64
dtype: object

## Checking the unique values of object variables to spot any errors.

Checking CarName.

In [80]:
df['CarName'].unique()

array(['alfa-romero giulia', 'alfa-romero stelvio',
       'alfa-romero Quadrifoglio', 'audi 100 ls', 'audi 100ls',
       'audi fox', 'audi 5000', 'audi 4000', 'audi 5000s (diesel)',
       'bmw 320i', 'bmw x1', 'bmw x3', 'bmw z4', 'bmw x4', 'bmw x5',
       'chevrolet impala', 'chevrolet monte carlo', 'chevrolet vega 2300',
       'dodge rampage', 'dodge challenger se', 'dodge d200',
       'dodge monaco (sw)', 'dodge colt hardtop', 'dodge colt (sw)',
       'dodge coronet custom', 'dodge dart custom',
       'dodge coronet custom (sw)', 'honda civic', 'honda civic cvcc',
       'honda accord cvcc', 'honda accord lx', 'honda civic 1500 gl',
       'honda accord', 'honda civic 1300', 'honda prelude',
       'honda civic (auto)', 'isuzu MU-X', 'isuzu D-Max ',
       'isuzu D-Max V-Cross', 'jaguar xj', 'jaguar xf', 'jaguar xk',
       'maxda rx3', 'maxda glc deluxe', 'mazda rx2 coupe', 'mazda rx-4',
       'mazda glc deluxe', 'mazda 626', 'mazda glc', 'mazda rx-7 gs',
       'mazda glc 

We will change Maxda to Mazda, Toyouta to Toyota, Porcshce to porsche and Vokswagen to Volkswagen.

In [81]:
df.replace({'CarName': {'toyouta tercel': 'toyota tercel', 'porcshce panamera': 'porsche','vokswagen rabbit':'volkswagen rabbit'}}, inplace=True)

In [82]:
df.replace({'CarName': {'maxda rx3': 'mazda rx3','maxda glc deluxe':'mazda glc deluxe'}}, inplace=True)

In [83]:
df['CarName'].unique()

array(['alfa-romero giulia', 'alfa-romero stelvio',
       'alfa-romero Quadrifoglio', 'audi 100 ls', 'audi 100ls',
       'audi fox', 'audi 5000', 'audi 4000', 'audi 5000s (diesel)',
       'bmw 320i', 'bmw x1', 'bmw x3', 'bmw z4', 'bmw x4', 'bmw x5',
       'chevrolet impala', 'chevrolet monte carlo', 'chevrolet vega 2300',
       'dodge rampage', 'dodge challenger se', 'dodge d200',
       'dodge monaco (sw)', 'dodge colt hardtop', 'dodge colt (sw)',
       'dodge coronet custom', 'dodge dart custom',
       'dodge coronet custom (sw)', 'honda civic', 'honda civic cvcc',
       'honda accord cvcc', 'honda accord lx', 'honda civic 1500 gl',
       'honda accord', 'honda civic 1300', 'honda prelude',
       'honda civic (auto)', 'isuzu MU-X', 'isuzu D-Max ',
       'isuzu D-Max V-Cross', 'jaguar xj', 'jaguar xf', 'jaguar xk',
       'mazda rx3', 'mazda glc deluxe', 'mazda rx2 coupe', 'mazda rx-4',
       'mazda 626', 'mazda glc', 'mazda rx-7 gs', 'mazda glc 4',
       'mazda glc custo

We will convert all value names into having a capital letter for each word.

In [84]:
df['CarName'] = df['CarName'].str.title()

In [85]:
df['CarName'].unique()

array(['Alfa-Romero Giulia', 'Alfa-Romero Stelvio',
       'Alfa-Romero Quadrifoglio', 'Audi 100 Ls', 'Audi 100Ls',
       'Audi Fox', 'Audi 5000', 'Audi 4000', 'Audi 5000S (Diesel)',
       'Bmw 320I', 'Bmw X1', 'Bmw X3', 'Bmw Z4', 'Bmw X4', 'Bmw X5',
       'Chevrolet Impala', 'Chevrolet Monte Carlo', 'Chevrolet Vega 2300',
       'Dodge Rampage', 'Dodge Challenger Se', 'Dodge D200',
       'Dodge Monaco (Sw)', 'Dodge Colt Hardtop', 'Dodge Colt (Sw)',
       'Dodge Coronet Custom', 'Dodge Dart Custom',
       'Dodge Coronet Custom (Sw)', 'Honda Civic', 'Honda Civic Cvcc',
       'Honda Accord Cvcc', 'Honda Accord Lx', 'Honda Civic 1500 Gl',
       'Honda Accord', 'Honda Civic 1300', 'Honda Prelude',
       'Honda Civic (Auto)', 'Isuzu Mu-X', 'Isuzu D-Max ',
       'Isuzu D-Max V-Cross', 'Jaguar Xj', 'Jaguar Xf', 'Jaguar Xk',
       'Mazda Rx3', 'Mazda Glc Deluxe', 'Mazda Rx2 Coupe', 'Mazda Rx-4',
       'Mazda 626', 'Mazda Glc', 'Mazda Rx-7 Gs', 'Mazda Glc 4',
       'Mazda Glc Custo

The values are all capitalised now, however we want to make 'Bmw' 'BMW' with a function.

In [86]:
def correct_bmw(name):
    name = name.title()  # Capitalizes first letter of each word
    return name.replace("Bmw", "BMW")

df['CarName'] = df['CarName'].apply(correct_bmw)
df['CarName'].unique()

array(['Alfa-Romero Giulia', 'Alfa-Romero Stelvio',
       'Alfa-Romero Quadrifoglio', 'Audi 100 Ls', 'Audi 100Ls',
       'Audi Fox', 'Audi 5000', 'Audi 4000', 'Audi 5000S (Diesel)',
       'BMW 320I', 'BMW X1', 'BMW X3', 'BMW Z4', 'BMW X4', 'BMW X5',
       'Chevrolet Impala', 'Chevrolet Monte Carlo', 'Chevrolet Vega 2300',
       'Dodge Rampage', 'Dodge Challenger Se', 'Dodge D200',
       'Dodge Monaco (Sw)', 'Dodge Colt Hardtop', 'Dodge Colt (Sw)',
       'Dodge Coronet Custom', 'Dodge Dart Custom',
       'Dodge Coronet Custom (Sw)', 'Honda Civic', 'Honda Civic Cvcc',
       'Honda Accord Cvcc', 'Honda Accord Lx', 'Honda Civic 1500 Gl',
       'Honda Accord', 'Honda Civic 1300', 'Honda Prelude',
       'Honda Civic (Auto)', 'Isuzu Mu-X', 'Isuzu D-Max ',
       'Isuzu D-Max V-Cross', 'Jaguar Xj', 'Jaguar Xf', 'Jaguar Xk',
       'Mazda Rx3', 'Mazda Glc Deluxe', 'Mazda Rx2 Coupe', 'Mazda Rx-4',
       'Mazda 626', 'Mazda Glc', 'Mazda Rx-7 Gs', 'Mazda Glc 4',
       'Mazda Glc Custo

Checking Fueltype

In [87]:
df['fueltype'].unique()

array(['gas', 'diesel'], dtype=object)

Checking aspiration

In [88]:
df['aspiration'].unique()

array(['std', 'turbo'], dtype=object)

Checking doornumber

In [89]:
df['doornumber'].unique()

array(['two', 'four'], dtype=object)

Checking carbody

In [90]:
df['carbody'].unique()

array(['convertible', 'hatchback', 'sedan', 'wagon', 'hardtop'],
      dtype=object)

Checking drivewheel

In [91]:
df['drivewheel'].unique()

array(['rwd', 'fwd', '4wd'], dtype=object)

Checking enginelocation

In [92]:
df['enginelocation'].unique()

array(['front', 'rear'], dtype=object)

Checking fuelsystem

In [93]:
df['fuelsystem'].unique()

array(['mpfi', '2bbl', 'mfi', '1bbl', 'spfi', '4bbl', 'idi', 'spdi'],
      dtype=object)

We are satisfied with all unique values of the object variables.

## Check for duplicates

In [94]:
df.duplicated().sum()

0

No duplicates found.

We are satisfied the data is clean.

## Save cleaned data to a new file

In [95]:
df.to_csv("../data/cleanedcardata.csv", index=False)

The dashboard visualisations for this data can be found [here](https://public.tableau.com/app/profile/kaori.ikarashi/viz/CarPriceAnalysis_17501618237170/Story1?publish=yes).