# Coursework 2

### This code reads a CSV file, processes the data, and uses it to predict the price of a car based on various factors such as the year, manufacturer, condition, fuel type, and odometer value. The detailed explanation of the code is as follows:

The code imports the pandas and numpy libraries and reads a CSV file named "vehicles_michigan.csv" into a Pandas DataFrame using the `pd.read_csv()` method. 

In [None]:
# import libraries
import pandas as pd
import numpy as np
# load dataframe
df = pd.read_csv('./vehicles_michigan.csv')

The code processes the data to create predictor variables and a target variable, `y`:

  The predictor variables `predictor1` and `predictor5` are created by converting the 'year' and 'odometer' column of the DataFrame to a numpy array of type float using `pd.to_numeric()` and `np.array()` methods. They are then transposed to a row vector using `.T`.

  The predictor variables `predictor2`, `predictor3`, and `predictor4` are created by using the `pd.get_dummies()` method to convert the 'manufacturer', 'condition', and 'fuel' column of the DataFrame into dummy variables. The `drop_first=True` argument drops the first dummy variable to avoid the dummy variable trap. 
  
  The target variable `y` is created by converting the 'price' column of the DataFrame to a numpy array of type float using `np.array()` method. It is then transposed to a row vector using `.T`.


In [None]:
predictor1 = np.array(pd.to_numeric(df['year']), ndmin=2).T # year column into array
predictor2 = pd.get_dummies(df['manufacturer'], drop_first=True) #manufacturer column to dummies values
predictor3 = pd.get_dummies(df['condition'], drop_first=True) 
predictor4 = pd.get_dummies(df['fuel'], drop_first=True) 
predictor5 = np.array(pd.to_numeric(df['odometer']), ndmin=2).T 
y = np.array(df['price'], ndmin=2).T 

The code creates a matrix `X` by concatenating the predictor variables together using `np.column_stack()` method. The first column of `X` is a column vector of ones, which corresponds to the intercept term of the multiple linear regression model.

First, It calculates the matrix `XTX` as the dot product of the transpose of `X` and `X`. Secondly, it calculates the inverse of `XTX` using `np.linalg.inv()` method and assigns it to `XTX_inv`. Thirdly, it calculates the matrix `XTX_invXT` as the dot product of `XTX_inv` and the transpose of `X`. Finally, it calculates the weight vector `w` as the dot product of `XTX_invXT` and `y`.


In [None]:
X = np.column_stack([np.ones(predictor1.shape), predictor1, predictor2, predictor3, predictor4, predictor5]) # predictors into one matrix
XTX = np.dot(X.T, X) # Step 1
XTX_inv = np.linalg.inv(XTX) # Step 2
XTX_invXT = np.dot(XTX_inv, X.T) # Step 3
w = np.dot(XTX_invXT, y) # least squares parameters

The code prompts the user to input the car's year, manufacturer, condition, fuel type, and odometer.

It processes the user input to create a vector `x` containing the same predictor variables as `X` for the car that the user entered. The first element of `x` is set to 1, corresponding to the intercept term of the multiple linear regression model. Then, it calculates the predicted price of the car as the dot product of `x` and `w`. Finally, it prints the predicted price of the car in a formatted string.

In [None]:
year = int(input("Enter the year of the car: ")) # Try '2018'
manufacturer = input("Enter the manufacturer of the car: ") # Try 'tesla'
condition = input("Enter the condition of the car: ") # Try 'good'
fuel = input("Enter the fuel of the car: ") # Try 'electric'
odometer = int(input("Enter the odometer of the car: ")) # Try '22000'

x1 = np.array([year]) # input year 
manufacturers = np.sort(df['manufacturer'].unique()) # get manufacturers list
x2_arr = np.zeros(len(manufacturers)) # populate dummie values
x2_arr[np.where(manufacturers == manufacturer)] = 1 # add 1 to the input manufacturer
x2 = x2_arr[1:] 
conditions = np.sort(df['condition'].unique())
x3_arr = np.zeros(len(conditions))
x3_arr[np.where(conditions == condition)] = 1
x3 = x3_arr[1:] 
fuels = np.sort(df['fuel'].unique())
x4_arr = np.zeros(len(fuels))
x4_arr[np.where(fuels == fuel)] = 1
x4 = x4_arr[1:] 
x5 = np.array([odometer])

x = np.concatenate(([1], x1, x2, x3, x4, x5)) # get input values into one matrix
price = np.dot(x, w) # calculate regression

print("The car price is: $", end = '')
print(f"{price[0]:,.2f}") # Should get over ~$40,000

Enter the year of the car: 2021
Enter the manufacturer of the car: tesla
Enter the condition of the car: excellent
Enter the fuel of the car: electric
Enter the odometer of the car: 22000
The car price is: $43,172.91
