# Intro to AI - Coursework
Joshua Luke Boddy - aczc760

Saeed Almansoori - 

### To-Do
- Functions to prepare each column of the dataset we need
- Decide on learning model (Multivariate regression?)
- Research integer value return (Possible with Regression)

## Data Preparation

Taking a CSV file as our data set, the data needs to be converted across from CSV to something program readable as well as formatting the data into a readable format for our model (undecided).

Columns needing data conversion:
- Manufacturer Name -> Categorical
- Model Name -> Categorical
- Transmission -> Categorical
- Color -> Categorical
- Engine Fuel -> Categorical
- Engine Has Gas -> Boolean
- Engine Type -> Categorical
- Body Type -> Categorical
- Has Warranty -> Boolean
- State -> Categorical
- Drivetrain -> Categorical

Extra Useable Columns
- Odometer Value -> Int
- Year Produced -> Int
- Engine Capacity -> Float

Data Labels
- Price USD -> Float

## Importing Required Libraries

The libraries below are required for the whole project and are imported at the start to maintain tidiness in the code, as well as make them usable throughout the notebook in later cells

In [190]:
import pandas as pd
import numpy as np

## Importing the Data Set from CSV

Reading the CSV file in to a data structure is simple using the pandas library as it has a built-in `read_csv` function that converts the data automatically (better than the alternative `with open(FILENAME, 'r') as file` option)

In [191]:
columnList = [
    'manufacturer_name',
    'model_name',
    'transmission',
    'color',
    'engine_fuel',
    'engine_has_gas',
    'engine_type',
    'body_type',
    'has_warranty',
    'state',
    'drivetrain',
    'odometer_value',
    'year_produced',
    'engine_capacity',
    'price_usd'
]

data = pd.read_csv('./cars.csv', usecols=columnList)
print(data.dtypes)

manufacturer_name     object
model_name            object
transmission          object
color                 object
odometer_value         int64
year_produced          int64
engine_fuel           object
engine_has_gas          bool
engine_type           object
engine_capacity      float64
body_type             object
has_warranty            bool
state                 object
drivetrain            object
price_usd            float64
dtype: object


## Preparation Functions for Each Column

Each column needs to be converted to a numerical data type so that it is readable by the neural network, this is done for each column below, however as opposed to writing functions for each column, we can determine what type the column is at the moment and write functions for each of those types respectively

In [192]:
# Convert From Object
def convertFromObject(column):
    # This one liner takes a categorical column
    # and assigns a value to each value in the column
    # then returns the new column as a list of ints
    return column.astype('category').cat.codes

# Convert From Boolean
def convertFromBoolean(column):
    # Simply convert from boolean to integer
    # where 0 is False and 1 is True
    return column.astype(int)

# Prepare Column function for each column
# This contains the appropriate function for
# each column in the dataset to run by
# checking their data types
def prepare_column(dtype, column):
    if dtype == np.object:
        return convertFromObject(column)
    elif dtype == np.bool:
        return convertFromBoolean(column)
    elif dtype == np.int or dtype == np.float:
        return column
    else:
        raise Exception("Data Type for Current Column %s not specified" % (column.dtype))

# The overall preparation function
# This will prepare every column in the dataset
# to be read by the neural network
def prepare_dataset(dataset):
    for column in dataset.columns:
        dataset[column] = prepare_column(dataset[column].dtype, dataset[column])
    labels = dataset.pop('price_usd')
    labels /= 100000
    print(labels)
    print(labels.max())
    return dataset.to_numpy(), labels.to_numpy()

## Using the Functions to Prepare the Data

Below we are applying the functions defined in the previous section to our dataset to make it readable by the neural network. This is important as mathematical operations later won't be able to be performed on the data if it is not in readable format, and as such the network will not function at all!

In [193]:
data, labels = prepare_dataset(data)

0        0.109000
1        0.050000
2        0.028000
3        0.099990
4        0.021341
           ...   
38526    0.027500
38527    0.048000
38528    0.043000
38529    0.040000
38530    0.032000
Name: price_usd, Length: 38531, dtype: float64
0.5


## Model Implementation
TBD when we confirm whether we can use multivariate regression or not

In [200]:
class NeuralNetwork():
        def __init__(self, layers=[14, 28, 28, 1], learningRate=0.001):
            # Adjusted the code here so that the neural network parameters were customisable
            # Same as before, hard coded values are stored in a dictionary and generated
            # based on the sizes of the input, hidden and output layer sizes
            self.layers = layers
            self.learningRate = learningRate
            self.params = {}
            for i in range(1, len(self.layers)):
                self.params['W' + str(i)] = np.random.uniform(-1, 1, size=(self.layers[i - 1], self.layers[i]))
                self.params['B' + str(i)] = np.random.uniform(-1, 1, size=(self.layers[i]))
        
        def forwardPass(self, X):
            params = self.params
            params['A0'] = np.array(X)
            for i in range(1, len(self.layers)):
                params['Z' + str(i)] = np.dot(params['A' + str(i - 1)], params['W' + str(i)]) + params['B' + str(i)]
                if(i == len(self.layers) - 1):
                    params['A' + str(i)] = self.sigmoid(params['Z' + str(i)])
                else:
                    params['A' + str(i)] = self.sigmoid(params['Z' + str(i)])
            return params["A" + str(len(self.layers) - 1)]
        
        def backwardPass(self, Y, output):
            params = self.params
            changes = {}
            placeholder = self.params
            for i in reversed(range(1, len(self.layers))):
                if (i == len(self.layers) - 1):
                    error = 2 * (output - Y) / output.shape[0] * self.softmax(params['Z' + str(i)], derivative=True)
                    changes['W' + str(i)] = np.outer(error, params['A' + str(i - 1)])
                else:
                    error = np.dot(error, params['W' + str(i + 1)].T) * self.sigmoid(params['Z' + str(i)], derivative=True)
                    changes['W' + str(i)] = np.outer(error, params["A" + str(i - 1)])
            return changes
        
        def updateParams(self, changes):
            for key, value in changes.items():
                self.params[key] -= self.learningRate * np.transpose(value)
        
        def classify(self, X):
            output = self.forwardPass(X)
            return output * 100000
            
        @staticmethod   
        def sigmoid(x, derivative = False):
            if derivative:
                return ((1/(1+np.exp(-x))) * (1 - 1/(1+np.exp(-x))))
            return 1/(1+np.exp(-x))
        
        @staticmethod 
        def softmax(x, derivative = False):
            exps = np.exp(x - x.max())
            if derivative:
                return exps / np.sum(exps, axis=0) * (1 - exps / np.sum(exps, axis=0))
            return exps / np.sum(exps, axis=0)
        
        
        def train_function(X, Y, epochs = 5000 ):
            
            for i in range (0, epochs):
                for j in range (0, len(data)):
                    
                    output = forwardPass (X)
                    changes = backwardPass(output, Y)
                    updateParams(changes)

In [201]:
carsNN = NeuralNetwork()
output = carsNN.forwardPass(data[0])
changes = carsNN.backwardPass(labels[0], output)
carsNN.updateParams(changes)

In [202]:
carsNN.classify(data[1])

array([97192.19599943])