# Written Assignment 07
*author: Logan Reine*

## Introduction

### `MLequations_v3.ipynb` is a machine learning library of my own making.  All functions and equations used to calculate virtually all answers are defined in the MLequations file.  It will be submitted with this assignment as a `.ipynb` file and a pdf file.

## Headings

In [139]:
%run MLequations_v3.ipynb
f = lambda x: print(f"{x}")

## Data

In [140]:
dose = pd.read_csv("dose.csv")
heat_load = pd.read_csv("heating-load.csv")
k_trick = pd.read_csv("kernel-trick.csv")
oxy_consume = pd.read_csv("oxygen-consumption.csv")
svm_1 = pd.read_csv("svm-1.csv")
svm_2 = pd.read_csv("svm-2.csv")

# 1 Multivariate Linear Regression Model

A multivariate linear regression model has been built to predict the heating load in a residential building based on a set of descriptive features describing the characteristics of the building. Heating load is the amount of heat energy required to keep a building at a specified temperature, usually 65 degrees Fahrenheit, during the winter regardless of outside temperature. The descriptive features used are the overall surface area of the building, the height of the building, the area of the building’s roof, and the percentage of wall area in the building that is glazed. This kind of model would be useful to architects or engineers when designing a new building. The trained model is:

    Heating Load​ = -26.030 + 0.0497 × Surface Area + 4.942 × Height - 0.090 × Roof Area + 20.523 × Glazing Area​

In [141]:
heat_load

Unnamed: 0,ID,Surface Area,Height,Roof Area,Glazing Area
0,1,784.0,3.5,220.5,0.25
1,2,710.5,3.0,210.5,0.1
2,3,563.5,7.0,122.5,0.4
3,4,637.0,6.0,147.0,0.6


In [142]:
w = [-26.030, 0.0497, 4.942, -0.090, 20.523]

for i in range(len(heat_load)):
    d = heat_load.iloc[i, 1:].tolist()
    print(f"\tQuery {i + 1} Prediction: {multi_reg(w,d):.4f}")

	Query 1 Prediction: 15.5176
	Query 2 Prediction: 7.2151
	Query 3 Prediction: 33.7542
	Query 4 Prediction: 34.3647


# 2 Another Multivariate Linear Regression Model

You are asked to build a model that predicts the amount of oxygen that an astronaut consumes when performing five minutes of intense physical work. The descriptive features for the model will be the age of the astronaut and their average heart rate throughout the work. The regression model is:

    Oxycon​=w[0] + w[1]⋅Age + w[2]⋅Heart Rate​

The table below shows a historical dataset that has been collected for this task.

In [143]:
oxy_consume

Unnamed: 0,ID,Oxycon,Age,Heart Rate
0,1,37.99,41,138
1,2,47.34,42,153
2,3,44.38,37,151
3,4,28.17,46,133
4,5,27.07,48,126
5,6,37.85,44,145
6,7,44.72,43,158
7,8,36.42,46,143
8,9,31.21,37,138
9,10,54.85,38,158


### a. Assuming that the current weights in a multivariate linear regression model are w[0]= -59.50, w[1] = -0.15, and w[2] = 0.60, make a prediction for each training instance using this model.

In [144]:
w = [-59.50, -0.15,  0.60]

for i in range(len(oxy_consume)):
    d = oxy_consume.iloc[i, 2:].tolist()
    print(f"\tOxycon {i + 1} Prediction: {multi_reg(w,d):.2f}")

	Oxycon 1 Prediction: 17.15
	Oxycon 2 Prediction: 26.00
	Oxycon 3 Prediction: 25.55
	Oxycon 4 Prediction: 13.40
	Oxycon 5 Prediction: 8.90
	Oxycon 6 Prediction: 20.90
	Oxycon 7 Prediction: 28.85
	Oxycon 8 Prediction: 19.40
	Oxycon 9 Prediction: 17.75
	Oxycon 10 Prediction: 29.60
	Oxycon 11 Prediction: 19.85
	Oxycon 12 Prediction: 16.85


### b. Calculate the sum of squared errors for the set of predictions generated in part (a).

In [145]:
t = oxy_consume['Oxycon'].tolist()
w = [-59.50, -0.15,  0.60]

errors = squared_error_sum(t, w, oxy_consume, 2)

print(f"\tSum of squared errors: {errors:.4f}")

	Sum of squared errors: 2017.5932


### c. Assuming a learning rate of 0.000002, calculate the weights at the next iteration of the gradient descent algorithm.

In [146]:
t = oxy_consume['Oxycon'].tolist()
age = oxy_consume['Age'].tolist()
rate = oxy_consume['Heart Rate'].tolist()
w = [-59.50, -0.15,  0.60]
alpha = 0.000002


print(f"\tNew Weight w[0]: {w[0] + (alpha * error_sum(t, w, oxy_consume, 2))}")
print(f"\tNew Weight w[1]: {w[1] + (alpha * error_delta(t, w, age, oxy_consume, 2))}")
print(f"\tNew Weight w[2]: {w[2] + (alpha * error_delta(t, w, rate, oxy_consume, 2))}")

	New Weight w[0]: -59.49956706
	New Weight w[1]: -0.13174326
	New Weight w[2]: 0.66254392


### d. Calculate the sum of squared errors for a set of predictions generated using the new set of weights calculated in part (c).

In [147]:
w = [-59.49956706, -0.13174326, 0.66254392]

errors = squared_error_sum(t, w, oxy_consume, 2)

print(f"\tSum of squared errors with new weights: {errors:.4f}")

	Sum of squared errors with new weights: 468.2768


# Logistic Regression Model

The effects that can occur when different drugs are taken together can be difficult for doctors to predict. Machine learning models can be built to help predict optimal dosages of drugs so as to achieve a medical practitioner’s goals.26 In the following figure, the image on the left shows a scatter plot of a dataset used to train a model to distinguish between dosages of two drugs that cause a dangerous interaction and those that cause a safe interaction. There are just two continuous features in this dataset, DOSE1 and DOSE2 (both normalized to the range (−1,1)(−1,1) using range normalization), and two target levels, dangerous and safe. In the scatter plot, DOSE1 is shown on the horizontal axis, DOSE2 is shown on the vertical axis, and the shapes of the points represent the target level—crosses represent dangerous interactions and triangles represent safe interactions.

In the preceding figure, the image on the right shows a simple linear logistic regression model trained to perform this task. This model is:

P(TYPE = dangerous) = Logistic(0.6168 + 2.7320 ×× DOSE1 + 2.4809 ×× DOSE2)

Plainly, this model is not performing well.

### a. Would the similarity-based, information-based, or probability-based predictive modeling approaches already covered in this book be likely to do a better job of learning this model than the simple linear regression model?

    similarity-based

### b. A simple approach to adapting a logistic regression model to learn this type of decision boundary is to introduce a set of basis functions that will allow a non-linear decision boundary to be learned. In this case, a set of basis functions that generate a cubic decision boundary will work well. An appropriate set of basis functions is as follows:

ϕ0​(⟨ DOSE1 , DOSE2 ⟩)=1  
ϕ1​(⟨ DOSE1 ,DOSE2 ⟩)= DOSE1   
ϕ2​(⟨ DOSE1 , DOSE2 ⟩)= DOSE2   
ϕ3​(⟨ DOSE1 , DOSE2 ⟩)= DOSE1^2  
ϕ4​(⟨ DOSE1 , DOSE2 ⟩)=DOSE2^2  
ϕ5​(⟨ DOSE1 , DOSE2 ⟩)=DOSE1^3  
ϕ6​(⟨ DOSE1 , DOSE2 ⟩)=DOSE2^3  
ϕ7​(⟨ DOSE1 , DOSE2 ⟩)= DOSE1 × DOSE2 ​

Training a logistic regression model using this set of basis functions leads to the following model:

P( TYPE = dangerous )=  
Logistic (  
−0.848×ϕ0​(⟨ DOSE1, DOSE2 ⟩)  
+1.545×ϕ1​(⟨ DOSE1, DOSE2 ⟩)  
−1.942×ϕ2​(⟨ DOSE1, DOSE2 ⟩)  
+1.973×ϕ3​(⟨ DOSE1, DOSE2 ⟩)  
+2.495×ϕ4​(⟨ DOSE1, DOSE2 ⟩)  
+0.104×ϕ5​(⟨ DOSE1, DOSE2 ⟩)  
+0.095×ϕ6​(⟨ DOSE1, DOSE2 ⟩)  
+3.009×ϕ7​(⟨ DOSE1, DOSE2 ⟩)  
)​  

In [148]:
dose

Unnamed: 0,ID,DOSE1,DOSE2
0,1,0.5,0.75
1,2,0.1,0.75
2,3,-0.47,-0.39
3,4,-0.47,0.18


In [149]:
w = [-0.848, 1.545, -1.942, 1.973, 2.495, 0.104, 0.095, 3.009]

for i in range(len(dose)):
    print(f"\tQuery {i + 1} predictions: {logistic(w, i, dose):.4f}")

	Query 1 predictions: 0.8244
	Query 2 predictions: 0.3868
	Query 3 predictions: 0.6303
	Query 4 predictions: 0.1582


# 4 Support Vector Machines

A support vector machine has been built to predict whether a patient is at risk of cardiovascular disease. In the dataset used to train the model, there are two target levels —- high risk (the positive level, +1) or low risk (the negative level, -1) —- and three descriptive features —- AGE, BMI, and BLOOD PRESSURE. The support vectors in the trained model are shown in the table below (all descriptive feature values have been standardized).

In [150]:
svm_1

Unnamed: 0,AGE,BMI,BLOOD PRESSURE,RISK
0,-0.4549,0.0095,0.2203,low risk
1,-0.2843,-0.5253,0.3668,low risk
2,0.3729,0.0904,-1.0836,high risk
3,0.558,0.2217,0.2115,high risk


In [151]:
svm_2

Unnamed: 0,ID,AGE,BMI,BLOOD PRESSURE
0,1,-0.8945,-0.3459,0.552
1,2,0.4571,0.4932,-0.4768
2,3,-0.3825,-0.6653,0.2855
3,4,0.7458,0.1253,-0.7986


In [152]:
fsvm_1 = format_frame(svm_1,'high risk')

alpha = [1.6811, 0.2384, 0.2055, 1.7139]
w0 = -0.0216

for i in range(len(svm_1)):
    
    input_vector = svm_2.iloc[i, 1:].tolist()
    result = svm(input_vector, fsvm_1, alpha, w0)
    
    print(f"\tQuery {i + 1} prediction: {result:.4f}")

	Query 1 prediction: -2.1063
	Query 2 prediction: 1.1684
	Query 3 prediction: -1.2286
	Query 4 prediction: 1.6225


# 5 Efficient Implementation of the SVM Approach

The use of the kernel trick is key in writing efficient implementations of the support vector machine approach to predictive modelling. The kernel trick is based on the fact that the result of a kernel function applied to a support vector and a query instance is equivalent to the result of calculating the dot product between the support vector and the query instance after a specific set of basis functions have been applied to both —- in other words, kernel(d, q) = ϕ(d) ⋅ ϕ(q)

### a. Using the support vector ⟨d[1], d[2]⟩ and the query instance ⟨q[1], q[2]⟩ as examples, show that applying a polynomial kernel with p=2, kernel(d,q) = (d⋅q+1)^2, is equivalent to calculating the dot product of the support vector and query instance after applying the following set of basis functions:  

ϕ0​(⟨ d[1], d[2] ⟩) = d[1]^2  
ϕ1​(⟨ d[1], d[2] ⟩) = d[2]^2  
ϕ2​(⟨ d[1], d[2] ⟩) = sqrt(2) × d[1] × d[2]  
ϕ3​(⟨ d[1], d[2] ⟩) = sqrt(2) × d[1]  
ϕ4(⟨ d[1], d[2] ⟩) = sqrt(2) x d[2]  
ϕ5(⟨ d[1], d[2] ⟩) = 1

    After expanding and reducing both the polynomial kernel formula and the product of the support vector and query instance after applying the following set of basis functions, the result is identical.  The answers in part d showcase how the different methods reduce to the same result.

### b. A support vector machine model has been trained to distinguish between dosages of two drugs that cause a dangerous interaction and those that interact safely. This model uses just two continuous features, DOSE1 and DOSE2, and two target levels, dangerous (the positive level, +1 ) and safe (the negative level, -1). The support vectors in the trained model are shown in the following table.

In [153]:
k_trick

Unnamed: 0,DOSE1,DOSE2,CLASS
0,0.2351,0.4016,1
1,-0.1764,-0.1916,1
2,0.3057,-0.9394,-1
3,0.559,0.6353,-1
4,-0.66,-0.1175,-1


In the trained model the value of w0​ is 0.3074, and the values of the α parameters are ⟨7.1655, 6.9060, 2.0033, 6.1144, 5.9538⟩.

Using the version of the support vector machine prediction model that uses basis functions (see the Equation below) with the basis functions given in Part (a), calculate the output of the model for a query instance with DOSE1 = 0.90 and DOSE2 = −0.90.

Mα,ϕ,w0(q) = ∑(ti × α[i] × (ϕ(di) ⋅ ϕ(q)) + w0)

ϕ0​(⟨ d[1], d[2] ⟩) = d[1]^2  
ϕ1​(⟨ d[1], d[2] ⟩) = d[2]^2  
ϕ2​(⟨ d[1], d[2] ⟩) = sqrt(2) × d[1] × d[2]  
ϕ3​(⟨ d[1], d[2] ⟩) = sqrt(2) × d[1]  
ϕ4(⟨ d[1], d[2] ⟩) = sqrt(2) x d[2]  
ϕ5(⟨ d[1], d[2] ⟩) = 1  

In [154]:
w0 = 0.3074
alpha = [7.1655, 6.9060, 2.0033, 6.1144, 5.9538]
q = [0.90, -0.90]
t = k_trick['CLASS'].tolist()

result = svm_basis(t, q, k_trick, alpha, w0, 0, 2)

print(f"\tPrediction using basis functions: {result:.4f}")

	Prediction using basis functions: -3.0194


### c. Using the version of the support vector machine prediction model that uses a kernel function (see the Equation below) with the polynomial kernel function, calculate the output of the model for a query instance with DOSE1 = 0.22 and DOSE2 = 0.16.

Mα,kernel,w0​​(q)=∑​(ti ​× α[i] × kernel(di​,q) + w0​)

In [155]:
q = [0.22, 0.16]

result = svm_kernel(t, q, k_trick, alpha, w0, 0, 2)

print(f"\tPrediction using kernel trick: {result:.4f}")

	Prediction using kernel trick: 1.5064


### d. Verify that the answers calculated in Parts (b) and (c) of this question would have been the same if the alternative approach (basis functions or the polynomial kernel function) had been used in each case.

In [156]:
q = [0.90, -0.90]
result = svm_kernel(t, q, k_trick, alpha, w0, 0, 2)
print(f"\tPrediction using kernel trick part b: {result:.4f}")

q = [0.22, 0.16]
result = svm_basis(t, q, k_trick, alpha, w0, 0, 2)
print(f"\tPrediction using basis functions part c: {result:.4f}")

	Prediction using kernel trick part b: -3.9188
	Prediction using basis functions part c: 7.7295
