# **Machine Learning - Regression**


# **Data Set Description:**
 Title  : **Yacht Hydrodynamics Data Set**

 Link   : https://archive.ics.uci.edu/ml/datasets/yacht+hydrodynamics

**Data Set Information:**

Prediction of residuary resistance of sailing yachts at the initial design stage is of a great value for evaluating the ships performance and for estimating the required propulsive power. Essential inputs include the basic hull dimensions and the boat velocity.

The Delft data set comprises 308 full-scale experiments, which were performed at the Delft Ship Hydromechanics Laboratory for that purpose.
These experiments include 22 different hull forms, derived from a parent form closely related to the Standfast designed by Frans Maas.

**Attribute Information:**

Variations concern hull geometry coefficients and the Froude number:

1. Longitudinal position of the center of buoyancy, adimensional.
2. Prismatic coefficient, adimensional.
3. Length-displacement ratio, adimensional.
4. Beam-draught ratio, adimensional.
5. Length-beam ratio, adimensional.
6. Froude number, adimensional.

The measured variable is the residuary resistance per unit weight of displacement:

7. Residuary resistance per unit weight of displacement, adimensional.


# **Importing Data set into Google Collab**

In [None]:
from google.colab import files
uploaded = files.upload()


Saving Yatch.xls to Yatch (1).xls


In [None]:
import io
import numpy as np
import pandas as pd
from scipy.stats import pearsonr
df2 = pd.read_excel(io.BytesIO(uploaded['Yatch.xls']))
res = []
X1 = X2 = X3 = X4 = X5 = X6 = X7 = []
for column in df2.columns:
    # Storing the rows of a column
    # into a temporary list
    li = df2[column].tolist()
    # appending the temporary list
    res.append(li)
X1 = res[0]
X2 = res[1]
X3 = res[2]
X4 = res[3]
X5 = res[4]
X6 = res[5]
Y  = res[6]
df2

Unnamed: 0,Longitudinal position,Prismatic coefficient,Length-displacement ratio,Beam-draught ratio,Length-beam ratio,Froude number,Residuary resistance
0,-2.3,0.568,4.78,3.99,3.17,0.125,0.11
1,-2.3,0.568,4.78,3.99,3.17,0.150,0.27
2,-2.3,0.568,4.78,3.99,3.17,0.175,0.47
3,-2.3,0.568,4.78,3.99,3.17,0.200,0.78
4,-2.3,0.568,4.78,3.99,3.17,0.225,1.18
...,...,...,...,...,...,...,...
303,-2.3,0.600,4.34,4.23,2.73,0.350,8.47
304,-2.3,0.600,4.34,4.23,2.73,0.375,12.27
305,-2.3,0.600,4.34,4.23,2.73,0.400,19.59
306,-2.3,0.600,4.34,4.23,2.73,0.425,30.48


# **Calculating all the required variables**

In [None]:
#Squares of X variables
x12= [i*i for i in X1]
x22= [i*i for i in X2]
x32= [i*i for i in X3]
x42= [i*i for i in X4]
x52= [i*i for i in X5]
x62= [i*i for i in X6]

#Mul with X and Y variables
x1y = [i*j for i,j in zip(X1,Y)]
x2y = [i*j for i,j in zip(X2,Y)]
x3y = [i*j for i,j in zip(X3,Y)]
x4y = [i*j for i,j in zip(X4,Y)]
x5y = [i*j for i,j in zip(X5,Y)]
x6y = [i*j for i,j in zip(X6,Y)]

#Mul with X variables
x1x2 = [i*j for i,j in zip(X1,X2)]
x1x3 = [i*j for i,j in zip(X1,X3)]
x1x4 = [i*j for i,j in zip(X1,X4)]
x1x5 = [i*j for i,j in zip(X1,X5)]
x1x6 = [i*j for i,j in zip(X1,X6)]

#x2x1 = [i*j for i,j in zip(X2,X1)]
x2x3 = [i*j for i,j in zip(X2,X3)]
x2x4 = [i*j for i,j in zip(X2,X4)]
x2x5 = [i*j for i,j in zip(X2,X5)]
x2x6 = [i*j for i,j in zip(X2,X6)]

x3x4 = [i*j for i,j in zip(X3,X4)]
x3x5 = [i*j for i,j in zip(X3,X5)]
x3x6 = [i*j for i,j in zip(X3,X6)]

x4x5 = [i*j for i,j in zip(X4,X5)]
x4x6 = [i*j for i,j in zip(X4,X6)]


x5x6 = [i*j for i,j in zip(X5,X6)]


# **Least Square Regression:**

In [None]:
n = len(X1)
m1 = (n*sum(x1y) - sum(X1)*sum(Y))/(n*sum(x12) - sum(X1)*sum(X1))
m2 = (n*sum(x2y) - sum(X2)*sum(Y))/(n*sum(x22) - sum(X2)*sum(X2))
m3 = (n*sum(x3y) - sum(X3)*sum(Y))/(n*sum(x32) - sum(X3)*sum(X3))
m4 = (n*sum(x4y) - sum(X4)*sum(Y))/(n*sum(x42) - sum(X4)*sum(X4))
m5 = (n*sum(x5y) - sum(X5)*sum(Y))/(n*sum(x52) - sum(X5)*sum(X5))
m6 = (n*sum(x6y) - sum(X6)*sum(Y))/(n*sum(x62) - sum(X6)*sum(X6))
c  = (sum(Y) - m1*sum(X1)- m2*sum(X2) - m3*sum(X3) - m4*sum(X4) - m5*sum(X5) - m6*sum(X6) )/n

print(" m1:", m1,"\n",
      "m2:", m2,"\n",
      "m3:", m3,"\n",
      "m4:", m4,"\n",
      "m5:", m5,"\n",
      "m6:", m6,"\n",
      "c:", c)

 m1: 0.19342278824117998 
 m2: -18.59689851373811 
 m3: -0.17777295324286058 
 m4: -0.3435111664280745 
 m5: -0.06268840609878333 
 m6: 121.66757242757322 
 c: -11.127523959316473


In [None]:
#Predicted Y:
pred_y =[]

for i in range (n):
  pred_y.append(m1*X1[i]+m2*X2[i]+m3*X3[i]+m4*X4[i]+m5*X5[i]+m6*X6[i]+c)
  
#R2 and Ajusted R2 values
SS_Residual = SS_total = 0
for i in range(n):
  SS_Residual+=((Y[i] - pred_y[i])**2)
  SS_total+=((Y[i] - np.mean(Y))**2)

R2 = 1- (SS_Residual/SS_total)
Adj_R2 = 1 - (1-R2)*((n-1)/(n-6-1)) 
print("R2 value:", R2, "Adj_R2 value:", Adj_R2)

R2 value: 0.6573437431428346 Adj_R2 value: 0.6505133858632899


# **Analytical Gradient Descent:**

In [None]:
a = np.array([[sum(x12),sum(x1x2),sum(x1x3),sum(x1x4),sum(x1x5),sum(x1x6),sum(X1)], 
              [sum(x1x2),sum(x22),sum(x2x3),sum(x2x4),sum(x2x5),sum(x2x6),sum(X2)],
              [sum(x1x3),sum(x2x3),sum(x32),sum(x3x4),sum(x3x5),sum(x3x6),sum(X3)],
              [sum(x1x4),sum(x2x4),sum(x3x4),sum(x42),sum(x4x5),sum(x4x6),sum(X4)],
              [sum(x1x5),sum(x2x5),sum(x3x5),sum(x4x5),sum(x52),sum(x5x6),sum(X5)],
              [sum(x1x6),sum(x2x6),sum(x3x6),sum(x4x6),sum(x5x6),sum(x62),sum(X6)],
              [sum(X1), sum(X2), sum(X3), sum(X4), sum(X5), sum(X6), n]])
b = np.array([sum(x1y), sum(x2y), sum(x3y), sum(x4y), sum(x5y), sum(x6y), sum(Y)])
Sol=np.linalg.solve(a,b)
m1 = Sol[0]
m2 = Sol[1]
m3 = Sol[2]
m4 = Sol[3]
m5 = Sol[4]
m6 = Sol[5]
c = Sol[6]
print(" m1:", m1,"\n",
      "m2:", m2,"\n",
      "m3:", m3,"\n",
      "m4:", m4,"\n",
      "m5:", m5,"\n",
      "m6:", m6,"\n",
      "c:", c)

 m1: 0.1938443362247513 
 m2: -6.419375927745614 
 m3: 4.232998632551539 
 m4: -1.765694811473923 
 m5: -4.516431772721753 
 m6: 121.66757242757215 
 c: -19.236660788915305


In [None]:
#Predicted Y:
pred_y =[]

for i in range (n):
  pred_y.append(m1*X1[i]+m2*X2[i]+m3*X3[i]+m4*X4[i]+m5*X5[i]+m6*X6[i]+c)
  
#R2 and Ajusted R2 values
SS_Residual = SS_total = 0
for i in range(n):
  SS_Residual+=((Y[i] - pred_y[i])**2)
  SS_total+=((Y[i] - np.mean(Y))**2)

R2 = 1- (SS_Residual/SS_total)
Adj_R2 = 1 - (1-R2)*((n-1)/(n-6-1)) 
print("R2 value:", R2, "Adj_R2 value:", Adj_R2)

R2 value: 0.6575638322504073 Adj_R2 value: 0.650737862129153


# **Correlation**

In [None]:
from scipy.stats import pearsonr
r1,_ = pearsonr(X1,Y)
r2,_ = pearsonr(X2,Y)
r3,_ = pearsonr(X3,Y)
r4,_ = pearsonr(X4,Y)
r5,_ = pearsonr(X5,Y)
r6,_ = pearsonr(X6,Y)

m1 = r1*(sum(Y)/sum(X1))
m2 = r2*(sum(Y)/sum(X2))
m3 = r3*(sum(Y)/sum(X3))
m4 = r4*(sum(Y)/sum(X4))
m5 = r5*(sum(Y)/sum(X5))
m6 = r6*(sum(Y)/sum(X6))
c= (sum(Y) - m1*sum(X1)- m2*sum(X2) - m3*sum(X3) - m4*sum(X4) - m5*sum(X5) - m6*sum(X6) )/n

print("m1:", m1,"\n",
      "m2:", m2,"\n",
      "m3:", m3,"\n",
      "m4:", m4,"\n",
      "m5:", m5,"\n",
      "m6:", m6,"\n",
      "c:", c)

m1: -0.0850716290081626 
 m2: -0.5315082276991635 
 m3: -0.006503636512838151 
 m4: -0.033114101461447276 
 m5: -0.0033561836741930135 
 m6: 29.5728946463301 
 c: 2.26263831596289


In [None]:
#Predicted Y:
pred_y =[]

for i in range (n):
  pred_y.append(m1*X1[i]+m2*X2[i]+m3*X3[i]+m4*X4[i]+m5*X5[i]+m6*X6[i]+c)
  
#R2 and Ajusted R2 values
SS_Residual = SS_total = 0
for i in range(n):
  SS_Residual+=((Y[i] - pred_y[i])**2)
  SS_total+=((Y[i] - np.mean(Y))**2)

R2 = 1- (SS_Residual/SS_total)
Adj_R2 = 1 - (1-R2)*((n-1)/(n-6-1)) 
print("R2 value:", R2, "Adj_R2 value:", Adj_R2)

R2 value: 0.2799234475650735 Adj_R2 value: 0.265569762134477


**Inference:**

**LSR:**

R2     = 0.6573437431428346

Adj_R2 = 0.6505133858632899

**GD:**

R2     =  0.6575638322504073

Adj_R2 =  0.650737862129153

**Correlation:**

R2     =  0.2799234475650735

Adj_R2 =  0.265569762134477

**Hence, GD > LSR > Correlation**,The dataset is best suited for GD

