## Dataset

Property Rates

|#|City|Town|Size|Price|
|-|-|-|-|-|
|1|X|A|1|15|
|2|X|B|1|10|
|3|X|C|1|20|
|4|Y|A|2|30|
|5|Y|C|3|60|
|6|Y|B|4|40|

In [122]:
dataset = [
  ['City', 'Town', 'Size', 'Price'],
  ['X', 'A', 1, 15],
  ['X', 'B', 1, 10],
  ['X', 'C', 1, 25],
  ['Y', 'A', 2, 30],
  ['Y', 'C', 3, 60],
  ['Y', 'B', 4, 40],
]
data, columns = dataset[1:], dataset[0]

In [123]:
columns

['City', 'Town', 'Size', 'Price']

In [124]:
data

[['X', 'A', 1, 15],
 ['X', 'B', 1, 10],
 ['X', 'C', 1, 25],
 ['Y', 'A', 2, 30],
 ['Y', 'C', 3, 60],
 ['Y', 'B', 4, 40]]

## Label-Encoding

In [125]:
# encode single column
def encode_column(data, column_idx, column_name='E'):
  counter = 0
  codes = {}

  for row in data:
    if row[column_idx] not in codes:
      counter+=1
      codes[row[column_idx]] = counter

  for row_idx in range(len(data)):
    val = data[row_idx][column_idx]
    del data[row_idx][column_idx]

    for column in codes.keys():
      data[row_idx].append(1 if val == column else 0)
    
  extended_columns = [f'{column_name}_{v}' for v in  codes.keys()]

  return extended_columns

In [126]:
# encode many columns
def label_encode(dataset, target_columns=[]):
  data, columns = dataset[1:], dataset[0]
  
  for tc in target_columns:
    col_idx = -1
    for col_idx in range(len(columns)):
      if columns[col_idx] == tc:
        break

    if col_idx == -1: raise Exception(f"Target Column {tc} not found!")
    
    columns += encode_column(data, col_idx, columns[col_idx])
    del columns[col_idx]

  return [columns] + data

In [127]:
dataset = label_encode(dataset, ['City', 'Town'])

In [128]:
dataset

[['Size', 'Price', 'City_X', 'City_Y', 'Town_A', 'Town_B', 'Town_C'],
 [1, 15, 1, 0, 1, 0, 0],
 [1, 10, 1, 0, 0, 1, 0],
 [1, 25, 1, 0, 0, 0, 1],
 [2, 30, 0, 1, 1, 0, 0],
 [3, 60, 0, 1, 0, 0, 1],
 [4, 40, 0, 1, 0, 1, 0]]

## 1. Simple Linear Regression
In simple linear regression, we model the relationship between two variables by fitting a straight line to the data. The `equation of the line` is given by:

$${ y = \beta_0 + \beta_1 x }$$

- ${ y }$ is the dependent variable (target).
- ${ x }$ is the independent variable (predictor).
- ${ \beta_0 }$ is the y-intercept (the value of ${ y }$ when ${ x = 0 }$).
- ${ \beta_1 }$ is the slope of the line (the change in ${ y }$ for a one-unit change in ${ x }$).

In [136]:
size  = [row[0] for row in data]
price = [row[1] for row in data]

print("size  =",  size)
print("price =", price)

size  = [1, 1, 1, 2, 3, 4]
price = [15, 10, 25, 30, 60, 40]


In [139]:
x = size
y = price

In [137]:
# Initialize parameters

b0 = 0  # Intercept
b1 = 0  # Slope

learning_rate = 0.01
iterations = 1000

In [138]:
print("Before: b0 =", b0, "and b1 =", b1)

Before: b0 = 0 and b1 = 0


In [140]:
# Train

# Calculate the prediction
for _ in range(iterations): # epochs
    m = len(y)

    for i in range(m):
        predicted = b0 + b1 * x[i]

        # Calculate the error
        error = predicted - y[i]

        # Update the weights
        b0 -= learning_rate * error
        b1 -= learning_rate * error * x[i]

In [141]:
print("After: b0 =", b0, "and b1 =", b1)

After: b0 = 8.414609790730783 and b1 = 10.669631646501303


In [142]:
# predict
print(f"input={x[1]},\npredicted={b0 + b1 * x[1]}\nactual = {y[1]}\nerror = {(b0 + b1 * x[1]) - y[1]}\nerror %age = {round(abs(((b0 + b1 * x[1]) - y[1]) / y[1]) * 100 ,2)}% ")

input=1,
predicted=19.084241437232087
actual = 10
error = 9.084241437232087
error %age = 90.84% 


In [143]:
def mse(X, y, b0, b1): # mean squared error (cost function)
  c = len(y)

  total_error = 0
  for i in range(c):
    total_error += (
      # result
      b0 +
      b1 * X[i]

      # minus actual
      - y[i]
    ) ** 2 # raise to two
  return total_error / (2 * c)


In [144]:
mse(x, y, b0, b1)

53.3798408466173

## 2. Multiple Linear Regression
In multiple linear regression, we model the relationship between one dependent variable and `multiple independent variables`. The equation is given by:

$${ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \ldots + \beta_n x_n }$$

- ${ y }$ is the dependent variable.
- ${ x_1, x_2, \ldots, x_n }$ are the independent variables.
- ${ \beta_0, \beta_1, \beta_2, \ldots, \beta_n }$ are the coefficients.


In [149]:
import numpy as np

In [162]:
# input and output
X = np.array([row[:1] + row[2:] for row in data])
Y = np.array([row[1] for row in data])

In [163]:
X

array([[1, 1, 0, 1, 0, 0],
       [1, 1, 0, 0, 1, 0],
       [1, 1, 0, 0, 0, 1],
       [2, 0, 1, 1, 0, 0],
       [3, 0, 1, 0, 0, 1],
       [4, 0, 1, 0, 1, 0]])

In [164]:
Y

array([15, 10, 25, 30, 60, 40])

In [165]:
weights = np.full(X.shape[1:], 1.0)
bias = 0.0

learning_rate = 0.01
iterations = 1000

In [166]:
c = len(Y)
for _ in range(iterations): # epochs
  for i in range(c):
    predicted = np.dot(X[i], weights)
    error = predicted - Y[i]
    weights -= learning_rate * error * X[i]
  print("iteration =", _, " weights = ", weights)

iteration = 0  weights =  [4.39277934 1.403848   2.00213722 1.37072304 1.30084272 1.73441947]
iteration = 1  weights =  [6.68637625 1.68220709 2.69416185 1.6211149  1.43953078 2.31572326]
iteration = 2  weights =  [8.23595879 1.87536345 3.1760914  1.78997347 1.46984066 2.79164071]
iteration = 3  weights =  [9.28194614 2.01059838 3.51563173 1.90356612 1.42814633 3.19451765]
iteration = 4  weights =  [9.98708171 2.10638954 3.75861629 1.9796779  1.33904224 3.54628569]
iteration = 5  weights =  [10.46152858  2.17525481  3.93607269  2.03035184  1.21914958  3.86182608]
iteration = 6  weights =  [10.77985726  2.22567741  4.06900614  2.06374393  1.07969285  4.15124677]
iteration = 7  weights =  [10.99254489  2.26340901  4.17163754  2.0853788   0.92824393  4.42142382]
iteration = 8  weights =  [11.13375939  2.29235194  4.2535953   2.09899968  0.76990271  4.67704484]
iteration = 9  weights =  [11.22662866  2.31515627  4.32139916  2.10714386  0.60809625  4.92131532]
iteration = 10  weights =  [11

In [172]:
# test
print(f"input={X[1]},\npredicted={np.dot(weights, X[1])}\nactual = {Y[1]}\nerror = {np.dot(weights, X[1]) - Y[1]}\nerror %age = {round(abs((np.dot(weights, X[1]) - Y[1]) / Y[1]) * 100 ,2)}% ")

input=[1 1 0 0 1 0],
predicted=7.136794265700643
actual = 10
error = -2.863205734299357
error %age = 28.63% 
