Deep Learning with NO Libraries. Part 1

You might have encountered articles/educators who told you that Deep Learning is a black box (specifically the hidden layers). The notion that you can't understand something when you can see it work so efficiently and, at times, with jaw-dropping accuracy is just an insult to the brilliant minds that have built such algorithms.

I want to demystify this notion by building Deep Learning algorithms (Neural Networks) from scratch. All you need is a basic understanding of the Python language and high school Mathematics. Let's go straight to implementation.

Neural Network work by predicting, comparing and learning.

Here we have one input data and one output. All we are doing is taking the dot product of the input with the weights.
Well, what are weights? Thinks of it as a knob that can be tweaked to get the desired result. Just like you turn the knob of Volumn to get that perfect sound that isn't too loud or inaudible. We will come across weights in the future.

The result is what we have as a prediction. This a very naive but easy to understandable Neural Network. But it doesn't have the ability to compare and learn, which we will be seeing in the future.

In [None]:
#The Network

weight = 0.1
def neural_network(input,weights):
  prediction = input * weight
  return prediction

# How we use the network to predict something

number_of_toes = [8.5 , 9.5 , 10 , 9]
input = number_of_toes[2]
pred = neural_network(input , weight)
print(pred)


1.0


ùêÉùêûùêûùê© ùêãùêûùêöùê´ùêßùê¢ùêßùê† ùê∞ùê¢ùê≠ùê° ùêçùêé ùêãùê¢ùêõùê´ùêöùê´ùê¢ùêûùê¨: ùêèùêöùê´ùê≠ 2

In the previous post, we made an elementary neural network that would just make a prediction using input & weights.

In this post, we will use multiple inputs & weights to make the prediction. Think of the inputs as one row in the data frame with one target. We have randomly initialize the weights to find that one target. Everything is almost the same from the previous post, except we use a new function ùò∏_ùò¥ùò∂ùòÆ() to perform the dot product. The element-wise multiplication happens in the ùò∏_ùò¥ùò∂ùòÆ(ùò¢,ùò£).

ùêçùêéùêìùêÑ: This is not a full-on neural network. It is a very easy way to look at the deep learning framework.

ùêìùê´ùêöùê¢ùêßùê¢ùêßùê† ùê®ùêß ùê¶ùêÆùê•ùê≠ùê¢ùê©ùê•ùêû ùêàùêßùê©ùêÆùê≠ùê¨:

In [None]:
def  w_sum(a , b):

  '''
  Weighted sum of input and weights .
  retruns the dot product of two vectors.
  The assert keyword is used when debugging code.
  The assert keyword lets you test if a condition
  in your code returns True, if not,
  the program will raise an AssertionError.
  '''
 # again same thing is happenning
 # input * weights (a*b) just multiple
  assert(len(a) == len(b))
  output = 0
  for i in range(len(a)): # 3 inputs
    output = output + (a[i] * b[i])
  return output


''' WEIGHTS '''

weights = [0.1,0.2,0]


''' INPUTS '''

x1 = [8.5 , 9.5 , 9.9 , 9.0]
x2 = [0.65 , 0.8 , 0.8 , 0.9]
x3 = [1.5 , 1.2 , 0.5 , 1.0]


def neural_network(input , weights):
  pred = w_sum(input,weights)
  return pred


input = [x1[0] , x2[0] , x3[0]]

''' DRIVER CODE '''

pred = neural_network(input , weights)
print(pred)

0.9800000000000001


ùêÉùêûùêûùê© ùêãùêûùêöùê´ùêßùê¢ùêßùê† ùê∞ùê¢ùê≠ùê° ùêçùêé ùê•ùê¢ùêõùê´ùêöùê´ùê≤: ùêèùêöùê´ùê≠ 3

I know I said no library, but it's but Numpy. It's no biggy. It's just the Numpy implementation of the previous code.

ùêçùêÆùê¶ùê©ùê≤ ùêàùê¶ùê©ùê•ùêûùê¶ùêûùêßùê≠ùêöùê≠ùê¢ùê®ùêß

In [None]:
import numpy as np

weights = np.array([0.1 , 0.2 , 0])

def neural_network(inputs , weights):
  pred = np.dot(input, weights)
  return pred

''' INPUTS '''

x1 = [8.5 , 9.5 , 9.9 , 9.0]
x2 = [0.65 , 0.8 , 0.8 , 0.9]
x3 = [1.5 , 1.2 , 0.5 , 1.0]

input = [x1[0] , x2[0] , x3[0]]


''' DRIVER CODE '''

pred = neural_network(input , weights)
print(pred)

0.9800000000000001


ùêÉùêûùêûùê© ùêãùêûùêöùê´ùêßùê¢ùêßùê† ùê∞ùê¢ùê≠ùê° ùêçùêé ùê•ùê¢ùêõùê´ùêöùê´ùê¢ùêûùê¨: ùêèùêöùê´ùê≠ 4

Let's dive straight into predicting multiple outputs from multiple inputs. (Now that looks more like a neural network.)

Consider this architecture from either the standpoint of three weights coming out of each input node or three weights entering each output node. I think the latter is much better right now. Consider this neural network as three separate dot products, or three separate weighted input sums. Each output node weighs the inputs individually before making a prediction.

There we have a new function, ùò∑ùò¶ùò§ùòµ_ùòÆùò¢ùòµ_ùòÆùò∂ùò≠() function. This function uses the ùò∏_ùò¥ùò∂ùòÆ() function to forecast after iterating through each row of weights (each row is a vector). It actually executes three weighted sums in quick succession before storing the results in an output vector.

Try the code out yourself. üòÑ
ùêéùêîùêìùêèùêîùêì : [0.72, 0.9800000000000001, 0.9100000000000001]

In [None]:
def  w_sum(a , b):

  '''
  Weighted sum of input and weights .
  retruns the dot product of two vectors.
  The assert keyword is used when debugging code.
  The assert keyword lets you test if a condition
  in your code returns True, if not,
  the program will raise an AssertionError.

  Taking weighted sum for one output / one row of the matrix at a time.
  Elementwise multiplication of [8.5 , 0.65 , 1.2] & [0.1 , 0.1 , -0.3]
  i.e first row of the matrix.
  Do the same for each

  returns a number that is output for one node
  '''
 # again same thing is happenning
 # input * weights (a*b) just multiply


  assert(len(a) == len(b))
  output = 0
  for i in range(len(a)): # 3 inputs
    output = output + (a[i] * b[i])
  return output


'''

Multiply the vector of  input [8.5 , 9.5 , 9.9 , 9.0] with the matrix of weights
This function iterates throught each row of weights (each row is a vector)

returns a list of all the output ; containing each node

'''

def vect_mat_mul(vector , matrix):
  assert((len(vector) == len(matrix)))
  output = [0 , 0 , 0]
  for i in range(len(vector)):
    #again same thing inputs * weights just vector into matrix
    # we get a vector of 1*3 in return since 1*3  x 3*3
    output[i] = w_sum(vector , matrix[i])
  return output

def nerual_network(inputs, weights):
  pred = vect_mat_mul(input , weights)
  pred

''' WEIGHTS '''

          #X1    X2     X3
weights =[[0.1, 0.1, -0.3], #Y1
          [0.1, 0.2,  0.0],  #Y2
          [0.1, 1.3,  0.1]]  #Y3


''' INPUTS '''

x1 = [8.5 , 9.5 , 9.9 , 9.0]
x2 = [0.65 , 0.8 , 0.8 , 0.9]
x3 = [1.5 , 1.2 , 0.5 , 1.0]

input = [x1[2] , x2[1] , x3[0]]
pred = neural_network(input , weights)
print(pred)

[ 1.22  3.1  -2.82]


ùêÉùêûùêûùê© ùêãùêûùêöùê´ùêßùê¢ùêßùê† ùê∞ùê¢ùê≠ùê° ùêçùêé ùêãùê¢ùêõùê´ùêöùê´ùê¢ùêûùê¨!!! ùêèùêöùê´ùê≠ 5

So far we have made the prediction. But how do we compare how good the prediction is? Is there a measure for it? Yes, the measure for how good or bad your prediction is, is dictated by the ùò¶ùò≥ùò≥ùò∞ùò≥. Basically, we subtract the goal from the actual prediction to find the error.

As seen in the code below, we squared the whole thing. There are two main reasons to do so.
1. The larger values are penalised more than the smaller ones.
2. The function is continuous and easy to differentiate as we will need it for gradient descent, which we will cover in a future post. (Although we are going to take a different approach for gradient descent. Just keep in mind those points.)

hashtag#deeplearning hashtag#datascience hashtag#ai hashtag#machinelearning

ùêÇùê®ùê¶ùê©ùêöùê´ùêû: ùêÉùê®ùêûùê¨ ùê®ùêÆùê´ ùêßùêûùê≠ùê∞ùê®ùê´ùê§ ùê¶ùêöùê§ùêû ùê†ùê®ùê®ùêù ùê©ùê´ùêûùêùùê¢ùêúùê≠ùê¢ùê®ùêßùê¨?

In [None]:
weight = 0.5
input = 0.5
goal_pred = 0.8

pred = input * weight

error = (pred - goal_pred) ** 2
print(error)

0.30250000000000005


ùêÉùêûùêûùê© ùêãùêûùêöùê´ùêßùê¢ùêßùê† ùê∞ùê¢ùê≠ùê° ùêçùêé ùêãùê¢ùêõùê´ùêöùê´ùê¢ùêûùê¨!!! ùêèùêÄùêëùêì 6

Let's look at a very naive way to update your weights by manually checking the values of the weights that give less error. The method is simple, after making a prediction, you predict two more times, once with a slightly higher weight and again with a slightly lower weight. You then move weight depending on which direction gave a smaller error. Repeating this enough times eventually reduces error to 0.

ùêéùêîùêìùêèùêîùêì:

Error:0.30250000000000005 Prediction:0.25

Error:0.3019502500000001 Prediction:0.2505

Error:0.30140100000000003 Prediction:0.251

...

Error:1.000000000065505e-06 Prediction:0.7989999999999673

Error:2.5000000003280753e-07 Prediction:0.7994999999999672

Error:1.0799505792475652e-27 Prediction:0.7999999999999672


In [None]:
weight = 0.5
input = 0.5
goal_pred = 0.8

step_amount =0.001   # learning rate

prediction_list = []
error_list = []

for i in range(1101):
  prediction = input * weight
  error = (pred - goal_pred) ** 2

  print("Error " +str(error) + "prediction" +str(prediction))
  prediction_list.append(prediction)
  error_list.append(error)

  up_prediction = input * (weight + step_amount)
  up_error = (goal_pred - up_prediction) ** 2

  down_prediction = input *(weight-step_amount)
  down_error = (goal_pred - down_prediction) ** 2

  if (down_error < up_error):
    weight = weight - step_amount

  if(down_error > up_error):
    weight = weight + step_amount

print(' Prediction is ' , prediction_list)
print('     ')
print(' Error is ' ,error_list)


Error 0.30250000000000005prediction0.25
Error 0.30250000000000005prediction0.2505
Error 0.30250000000000005prediction0.251
Error 0.30250000000000005prediction0.2515
Error 0.30250000000000005prediction0.252
Error 0.30250000000000005prediction0.2525
Error 0.30250000000000005prediction0.253
Error 0.30250000000000005prediction0.2535
Error 0.30250000000000005prediction0.254
Error 0.30250000000000005prediction0.2545
Error 0.30250000000000005prediction0.255
Error 0.30250000000000005prediction0.2555
Error 0.30250000000000005prediction0.256
Error 0.30250000000000005prediction0.2565
Error 0.30250000000000005prediction0.257
Error 0.30250000000000005prediction0.2575
Error 0.30250000000000005prediction0.258
Error 0.30250000000000005prediction0.2585
Error 0.30250000000000005prediction0.259
Error 0.30250000000000005prediction0.2595
Error 0.30250000000000005prediction0.26
Error 0.30250000000000005prediction0.2605
Error 0.30250000000000005prediction0.261
Error 0.30250000000000005prediction0.2615
Error 

ùêÉùêûùêûùê© ùêãùêûùêöùê´ùêßùê¢ùêßùê† ùê∞ùê¢ùê≠ùê° ùêçùêé ùêãùê¢ùêõùê´ùêöùê´ùê¢ùêûùê¨!!! ùêèùêöùê´ùê≠ 7

In the previous post, I showed how to update the weights naively but intuitively. Now we will look at a problem faced by this approach; if you change the step_amount(think of it as the learning rate), the error and the prediction keep oscillating around a particular range.
I have also made a visualisation to make it more clear. (last page in the pdf)

ùêãùêûùê≠ùê¨ ùêîùêßùêùùêûùê´ùê¨ùê≠ùêöùêßùêù ùê≠ùê°ùêû ùê´ùêûùêöùê¨ùê®ùêß ùêõùêûùê°ùê¢ùêßùêù ùê¢ùê≠!!!
The error and prediction oscillate because the weight is adjusted by a fixed step amount (in this case, 0.4) regardless of whether the prediction is getting closer or further from the goal prediction. This causes the network to overshoot the optimal weight and alternate between updating the weight to the left and right of the optimal weight, causing the prediction and error to oscillate.
The example code adjusts the weight based on whether the error is smaller when the weight is increased or decreased by the step amount. This approach of adjusting the weight based on the direction of the error gradient is known as gradient descent. However, in practice, more sophisticated techniques like stochastic gradient descent and backpropagation are used to optimize the weights of neural networks, which can handle more complex architectures and avoid the oscillations seen in the example code.

ùêéùêîùêìùêèùêîùêì:
Error:0.30250000000000005 Prediction:0.25
Error:0.12250000000000003 Prediction:0.45
Error:0.022500000000000006 Prediction:0.65
Error:0.0025000000000000044 Prediction:0.8500000000000001
Error:0.022499999999999975 Prediction:0.6500000000000001
Error:0.0025000000000000044 Prediction:0.8500000000000001
Error:0.022499999999999975 Prediction:0.6500000000000001
Error:0.0025000000000000044 Prediction:0.8500000000000001
Error:0.022499999999999975 Prediction:0.6500000000000001

In [None]:
weight = 0.5
input = 0.5
goal_pred = 0.8


'''

With a set step_amount ,
unless the perfect weight is exactly n* stepamount away
the network will eventually overshoot by some number less than step_amount
When it does ,
it will then start alternating back and forth between each side of goal_prediction

'''
step_amount =0.4   # adjusting the learning rate

prediction_list = []
error_list = []

for i in range(1101):
  prediction = input * weight
  error = (pred - goal_pred) ** 2

  print("Error " +str(error) + "        Prediction" +str(prediction))
  prediction_list.append(prediction)
  error_list.append(error)

  up_prediction = input * (weight + step_amount)
  up_error = (goal_pred - up_prediction) ** 2

  down_prediction = input *(weight-step_amount)
  down_error = (goal_pred - down_prediction) ** 2

  if (down_error < up_error):
    weight = weight - step_amount

  if(down_error > up_error):
    weight = weight + step_amount



Error 0.30250000000000005prediction0.25
Error 0.30250000000000005prediction0.45
Error 0.30250000000000005prediction0.65
Error 0.30250000000000005prediction0.8500000000000001
Error 0.30250000000000005prediction0.6500000000000001
Error 0.30250000000000005prediction0.8500000000000001
Error 0.30250000000000005prediction0.6500000000000001
Error 0.30250000000000005prediction0.8500000000000001
Error 0.30250000000000005prediction0.6500000000000001
Error 0.30250000000000005prediction0.8500000000000001
Error 0.30250000000000005prediction0.6500000000000001
Error 0.30250000000000005prediction0.8500000000000001
Error 0.30250000000000005prediction0.6500000000000001
Error 0.30250000000000005prediction0.8500000000000001
Error 0.30250000000000005prediction0.6500000000000001
Error 0.30250000000000005prediction0.8500000000000001
Error 0.30250000000000005prediction0.6500000000000001
Error 0.30250000000000005prediction0.8500000000000001
Error 0.30250000000000005prediction0.6500000000000001
Error 0.30250000

üß†ùêÉùêûùêûùê© ùêãùêûùêöùê´ùêßùê¢ùêßùê† ùê∞ùê¢ùê≠ùê° ùêçùêé ùêãùê¢ùêõùê´ùêöùê´ùê¢ùêûùê¨!!! ùêèùêöùê´ùê≠ 8

We are going to make a Gradient Descent Learning with Multiple Outputs with several steps of learning. This code implements a simple neural network and demonstrates the process of training the network using a basic form of gradient descent.

üìúThe ùòØùò¶ùò∂ùò≥ùò¢ùò≠_ùòØùò¶ùòµùò∏ùò∞ùò≥ùò¨() function takes an input value and a set of weights, performs a weighted sum of the inputs, and returns the output. The ùò•ùò¶ùò≠ùòµùò¢() function calculates the difference between the predicted output and the true output values. The ùò∏ùò¶ùò™ùò®ùò©ùòµ_ùò•ùò¶ùò≠ùòµùò¢() function computes the weight updates based on the deltas and the input value.

The main loop runs for 200 iterations and performs the following steps in each iteration:

‚ö°1. Forward pass: It calls the ùòØùò¶ùò∂ùò≥ùò¢ùò≠_ùòØùò¶ùòµùò∏ùò∞ùò≥ùò¨() function to obtain a prediction.

‚ö°2. Delta calculation: It calculates the deltas between the prediction and the true values using the ùò•ùò¶ùò≠ùòµùò¢() function.

‚ö°3. Error calculation: It computes the squared errors between the prediction and the true values.

‚ö°4. Weight update: It updates the weights using the calculated weight deltas and the learning rate (alpha).

‚ö°5. Logging: It stores the predictions, errors, weight deltas, and updated weights for analysis and visualization.

This code provides a basic understanding of how a neural network can be trained using gradient descent to minimize the prediction errors. It demonstrates the iterative process of adjusting weights to improve the network's performance over time.

üìúOUTPUT:
[0.061750000000000006, -0.5655, 0.3152500000000001]

Iteration:1

Delta:[0.095, -0.87, 0.4850000000000001]

Weight_Deltas:[0.061750000000000006, -0.5655, 0.3152500000000001]
Weights:[0.293825, 0.25655, 0.868475]

Pred:[0.195, 0.13, 0.5850000000000001]

Error:[0.009025, 0.7569, 0.2352250000000001]

-----------------------------------

[0.059141062499999994, -0.541607625, 0.3019306875]

Iteration:2

Delta:[0.09098624999999999, -0.8332425, 0.46450875]

Weight_Deltas:[0.059141062499999994, -0.541607625, 0.3019306875]

Weights:[0.28791089375, 0.3107107625, 0.83828193125]

Pred:[0.19098625, 0.1667575, 0.56450875]

Error:[0.008278497689062499, 0.69429306380625, 0.21576837882656252]
...
[1.1476697434897183e-05, -0.00010510238703509978, 5.859156058863197e-05]

Iteration:200

Delta:[1.765645759214951e-05, -0.00016169598005399966, 9.014086244404917e-05]

Weight_Deltas:[1.1476697434897183e-05, -0.00010510238703509978, 5.859156058863197e-05]

Weights:[0.15387216995732136, 1.538223285654005, 0.15397897294000892]

Pred:[0.10001765645759216, 0.999838304019946, 0.10009014086244405]

Error:[3.117504947033741e-10, 2.6145589965623455e-08, 8.125375082156995e-09]

-------------------------------------

hashtag#visualization hashtag#deeplearning hashtag#machinelearning hashtag#datascience

Gradient Descent Learning with Multiple Outputs with several steps of learning.

In [None]:
weights = [0.3,0.2,0.9]
alpha = 0.1  # learning rate
wlrec = [0.65,1.0,1.0,0.9]

#inputs
hurt =  [ 0.1, 0.0, 0.0, 0.1]
win  =  [  1 ,   1,   8,   1]
sad  =  [0.1, 0.0, 0.1,  0.2]

input = wlrec[0]
true = [hurt[0] , win[0] , sad[0]]

def neural_network(input , weights):
  output = [0 , 0 , 0]
  for i in range(len(weights)):
    output[i] = input * weights[i]
  return output

def delta(pred ,true):   #calculating error (ypred - ytrue)
  output = [0,0,0]
  for i in range(len(true)):
    output[i] = output[i] + (pred[i] - true[i])
  return output

def weight_delta(deltas , input):
  output = [0 , 0 , 0]
  for i in range(len(deltas)):
    output[i] = deltas[i] * input
  return output

prediction_list = []
error_list = []
direction_and_amount_list = []
weight_list = []


for i in range(200):
  prediction = neural_network(input , weights)
  prediction_list.append(prediction)

  deltas = delta(prediction , true)

  error = list(map(lambda x: x**2 , delta(prediction , true)))  #calculating error
  error_list.append(error)

  weight_deltas = weight_delta(deltas , input)
  direction_and_amount_list.append(weight_deltas)

  print(weight_deltas)

  for  i in range(len(weights)):
    weights[i] = weights[i] - alpha * weight_deltas[i]   # wnew = wold - lr (dloss/dwold)

  weight_list.append(weights)
  print("Iteration:" + str(i+1))
  print("Deltas:" + str(deltas))
  print("Weight_deltas:" + str(weight_deltas))
  print("Weights:" + str(weights))
  print("Pred:" + str(pred))
  print("Error:" + str(error))

  print('-----------------------------')

[0.061750000000000006, -0.5655, 0.3152500000000001]
Iteration:3
Deltas:[0.095, -0.87, 0.4850000000000001]
Weight_deltas:[0.061750000000000006, -0.5655, 0.3152500000000001]
Weights:[0.293825, 0.25655, 0.868475]
Pred:0.25
Error:[0.009025, 0.7569, 0.2352250000000001]
-----------------------------
[0.059141062499999994, -0.541607625, 0.3019306875]
Iteration:3
Deltas:[0.09098624999999999, -0.8332425, 0.46450875]
Weight_deltas:[0.059141062499999994, -0.541607625, 0.3019306875]
Weights:[0.28791089375, 0.3107107625, 0.83828193125]
Pred:0.25
Error:[0.008278497689062499, 0.69429306380625, 0.21576837882656252]
-----------------------------
[0.056642352609375, -0.5187247028437499, 0.28917411595312503]
Iteration:3
Deltas:[0.0871420809375, -0.7980380043749999, 0.44488325531250006]
Weight_deltas:[0.056642352609375, -0.5187247028437499, 0.28917411595312503]
Weights:[0.2822466584890625, 0.362583232784375, 0.8093645196546875]
Pred:0.25
Error:[0.007593742270117802, 0.6368646564268324, 0.19792111085744712