# Introduction:

#### What are Perceptrons:
In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.

Small, fast and compact these Perceptrons are able to learn and develop an understanding for so many Datasets. Yes, they may not be as efficient an accurate as Deep Neural Networks but they are still extremely useful models.

Let us begin by understanding what perceptrons really are by creating and deploying on for a simple dataset where everything greater than 0.5 can be labelled as a 1 and everything below can be labelled as 0. 

Before beginning Perceptrons though let us look at an even simpler model called the "Linear Regression Model" I guess unless there is a name like Perceptron for it.

### Imports:

In [1]:
import numpy as np
import math
import random
import tensorflow as tf
from sklearn.preprocessing import MinMaxScaler

### Creating A Dataset:

Creating a Dataset around y = mx + c where both are defined below I will also create some noise and variation so that might not be the optimal answer finally.

In [2]:
m = 121.7094
c = 891.2648

x = np.array([i for i in range(0, 50000, 3)][:16000])
y = np.array([m*i+c+random.randint(-2, 2)*random.random() for i in range(0, 50000, 3)][:16000])
print(y.shape, x.shape)

(16000,) (16000,)


### Shuffling:

In [3]:
index = np.array(range(y.shape[0]))
np.random.shuffle(index)

x = x[index]
y = y[index]


### Model and Fitting

In [4]:
model = tf.keras.models.Sequential([
                                    tf.keras.layers.Dense(1, activation='linear')
])
model.compile(optimizer='adam', loss='mse', metrics=['mae'])
history = model.fit(x, y, epochs=200, verbose=0)
print(model.evaluate(x, y))

[504089837568.0, 615060.4375]


Scaling it:

In [5]:
x_scalar = x[index]/np.max(x)
y_scalar = y[index]/np.max(y)

In [6]:
model1 = tf.keras.models.Sequential([
                                    tf.keras.layers.Dense(1, activation='linear')
])
model1.compile(optimizer='adam', loss='mse', metrics=['mae'])
history = model1.fit(x_scalar, y_scalar, epochs=200, verbose=0)
print(model.evaluate(x, y))

[504089837568.0, 615060.4375]


In [7]:
y_pred = model.predict(x_scalar) * np.max(y)
mae = np.mean(np.abs(y - y_pred))
print(mae)

850215580.4442718


So basically scaling it down is somehow giving poor results so we won't scale anything down

Let us work it out now, Sampling and taking a 2 * root(n) sample use that to initialize the weights

In [8]:
index = np.array(range(y.shape[0]))
np.random.shuffle(index)

x = x[index]
y = y[index]

set_1 = np.array([x[:int(x.shape[0]**0.5)], y[:int(x.shape[0]**0.5)]]).T
set_2 = np.array([x[-int(x.shape[0]**0.5):], y[-int(x.shape[0]**0.5):]]).T

Now the equation of line is given by y = mx +c

so m = (y1 - y2)/(x1 - x2)

   c = y - mx

In [9]:
m = np.array([(i[1]-j[1])/(i[0]-j[0]) for i,j in zip(set_1, set_2)])
c = np.array([set_i[1] - m_i * set_i[0] for set_i, m_i in zip(set_1, m)])

mae_anal_set_1 = np.array([sum([abs(set_i[1] - (set_i[0]*m_i + c_i)) for set_i in set_1])/set_1.shape[0] for m_i, c_i in zip(m, c)])

sorted_index = np.argsort(mae_anal_set_1)
m = m[sorted_index]
c = c[sorted_index]
mae_anal_set_1 = mae_anal_set_1[sorted_index]

final_m, final_c = np.mean(m[:int(m.shape[0]**0.5)]), np.mean(c[:int(m.shape[0]**0.5)])

Now Let us Initialize the weights accoridng to above method

In [10]:
print(model.get_weights())

[array([[96.113335]], dtype=float32), array([97.96404], dtype=float32)]


In [11]:
weights = [np.array([[final_m]]), np.array([final_c])]
model.set_weights(weights)

In [12]:
history = model.fit(x, y, epochs=10, verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Now Let us Create a Workable class for this where input is x and y. Output is weights which can directly be done as model.set_weights(weights)

In [13]:
class Linear_weight_initializer:
  def __init__(self, x, y, sample_exp=0.5):
    self.x = x
    self.y = y
    self.sample_exp = sample_exp
    self.set_1 = None
    self.set_2 = None
    self.final_c = 0
    self.final_m = 0
  
  def setter(self):
    index = np.array(range(y.shape[0]))
    np.random.shuffle(index)

    self.x = self.x[index]
    self.y = self.y[index]

    self.set_1 = np.array([self.x[:int(self.x.shape[0]**self.sample_exp)], self.y[:int(self.x.shape[0]**self.sample_exp)]]).T
    self.set_2 = np.array([self.x[-int(self.x.shape[0]**self.sample_exp):], self.y[-int(self.x.shape[0]**self.sample_exp):]]).T

  def m_c_calculator(self):
    m = np.array([(i[1]-j[1])/(i[0]-j[0]) for i,j in zip(self.set_1, self.set_2)])
    c = np.array([set_i[1] - m_i * set_i[0] for set_i, m_i in zip(self.set_1, m)])

    mae_anal_set_1 = np.array([sum([abs(set_i[1] - (set_i[0]*m_i + c_i)) for set_i in self.set_1])/self.set_1.shape[0] for m_i, c_i in zip(m, c)])

    sorted_index = np.argsort(mae_anal_set_1)
    m = m[sorted_index]
    c = c[sorted_index]
    mae_anal_set_1 = mae_anal_set_1[sorted_index]

    self.final_m, self.final_c = np.mean(m[:int(m.shape[0]**self.sample_exp)]), np.mean(c[:int(m.shape[0]**self.sample_exp)])

  def weights_calc(self):
    self.setter()
    self.m_c_calculator()
    weights = [np.array([[self.final_m]]), np.array([self.final_c])]
    return weights

In [14]:
lwi = Linear_weight_initializer(x, y)
weights = lwi.weights_calc()
print(weights)

[array([[121.70939787]]), array([891.31568849])]


In [15]:
model.set_weights(weights)
history = model.fit(x, y, epochs=10, verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Now Let us create a new Dataset with varying degrees of noise

In [16]:
def noisy_linear_dataset(n):
  m = random.random()*1000
  c = random.random()*1000

  x = np.array([i for i in range(0, 50000, 3)][:16000])
  y = np.array([m*i+c+random.randint(-n, n)*random.random() for i in range(0, 50000, 3)][:16000])

  return x, y

We will now run the model for 10 epochs one where tensorflow starts from the beginning and another where it is pre-optimised

The outputs which contain Function are the outputs from the class above and the ones with Normal are the ones which tensorflow does in the same number of epochs

In [17]:
for noise in range(0, 100, 2):
  x, y = noisy_linear_dataset(noise)

  #Normally

  model0 = tf.keras.models.Sequential([
                                    tf.keras.layers.Dense(1, activation='linear', input_shape=(1,))
  ])
  model0.compile(optimizer='adam', loss='mse', metrics=['mae'])
  history = model0.fit(x, y, epochs=10, verbose=0)
  mse0, mae0 = model0.evaluate(x, y, verbose=0)

  mse0, mae0 = "{:.4f}".format(mse0), "{:.4f}".format(mae0)

  #Using Function

  model1 = tf.keras.models.Sequential([
                                    tf.keras.layers.Dense(1, activation='linear', input_shape=(1,))
  ])
  model1.compile(optimizer='adam', loss='mse', metrics=['mae'])

  lwi = Linear_weight_initializer(x, y)
  weights = lwi.weights_calc()

  model1.set_weights(weights)

  history = model1.fit(x, y, epochs=10, verbose=0)
  mse1, mae1 = model1.evaluate(x, y, verbose=0)

  mse1, mae1 = "{:.4f}".format(mse1), "{:.4f}".format(mae1)

  print("{:<7} {:<8} {:<25} {:<10} {:<10} {:<8} {:<25} {:<10} {:<10}".format(str(int(noise/2+1))+'/50','MAE Normal:', mae0, 'MAE Function:', mae1, 'MSE Normal:', mse0, 'MSE Function:', mse1))

1/50    MAE Normal: 5872077.0000              MAE Function: 0.1879     MSE Normal: 45972860174336.0000       MSE Function: 0.1376    
2/50    MAE Normal: 19854670.0000             MAE Function: 1.3253     MSE Normal: 525618164793344.0000      MSE Function: 3.3781    
3/50    MAE Normal: 5072431.0000              MAE Function: 2.7752     MSE Normal: 34306424569856.0000       MSE Function: 11.0488   
4/50    MAE Normal: 10406175.0000             MAE Function: 1.6333     MSE Normal: 144383613599744.0000      MSE Function: 4.9103    
5/50    MAE Normal: 15192724.0000             MAE Function: 5.2947     MSE Normal: 307766518349824.0000      MSE Function: 40.2653   
6/50    MAE Normal: 17302830.0000             MAE Function: 2.6631     MSE Normal: 399187682263040.0000      MSE Function: 12.9269   
7/50    MAE Normal: 17940998.0000             MAE Function: 4.5541     MSE Normal: 429174607052800.0000      MSE Function: 32.6124   
8/50    MAE Normal: 7026075.0000              MAE Function: 3.

10 epochs way better results but totally useless since single feature datasets are extremely rare anyways and realistically impossible to find

Now to move on to perceptrons with a single feature. Logistic Regression. So we know that for Logistic y is given as a function of f(x) where alpha and beta exist in the form that y = 1/(1 + e^-f(x)). Now here, f(x) = alpha * x + beta. So, then what is the threshold thing. So generally anything greater than 0.5 and anything lesser than 0.5 are classified as a 1 and anything below is a 0. Therefore, let us think of a way to faster approach alpha and beta. First let's simply create a dataset and evaluate it.

In [18]:
x = np.array([10 * random.random() + random.random() * random.randint(-3, 3) for i in range(5000)] + [10 + 10 * random.random() + random.random() * random.randint(-3, 3) for i in range(5000)])
y = np.array([0 for i in range(5000)] + [1 for i in range(5000)])

indexes = np.array(range(10000))
np.random.shuffle(indexes)
x = x[indexes]
y = y[indexes]

In [19]:
model = tf.keras.models.Sequential([
                                    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history = model.fit(x, y, epochs=200, verbose=0)

print(model.evaluate(x, y))

[0.10060581564903259, 0.9610000252723694]


In [20]:
print(model.get_weights())

[array([[1.2968458]], dtype=float32), array([-13.006274], dtype=float32)]


But, one thing to notice about logistic regression is that f(x) must be 0 at the line of seperation but then what's the use. Easiest way to pull that off is to have m = 1 and c = x0. So basically how to find x0. If we sort and then find the most confusing part of y to find out the most optimal x0, thats useless because then you are simply doing a sorting when np.sort is of the complexity ==> n log n and then to top off it we sample root(n) and all let's leave all of that we will go for the most simple method for sampling. So, if we sample out root(n) randomly from the shuffled dataset, and then we sort it we get, root(n) log(n)

In [21]:
class Sigmoid_weight_initializer:
  def __init__(self, x, y, sample_exp=0.5, eps=1e-5, max_iter=1000):
    self.x = x
    self.y = y
    self.x_sub = None
    self.y_sub = None
    self.sets = None
    self.scores = None
    self.c = None
    self.weights = None
    self.i = 0
    self.eps = eps
    self.max_iter = max_iter

  def sub_generator(self):
    self.x_sub = x[int(x.shape[0]**0.5)*self.i:int(x.shape[0]**0.5)*(self.i+1)]
    self.y_sub = y[int(x.shape[0]**0.5)*self.i:int(x.shape[0]**0.5)*(self.i+1)]
    indexes = np.argsort(self.x_sub)
    self.x_sub = self.x_sub[indexes]
    self.y_sub = self.y_sub[indexes]


  def set_generator(self):
    self.sub_generator()
    set_size = int(self.x_sub.shape[0]**0.5)
    self.sets = np.array([[self.x_sub[set_size*i:(i+1)*set_size], self.y_sub[set_size*i:(i+1)*set_size]] for i in range(self.y_sub.shape[0]//set_size)])

  def set_scorer(self):
    self.set_generator()
    self.scores = np.abs(np.array([np.mean(set_i[1]) for set_i in self.sets]) - 0.5)
    self.best_set = np.argsort(self.scores)[0]
    
  def weighter(self):
    self.set_scorer()
    self.c = - (np.median(self.sets[self.best_set][0]) + np.mean(self.sets[self.best_set][0]))/2
    self.weights = [np.array([[1]]), np.array([self.c])]
    return self.weights, self.c

  def dissimilarity_matrix(self, data):
    dissimilarity_matrix = np.empty([data.shape[0], data.shape[0]])
    for i in range(len(data)):
      for j in range(len(data)):
        d_ij = np.sum(np.square(data[i]-data[j]))
        dissimilarity_matrix[i][j] = d_ij

    return dissimilarity_matrix

  def get_perplexity(self, D_row, variance):
    A_row = np.exp(-D_row * variance)
    sumA = sum(A_row)
    perplexity = np.log(sumA) + variance * np.sum(D_row * A_row) / sumA
    return perplexity, A_row

  def affinity_matrix(self, dMatrix, perplexity):
    eps = self.eps
    (n, _) = dMatrix.shape
    variance_matrix = np.ones(dMatrix.shape[0])
    affinity_matrix = np.zeros(dMatrix.shape)
    logU = np.log(perplexity)
    for i in range(dMatrix.shape[0]):
      variance_min = -np.inf
      variance_max =  np.inf
      d_i = dMatrix[i, np.concatenate((np.r_[0:i], np.r_[i+1:n]))]
      (c_perplexity, thisA) = self.get_perplexity(d_i, variance_matrix[i])
      perplexity_diff = c_perplexity - logU
      tries = 0
      while (np.isnan(perplexity_diff) or np.abs(perplexity_diff) > eps) and tries < self.max_iter:
        if np.isnan(perplexity_diff):
          variance_matrix[i] = variance_matrix[i] / 10.0
        elif perplexity_diff > 0:
          variance_min = variance_matrix[i].copy()
          if variance_max == np.inf or variance_max == -np.inf:
            variance_matrix[i] = variance_matrix[i] * 2.0
          else:
            variance_matrix[i] = (variance_matrix[i] + variance_max) / 2.0
        else:
          variance_max = variance_matrix[i].copy()
          if variance_min == np.inf or variance_min == -np.inf:
            variance_matrix[i] = variance_matrix[i] / 2.0
          else:
            variance_matrix[i] = (variance_matrix[i] + variance_min) / 2.0
        (c_perplexity, thisA) = self.get_perplexity(d_i, variance_matrix[i])
        perplexity_diff = c_perplexity - logU
        tries += 1
      affinity_matrix[i, np.concatenate((np.r_[0:i], np.r_[i+1:n]))] = thisA
    return variance_matrix, affinity_matrix

  def binding_matrix(self, aMatrix):
    binding_matrix = aMatrix / aMatrix.sum(axis=1)[:,np.newaxis]
    return binding_matrix

  def outlier_probability(self, bMatrix):
    outlier_matrix = np.prod(1-bMatrix, 0)
    return outlier_matrix

  def sos(self, cleaned_data, perplexity): 
    dMatrix = self.dissimilarity_matrix(cleaned_data)
    var_matrix, aff_matrix = self.affinity_matrix(dMatrix, perplexity)
    bin_matrix = self.binding_matrix(aff_matrix)
    outlier_matrix = self.outlier_probability(bin_matrix)
    return outlier_matrix

  def weights_calc(self):
    weights_list = []
    for i in range(int(self.x.shape[0]/int(self.x.shape[0]**0.5))):
      self.i = i
      _, c = self.weighter()
      weights_list.append(c)
    weights_list = np.array(weights_list)
    weights_list1 = (weights_list - np.min(weights_list))/(np.max(weights_list) - np.min(weights_list))
    outlier_matrix = self.sos(weights_list1, weights_list1.shape[0]/4)
    outlier_matrix_best = np.argmin(outlier_matrix)
    self.c = weights_list[outlier_matrix_best]

    self.weights = [np.array([[1]]), np.array([self.c])]
    return self.weights

A poorer performance compared to the linear but one thing we can do is implent run and it is more dependent on the randomness so yes, it can wrongly classify a better implementation would be if we are able to make it so that the class samples root(n) multiple times, and then use outlier classifaction for removing these values and hence the function get_weights is implemented.

In [22]:
swi = Sigmoid_weight_initializer(x, y)
weights = swi.weights_calc()

In [23]:
model = tf.keras.models.Sequential([
                                    tf.keras.layers.Dense(1, activation='sigmoid', input_shape=(1,))
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.set_weights(weights)
history = model.fit(x, y, epochs=20, verbose=1)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


As you can see much similar accuracy and loss even though it is run for only 20 epochs and taking a look at the first epoch itself we can see a much closer start to the end result

Well now, it starts much better

In [24]:
def noisy_sigmoid_dataset(noise):
  x = np.array([10 * random.random() + random.random() * random.randint(-noise, noise) for i in range(5000)] + [100 + 10 * random.random() + random.random() * random.randint(-noise, noise) for i in range(5000)])
  y = np.array([0 for i in range(5000)] + [1 for i in range(5000)])

  indexes = np.array(range(10000))
  np.random.shuffle(indexes)
  x = x[indexes]
  y = y[indexes]
  return x, y

The outputs which contain Function are the outputs from the class above and the ones with Normal are the ones which tensorflow does in the same number of epochs

In [25]:
for noise in range(0, 100, 2):
  x, y = noisy_sigmoid_dataset(noise)

  #Normally

  model0 = tf.keras.models.Sequential([
                                    tf.keras.layers.Dense(1, activation='sigmoid', input_shape=(1,))
  ])
  model0.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
  history = model0.fit(x, y, epochs=10, verbose=0)
  loss0, acc0 = model0.evaluate(x, y, verbose=0)

  loss0, acc0 = "{:.4f}".format(loss0), "{:.4f}".format(acc0)

  #Using Function

  model1 = tf.keras.models.Sequential([
                                    tf.keras.layers.Dense(1, activation='sigmoid', input_shape=(1,))
  ])
  model1.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

  lwi = Sigmoid_weight_initializer(x, y)
  weights = lwi.weights_calc()

  model1.set_weights(weights)

  history = model1.fit(x, y, epochs=10, verbose=0)
  loss1, acc1 = model1.evaluate(x, y, verbose=0)

  loss1, acc1 = "{:.4f}".format(loss1), "{:.4f}".format(acc1)

  print("{:<7} {:<8} {:<25} {:<10} {:<10} {:<8} {:<25} {:<10} {:<10}".format(str(int(noise/2+1))+'/50','Acc Normal:', acc0, 'Acc Function:', acc1, 'Loss Normal:', loss0, 'Loss Function:', loss1))

1/50    Acc Normal: 1.0000                    Acc Function: 1.0000     Loss Normal: 0.0948                    Loss Function: 0.0000    
2/50    Acc Normal: 1.0000                    Acc Function: 1.0000     Loss Normal: 0.0888                    Loss Function: 0.0000    
3/50    Acc Normal: 0.5036                    Acc Function: 1.0000     Loss Normal: 0.4222                    Loss Function: 0.0000    
4/50    Acc Normal: 1.0000                    Acc Function: 1.0000     Loss Normal: 0.0785                    Loss Function: 0.0000    
5/50    Acc Normal: 0.9986                    Acc Function: 1.0000     Loss Normal: 0.2646                    Loss Function: 0.0000    
6/50    Acc Normal: 1.0000                    Acc Function: 1.0000     Loss Normal: 0.0918                    Loss Function: 0.0000    
7/50    Acc Normal: 1.0000                    Acc Function: 1.0000     Loss Normal: 0.0845                    Loss Function: 0.0000    
8/50    Acc Normal: 0.6230                    Ac

Yes, the loss kind of looses it for the function we built since it does not really take into account loss at all. This is the major drawback for Sigmoid Function so I guess accuracy is the only feature which is consitently better

So, finally now we have two functions one which optimally is able to intialize weights for Linear and one for Logistic. Hope you enojoyed reading.