In [9]:
#test import to make sure that the environment works
import torch
import numpy as np

layer = torch.tensor([1, 2, 3], dtype=float, requires_grad=True)

print(layer)

tensor([1., 2., 3.], dtype=torch.float64, requires_grad=True)


TEST

Outline Very Basic:

#The Main Idea behind Machine Learning

It is well known that machines are potent in processing defined algorithms with its combination of speed, memory, and accuracy. Once a human defined an algorithm, or a series of steps, for the computer to follow, it can do so faster and better than any other human.

However, machines themselves are unable to tackle the more abstract problems such as differentiating between a photo of a dog from a cat. 

To humans, this task may be trivial. However, humans themselves are also unable to clearly explain their thought process for separating dogs and cats in a concise way. They may suggest tips such as looking at its ears or tail, but this is another ambiguous question in itself, especially to a computer which percieves images not by its greater pattern, but each individual pixel and its color values. For humans, their brains are able to intuitively process the information in something akin to a black box, unsure of the exact algorithms underneath. Since humans are also unable to create a concise algorithm for classification, they cannot write code for a machine to follow in order to accomplish the same task.

So, does this mean that humans were born with an innate ability to differentiate between dogs and cats? There are no strong evidence supporting this argument, so the leading theory is that humans develop their classification abilities later on, probably by observing an uncountable amount of dogs and cats throughout their lives. This implies that the classification process can be learnt, most likely by identifying groups of hidden patterns that gives deeper insight than just the raw data itself.

**The broadest idea of machine learning is that there are intrinsic patterns in data. By matching and gathering a large amount input and output pairs, it may be possible to find the function or formula which converts an input into the desired corresponding output.**

The rest of this paper would discuss the more practical concepts in implementing simpler neural network models.

#Embedding: Representing Information:

Before making. This is most commonly done through vectors, matrices, and tensors. These are essencially an array or list of a certain dimension. This process is known as embedding.

#Neurons and Linear Layers:

The idea behind a neuron is that it is the smallest possible component in a larger neural network, just like a human's neuron to their brain. While biology and chemistry powers a human neuron, a machine's neuron is defined by math.

In mathamatics, multiplication is the most common way to alter a value's size by its proportion. For instance, multiplying X by 0.5 yields X/2, something half as large in magnitude, while multiplying X by 2 yields 2X, something twice as large in magnitude. This is a useful way to amplify or diminish a value's magnitude without changing its inherent composition (attributes such as its prime factors). Another simple way to manipulate values is addition. This operation can shift a value along the number line, or alternatively a vector along a certain axis. Although addition also affects the size of a value, it will disturb said value's composition. Overall, it is best to think of multiplication as adjusting a value's size, while addition acts as an offset.

These mathmatical ideas also applies to the field of machine learning, which comes in the form of a neuron. The neuron would be akin to a function, with two inherent adjustable properties known as the weight and the bias. The idea behind a neuron is that it will take an input signal, multiply it with its weight attribute, add the product with its bias attribute, and return the final sum as its modified output signal. For example, a neuron with a large weight would amplify the input signal into a larger output signal, and vice versa.

Here is the simple formula for a single neuron:

*y = wx + b*
*(where w is the weight, x is the input, and b is the bias value)*


Neurons are then organized into layers.

The main idea for almost all neural networks are its layers. Each layer would have a certain amount of "neurons" or "nodes", which would each hold a weight value. By assigning different weights to each neuron in the layers, the input signals will get amplified or diminished in its corresponding areas.

Practically, all weights, inputs, and biases are represented as matrices or tensors, both of which are a common way to group large amount of numbers. This allows for the ease of processing large amount of calculations which neural networks need. Alternatively, they are similar to an array of Nth dimension, depending on the requirements.

Besides the, a non-linear function is . Common examples include sigmoid and tanh.

Here is a human-friendly example of a neural network:

In [20]:
#Inputs
X = np.array([2, 3, 5])

#Neural Layer Properties
W = np.array([1, 2, 4])
B = 0

#Non-linear function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

class linear_neuron_layer:
    def __init__(self, w, b):
        self.w = w
        self.b = b
    
    def forward(self, x):
        if len(x) == len(self.w):
            return sigmoid(sum(x*self.w) + self.b)
        else:
            return "Weight/Input Mismatch"

neuron = linear_neuron_layer(W, B)

neuron.forward(X)
#y = sigmoid(28)
#y = 0.9999999999993086

0.9999999999993086

In [None]:
#Inputs
X = np.array([2, 3, 5])

#Neural Layer Properties
W = np.array([1, 2, 4])
B = 0

#Functions
def sigmoid(x): #Sigmoid function
    return 1 / (1 + np.exp(-x))

def sigmoidDerivative(x):
    return sigmoid(x) * (1 - sigmoid(x))

def error(y, t): #L2 Norm
    return 0.5 * np.power((t-y), 2)

def errorDerivative(y, t):
    return -(t-y)

class linear_neuron_layer:
    def __init__(self, w, b):
        self.w = w
        self.b = b
    
    def forward(self, x):
        if len(x) == len(self.w):
            return sigmoid(sum(x*self.w) + self.b)
        else:
            return "Weight/Input Mismatch"
    
    def updateWeights(self, g, u):
        if len(self.w) == len(g):
            self.w = (self.w + g) * u

neuron = linear_neuron_layer(W, B)

neuron.forward(X)
#y = sigmoid(28)
#y = 0.9999999999993086

0.9999999999993086

In [35]:
#Constants
bigErr = 100000

#Helpers
def sigmoid(x): #Sigmoid function
    return 1 / (1 + np.exp(-x))

def sigmoidDerivative(x):
    return sigmoid(x) * (1 - sigmoid(x))

def error(y, t): #L2 Norm
    return 0.5 * np.power((t-y), 2)

def errorDerivative(y, t):
    return -(t-y)

class NeuronLayerSingle:
    def __init__(self, w, b, f):
        self.w = w #weight list
        self.b = b #bias value
        self.f = f #non-linear function
    
    #functions
    def updateWeights(self, g, u):
        if len(self.w) == len(g):
            self.w = (self.w + g) * u

    def getInputSize(self):
        return len(self.w)

    def getOutputRaw(self, x):
        if len(x) == len(self.w):
            return sum(x*self.w) + self.b
        else:
            return "Weight/Input Mismatch"

    def getOutput(self, x):
        if len(x) == len(self.w):
            return self.f(self.getOutputRaw(x))
        else:
            return "Weight/Input Mismatch"


class TestModelSingle: #Single Layer Only
    def __init__(self, inputSize):

        mag = 1 / np.sqrt(inputSize) #shallow weight initialization

        self.layer = NeuronLayerSingle(np.array([(np.random.rand() * mag * 2 - mag) for i in range(inputSize)]), 0., sigmoid) #Neuron layer, init values [-1/sqrt(x), 1/sqrt(x)]
        self.output = None

        self.err = bigErr

    def run(self, x):
        self.output = self.layer.getOutput(x)
        return self.output
    
    def trainOnce(self, x, t, u):
        y = self.run(x)
        s = self.layer.getOutputRaw(x)
        err = errorDerivative(y, t)
        g = err * sigmoidDerivative(s) * x

        self.layer.updateWeights(g, u)

def actualXOR(x):
    a = x[0]
    b = x[1]
    if a == b:
        return 0
    return 1


trainedXOR = TestModelSingle(2)

xorOriginalWeight = trainedXOR.layer.w

for i in range(250): #arbitrary amount of training
    randomInput = np.array([int(np.random.rand() * 2) for i in range(2)])

    trainedXOR.trainOnce(randomInput, actualXOR(randomInput), 0.97)

#Note: u values were cherry picked for better results
#Another Note: Bias is always set to 0, so when input is [0, 0], the result would always be 0.5, since weights won't affect output

print("-------------XOR Output")

print(trainedXOR.run(np.array([0,0]))) #0, 0.5
print(trainedXOR.run(np.array([0,1]))) #1, 0.10408986683883949
print(trainedXOR.run(np.array([1,0]))) #1, 0.2770619324755566
print(trainedXOR.run(np.array([1,1]))) #0, 0.042628520033774785


-------------XOR Output
0.5
0.4357839017110881
0.2245687444884695
0.1827944569485668


Besides the, a non-linear function is . Common examples include sigmoid and tanh.



After summarizing everything, here are the general formula of a single neural network layer.

General formula:

y = f(x1w1 + x2w2 + x3w3 + ... + xnwn + b)

Formula in matrix form:

y = f(X * W + b)

where X is the input matrix, W is the matrix containing the weights, b is the bias term, and f is the non-linear function.


#Forward Pass and Backpropagation:

The formula described previously is the definition of a forward pass, which means putting inputs into a neural network and obtaining an output from it.

Backpropagation, on the other hand, uses an output and passes it back through the network to update its weights and biases. 

Project result:
trained XOR


#Common Pitfalls and Solutions

Neural Network training:
Problems: overfitting, underfitting
Techniques: Learning rate adjustment (optimizers), momentum(adagrad), dropout layers



#BELOW ARE UNFINISHED

Convolutional Layers:
Image processing, identify edges
simplify into smaller tensor (embedded vectors?)

Project result:
MNIST Reader
ResNet18 with CIFAR-10
ViT with CIFAR-10


Generative Programs:
GAN
generator vs discriminator

DDRM diffusion
image to noise and vise versa