# BASIC CONCEPTS

Here's an example Python code for a **basic perceptron** that takes two binary inputs (x1 and x2) and produces a binary output (y) based on a given threshold (Z) and weights (w1 and w2):

In [2]:
def perceptron(x1, x2, w1, w2, Z):
    # Calculate the weighted sum
    z = w1*x1 + w2*x2
    
    # Apply the activation function
    if z > Z:
        y = 1
    else:
        y = 0
        
    # Return the output
    return y

#you can call the function like this:
y = perceptron(0, 1, 0.5, -0.5, 0)
print(y)


0


 Here's an example Python code for an **AND perceptron** that takes two binary inputs (x1 and x2) and produces a binary output (y) based on a threshold of 0.5 and equal weights of 0.5 for both inputs:

In [3]:
def and_perceptron(x1, x2):
    # Set the weights and threshold for an AND perceptron
    w1 = 0.5
    w2 = 0.5
    Z = 0.5
    
    # Calculate the weighted sum
    z = w1*x1 + w2*x2
    
    # Apply the activation function
    if z > Z:
        y = 1
    else:
        y = 0
        
    # Return the output
    return y


y = and_perceptron(0, 0)
print(y)


0


The **XOR function** cannot be implemented using a single-layer perceptron because it is not linearly separable. However, it can be implemented using a multi-layer perceptron (MLP) with at least one hidden layer.

Here's an example Python code for an **XOR MLP** that takes two binary inputs (x1 and x2) and produces a binary output (y) using a hidden layer with two neurons and a sigmoid activation function:

In [4]:
import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def xor_mlp(x1, x2):
    # Set the weights and biases for the XOR MLP
    w1 = np.array([[20, -20], [-20, 20]])
    b1 = np.array([-10, 30])
    w2 = np.array([[-20], [20]])
    b2 = np.array([-30])
    
    # Calculate the hidden layer
    z1 = np.dot(np.array([x1, x2]), w1) + b1
    h1 = sigmoid(z1)
    
    # Calculate the output layer
    z2 = np.dot(h1, w2) + b2
    y = sigmoid(z2)
    
    # Round the output to 0 or 1
    if y > 0.5:
        y = 1
    else:
        y = 0
    
    # Return the output
    return y


    y = xor_mlp(0, 1)
print(y)



0


A **perceptron** is a type of artificial neural network that can be used for binary classification problems. It takes a set of input signals, applies weights to those signals, and outputs a binary decision based on a threshold function. The perceptron learning algorithm adjusts the weights during training to improve the accuracy of the classification.

Here's an example Python code for a **basic perceptron **that takes two binary inputs (x1 and x2) and produces a binary output (y) based on a given threshold (Z) and weights (w1 and w2):

In [6]:
class Perceptron:
    def __init__(self, w1, w2, Z):
        self.w1 = w1
        self.w2 = w2
        self.Z = Z
    
    def predict(self, x1, x2):
        # Calculate the weighted sum
        z = self.w1*x1 + self.w2*x2
        
        # Apply the activation function
        if z > self.Z:
            y = 1
        else:
            y = 0
            
        # Return the output
        return y
    
    def train(self, X, y, eta, epochs):
        for epoch in range(epochs):
            for i in range(X.shape[0]):
                # Calculate the prediction and error
                x1, x2 = X[i]
                y_pred = self.predict(x1, x2)
                error = y[i] - y_pred
                
                # Update the weights
                self.w1 += eta * error * x1
                self.w2 += eta * error * x2
                self.Z += eta * error
        
        # Print the final weights and threshold
        print('Final weights:', self.w1, self.w2)
        print('Final threshold:', self.Z)



##
import numpy as np

# Define the training data
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([0, 0, 0, 1])

# Create a Perceptron object and train it
p = Perceptron(0.1, 0.2, 0.3)
p.train(X, y, 0.1, 10)

# Test the trained perceptron
for i in range(X.shape[0]):
    x1, x2 = X[i]
    y_pred = p.predict(x1, x2)
    print(x1, x2, y_pred)



Final weights: 0.1 0.2
Final threshold: 0.3
0 0 0
0 1 0
1 0 0
1 1 1


Here's an example Python code that demonstrates how to train a simple neural network model with a fixed **number of epochs**:

In [7]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Define the model architecture
model = keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=(784,)),
    layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Load and preprocess the data
(X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data()
X_train = X_train.reshape((60000, 784)) / 255.0
X_test = X_test.reshape((10000, 784)) / 255.0
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

# Train the model with a fixed number of epochs
history = model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test))

# Evaluate the trained model
test_loss, test_acc = model.evaluate(X_test, y_test)
print('Test accuracy:', test_acc)





Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test accuracy: 0.9729999899864197


Here's an example Python code that calculates the weighted sum of a 2D input vector using a weight vector:
**x = (x1, x2)> wh1 = (wh11, wh12)> so x>wh1 = wh11x1 + wh12x2**

In [9]:
import numpy as np

wh11 = 11
wh12 = 22

# Define the input vector x and the weight vector wh1
x = np.array([x1, x2])  # Replace x1 and x2 with actual values
wh1 = np.array([wh11, wh12])  # Replace wh11 and wh12 with actual values

# Calculate the weighted sum of x and wh1
weighted_sum = np.dot(x, wh1)

print(weighted_sum)


33


**Linear classifier** can be represented as a single layer perceptron 
y = f(w1*x1 + w2*x2 + ... + wn*xn + b)
In this formula, y is the output (prediction) of the linear classifier, x1, x2, ..., xn are the input features, w1, w2, ..., wn are the weights for each feature, b is the bias term, and f is an activation function that maps the weighted sum to a final output value.

For a linear classifier, the activation function f is usually a simple identity function or a step function that returns 1 if the weighted sum is greater than a threshold and 0 otherwise. In other words, the output of the linear classifier is a binary classification decision based on a linear combination of the input features.

This formula can be implemented as a **single-layer perceptron** using the following code:


In [None]:
import numpy as np

def predict(x, w, b):
    """Predict the output of a linear classifier."""
    z = np.dot(x, w) + b
    y = np.where(z > 0, 1, 0)
    return y



here's an example of how to create a **multi-layer perceptron (MLP) ** with a **variable number of perceptrons** in the hidden layer using the Keras API in Python:

In [10]:
from keras.models import Sequential
from keras.layers import Dense

# Define the number of input features and classes
num_features = 10
num_classes = 2

# Define the number of perceptrons in the hidden layer
num_hidden = 50

# Create the MLP model
model = Sequential()
model.add(Dense(num_hidden, input_dim=num_features, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))

# Compile the model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])




# Activation functions

The **step function** is a simple activation function that maps the input to either 0 or 1 based on a threshold. It has a discontinuous nature and can only produce binary outputs, making it unsuitable for many practical applications.

The step function can be represented as:

In [11]:
def step(x):
    if x < 0:
        return 0
    else:
        return 1


The **sigmoid function** is a common activation function used in neural networks. It has a characteristic S-shaped curve and can map any input to a value between 0 and 1, making it suitable for binary classification tasks.

The sigmoid function can be represented as:

In [None]:
import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return sigmoid(x) * (1 - sigmoid(x))


The **linear activation function** simply outputs the input value, scaled by a constant factor. It is a simple function that can be represented as:
The **polynomial activation function** is a more complex function that can capture nonlinear relationships between the input and output of a neural network. It can be represented as:

In [None]:
def linear(x):
    return x
def polynomial(x, degree):
    return x**degree


here is a general example of how to apply a **nonlinear transformation** to input data using the numpy library in Python:

In [None]:
import numpy as np

# Generate some random input data
x = np.random.rand(100, 5)  # 100 samples, 5 input variables

# Apply a polynomial transformation of degree 2
x_poly = np.hstack([x, x**2])

# Apply a radial basis function (RBF) transformation
centers = np.random.rand(10, 5)  # 10 centers, same dimension as input
sigma = 0.1
x_rbf = np.exp(-np.sum((x[:, None, :] - centers)**2, axis=2) / (2 * sigma**2))

# Use the transformed data in a regression or classification model


 here is a general example of how to fit a **projection pursuit regression (PPR)** model using the scikit-learn library in Python:

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.decomposition import PCA
import numpy as np

# Generate some random input data
X = np.random.rand(100, 10)  # 100 samples, 10 input variables
y = np.random.rand(100)  # target variable

# Fit a PPR model with M=2 projections using PCA
pca = PCA(n_components=2)
X_proj = pca.fit_transform(X)
lr = LinearRegression()
lr.fit(X_proj, y)

# Use the PPR model to predict new data
X_new = np.random.rand(10)  # new input data
X_new_proj = pca.transform(X_new.reshape(1, -1))
y_pred = lr.predict(X_new_proj)

# Alternatively, use a polynomial kernel to model quadratic terms
from sklearn.kernel_approximation import PolynomialCountSketch
X_poly = PolynomialCountSketch(degree=2).fit_transform(X)
lr_poly = LinearRegression()
lr_poly.fit(X_poly, y)
y_pred_poly = lr_poly.predict(PolynomialCountSketch(degree=2).fit_transform(X_new.reshape(1, -1)))


Alternatively, we can use a **polynomial kernel approximation** to model quadratic terms directly, without projecting the data onto lower dimensions.
Here's an example code using scikit-learn library to perform **Support Vector Regression with polynomial kernel**:

In [None]:
from sklearn.svm import SVR
from sklearn.preprocessing import PolynomialFeatures
import numpy as np

# generate sample data
X = np.random.rand(100, 2)
y = X[:, 0] ** 2 + X[:, 1] ** 2

# create polynomial features up to degree 2
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)

# train SVR with polynomial kernel
svr = SVR(kernel='poly', degree=2)
svr.fit(X_poly, y)

# predict on new data
X_new = np.random.rand(10, 2)
X_new_poly = poly.transform(X_new)
y_pred = svr.predict(X_new_poly)


Here's an example code in Python using the **ReLU function** from the TensorFlow library:

In [None]:
import tensorflow as tf

# define input tensor
x = tf.placeholder(tf.float32, shape=[None, 10])

# define ReLU activation function
relu = tf.nn.relu(x)

# create session and run computation
with tf.Session() as sess:
    # generate random input
    input_data = np.random.randn(5, 10)
    
    # run computation with input
    output = sess.run(relu, feed_dict={x: input_data})


Here is the code to apply the **tanh function** and its derivative in Python:

In [None]:
import numpy as np

def tanh(x):
    return np.tanh(x)

def tanh_derivative(x):
    return 1 - np.tanh(x)**2


Here's an example code for the **ReLU activation function and its derivative:**

In [None]:
import numpy as np

# ReLU activation function
def relu(x):
    return np.maximum(0, x)

# Derivative of ReLU activation function
def relu_derivative(x):
    return np.where(x > 0, 1, 0)


Here's an example code for the **Leaky ReLU activation** function and its derivative:

In [None]:
import numpy as np

# Leaky ReLU activation function
def leaky_relu(x, alpha=0.01):
    return np.maximum(alpha*x, x)

# Derivative of Leaky ReLU activation function
def leaky_relu_derivative(x, alpha=0.01):
    return np.where(x > 0, 1, alpha)


Here's an example code for the **Parametric ReLU activation function** and its derivative:

In [None]:
import numpy as np

class PReLU:
    def __init__(self, alpha=0.01):
        self.alpha = alpha
        
    def forward(self, x):
        self.mask = x > 0
        self.alpha_x = self.alpha * x
        return np.where(self.mask, x, self.alpha_x)
    
    def backward(self, dout):
        dx = np.where(self.mask, dout, dout * self.alpha)
        dalpha = np.sum(dout * self.alpha_x * (1 - self.mask))
        self.alpha -= dalpha * 0.001  # learning rate for alpha
        return dx


Here's the Python code for implementing the **ELU activation function:**

In [None]:
import numpy as np

def elu(x, alpha=1.0):
    """
    Computes the Exponential Linear Unit (ELU) activation function.
    
    Args:
        x (float or numpy array): Input to the activation function.
        alpha (float): Slope of the negative region. Default is 1.0.
        
    Returns:
        float or numpy array: Output of the activation function.
    """
    return np.where(x > 0, x, alpha*(np.exp(x)-1))


def elu_derivative(x, alpha=1.0):
    """
    Computes the derivative of the Exponential Linear Unit (ELU) activation function.
    
    Args:
        x (float or numpy array): Input to the activation function.
        alpha (float): Slope of the negative region. Default is 1.0.
        
    Returns:
        float or numpy array: Derivative of the activation function.
    """
    return np.where(x > 0, 1, alpha*np.exp(x))


**Softmax Function**

In [None]:
import numpy as np

def softmax(z):
    exp_z = np.exp(z)
    return exp_z / np.sum(exp_z)


# Gradient Optamization methods

**Gradient Descent method**

In [None]:
import numpy as np

# Define the cost function to optimize
def cost_function(X, y, theta):
    m = len(y)  # Number of training examples
    h = X @ theta  # Hypothesis function
    J = (1 / (2 * m)) * np.sum((h - y) ** 2)  # Cost function
    return J

# Define the Gradient Descent function
def gradient_descent(X, y, alpha, num_iterations):
    m, n = X.shape  # Number of training examples and features
    theta = np.zeros((n, 1))  # Initialize the parameters
    J_history = np.zeros((num_iterations, 1))  # History of cost function values
    
    for i in range(num_iterations):
        h = X @ theta  # Hypothesis function
        gradient = (1 / m) * X.T @ (h - y)  # Gradient of the cost function
        theta = theta - alpha * gradient  # Update the parameters
        J_history[i] = cost_function(X, y, theta)  # Store the cost function value
        
    return theta, J_history

# Test the Gradient Descent function
X = np.array([[1, 3], [1, 4], [1, 5], [1, 6]])  # Features matrix
y = np.array([[1], [2], [3], [4]])  # Labels vector
alpha = 0.1  # Learning rate
num_iterations = 1000  # Number of iterations
theta, J_history = gradient_descent(X, y, alpha, num_iterations)  # Run Gradient Descent

# Print the learned parameters and the final cost function value
print("Learned parameters:")
print(theta)
print("Final cost function value:")
print(J_history[-1])


**Stochastic Gradient Descent (SGD) algorithm:**

In [None]:
def stochastic_gradient_descent(X, y, learning_rate, epochs):
    # initialize the weights
    weights = np.zeros(X.shape[1])
    
    # loop over the epochs
    for epoch in range(epochs):
        # shuffle the data for stochasticity
        X, y = shuffle(X, y)
        
        # loop over each data point
        for i in range(X.shape[0]):
            # calculate the predicted value using the current weights
            y_pred = sigmoid(np.dot(X[i], weights))
            
            # calculate the error and update the weights
            error = y[i] - y_pred
            weights += learning_rate * error * X[i]
    
    return weights


Here's an example code for **Mini-batch Gradient Descent**:

In [None]:
import numpy as np

def mini_batch_gradient_descent(X, y, alpha, epochs, batch_size):
    # X: input data
    # y: output data
    # alpha: learning rate
    # epochs: number of iterations to train the model
    # batch_size: number of samples in each batch
    
    m = X.shape[0] # number of samples
    n = X.shape[1] # number of features
    num_batches = m // batch_size # number of mini-batches
    
    # initialize weights randomly
    W = np.random.randn(n, 1)
    
    for epoch in range(epochs):
        # shuffle the data
        permutation = np.random.permutation(m)
        X_shuffled = X[permutation,:]
        y_shuffled = y[permutation,:]
        
        # iterate over mini-batches
        for i in range(num_batches):
            start = i * batch_size
            end = (i+1) * batch_size
            X_batch = X_shuffled[start:end,:]
            y_batch = y_shuffled[start:end,:]
            
            # compute gradient
            error = X_batch.dot(W) - y_batch
            gradient = (1/batch_size) * X_batch.T.dot(error)
            
            # update weights
            W = W - alpha * gradient
    
    return W


The **momentum method** is an extension of gradient descent that incorporates previous gradients to accelerate the convergence. Here is the code for the momentum method:

In [None]:
import numpy as np

def momentum_gradient_descent(X, y, lr=0.01, beta=0.9, epochs=100):
    # Initialize parameters
    m, n = X.shape
    w = np.zeros((n, 1))
    v = np.zeros((n, 1))
    loss_history = []
    
    # Loop over epochs
    for i in range(epochs):
        # Compute gradient
        grad = np.dot(X.T, (np.dot(X, w) - y)) / m
        
        # Update velocity
        v = beta * v + (1 - beta) * grad
        
        # Update parameters
        w = w - lr * v
        
        # Compute loss
        loss = np.mean((np.dot(X, w) - y) ** 2)
        loss_history.append(loss)
        
    return w, loss_history


**RMSProp (Root Mean Square Prop)** is an optimization algorithm used for training neural networks. It is an adaptive learning rate method that modifies the learning rate of each weight based on the root mean squared of the gradients. Here's the code for RMSProp:

In [None]:
import numpy as np

def rmsprop(weights, gradients, lr, decay_rate, cache):
    eps = 1e-8 # small constant to avoid division by zero
    for i in range(len(weights)):
        cache[i] = decay_rate * cache[i] + (1 - decay_rate) * gradients[i]**2
        weights[i] -= lr * gradients[i] / (np.sqrt(cache[i]) + eps)
    return weights, cache


Adam is an adaptive learning rate optimization algorithm that combines the advantages of both momentum method and RMSProp. Here is the code for **Adam optimizer**:

In [None]:
import numpy as np

class Adam:
    def __init__(self, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-8):
        self.learning_rate = learning_rate
        self.beta1 = beta1
        self.beta2 = beta2
        self.epsilon = epsilon
        self.m = 0
        self.v = 0
        self.t = 0
    
    def update(self, w, grad_wrt_w):
        self.t += 1
        self.m = self.beta1 * self.m + (1 - self.beta1) * grad_wrt_w
        self.v = self.beta2 * self.v + (1 - self.beta2) * np.square(grad_wrt_w)
        m_hat = self.m / (1 - self.beta1 ** self.t)
        v_hat = self.v / (1 - self.beta2 ** self.t)
        w -= self.learning_rate * m_hat / (np.sqrt(v_hat) + self.epsilon)
        return w


# BACK PROPAGATION

**Backpropagation** is a supervised learning algorithm used for training artificial neural networks. It is based on the chain rule of calculus, and it allows the network to adjust its weights in order to minimize the difference between its output and the desired output.

The backpropagation algorithm can be divided into two phases: forward propagation and backward propagation. In the forward propagation phase, the input is fed through the network and the output is calculated. In the backward propagation phase, the error is calculated and propagated back through the network to adjust the weights.

Here is a high-level pseudocode for the backpropagation algorithm:

// **Feed-forward phase**

For each layer in the network:
    Calculate the output of each neuron in the layer based on the input and the weights
    Store the output for later use in the backpropagation phase
    
// **Backward propagation phase**

Calculate the error between the network's output and the desired output
For each layer in the network (starting from the output layer and working backwards):
    Calculate the error for each neuron in the layer based on the error from the next layer and the weights
    Update the weights for each neuron in the layer based on the error and the input


Here's an example code for **backpropagation** in a **two-layer neural network** using sigmoid activation function and mean squared error loss:

In [None]:
import numpy as np

# Define the sigmoid activation function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Define the derivative of sigmoid function
def sigmoid_derivative(x):
    return sigmoid(x) * (1 - sigmoid(x))

# Initialize network weights and biases
n_input = 2
n_hidden = 3
n_output = 1

W1 = np.random.randn(n_hidden, n_input)
b1 = np.zeros((n_hidden, 1))
W2 = np.random.randn(n_output, n_hidden)
b2 = np.zeros((n_output, 1))

# Define input and output data
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]).T
Y = np.array([[0, 1, 1, 0]])

# Define hyperparameters
learning_rate = 0.1
epochs = 10000

# Train the network using backpropagation
for i in range(epochs):
    # Forward pass
    Z1 = np.dot(W1, X) + b1
    A1 = sigmoid(Z1)
    Z2 = np.dot(W2, A1) + b2
    A2 = sigmoid(Z2)

    # Compute error
    loss = np.mean((A2 - Y) ** 2)

    # Backward pass
    dA2 = 2 * (A2 - Y)
    dZ2 = dA2 * sigmoid_derivative(Z2)
    dW2 = np.dot(dZ2, A1.T)
    db2 = np.sum(dZ2, axis=1, keepdims=True)
    dA1 = np.dot(W2.T, dZ2)
    dZ1 = dA1 * sigmoid_derivative(Z1)
    dW1 = np.dot(dZ1, X.T)
    db1 = np.sum(dZ1, axis=1, keepdims=True)

    # Update weights and biases
    W2 -= learning_rate * dW2
    b2 -= learning_rate * db2
    W1 -= learning_rate * dW1
    b1 -= learning_rate * db1

    # Print the loss after every 1000 epochs
    if i % 1000 == 0:
        print(f"Epoch {i}: loss = {loss}")
