An **activation function** in a neural network determines whether a neuron should be "activated" (i.e., produce a strong output) or not, based on the weighted sum of its inputs.

Here's what happens step by step:
1.	Inputs come into a neuron (node), each multiplied by a weight.
2.	These weighted inputs are summed, and often a **bias** is added.
3.	The result is passed through an activation function, which transforms it into the neuron's output.

Mathematically:

output=$\phi(w_1 x_1 +w_2 x_2 +\dots +w_n x_n +b)$


In [1]:
import numpy as np

In [2]:
class Neuron:
  # Initialize weights randomly and bias to zero
  def __init__(self, num_inputs):
    self.weights = np.random.randn(num_inputs)
    self.bias = 0.0

  # Sigmoid activation function
  def sigmoid(self, x):
    return 1 / (1 + np.exp(-x))
  # Compute weighted sum of inputs + bias
  def forward(self, inputs):
    weighted_sum = np.dot(inputs, self.weights) + self.bias
    # Apply activation function (sigmoid in this case)
    y=  self.sigmoid(weighted_sum)
    print("y calculated", y)
    return y



# Create a neuron with 3 inputs
num_inputs = 3
neuron = Neuron(num_inputs)

# Example input vector
inputs = np.array([0.5, -1.0, 2.0])

# Perform forward propagation through the neuron
output = neuron.forward(inputs)
print("Inputs:", inputs) #[ 0.5 -1. 2. ]
print("Output:", output) #0.7808865272523784

y calculated 0.538181736229801
Inputs: [ 0.5 -1.   2. ]
Output: 0.538181736229801


# Common activation functions
| Function       | Formula                | Output Range | Notes                                                      |
| -------------- | ---------------------- | ------------ | ---------------------------------------------------------- |
| **Sigmoid**    | $\frac{1}{1 + e^{-x}}$ | (0, 1)       | Good for probabilities, but can cause vanishing gradients. |
| **Tanh**       | $\tanh(x)$             | (-1, 1)      | Zero-centered, but still has vanishing gradient issues.    |
| **ReLU**       | $\max(0, x)$           | \[0, ∞)      | Fast and simple; most common in hidden layers.             |
| **Leaky ReLU** | $\max(0.01x, x)$       | (−∞, ∞)      | Fixes ReLU's "dying neuron" problem.                       |


In [3]:
# Sigmoid and ReLU activation functions
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_deriv(x):
    return sigmoid(x) * (1 - sigmoid(x))

def relu(x):
    return np.maximum(0, x)

def relu_deriv(x):
    return (x > 0).astype(float)

# Inputs and labels for AND logic gate
X = np.array([[0, 0],
              [0, 1],
              [1, 0],
              [1, 1]])
y = np.array([[0], [0], [0], [1]])

# Random weight initialization
np.random.seed(0)
W1 = np.random.randn(2, 2)  # input to hidden (2 neurons)
b1 = np.zeros((1, 2))
W2 = np.random.randn(2, 1)  # hidden to output
b2 = np.zeros((1, 1))

# Training parameters
learning_rate = 0.1
epochs = 1000

for epoch in range(epochs):
    # === Forward pass ===
    Z1 = X @ W1 + b1
    A1 = relu(Z1)
    Z2 = A1 @ W2 + b2
    A2 = sigmoid(Z2)

    # === Loss (binary cross-entropy) ===
    loss = -np.mean(y * np.log(A2 + 1e-8) + (1 - y) * np.log(1 - A2 + 1e-8))

    # === Backward pass ===
    dZ2 = A2 - y                      # derivative of loss w.r.t. Z2
    dW2 = A1.T @ dZ2
    db2 = np.sum(dZ2, axis=0, keepdims=True)

    dA1 = dZ2 @ W2.T
    dZ1 = dA1 * relu_deriv(Z1)
    dW1 = X.T @ dZ1
    db1 = np.sum(dZ1, axis=0, keepdims=True)

    # === Update weights and biases ===
    W2 -= learning_rate * dW2
    b2 -= learning_rate * db2
    W1 -= learning_rate * dW1
    b1 -= learning_rate * db1

    if epoch % 100 == 0:
        print(f"Epoch {epoch}, Loss: {loss:.4f}")

# === Final output ===
print("\nTrained predictions:")
print(np.round(A2))





Epoch 0, Loss: 1.0635
Epoch 100, Loss: 0.0281
Epoch 200, Loss: 0.0101
Epoch 300, Loss: 0.0056
Epoch 400, Loss: 0.0038
Epoch 500, Loss: 0.0028
Epoch 600, Loss: 0.0022
Epoch 700, Loss: 0.0018
Epoch 800, Loss: 0.0015
Epoch 900, Loss: 0.0013

Trained predictions:
[[0.]
 [0.]
 [0.]
 [1.]]
