###Logistic Regression Equations

![Screenshot from 2024-08-20 19-02-28.png](attachment:90bf2b91-6949-4ee1-8f08-cc1d8fab7714.png)

w : is the weight vector.
𝑥(𝑖): x (i)is the feature vector for the 𝑖 ith sample.
𝑏  is the bias term.
𝑧(𝑖
)
z 
(i)
  is the linear combination (or the logit).
![Screenshot from 2024-08-20 19-06-54.png](attachment:2ec316f0-72c9-4d42-ac3a-a8fcff8a578e.png)

![image.png](attachment:592e6b4e-6904-43a3-9b64-327839b9f4f5.png)

# Benefit of vectorization
Instead of calculating the linear combination and activation for each sample individually, you can vectorize the operations, which allows you to handle multiple samples at once

1- Vectorized Linear Combination
![Screenshot from 2024-08-20 19-11-40.png](attachment:e72771c1-98fa-4117-9fdb-d79a213e3bc1.png)

W: is the weight vector of shape (n features, 1).
𝑋: is the matrix of input features, where each column represents a sample (shape 
    (𝑛 features,𝑚) with 𝑚 being the number of samples)
b: is the bias term, which can be broadcasted to match the shape of 𝑍.
Z: is the resulting vector of logits for all samples (shape (1,𝑚)).

2- Vectorized Activation: 
![Screenshot from 2024-08-20 19-20-04.png](attachment:5ff5b1b0-db10-4060-b947-110167ab84c4.png)

In [1]:
import numpy as np


In [2]:

# Example weight vector, input feature matrix, and bias term
W = np.array([[w1], [w2], [w3]])  # Weight vector (n_features, 1)
X = np.array([[x11, x12, x13],    # Feature matrix (n_features, m_samples)
              [x21, x22, x23],
              [x31, x32, x33]])
b = 1.0  # Bias term

# Vectorized linear combination
Z = np.dot(W.T, X) + b

# Sigmoid activation function
A = 1 / (1 + np.exp(-Z))

print(A)  # Output probabilities for each sample

NameError: name 'w1' is not defined

In [3]:
import numpy as np

# Initialize parameters
n_features = 3  # Example: 3 features
m_samples = 5   # Example: 5 samples
W = np.random.randn(n_features, 1)
b = 0.0
X = np.random.randn(n_features, m_samples)
Y = np.random.randint(0, 2, (1, m_samples))  # Random binary labels

# Hyperparameters
learning_rate = 0.01
num_iterations = 1000

# Gradient descent loop
for i in range(num_iterations):
    # Forward propagation
    Z = np.dot(W.T, X) + b
    A = 1 / (1 + np.exp(-Z))  # Sigmoid function
    
    # Compute the cost (optional for monitoring)
    cost = -np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A)) / m_samples
    
    # Backpropagation
    dZ = A - Y
    dW = np.dot(X, dZ.T) / m_samples
    db = np.sum(dZ) / m_samples
    
    # Update parameters
    W = W - learning_rate * dW
    b = b - learning_rate * db
    
    # Print the cost every 100 iterations
    if i % 100 == 0:
        print(f"Iteration {i}: Cost {cost}")

# Output the final weights and bias
print("Final weights:", W)
print("Final bias:", b)


Iteration 0: Cost 1.1860719745682233
Iteration 100: Cost 0.8520745604166569
Iteration 200: Cost 0.6636455950904162
Iteration 300: Cost 0.5576822644942685
Iteration 400: Cost 0.49489595181735907
Iteration 500: Cost 0.4548933865034101
Iteration 600: Cost 0.4274837774362087
Iteration 700: Cost 0.4074567976943535
Iteration 800: Cost 0.3920175875388531
Iteration 900: Cost 0.3795844146875801
Final weights: [[-0.62711664]
 [ 1.00913858]
 [-1.31878621]]
Final bias: -0.587258404845304
