
* logistic regression 
* Support vector machine

LOGISTICS REGRESSION

In [1]:
import numpy as np

def sigmoid(z):
    """Sigmoid activation function."""
    return 1 / (1 + np.exp(-z))

def logistic_regression(X, y, learning_rate=0.01, num_epochs=1000):
    """Logistic Regression implementation using gradient descent."""
    n_samples, n_features = X.shape

    # Initialize weights and bias
    weights = np.zeros(n_features)
    bias = 0

    for epoch in range(num_epochs):
        # Forward pass
        logits = np.dot(X, weights) + bias
        predictions = sigmoid(logits)

        # Compute loss (binary cross-entropy)
        loss = -np.mean(y * np.log(predictions) + (1 - y) * np.log(1 - predictions))

        # Compute gradients
        dw = (1/n_samples) * np.dot(X.T, (predictions - y))
        db = (1/n_samples) * np.sum(predictions - y)

        # Update weights and bias
        weights -= learning_rate * dw
        bias -= learning_rate * db

        # Print loss every 100 epochs
        if epoch % 100 == 0:
            print(f"Epoch {epoch}, Loss: {loss}")

    return weights, bias

# Example usage:
# Replace X_data and y_data with your own data
X_data = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
y_data = np.array([0, 0, 1, 1])

# Perform logistic regression
weights, bias = logistic_regression(X_data, y_data)

# Display the results
print("Logistic Regression Weights:", weights)
print("Logistic Regression Bias:", bias)


Epoch 0, Loss: 0.6931471805599453
Epoch 100, Loss: 0.6195332903611219
Epoch 200, Loss: 0.592068576113594
Epoch 300, Loss: 0.5667402582926928
Epoch 400, Loss: 0.5433708303396876
Epoch 500, Loss: 0.5217934102724424
Epoch 600, Loss: 0.5018514944201355
Epoch 700, Loss: 0.48339976088114667
Epoch 800, Loss: 0.4663043379465336
Epoch 900, Loss: 0.4504426760047134
Logistic Regression Weights: [ 0.91260242 -0.20690417]
Logistic Regression Bias: -1.1195065937669515


Logistic Regression is a widely used statistical method for binary classification problems, where the goal is to predict the probability that an instance belongs to a particular class. Despite its name, Logistic Regression is used for classification rather than regression.

### Key Concepts:

1. **Sigmoid Activation Function:**
   - Logistic Regression uses the sigmoid activation function to transform the output into a probability between 0 and 1.
   - The sigmoid function is defined as: \( \sigma(z) = \frac{1}{1 + e^{-z}} \), where \( z \) is the linear combination of features and weights.

2. **Linear Model:**
   - The linear model in Logistic Regression is expressed as: \( z = w_0 + w_1x_1 + w_2x_2 + \ldots + w_nx_n \), where \( w \) is the weight vector, \( x \) is the feature vector, and \( n \) is the number of features.

3. **Prediction:**
   - The predicted probability of belonging to class 1 is given by \( \hat{y} = \sigma(z) \).
   - If \( \hat{y} \) is greater than or equal to 0.5, the model predicts class 1; otherwise, it predicts class 0.

4. **Loss Function:**
   - Logistic Regression uses the binary cross-entropy loss function.
   - The loss for a single instance is given by: \( -[y\log(\hat{y}) + (1 - y)\log(1 - \hat{y})] \), where \( y \) is the actual class label.

5. **Gradient Descent:**
   - The goal is to minimize the loss by adjusting the weights and bias using gradient descent.
   - Partial derivatives of the loss with respect to weights and bias are calculated, and weights are updated in the opposite direction of the gradient.

### Training Process:

1. **Initialization:**
   - Initialize weights (\( w \)) and bias (\( b \)) to zero or small random values.

2. **Forward Pass:**
   - Compute the weighted sum (\( z \)) of features for each instance.
   - Apply the sigmoid activation function to \( z \) to obtain predicted probabilities.

3. **Loss Computation:**
   - Compute the binary cross-entropy loss based on the actual class labels and predicted probabilities.

4. **Backward Pass (Gradient Descent):**
   - Calculate the gradients of the loss with respect to weights and bias.
   - Update weights and bias in the opposite direction of the gradients.

5. **Repeat:**
   - Repeat the forward pass, loss computation, and backward pass for a specified number of epochs or until convergence.

### Advantages:

- **Interpretability:**
  - Logistic Regression provides interpretable coefficients, allowing you to understand the impact of each feature on the predicted probability.

- **Efficiency:**
  - It is computationally efficient and easy to implement.

- **Probabilistic Output:**
  - Logistic Regression provides a probabilistic output, making it suitable for scenarios where understanding the certainty of predictions is important.

### Use Cases:

- **Binary Classification:**
  - Predicting whether an email is spam or not.
  - Identifying whether a transaction is fraudulent.

- **Medical Diagnosis:**
  - Predicting the probability of a disease based on patient characteristics.

- **Marketing:**
  - Predicting whether a customer will make a purchase.

Logistic Regression is a foundational algorithm in machine learning and serves as a baseline for more complex models. It is particularly useful when the relationship between features and the target variable is approximately linear.

SUPPORT VECTOR MACHINE

In [2]:
import numpy as np

class SVM:
    def __init__(self, learning_rate=0.01, lambda_param=0.01, num_epochs=1000):
        self.learning_rate = learning_rate
        self.lambda_param = lambda_param
        self.num_epochs = num_epochs
        self.weights = None
        self.bias = None

    def fit(self, X, y):
        # Initialize weights and bias
        self.weights = np.zeros(X.shape[1])
        self.bias = 0

        # Gradient Descent
        for epoch in range(self.num_epochs):
            # Compute decision function (y_hat)
            decision_function = np.dot(X, self.weights) + self.bias

            # Calculate hinge loss
            loss = 1 - y * decision_function
            loss[loss < 0] = 0  # Set negative hinge loss to 0

            # Calculate gradient
            dw = -2 * np.dot(X.T, loss * y) + 2 * self.lambda_param * self.weights
            db = -2 * np.sum(loss * y)  # Derivative of hinge loss with respect to bias

            # Update weights and bias
            self.weights -= self.learning_rate * dw
            self.bias -= self.learning_rate * db

    def predict(self, X):
        # Predict the class label (1 or -1) for each instance in X
        decision_function = np.dot(X, self.weights) + self.bias
        predictions = np.sign(decision_function)
        return predictions

# Example usage:
# Replace X_data and y_data with your own data
X_data = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
y_data = np.array([1, 1, -1, -1])

# Create and train SVM
svm_model = SVM()
svm_model.fit(X_data, y_data)

# Make predictions
predictions = svm_model.predict(X_data)

# Display the results
print("SVM Weights:", svm_model.weights)
print("SVM Bias:", svm_model.bias)
print("Predictions:", predictions)


SVM Weights: [-2.2526297   0.64927896]
SVM Bias: 3.329399424709091
Predictions: [ 1.  1. -1. -1.]


Support Vector Machine (SVM) is a supervised learning algorithm used for classification and regression tasks. SVM aims to find the optimal hyperplane that best separates different classes in the feature space. It is particularly effective in high-dimensional spaces and is known for its ability to handle non-linear relationships through the use of kernel functions.

### Key Concepts:

1. **Hyperplane:**
   - In a two-dimensional space, a hyperplane is a line that separates two classes. In a higher-dimensional space, it becomes a hyperplane.
   - SVM aims to find the hyperplane that maximizes the margin between classes.

2. **Margin:**
   - The margin is the distance between the hyperplane and the nearest data point from each class. SVM seeks to maximize this margin.
   - Larger margins often lead to better generalization to unseen data.

3. **Support Vectors:**
   - Support vectors are the data points that are closest to the hyperplane.
   - These points play a crucial role in determining the optimal hyperplane and the margin.

4. **Decision Function:**
   - The decision function of SVM is used to classify new instances.
   - For a given instance, the decision function calculates the signed distance from the point to the hyperplane. The sign of this distance determines the predicted class.

5. **Kernel Trick:**
   - SVM can handle non-linear relationships between features and classes through the use of kernel functions.
   - Common kernels include linear, polynomial, and radial basis function (RBF) kernels.

6. **Soft Margin SVM:**
   - In cases where the data is not linearly separable, or there is noise in the data, SVM allows for a certain amount of misclassification.
   - This is achieved by introducing a slack variable that allows some points to be on the wrong side of the margin or even the wrong side of the hyperplane.

### Training Process:

1. **Objective Function:**
   - The objective of SVM is to find the hyperplane that maximizes the margin while minimizing the classification error.
   - This is often formulated as a constrained optimization problem.

2. **Optimization:**
   - The optimization process involves finding the weights and bias of the hyperplane that satisfy the constraints and maximize the margin.

3. **Kernel Transformation:**
   - In cases where a linear hyperplane cannot separate the classes, SVM applies a kernel transformation to map the data into a higher-dimensional space.
   - The kernel function calculates the dot product in the transformed space without explicitly computing the transformation.

4. **Regularization:**
   - SVM includes a regularization parameter to control the trade-off between maximizing the margin and minimizing the classification error.

### Use Cases:

- **Image Classification:**
  - SVM can be used for image classification tasks, such as recognizing handwritten digits or classifying objects in images.

- **Text Classification:**
  - SVM is effective in text classification tasks, such as spam detection or sentiment analysis.

- **Bioinformatics:**
  - SVM is applied to bioinformatics for tasks like protein classification and gene expression analysis.

- **Finance:**
  - SVM is used in financial applications, including credit scoring and stock market prediction.

### Advantages:

- **Effective in High-Dimensional Spaces:**
  - SVM performs well in high-dimensional spaces, making it suitable for tasks with a large number of features.

- **Robust to Overfitting:**
  - SVM tends to be less prone to overfitting, especially in high-dimensional spaces.

- **Effective with Non-Linear Relationships:**
  - SVM can handle non-linear relationships through the use of kernel functions.

### Limitations:

- **Computational Complexity:**
  - Training an SVM can be computationally expensive, particularly for large datasets.

- **Difficulty in Interpretability:**
  - Interpreting the meaning of the support vectors and the hyperplane in real-world terms can be challenging.

Support Vector Machine is a versatile and powerful algorithm, but the choice of the kernel and tuning parameters requires careful consideration based on the characteristics of the data. For practical applications, libraries like scikit-learn in Python provide efficient implementations of SVM with various kernel options and hyperparameter tuning capabilities.