# <center>Assignment 4</center>

### 1. Write down the prediction function and cost function and the corresponding python code in the context for logistic regression.

The <b>sigmoid function</b>, also known as the logistic function, maps any real-valued number to the range (0, 1), which can be used to convert a linear regression output to a probability that the instance belongs to a particular class. Sigmoid function is defined as
$$
σ(z) = \frac{1}{1 + e^{-z}}
$$
The <b>prediction function</b> in logistic regression is the sigmoid function of the linear combination of the input features and the model parameters. It can be represented as follows:
$$h_θ(x) = \frac{1}{1 + e^{-(θ^T x)}}$$
Where: <br>

hθ​(x) is the predicted output, <br>
θ is the vector of model parameters,<br>
x is the vector of input features.<br>
The <b>cost function</b> in logistic regression is the log loss function.  It measures the difference between the predicted probabilities and the actual class labels. The goal is to minimize the cost function to find the optimal coefficients for the logistic regression model. It can be represented as follows:
$$
J(θ) = -\frac{1}{m} \sum_{i=1}^{m} [y^{(i)} log(h_θ(x^{(i)})) + (1 - y^{(i)}) log(1 - h_θ(x^{(i)}))]
$$
Where:<br>

J(θ) is the cost,<br>
m is the number of instances in the dataset,<br>
y<sup>(i)</sup> is the actual output of the i-th instance,<br>
hθ​(x<sup>(i)</sup>)) is the predicted output of the i-th instance, which is the output of the sigmoid function applied to the linear combination of the input features and the model parameters

In [7]:
import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))
def predict(theta, X): #Theta =[w,b]
    return sigmoid(np.dot(X, theta)) #or sigmoid(X@w+b)
def cost_function(theta, X, y):
    m = len(y)
    h = predict(theta, X)
    cost = -1/m * np.sum(y * np.log(h) + (1 - y) * np.log(1 - h))
    return cost

### 2. Define types of logistic regression.

Logistic regression is a statistical model used in machine learning for binary classification problems. It can be categorized into three main types: <br> 

<b>Binary Logistic Regression:</b> This is the most common form of logistic regression, where the response variable (outcome) can only belong to one of two categories. For example, predicting whether an email is spam or not spam.<br><br>

<b>Multinomial Logistic Regression:</b> In this type, the response variable can belong to one of three or more categories that do not have a natural ordering. For example, predicting a person’s preference for a presidential candidate when there are more than two candidates. <br><br>

<b>Ordinal Logistic Regression:</b> This type of logistic regression is used when the response variable can belong to one of three or more categories that have a natural ordering. For example, predicting a product’s rating as low, medium, or high.

### 3. List the difference between linear regression and logistic regression.

#### Linear Regression
1. Linear Regression is a supervised regression model.
2. In Linear Regression, we predict the value by an integer number.
3. It is based on the least square estimation.
4. Here when we plot the training datasets, a straight line can be drawn that touches maximum plots.
5. Linear regression is used to estimate the dependent variable in case of a change in independent variables. For example, predict the price of houses.
6. Linear regression assumes the normal or gaussian distribution of the dependent variable.

#### Logistic Regression
1. Logistic Regression is a supervised classification model.
2. In Logistic Regression, we predict the value by 1 or 0.
3. It is based on maximum likelihood estimation.
4. Positive slopes result in an S-shaped curve and negative slopes result in a Z-shaped curve.
5. Logistic regression is used to calculate the probability of an event. For example, classify if tissue is benign or malignant.
6. Logistic regression assumes the binomial distribution of the dependent variable.


### 4. Let you have given the following dataset:
x1 x2 y <br>
0.5 1 0 <br>
1 2 0 <br>
1.5 2.5 1 <br>
2 3 1 <br>
### Where x1, x2 are independent variable and y is dependent variable. In the context of logistic regression, find the optimized parameters after 3rd iteration. Find prediction for [1,1.5] w.r.t. the optimized parameter.

In [9]:
""" For the following question we will use Gradient descent but manually calculating the 
parameters after each iteration of Gradient descent can be quite complex and time consuming
so we are going to use the code rather than manually solving it"""

import numpy as np
def gradient_descent(X, y, theta, alpha, iterations):
    m = len(y)
    for _ in range(iterations):
        predictions = predict(theta, X)
        error = predictions - y
        gradient = np.dot(X.T, error)
        theta -= alpha * (1/m) * gradient
    return theta
X = np.array([[0.5, 1], [1, 2], [1.5, 2.5], [2, 3]])
y = np.array([0, 0, 1, 1])
X = np.hstack((np.ones((X.shape[0], 1)), X))  # Add a column of ones for the bias term
# Initialize theta
theta = np.zeros(X.shape[1])
# Set the learning rate and the number of iterations
alpha = 0.1
iterations = 3
theta = gradient_descent(X, y, theta, alpha, iterations)
print("Optimized parameters: ", theta)
# Predict for [1, 1.5]
X_new = np.array([1, 1.5]).reshape(1, -1)
X_new = np.hstack((np.ones((X_new.shape[0], 1)), X_new))  # Add a column of ones for the bias term
prediction = predict(theta, X_new)
print("Prediction for [1, 1.5]: ", prediction)


Optimized parameters:  [-0.00682175  0.06503065  0.07733742]
Prediction for [1, 1.5]:  [0.54344393]


### 5. Explain K-Nearest Neighbor (KNN) algorithm.

The K-Nearest Neighbors (KNN) algorithm is a simple and intuitive classification and regression. It doesn’t require any mathematical assumptions or discriminative
function but memorizes the training dataset instead.


1. Let k be the number of neighbours and D be the set of training
samples.
2. for each test sample t = (x′, y′) do
    1. Compute d, the distance between t and training sample, (x, y) ∈ D.
    2. Sort the calculated distances d in ascending order.
    3. Get the top k rows from the sorted array.
    4. Get the most frequent class corresponding to these rows.
    5. Set the class of the test sample to the most frequent class.
3. Return the predicted class labels of the test sample.

### 6. How do you choose the optimal k for KNN model?

1. Cross-Validation like <b> K-fold cross-validation</b> where the data set is divided into k subsets, and the holdout method is repeated k times. Each time, one of the k subsets is used as the test set and the other k-1 subsets are put together to form a training set. The average error across all k trials is computed. The optimal choice of k is usually the one that minimizes the test error.<br><br>
2. Elbow Method: Run the KNN for a range of k values (say 1 to 20) and plot test error first error will decrease and reach a point (inflection point) and then rise again, option value is that inflection point.<br><br>
3. Sometimes, domain knowledge can also help us like in classification of medical condition only closest observations are revelant hence we choose a small k.

### 7. Suppose you have given the following dataset:
x1 x2 y <br>
0.5 0.5 0 <br>
0.5 1 0 <br>
1 1 0 <br>
2 2.5 1 <br>
2.5 3 1 <br>
3 3 1 <br>
### Where x1, x2 are independent variable and y is dependent variable. Predict the class for [1.5,1] for k=2,3 respectively.


Solving KNN manually:

1. **Calculate the Euclidean distance** between the point [1.5, 1] and all points in the dataset. The Euclidean distance between two points (x1, y1) and (x2, y2) is given by $\sqrt{(x1-x2)^2 + (y1-y2)^2}$.

2. **Sort the distances** in ascending order and take the first k points.

3. **Vote for the class**: For k=2, take the 2 nearest points and use a majority vote to predict the class. For k=3, take the 3 nearest points and do the same.

Calculation:

- Distances to [1.5, 1]:
    - Point [0.5, 0.5]: $\sqrt{(0.5-1.5)^2 + (0.5-1)^2}$ = 1.118
    - Point [0.5, 1]: $\sqrt{(0.5-1.5)^2 + (1-1)^2}$ = 1.0
    - Point [1, 1]: $\sqrt{(1-1.5)^2 + (1-1)^2}$ = 0.5
    - Point [2, 2.5]: $\sqrt{(2-1.5)^2 + (2.5-1)^2}$ = 1.581
    - Point [2.5, 3]: $\sqrt{(2.5-1.5)^2 + (3-1)^2}$ = 2.236
    - Point [3, 3]: $\sqrt{(3-1.5)^2 + (3-1)^2}$ = 2.5
- Sorting wrt Distance:<br>
 Point &emsp;&emsp; Distance &nbsp;  Class <br>
|-----------|---------|------| <br>
| [1, 1]&emsp; &ensp; | 0.5 &emsp; &nbsp; | 0 &emsp;| <br>
| [0.5, 1] &emsp;| 1.0 &emsp;&ensp;&nbsp;| 0&emsp; |<br>
| [0.5, 0.5] &nbsp;| 1.118 &ensp;&nbsp;| 0 &emsp;|<br>
| [2, 2.5] &emsp;| 1.581 &ensp;&nbsp;| 1 &emsp;|<br>
| [2.5, 3] &emsp;| 2.236 &ensp;&nbsp;| 1 &emsp;|<br>
| [3, 3] &emsp;&ensp;&nbsp;| 2.5 &emsp;&ensp;&nbsp;| 1 &emsp;|<br>

- For k=2, the two closest points are [1, 1] and [0.5, 1], both of which have class 0. So the prediction for [1.5, 1] is 0.

- For k=3, the three closest points are [1, 1], [0.5, 1], and [2, 2.5]. The majority class is 0 (from [1, 1] and [0.5, 1]), so the prediction for [1.5, 1] is also 0.


### 8. If the dataset is imbalance, then can the prediction by KNN be bias? Explain with an example.

<b>Yes</b>, if the dataset is imbalanced, the prediction by K-Nearest Neighbors (KNN) can be biased. Imbalanced datasets occur when one class significantly outnumbers the other class(es). In such cases, the majority class tends to dominate the predictions, leading to biased results. <br><br>
Suppose we have a binary classification problem where we want to predict whether a bank transaction is fraudulent (class 1) or not (class 0). However, the dataset is highly imbalanced, with only 1% of transactions being fraudulent (class 1) and 99% being non-fraudulent (class 0). <br><br>

<b>Example</b>:Now, let's consider a scenario where we have a new transaction that we want to classify using the KNN algorithm. We choose 𝑘 = 5 nearest neighbors for simplicity. <br><br>

In this imbalanced dataset:<br><br>

99% of the neighbors (4 out of 5) may belong to the majority class (non-fraudulent transactions).<br>
Only 1% of the neighbors (1 out of 5) may belong to the minority class (fraudulent transactions).<br><br>

So even if the new transaction is actually fraudulent it will be classified as non-fraduluent by KNN.<br><br>
<center><b> or</b></center> <br>

Example: Class A has 900 instances, Class B has 50 with k=3 now even if new instance will be in Class B it can be possible that its three nearest neighbors will all be instances of Class A due to really high majority of Class A,
hece putting the new instance in Class A as well.