# CSCC11 - Introduction to Machine Learning, Fall 2022, Assignment 2

## Authors

Shawn Santhoshgeorge (1006094673) \
Anaqi Amir Razif (1005813880)

In [10]:
import pandas as pd
import numpy as np

In [11]:
# Import the Data and Setup X and y for Training
df_train = pd.DataFrame({
    "Width": [4, 6, 6, 6 , 6, 8 , 8],
    "Height": [4, 4, 5, 8, 10, 8, 10],
    "Orange": [1, 1, 1, 0, 0, 1, 0]
})

print(df_train)

# Split Data into X and Y
X_train = df_train[['Width', 'Height']].to_numpy()
X_train = np.hstack((X_train, np.ones((X_train.shape[0], 1)))) # Add Column of 1's for the bias term
Y_train = df_train['Orange'].to_numpy()

   Width  Height  Orange
0      4       4       1
1      6       4       1
2      6       5       1
3      6       8       0
4      6      10       0
5      8       8       1
6      8      10       0


### Write the corresponding optimization problem in terms of the data provided above and specify the parameters to be estimated



The optimization problem we are trying to solve is the following

Model: $P(\text{Orange}  | \mathbf{X}) = \frac{1}{1 + e^{\mathbf{w}^T \mathbf{X}}}$, where $\mathbf{w} = \begin{bmatrix} w_1 \\ w_2 \\ b \end{bmatrix}$ and $\mathbf{X} = \begin{bmatrix} \text{Weight} \\ \text{Height} \\ 1 \end{bmatrix}$


Given $\{x_i, y_i\}_{i=1, \cdots, N}$.To find the estimation for the model parameters we would want to minimize the negative log-likelihood as follows

$L(\mathbf{w}) = - \sum_{i=1}^N y_ilog(P(\text{Orange} |x_i)) + (1- y_i)log(1 - P(\text{Orange} |x_i))$

After taking the partial derivatives for each $\mathbf{w}_i$ we get the following $\frac{\partial}{\partial\mathbf{w}}L(\mathbf{w}) = -\sum_{i=1}^N (y_i - p_i)x_i $



In [12]:
# Sigmoid
def sigmoid(values):
    """
    Return the value from the Sigmoid Function

    Args:
        - values (ndarray (Shape: (N, 1))): Result of the Dot Product with Model Parameters and Input (w^Tx)

    Output:
        Values from the Sigmoid Function
    """

    'Checks if values is an array'
    assert isinstance(values, np.ndarray), 'values must be an ndarray of Nx1'

    return 1 / (1 + np.exp(-values))

In [13]:
w = np.asarray([0.3,-0.2, 0.7]) # Initial Weights
STEP_SIZE = 0.01

def train(x, y, init_w, iters=3):
    """
    Finds the model parameter estimations using Gradient Descent

    Args:
        - x: (ndarray (Shape: (N, 3))): A Nx3 matrix corresponding to the inputs and 1's.
        - y: (ndarray (Shape: (N, 1))): A N-column vector corresponding to the outputs given the inputs.
        - init_w: (ndarray (Shape: (3, 1))): Initial Weights and Bias Term for the model
        - iters (int): Number of iterations for the Gradient Descent Algorithm (Default=3)

    Output:
        - w: (ndarray (Shape: (3, 1))): Estimated Weights and Bias Term for the model
    """

    # Creates a copy of the initial weights
    w = np.copy(init_w)

    # Calculates the gradient and moves the weight closer to the estimate
    for _ in range(iters):
        deltaW = np.dot(x.T, (sigmoid(np.dot(x, w)) - y))
        w -= STEP_SIZE * deltaW

    return w

# Model Parameter Optimization
print("Initial Model Parameters: ", w)
w = train(X_train, Y_train, w)
print("After Optimization: ", w)

# Model Testing on Training Data
Y_train_pred = sigmoid(np.dot(X_train, w)) >= 1/2
print("Train Data Result: ", 1 * Y_train_pred)

Initial Model Parameters:  [ 0.3 -0.2  0.7]
After Optimization:  [ 0.27208574 -0.35574855  0.69978802]
Train Data Result:  [1 1 1 0 0 1 0]


In [14]:
# Import the Data and Setup X for Testing
X_test = np.array([(3,3), (4, 10), (9, 8), (9, 10)])
X_test = np.hstack((X_test, np.ones((X_test.shape[0], 1))))
Y_test_pred = sigmoid(np.dot(X_test, w)) >= 1/2
print("Test Data Result: ", 1 * Y_test_pred)

Test Data Result:  [1 0 1 0]


Therefore, the new points fit into the following class

<center>

| Width 	| Height 	| Orange 	|
|-------	|--------	|--------	|
| 3     	| 3      	| 1      	|
| 4     	| 10     	| 0      	|
| 9     	| 8      	| 1      	|
| 9     	| 10     	| 0      	|

</center>

### Discuss one advantage of Logistic Regression

One advantage of Logsitic Regression is that it has less model parameters compared to another classifer like Gaussian Class Conditionals, this means the training phase will be relatively quick to compute.

### Briefly explain whether Logistic Regressionis discriminative or generative ?

Logistic Regression is a discriminative model, since it does not attempt to model the complete probaility of the training data instead it only attempts to model the conditional probability of the target output given the input, for in this case $P(\text{Orange} | X)$.