# Assignment 02

```
Build a binary classifier for human versus horse based on logistic regression using the dataset that consists of human and horse images
```

## Binary classification based on logistic regression

$(x_i, y_i)$ denotes a pair of a training example and $i = 1, 2, \cdots, n$

$\hat{y}_i = \sigma(z_i)$ where $z_i = w^T x_i + b$ and $\sigma(z) = \frac{1}{1 + \exp(-z)}$

The loss function is defined by $\mathcal{L} = \frac{1}{n} \sum_{i=1}^n f_i(w, b)$

$f_i(w, b) = - y_i \log \hat{y}_i - (1 - y_i) \log (1 - \hat{y}_i) $

## Dataset

- The dataset consists of human images and horse images for the training and the validation
- The classifier should be trained using the training set
- The classifier should be tested using the validation set

## Implementation

- Write codes in python programming
- Use ```jupyter notebook``` for the programming environment
- You have to write your own implementation for the followings:
    - compute the loss
    - compute the accuracy
    - compute the gradient of the model parameters with respect to the loss
    - update the model parameters
    - plot the results

## Optimization

- Apply the gradient descent algorithm with an appropriate learning rate
- Apply the number of iterations that lead to the convergence of the algorith
- Use the vectorization scheme in the computation of gradients and the update of the model parameters

## git commit

- Apply a number of ```git commit``` at intermediate development steps with their descriptive comments 

## Output

- Plot the training loss at every iteration (x-axis: iteration, y-axis: loss)
- Plot the validation loss at every iteration (x-axis: iteration, y-axis: loss)
- Plot the training accuracy at every iteration (x-axis: iteration, y-axis: accuracy)
- Plot the validation accuracy at every iteration (x-axis: iteration, y-axis: accuracy)
- Present the table for the final accuracy and loss with training and validation datasets as below:

| dataset    | loss       | accuracy   | 
|:----------:|:----------:|:----------:|
| training   |            |            |
| validation |            |            |

## Submission

- A PDF file exported from jupyter notebook for codes, results and comments [example: 20191234_02.pdf]
- A PDF file exported from the github website for the history of git commit [example: 20191234_02_git.pdf]


In [1]:
import numpy as np
import matplotlib.pyplot as plt

In [93]:
'''
import torch
from torch.utils.data import Dataset, DataLoader
import torchvision.transforms as transforms
import torchvision
import os

NUM_EPOCH = 1

transform = transforms.Compose([#transforms.Resize((256,256)),  
                                transforms.Grayscale(),		# the code transforms.Graysclae() is for changing the size [3,100,100] to [1, 100, 100] (notice : [channel, height, width] )
                                transforms.ToTensor(),])


#train_data_path = 'relative path of training data set'
train_data_path = './horse-or-human/train'
trainset = torchvision.datasets.ImageFolder(root=train_data_path, transform=transform)
# change the valuse of batch_size, num_workers for your program
# if shuffle=True, the data reshuffled at every epoch 
trainloader = torch.utils.data.DataLoader(trainset, batch_size=3, shuffle=False, num_workers=1)  


validation_data_path = './horse-or-human/validation'
valset = torchvision.datasets.ImageFolder(root=validation_data_path, transform=transform)
# change the valuse of batch_size, num_workers for your program
valloader = torch.utils.data.DataLoader(valset, batch_size=3, shuffle=False, num_workers=1)  


for epoch in range(NUM_EPOCH):
    # load training images of the batch size for every iteration
    for i, data in enumerate(trainloader):

        # inputs is the image
        # labels is the class of the image
        inputs, labels = data

        # if you don't change the image size, it will be [batch_size, 1, 100, 100]
        print(inputs.shape)

        # if labels is horse it returns tensor[0,0,0] else it returns tensor[1,1,1]
        print(labels)  





    # load validation images of the batch size for every iteration
    for i, data in enumerate(valloader):
        
        # inputs is the image
        # labels is the class of the image
        inputs, labels = data

        # if you don't change the image size, it will be [batch_size, 1, 100, 100]
        print(inputs.shape)

        # if labels is horse it returns tensor[0,0,0] else it returns tensor[1,1,1]
        print(labels)    
'''

"\nimport torch\nfrom torch.utils.data import Dataset, DataLoader\nimport torchvision.transforms as transforms\nimport torchvision\nimport os\n\nNUM_EPOCH = 1\n\ntransform = transforms.Compose([#transforms.Resize((256,256)),  \n                                transforms.Grayscale(),\t\t# the code transforms.Graysclae() is for changing the size [3,100,100] to [1, 100, 100] (notice : [channel, height, width] )\n                                transforms.ToTensor(),])\n\n\n#train_data_path = 'relative path of training data set'\ntrain_data_path = './horse-or-human/train'\ntrainset = torchvision.datasets.ImageFolder(root=train_data_path, transform=transform)\n# change the valuse of batch_size, num_workers for your program\n# if shuffle=True, the data reshuffled at every epoch \ntrainloader = torch.utils.data.DataLoader(trainset, batch_size=3, shuffle=False, num_workers=1)  \n\n\nvalidation_data_path = './horse-or-human/validation'\nvalset = torchvision.datasets.ImageFolder(root=validation_

In [60]:
import os
from glob import glob
from PIL import Image

training_data_path = './horse-or-human/train'
validation_data_path = './horse-or-human/validation'

training_data_list_horses = glob(os.path.join(training_data_path, 'horses/*'))
training_data_list_humans = glob(os.path.join(training_data_path, 'humans/*'))

validation_data_list_horses = glob(os.path.join(validation_data_path, 'horses/*'))
validation_data_list_humans = glob(os.path.join(validation_data_path, 'humans/*'))

# 0 for horses
# 1 for humans

width = 100
height = 100

number_of_training_data_horses = len(training_data_horses)
number_of_training_data_humans = len(training_data_humans)

number_of_validation_data_horses = len(validation_data_horses)
number_of_validation_data_humans = len(validation_data_humans)

number_of_training_data = number_of_training_data_horses + number_of_training_data_humans
number_of_validation_data = number_of_validation_data_horses + number_of_validation_data_humans

training_data_horses = np.zeros((len(training_data_list_horses), width, height))
training_data_humans = np.zeros((len(training_data_list_humans), width, height))

validation_data_horses = np.zeros((len(validation_data_list_horses), width, height))
validation_data_humans = np.zeros((len(validation_data_list_humans), width, height))

# index x width x height
for i, fname in enumerate(training_data_list_horses):
    tmp = Image.open(fname)
    training_data_horses[i,:,:] = np.array(tmp)

for i, fname in enumerate(training_data_list_humans):
    tmp = Image.open(fname)
    training_data_humans[i,:,:] = np.array(tmp)

for i, fname in enumerate(validation_data_list_horses):
    tmp = Image.open(fname)
    validation_data_horses[i,:,:] = np.array(tmp)
    
for i, fname in enumerate(validation_data_list_humans):
    tmp = Image.open(fname)
    validation_data_humans[i,:,:] = np.array(tmp)

training_data = np.vstack((training_data_horses, training_data_humans))
validation_data = np.vstack((validation_data_horses, validation_data_humans))

training_data_label = np.ones((number_of_training_data))
validation_data_label = np.ones((number_of_validation_data))
for i in range(len(training_data_list_horses)):
    training_data_label[i] = 0
for i in range(len(validation_data_list_horses)):
    validation_data_label[i] = 0



In [67]:
# reshape to index x data
training_data = training_data.reshape(number_of_training_data, width*height)
validation_data = validation_data.reshape(number_of_validation_data, width*height)

In [68]:
def sigmoid(x):
    result = 1+np.exp(-x)
    return result

In [69]:
def cross_entropy(y, y_hat):
    result = -y*np.log(y_hat) - (1-y)*np.log(1-y_hat)
    return result

In [71]:
def gradient(f, x, h=1e-5):
    result = (f(x+h)-f(x)) / h
    return result

In [74]:
def partial_difference_quotient(f, v, i, h):
# """ 함수 f의 i번째 편도함수가 v에서 가지는 값 """

    w = [v_j + (h if j == i else 0) # h를 v의 i번째 변수에만 더해주자.
        for j, v_j in enumerate(v)] # 즉 i 번째 변수만 변화할 경우

    return (f(w) - f(v)) / h

# 출처: https://hamait.tistory.com/747 [HAMA 블로그]