# Assignment 2

## Instructions
- Your submission should be the `.ipynb` file with your name,
  like `YusufMesbah.ipynb`. it should include the answers to the questions in
  markdown cells.
- You are expected to follow the best practices for code writing and model
training. Poor coding style will be penalized.
- You are allowed to discuss ideas with your peers, but no sharing of code.
Plagiarism in the code will result in failing. If you use code from the
internet, cite it.
- If the instructions seem vague, use common sense.

# Task 1: ANN (30%)
For this task, you are required to build a fully connect feed-forward ANN model
for a multi-label regression problem.

For the given data, you need do proper data preprocessing, design the ANN model,
then fine-tune your model architecture (number of layers, number of neurons,
activation function, learning rate, momentum, regularization).

For evaluating your model, do $80/20$ train test split.

### Data
You will be working with the data in `Task 1.csv` for predicting students'
scores in 3 different exams: math, reading and writing. The columns include:
 - gender
 - race
 - parental level of education
 - lunch meal plan at school
 - whether the student undertook the test preparation course

In [6]:
import numpy as np
import pandas as pd
import sklearn as sl
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder
from tensorflow import keras
from tensorflow.keras import datasets, layers, losses, models

In [7]:
data = pd.read_csv('Task 1.csv')
col_scores = ['math score','reading score', 'writing score']
data.head()


Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
0,male,group A,high school,standard,completed,67,67,63
1,female,group D,some high school,free/reduced,none,40,59,55
2,male,group E,some college,free/reduced,none,59,60,50
3,male,group B,high school,standard,none,77,78,68
4,male,group E,associate's degree,standard,completed,78,73,68


In [8]:
def OE(changed_dataset, columns):
    encoder = OrdinalEncoder()
    for column in columns:
        column_val = changed_dataset[column].values
        changed_dataset[column] = encoder.fit_transform(column_val.reshape(-1,1))
    return changed_dataset

def OHE(changed_dataset, columns_ch):
    for column in columns_ch:
        uniq_val = changed_dataset[column].unique()
        encoder = OneHotEncoder()
        OneHot = encoder.fit_transform(changed_dataset[[column]]).toarray()

        dat = pd.DataFrame(OneHot, columns=uniq_val)
        changed_dataset = changed_dataset.drop(columns = column).join(dat)
        #print(changed_dataset)
    return changed_dataset

In [9]:
new_df = OHE(
            OE(data, ['race/ethnicity','parental level of education',]),
            ['gender','lunch','test preparation course'])
new_df.head()

Unnamed: 0,race/ethnicity,parental level of education,math score,reading score,writing score,male,female,standard,free/reduced,completed,none
0,0.0,2.0,67,67,63,0.0,1.0,0.0,1.0,1.0,0.0
1,3.0,5.0,40,59,55,1.0,0.0,1.0,0.0,0.0,1.0
2,4.0,4.0,59,60,50,0.0,1.0,1.0,0.0,0.0,1.0
3,1.0,2.0,77,78,68,0.0,1.0,0.0,1.0,0.0,1.0
4,4.0,0.0,78,73,68,0.0,1.0,0.0,1.0,1.0,0.0


In [10]:
def get_train_data(dataset,column,size):
    x_train,x_test,y_train,y_test = train_test_split(dataset.drop(columns=column),dataset[column],test_size=size)
    return x_train, x_test, y_train, y_test

x_train, x_test, y_train, y_test = get_train_data(new_df,col_scores,0.2)
x_test

Unnamed: 0,race/ethnicity,parental level of education,male,female,standard,free/reduced,completed,none
259,3.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0
874,2.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0
44,4.0,2.0,0.0,1.0,0.0,1.0,0.0,1.0
335,2.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0
672,2.0,5.0,1.0,0.0,1.0,0.0,0.0,1.0
...,...,...,...,...,...,...,...,...
666,2.0,4.0,1.0,0.0,0.0,1.0,1.0,0.0
213,2.0,5.0,0.0,1.0,0.0,1.0,0.0,1.0
698,2.0,4.0,0.0,1.0,1.0,0.0,1.0,0.0
789,2.0,5.0,0.0,1.0,1.0,0.0,0.0,1.0


In [11]:
#Creating base neural network
model = keras.Sequential([
    layers.Dense(8, activation='relu', input_shape=(8,)),
    layers.Dense(6, activation='relu'),
    layers.Dense(3, activation='sigmoid'),
])

In [12]:
model.compile(loss="categorical_crossentropy",
              optimizer="adam",
              metrics = ['accuracy'])
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 8)                 72        
                                                                 
 dense_1 (Dense)             (None, 6)                 54        
                                                                 
 dense_2 (Dense)             (None, 3)                 21        
                                                                 
Total params: 147
Trainable params: 147
Non-trainable params: 0
_________________________________________________________________


In [14]:
history = model.fit(x_train.to_numpy(), y_train.to_numpy(), batch_size=100, epochs=20, validation_split=0.1)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


### Questions
1. What preprocessing techniques did you use? Why?
    - *Answer*
2. Describe the fine-tuning process and how you reached your model architecture.
    - *Answer*

# Task 2: CNN (40%)
For this task, you will be doing image classification:
- First, adapt your best model from Task 1 to work on this task, and
fit it on the new data. Then, evaluate its performance.
- After that, build a CNN model for image classification.
- Compare both models in terms of accuracy, number of parameters and speed of
inference (the time the model takes to predict 50 samples).

For the given data, you need to do proper data preprocessing and augmentation,
data loaders.
Then fine-tune your model architecture (number of layers, number of filters,
activation function, learning rate, momentum, regularization).

### Data
You will be working with the data in `triple_mnist.zip` for predicting 3-digit
numbers writen in the image. Each image contains 3 digits similar to the
following example (whose label is `039`):

![example](https://github.com/shaohua0116/MultiDigitMNIST/blob/master/asset/examples/039/0_039.png?raw=true)

In [None]:
# TODO: Implement task 2

### Questions
1. What preprocessing techniques did you use? Why?
    - *Answer*
2. What data augmentation techniques did you use?
    - *Answer*
3. Describe the fine-tuning process and how you reached your final CNN model.
    - *Answer*

# Task 3: Decision Trees and Ensemble Learning (15%)

For the `loan_data.csv` data, predict if the bank should give a loan or not.
You need to do the following:
- Fine-tune a decision tree on the data
- Fine-tune a random forest on the data
- Compare their performance
- Visualize your DT and one of the trees from the RF

For evaluating your models, do $80/20$ train test split.

### Data
- `credit.policy`: Whether the customer meets the credit underwriting criteria.
- `purpose`: The purpose of the loan.
- `int.rate`: The interest rate of the loan.
- `installment`: The monthly installments owed by the borrower if the loan is funded.
- `log.annual.inc`: The natural logarithm of the self-reported annual income of the borrower.
- `dti`: The debt-to-income ratio of the borrower.
- `fico`: The FICO credit score of the borrower.
- `days.with.cr.line`: The number of days the borrower has had a credit line.
- `revol.bal`: The borrower's revolving balance.
- `revol.util`: The borrower's revolving line utilization rate.

In [None]:
# TODO: Implement task 3

### Questions
1. How did the DT compare to the RF in performance? Why?
    - *Answer*
2. After fine-tuning, how does the max depth in DT compare to RF? Why?
    - *Answer*
3. What is ensemble learning? What are its pros and cons?
    - *Answer*
4. Briefly explain 2 types of boosting methods and 2 types of bagging methods.
Which of these categories does RF fall under?
    - *Answer*

# Task 4: Domain Gap (15%)

Evaluate your CNN model from task 2 on SVHN data without retraining your model.

In [None]:
# TODO: Implement task 4

### Questions
1. How did your model perform? Why is it better/worse?
    - *Answer*
2. What is domain gap in the context of ML?
    - *Answer*
3. Suggest two ways through which the problem of domain gap can be tackled.
    - *Answer*