<a href="https://colab.research.google.com/github/QBlek/ML_practice/blob/main/MLpractice6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#1.

The eigenvectors that PCA chooses to reduce dataset dimensionality are orthonormal. What does this mean and why is this desirable?

- This means they are perpendicular to each other. The eigenvector is showing us how those data sets are related along that line. The second eigenvector gives us the other pattern in the data, that all the points follow the main line, but are off to the side of the main line by some amount. Thus, with this process of taking the eigenvectors of the covariance matrix, we have been able to extract lines that characterize the data. We want to maximize variance and minimum error here, this is why it is desirable here.

---

#2.

Consider the AdaBoost algorithm summarized in the textbook and discussed in class. Recall that this algorithm updates weights for each training example in response to how accurately the base (weak) classifier performed on this example in the previous iteration.

Suppose that we start with a uniform weight distribution for 20 data points (each point is weighted 0.05). Describe how the weights will be adjusted after one iteration of AdaBoost if the weak classifier is a weighted Majority Classifier. In this example, 18 examples belong to the positive class and 2 examples belong to the negative class.

- $18+, 2-$

(Weighted Error Rate) $\epsilon = 2/20 = 0.1$

(Adaptive Parameter) $\alpha = (1/2)*\log((1-\epsilon)/\epsilon) = (1/2) * \log((1-0.1)/0.1) = (1/2)*\log(9) = \log(3)$

(New Weights for each Positive class) $1*\exp[-\log(3)] = 1/3$

(New Weights for each Negative class) $1*\exp[\log(3)] = 3$

$Z = 18 * (1/3) + 2 * 3 = 12$

**Positive class weights $= (1/12) * (1/3) = 1/36$ , Negative class weights $= (1/12) * 3 = 1/4$**

---

#3.

In this assignment you will build a simple reinforcement learning agent which uses q-learning to play the game ``frozen lake''. The frozen lake is a 4x4 grid represented as the following:

Representation | Character meaning
--- | ---
SFFF     |  (S: starting point, safe)
FHFH     |  (F: frozen surface, safe)
FFFH     |  (H: hole, fall to your doom)
HFFG     |  (G: goal, where the frisbee is located)

The description of the problem is:
``Winter is here. You and your friends were tossing around a frisbee at the park when you made a wild throw that left the frisbee out in the middle of the lake. The water is mostly frozen, but there are a few holes where the ice has melted. If you step into one of those holes, you'll fall into the freezing water. At this time, there's an international frisbee shortage, so it's absolutely imperative that you navigate across the lake and retrieve the disc. However, the ice is slippery, so you won't always move in the direction you intend.''

Below, I have started a program for you to implement a q-learning strategy to solve this problem. Each episode consists of
- restarting the environment state
- selecting and performing actions (for a maximum of max_steps steps)
- updating the q table after each step
- keeping track of the total reward for the episode

Report the reward averaged over the episodes as well as the reward for the last episode.
The initial hyperparameters are defined for you, but you can vary these to see how they affect performance.

When you select an action, add some element of chance by using the exploration versus exploitation option (the value of $\epsilon$ is provided in the code below). At the end of each episode, update the exploration parameter to decrease it with experience: $\epsilon = minepsilon + (maxepsilon - minepsilon) \times e^{-decayrate \times episodenumber}$. The example we went over in class should provide insight on using openai gym, but there is also documentation online at https://www.google.com/search?client=firefox-b-1-d&q=openai+gym.

Report the best results you observed and the hyperparameters that achieved those results.

In [None]:
'''
A frozenlake-v0 is a 4x4 grid world which looks as follows:
SFFF       (S: starting point, safe)
FHFH       (F: frozen surface, safe)
FFFH       (H: hole, fall to your doom)
HFFG       (G: goal, where the frisbee is located)
Additionally, there is a little uncertainity in the agent movement.
Q Learning - A simple q Learning algorithm is employed for the task.
The q values are stored in a table and these are updated in each iteration
to converge to their optimum values.
'''

import gym
import numpy as np
import random
import math

def main():
  env = gym.make("FrozenLake-v0")

  num_episodes = 100000
  max_steps = 500
  gamma = 0.99
  learning_rate = 0.1
  discount_rate = 0.99
  epsilon = 1.0
  max_epsilon = 1.0
  min_epsilon = 0.01
  decay_rate = 0.01

  # initialize the Q table
  state_space_size = env.observation_space.n
  action_space_size = env.action_space.n
  Q = np.zeros((state_space_size, action_space_size))
  rewards = []

  for i in range(num_episodes):
    state = env.reset()
    total_reward = 0
    done = False
    for j in range(max_steps):
      # pick a random number to decide between explore and exploit
      if random.uniform(0, 1) < epsilon:
        # if explore, use Q value to pick action
        #action = np.argmax(Q[state])
        # if explore, pick randomly with env.action_space.sample()
        action = env.action_space.sample()
      else:
        # if exploit, pick randomly with env.action_space.sample()
        #action = env.action_space.sample()
        # if exploit, use Q value to pick action
        action = np.argmax(Q[state])
      
      # update state, reward, done, and info based on taking action
      next_state, reward, done, info = env.step(action)
      # update total_reward
      total_reward += reward
      # update Q values
      old_value = Q[state, action]
      next_max = np.max(Q[next_state])
      new_value = (1 - learning_rate) * old_value + learning_rate * (reward + gamma * next_max)
      Q[state, action] = new_value
      # update state
      state = next_state
      # if done, break
      if done == True:
          break
      if i == (num_episodes-1):   # see a visualization of the agent
        env.render()
    epsilon = min_epsilon + (max_epsilon - min_epsilon)*np.exp(-decay_rate*i) 
    rewards.append(total_reward)
  print(np.around(Q,6))
  print('score:', np.mean(rewards))

main()

- I find the best result I got is around **0.66** with *0.1 learning rate* and *0.99 discount rate* and *1.0 epsilon*. (Higher learning rate and lower discount rate produces even lower score.) For many other trials, I was usually getting **0.65**ish result with the same condition for hyperparameters.

#4.

(50 points) In this assignment you are asked to use keras and tensorflow to design and compare three different convolutional neural networks, two to recognize MNIST digits and two to recognize MNIST fashion. You can design the network structures as you like but each one should differ in both structure (number and ordering of convolution, ReLU, and pooling layers) and parameter values. Provide a brief justification of each network and summarize the performance of the alternative network structures. Specifically comment on how the network structure needs to be changed when you move from the MNIST dataset to the MNIST fashion dataset.

To get you started, I provided code below for one sample structure that classifies MNIST digits. For fashion, replace mnist = keras.datasets.mnist with mnist = keras.datasets.fashion_mnist.

In [None]:
### When test the result, change the variable for "mnist" and "class_names" for each dataset.
### which line is for what dataset is commneted next to that line.
### Uncomment the commented ones when using it.

# TensorFlow and tf.keras
import tensorflow as tf
from tensorflow import keras

# Helper libraries
import numpy as np
import matplotlib.pyplot as plt

# Load dataset

# Using MNIST handwritten digits (28x28 grayscale images, 60K training, 10K testing)
# If use different dataset, then change class names and possibly normalization.
#mnist = keras.datasets.mnist            # for MNIST digits
mnist = keras.datasets.fashion_mnist    # for MNIST fashion dataset
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
#class_names = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']  # for MNIST digits
class_names = ['t-shirts', 'long pants', 'sweat shirts', 'one-piece', 'jackets', 'high heels', 'long shirts', 'shoes', 'bags', 'boots'] # for MNIST fashion dataset
# Normalize inputs
train_images = train_images / 255.0
test_images = test_images / 255.0

# Display first 25 training images
plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_images[i], cmap=plt.cm.binary)
    plt.xlabel(class_names[train_labels[i]])
plt.show()

# Reshape images for compatibility with convolutional layer
train_images = np.reshape(train_images, (60000,28,28,1))
test_images = np.reshape(test_images, (10000,28,28,1))

# Build, train, and evaluate model

def eval_model(model, epochs=10):
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    print ('num', len(train_labels))
    hist = model.fit(train_images, train_labels, epochs=epochs)
    #train_acc = hist.history['acc'][-1]
    test_loss, test_acc = model.evaluate(test_images, test_labels)
    model.summary()
    print('Epochs: ' + str(epochs))
    print(hist)
    #print('Training accuracy: ' + str(train_acc))
    #print('Testing accuracy: ' + str(test_acc))

# Model 1: flattened image input, 128-node dense ReLU hidden, 10-node dense softmax output

def run_model0(epochs=10):
    print(">>>>> Model 0:") # Dense, Dense
    model = keras.Sequential([
        keras.layers.Flatten(input_shape=(28,28,1)),
        keras.layers.Dense(units=32, activation=tf.nn.relu),
        keras.layers.Dense(units=32, activation=tf.nn.relu),
        keras.layers.Dense(units=10, activation=tf.nn.softmax)
        ])
    eval_model(model, epochs)

def run_model1(epochs=10):
    print(">>>>> Model 1:") # Conv2D(64), MaxPool2D
    model = keras.Sequential([
        keras.layers.Conv2D(filters=64, kernel_size=(3,3),input_shape=(28,28,1), activation=tf.nn.relu),
        keras.layers.MaxPooling2D(pool_size=(2,2)),
        keras.layers.Flatten(),
        keras.layers.Dense(units=10, activation=tf.nn.softmax)
        ])
    eval_model(model, epochs)

def run_model2(epochs=10):
    print(">>>>> Model 2:") # Conv2D(32), MaxPool2D, Conv2D(32), MaxPool2D
    model = keras.Sequential([
        keras.layers.Conv2D(filters=32, kernel_size=(3,3),input_shape=(28,28,1), activation=tf.nn.relu),
        keras.layers.MaxPooling2D(pool_size=(2,2)),
        keras.layers.Conv2D(filters=64, kernel_size=(3,3),input_shape=(28,28,1), activation=tf.nn.relu),
        keras.layers.MaxPooling2D(pool_size=(2,2)),
        keras.layers.Flatten(),
        keras.layers.Dense(units=10, activation=tf.nn.softmax)
        ])
    eval_model(model, epochs)

def run_model3(epochs=10):
    print(">>>>> Model 3:") # Conv2D(64), MaxPool2D, Dense(64)
    model = keras.Sequential([
        keras.layers.Conv2D(filters=64, kernel_size=(3,3),input_shape=(28,28,1), activation=tf.nn.relu),
        keras.layers.MaxPooling2D(pool_size=(2,2)),
        keras.layers.Flatten(),
        keras.layers.Dense(units=64, activation=tf.nn.relu),
        keras.layers.Dense(units=10, activation=tf.nn.softmax)
        ])
    eval_model(model, epochs)

def run_model4(epochs=10):
    print(">>>>> Model 4:") # Conv2D(32), MaxPool2D, Conv2D(32), MaxPool2D, Dense(32), Dense(32)
    model = keras.Sequential([
        keras.layers.Conv2D(filters=32, kernel_size=(3,3),input_shape=(28,28,1), activation=tf.nn.relu),
        keras.layers.MaxPooling2D(pool_size=(2,2)),
        keras.layers.Conv2D(filters=32, kernel_size=(3,3),input_shape=(28,28,1), activation=tf.nn.relu),
        keras.layers.MaxPooling2D(pool_size=(2,2)),
        keras.layers.Flatten(),
        keras.layers.Dense(units=32, activation=tf.nn.relu),
        keras.layers.Dense(units=32, activation=tf.nn.relu),
        keras.layers.Dense(units=10, activation=tf.nn.softmax)
        ])
    eval_model(model, epochs)

#######################################################################################################
# Three functions: One for MNIST digits dataset, Two for MNIST fashion dataset.

def run_model_test1(epochs=10):  # For Digits dataset
    print(">>>>> Test Model 1:") # Dense(64), Dense(64), Dense(64)
    model = keras.Sequential([
        keras.layers.Flatten(input_shape=(28,28,1)),
        keras.layers.Dense(units=64, activation=tf.nn.relu),
        keras.layers.Dense(units=64, activation=tf.nn.relu),
        keras.layers.Dense(units=64, activation=tf.nn.relu),
        keras.layers.Dense(units=10, activation=tf.nn.softmax)
        ])
    eval_model(model, epochs)

def run_model_test2(epochs=10):  # For Fashion dataset
    print(">>>>> Test Model 2:") # Conv2D(256), MaxPool2D, Conv2D(512), MaxPool2D
    model = keras.Sequential([
        keras.layers.Conv2D(filters=256, kernel_size=(3,3),input_shape=(28,28,1), activation=tf.nn.relu),
        keras.layers.MaxPooling2D(pool_size=(2,2)),
        keras.layers.Conv2D(filters=512, kernel_size=(3,3),input_shape=(28,28,1), activation=tf.nn.relu),
        keras.layers.MaxPooling2D(pool_size=(2,2)),
        keras.layers.Flatten(),
        keras.layers.Dense(units=10, activation=tf.nn.softmax)
        ])
    eval_model(model, epochs)

def run_model_test3(epochs=10):  # For Fashion dataset
    print(">>>>> Test Model 3:") # Conv2D(128), MaxPool2D, Dense(512)
    model = keras.Sequential([
        keras.layers.Conv2D(filters=128, kernel_size=(3,3),input_shape=(28,28,1), activation=tf.nn.relu),
        keras.layers.MaxPooling2D(pool_size=(2,2)),
        keras.layers.Flatten(),
        keras.layers.Dense(units=512, activation=tf.nn.relu),
        keras.layers.Dense(units=10, activation=tf.nn.softmax)
        ])
    eval_model(model, epochs)

#######################################################################################################

#run_model0(epochs=10)
run_model_test1(epochs=10)  # For Digits dataset
run_model_test2(epochs=10)  # For Fashion dataset
run_model_test3(epochs=10)  # For Fashion dataset

(Accuracies are based on test result, not training)

- For *run_model_test1()*, I added one more Dense from the example code that's already given for MNIST Digits dataset, and made all parameters from 32 to 64, and it looks like Accuracy slightly went up by doing this for MNIST Digits dataset. When it's testing for *MNIST Fashion dataset*, it only results accuracy of almost **88%**, whereas it was resulting about **98%** for *MNIST Digits dataset*.

When it changes its dataset from MNIST Digits to MNIST Fashion, the structure needs to have somewhat less number of Conv2D and MaxPooling2D repeating to get higher accuracy. I find repeatedly using them makes accuracy lower, whereas accuracy for Digits stays high in most of time. And making Dense(with ReLU)'s unit value dramatically higher and making Conv2D's filters' value higher make higher accuracy as a result. When it's not using Dense(with ReLU), I had to use much higher value of filters in Conv2D and made accuracy higher.

- For *run_model_test2()*, I used filters 256 in first Conv2D and another filters of 512 in second Conv2D, without using Dense(with ReLU) at the end. When it's testing with *MNIST Fashion dataset*, this results about almost **91%** accuracy, and I think it's resulting a little bit lower than the one using Conv2D and MaxPooling2D only once. When it's testing with *MNIST Digits dataset*, it's resulting even higher accuracy(**99%**).

- For *run_model_test3()*, I used filters 128 in Conv2D and unit value of 512 in Dense(with ReLU), this resulted almost **91-92%** accuracy when it's testing for *MNIST Fashion dataset*. It seems like Dense(with ReLU) makes higher accuracy as well. When it's testing for *MNIST Digits dataset*, just like the test2, it's resulting even higher accuracy(about **99%**).