# Assignment 2
*Due: November 19th, 23:59 Copenhagen (CET) time.* Some general remarks for handing in exercises:
- Each exercise comes with context and code from the exercise-set of which it is a part. It is up to you to recycle the right code. If this notebook can be executed from top to bottom on another computer (given the right libraries are installed and data stored) it makes it easier to give points for exercises that were only partially finished for whatever reason
- Make sure to answer each sub-exercise
- Commenting amply on your results makes it easier to understand that you were on the right track, even if the answer was wrong

### Week 5

> **Ex. 5.1.4**: Did the network finish training? Consider the generated text across epochs.
> 1. In the early batches (0-10), the generated text looks very bad. Can you explain why the low diversity generated text contains almost only the symbol " " (that is, spaces)?
> 2. The high diversity generated text is strange too, but in a different way. Explain how and why (include an explanation of what the diversity function does).



*Answer 5.1.4.1* <br>
The low diversity generated text in which the diversity is close to 0 (around 0.2 in the on_epoch_end function) containly almost only the symbol " " since the character predictions made by the functions are less variable. In order to generate text, we take the predictions returned by the model and input it into the sample function. From there, the temperature, or diversity, determines whether characters are sampled from the probability vector of predictions (close to 1) or whether we repeatedly predict the character with the highest probability (close to 0). Thus, in the early batches we have almost all spaces since the diversity is low and the model quickly learns that repeatedly selecting the most common character can reduce loss early on. 

*Answer 5.1.4.2* <br>
The high diversity generated text is messed up as well because there is too much diversity in the predictions so the words don't make any sense since the characters are all sampled from the probability vector of predictions. Since the high diversity text contains a temperature closer to 1, the character predictions are sampled directly from the probability vector returned by the model. However, sampling from this prediction probility vector doesn't account for relationships between the characters, i.e., what characters are likely to come after others, so the sequences of letters predicted are not realisitic (although their distributions might be more representative of the probability vector output by the network). 


> **Ex. 5.1.6**: Do the same as above, but for 40 random letters (e.g. smash away on your keyboard) as seed. What happens? Can you explain why?

*Answer 5.1.6* <br>
Even when the seed is 40 random letters, the network is still able to quicky learn to generate text that is reminiscent of the pulp fiction screenplay. While we would expect to see more text generated that is similar to the random text input, we see the same types of text that we trained the network on since we have likely overfit the network to that type of text (from the previous fitting of the model to the seeds from the text). 

In [3]:
%matplotlib inline

import numpy as np
import random
import requests as rq
import sys
import io
from bs4 import BeautifulSoup
from keras.callbacks import LambdaCallback
from keras.models import Sequential
from keras.layers import Dense, LSTM
from keras.optimizers import RMSprop
from collections import Counter
from datetime import datetime
import keras
import keras.callbacks
from keras.callbacks import TensorBoard
%load_ext tensorboard

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


In [None]:
response = rq.get("http://www.dailyscript.com/scripts/pulp_fiction.html")
text = BeautifulSoup(response.content, "html.parser").getText()

new_seed = "dnasklgnqrgnqerpignerpggerourgnqeroginoe"

chars = sorted(list(set(text)))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

seqlen = 40
step = seqlen
sentences = []
for i in range(0, len(text) - seqlen - 1, step):
    sentences.append(text[i: i + seqlen + 1])

x = np.zeros((len(sentences), seqlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), seqlen, len(chars)), dtype=np.bool)

for i, sentence in enumerate(sentences):
    for t, (char_in, char_out) in enumerate(zip(sentence[:-1], sentence[1:])):
        x[i, t, char_indices[char_in]] = 1
        y[i, t, char_indices[char_out]] = 1

model = Sequential()
model.add(LSTM(128, input_shape=(seqlen, len(chars)), return_sequences=True))
model.add(Dense(len(chars), activation='softmax'))

model.compile(
    loss='categorical_crossentropy',
    optimizer=RMSprop(learning_rate=0.01),
    metrics=['categorical_crossentropy', 'accuracy']
)

def sample(preds, temperature=1.0):
    """Helper function to sample an index from a probability array."""
    preds = np.asarray(preds).astype('float64')
    preds = np.exp(np.log(preds) / temperature)  # softmax
    preds = preds / np.sum(preds)                #
    probas = np.random.multinomial(1, preds, 1)  # sample index
    return np.argmax(probas)                     #

def on_epoch_end(epoch, _):
    """Function invoked at end of each epoch. Prints generated text."""
    print()
    print('----- Generating text after Epoch: %d' % epoch)

    start_index = random.randint(0, len(text) - seqlen - 1)
     
    for diversity in [0.2, 0.5, 1.0]:
        print('----- diversity:', diversity)

        generated = ''
        sentence = new_seed
        generated += sentence
        print('----- Generating with seed: "' + sentence + '"')
        sys.stdout.write(generated)

        for i in range(5000):
            x_pred = np.zeros((1, seqlen, len(chars)))
            for t, char in enumerate(sentence):
                x_pred[0, t, char_indices[char]] = 1.

            preds = model.predict(x_pred, verbose=0)
            next_index = sample(preds[0, -1], diversity)
            next_char = indices_char[next_index]

            sentence = sentence[1:] + next_char

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

print_callback = LambdaCallback(on_epoch_end=on_epoch_end)

model.fit(x, y,
          batch_size=128,
          epochs=50,
          callbacks=[print_callback])

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  x = np.zeros((len(sentences), seqlen, len(chars)), dtype=np.bool)
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  y = np.zeros((len(sentences), seqlen, len(chars)), dtype=np.bool)


Epoch 1/50
----- Generating text after Epoch: 0
----- diversity: 0.2
----- Generating with seed: "dnasklgnqrgnqerpignerpggerourgnqeroginoe"
dnasklgnqrgnqerpignerpggerourgnqeroginoe                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    

dnasklgnqrgnqerpignerpggerourgnqeroginoe                                                                                                                                                                                                                                                                     T                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          

       l             d      fI RImarotgeto              F  Bluni  e              e        oy           yt HAT

     odH etin, oild sto          AEO
EM            AO          BG                             – klT             c   e      w N      M          a"IIgr  n   NtTULMCT         e4     TB VAO
             i        r f     Se b    .        GAN
e        
     e    lV'"ler  
    BhmeEw.

       t                 i     ihe,sehapetdrit 
     a– n                                      UWf LLpfi    lfetU

  f   BiiFEBywS
AIB    oI  N?TNtrp         g"les, 
   a                                              tow 
SONEGL        aI     TfalkDd tn  
                 I fa.

S      wue       bIMrYs                        s d            e        i                y   Y                N        h      1    WwheNt         T I g  s              T       cu?    HLy a  M       e BcinscinCotp.y, o        NM      Y ,  I0         fran. Re       owe f T  Fb Nudlb  
 B         E'              IieTYaho    B'   I  

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        

                   tinomanit. Pan'thes hit Vor qunche bi                         VINCE

                                .                                                                                                      MVINT
             Yu-en. I)
                               of dollirr  bis Whe tosthon "

        D co that mrar hang shimhd wo 
                                                                 Wan'  ANNNEO(                                                                                          in Hhat  na whond PG f.

          lowhly mid we 
                  o ha dein or was, -mangr upf binge ch. Wine thrind, 
                                                                               in sopldyhitdad, IoVE

                                                                       s wAim bfir.

             MIF
                        BUCENGANS yuy ok bicr ar cewAN

           M wh. 
                                        JULES
                    s Mathe 
     

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       Butch helld the mitht a dall the purds 
                                                                                                                                                                                                                                                                         

                       his wiendat a lfarband.

             wea meee yokseler – anddbick, atars "in.

                         Whall whald and tcr oned aing the my it the readm" and ofwar snodes hist? Ror yous, oullawe noverres, ind?

      Buter ard –oUTH. ROARIV MALY – TOMANS

                                      ovemelf your hemeleHe's swat to drondysidiy akes frasgs 
                              fulles acloindin'mo peraling, aike sboeat.

                                      I ce perd, houldeldips.

                VINCENT
                     walk shees.

                                 Wilkt or and Jullnth is hilf the a kic the loo?

                           JURLMIS
                   wobral it you-dien des reverdy's beythale ytut'rnou troed haid andy you min 
                     MRESRARLRULSIRU" 
                    Farcand a ryer.

   BULES
             Mardomendy.

          Mar you loor, Vinces.  erders thin,.

                    Vincingst.

               Hu kere, 


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        

                 I'm a back?S              Houll the shat fime his messelh yuppins nomes gre 
         TUL H WOCS PAFSANTRHOP – MABING BLAY tiL quile the  preras? .
                          ou ambad nck?

                                                          Jig.

                        he Whit miase to stat in ey.

                          

### Week 6

> **Ex. 6.1.1**: In your own words, explain what the following function arguments do in
the different model loading functions:
1. `include_top`: This specifies whether or not to include the fully connected top layer of a pre-trained model. If set to True, the model will include the fully connected layer that was trained on the original dataset, but if set to False, this layer will be excluded, allowing the user to add their own top layer for their specific use case. 
1. `weights`: This specifies whether to load the weights of a pre-trained model or not. This has a variety of options vailable such as imagenet, None or a path to saved weights.
1. `input_shape`: This specifies the shape of the input data that will be used with the pre-trained model.
1. `pooling`: This specifies the type of pooling used in the final layer of the model. This can be set to avg, max or None
1. `classes`: This specifies the number of classes or outputs for the final output layer of the model.

> **Ex. 6.1.2**: Following Jason's example under 'Pre-Trained Model as Classifier'
classify [this image](https://images.squarespace-cdn.com/content/v1/58f0ecc029687fbef7b86b03/1583064484458-IM0UKAZIONS6E2CFCDJC/ke17ZwdGBToddI8pDm48kD5ENJpXCfmjfXuRxqpPb-1Zw-zPPgdn4jUwVcJE1ZvWQUxwkmyExglNqGp0IvTJZUJFbgE-7XRK3dMEBRBhUpyN2spBBImrH38afc2UL8XBF0s2RHqmX-QW0wG37RpCsIsNysB0CO3b7e86dkNKVNs/Otter+Makes+an+Immediate+U-Turn+Back+to+the+Water.jpg?format=1500w).
Print not just the most likely label, but everything that `decode_predictions` returns.
>
> ***Note***: *The VGG16 model he uses is 500 MB to download, and will take quite long to load and apply.
> Rather use one of the smaller models instead ([here](https://keras.io/applications/#documentation-for-individual-models)'s an overview of model sizes), such as DenseNet121.*

In [None]:
import tensorflow as tf
# example of using a pre-trained model as a classifier
from tensorflow.keras.preprocessing.image  import load_img
from tensorflow.keras.preprocessing.image  import img_to_array
from tensorflow.keras.applications.densenet import preprocess_input
from tensorflow.keras.applications.densenet import decode_predictions
from tensorflow.keras.applications.densenet import DenseNet121
# load an image from file
image = load_img('Otter.jpg', target_size=(224, 224))
# convert the image pixels to a numpy array
image = img_to_array(image)
# reshape data for the model
image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
# prepare the image for the VGG model
image = preprocess_input(image)
# load the model
model = DenseNet121()
# predict the probability across all output classes
yhat = model.predict(image)
# convert the probabilities to class labels
label = decode_predictions(yhat)
# retrieve the results
label = [(class_name, prob) for (_, class_name, prob) in label[0]]
# print the classification
for class_name, prob in label:
    print('%s (%.2f%%)' % (class_name, prob*100))

> **Ex. 6.2.2:** Now, extract features for each datapoint, using a pre-trained neural network, thus building train and test input matrices `x_train_FE` and `x_test_FE`. Train a logistic regression classifier on the learned features, and report the accuracy on the test data.
You should be getting a significantly better performance than when using the raw data. Why is that; what work did the pretrained network do for you to be able to use a linear classifier and get such great performance on a clearly nonlinear problem?

In [None]:
# Answer 6.2.2 Code Part 1/2
import numpy as np
import matplotlib.pyplot as plt
from skimage.transform import resize
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

X = np.load('X_cat_vs_dog.npz')['arr_0']
Y = np.load('Y_cat_vs_dog.npz')['arr_0']

# Split train/test
x_train = X[0:500]
y_train = Y[0:500]
x_test = X[500:]
y_test = Y[500:]

# Resize images
x_train_resized = np.array([resize(img, (224, 224, 3)) for img in x_train])
x_test_resized = np.array([resize(img, (224, 224, 3)) for img in x_test])

# Preprocess the images for the Densenet model
x_train_preprocessed = preprocess_input(x_train_resized)
x_test_preprocessed = preprocess_input(x_test_resized)

# Extract features using the Densenet model
model = DenseNet121(weights='imagenet', include_top=False, pooling='avg')
x_train_FE = model.predict(x_train_preprocessed)
x_test_FE = model.predict(x_test_preprocessed)

# Flatten features
x_train_FE = x_train_FE.reshape(x_train_FE.shape[0], -1)
x_test_FE = x_test_FE.reshape(x_test_FE.shape[0], -1)

# Train logistic regression classifier
clf = LogisticRegression(max_iter=1000)
clf.fit(x_train_FE, y_train)

# Evaluate on test set
accuracy = clf.score(x_test_FE, y_test)
print('Accuracy:', accuracy)

*Answer 6.2.2 Writing Part 2/2* <br>
The reason why using a pre-trained neural network to extract features results in significantly better performance than using raw data is that the pre-trained network has learned to extract relevant features from images that are useful for classification tasks. By using the output of a hidden layer as features, we are essentially using a compressed representation of the original image that retains important information for classification. This compressed representation is more informative than the raw pixel values and reduces the dimensionality of the data, making it easier for a linear classifier such as logistic regression to separate the different classes. In essence, the pre-trained neural network has done the feature engineering for us, allowing us to focus on building a simple, interpretable classifier.

### Week 7

> **Ex. 7.1.1**: What is typically the input and output of an autoencoder? What loss function is typically used?

*Answer 7.1.1* <br>
The input of an autoencoder is raw data and the output of an autoencoder is a reconstruction of the input based on a compressed version of the input. The loss functions typically used are mean squared error or binary crossentropy.

> **Ex. 7.1.4**: Run the experiment using different values of `latent_dim` (e.g. `[2,16,64,128,512]`) and store the validation loss of the last iteration in the `history` variable for each. Then plot it, my plot looks [like this](https://dhsvendsen.github.io/images/latentdim_vs_reconerror.png).

In [None]:
class Autoencoder(Model):
    def __init__(self, latent_dim):
        super(Autoencoder, self).__init__()
        self.latent_dim = latent_dim   
        self.encoder = tf.keras.Sequential([
            layers.Flatten(),
            layers.Dense(latent_dim, activation='relu'),
            ])
        self.decoder = tf.keras.Sequential([
            layers.Dense(784, activation='sigmoid'),
            layers.Reshape((28, 28))
            ])

    def call(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return decoded

(x_train, _), (x_test, y_test) = fashion_mnist.load_data()
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.

n = 10
plt.figure(figsize=(20, 4))
for i in range(n):
    # display original
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(x_train[i])
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)



latent_dim_list = [2, 16, 64, 128, 512]
history_list = []
for dim in latent_dim_list:
  autoencoder = Autoencoder(latent_dim = dim)
  # autoencoder = layers.Dense(128, activation='relu')(autoencoder)
  autoencoder.compile(optimizer='adam', loss=losses.MeanSquaredError())
  history = autoencoder.fit(x_train, x_train,
                epochs=10,
                shuffle=True,
                validation_data=(x_test, x_test))
  history_list.append(history.history['val_loss'][-1])
  print(history_list)

In [None]:
plt.plot(latent_dim_list, history_list, 'bo-')
plt.xlabel('Latent dimension')
plt.ylabel('Validation loss')
plt.title('Autoencoder validation loss by latent dimension')
plt.show()

> **Ex. 7.1.5**: Set the `latent_dim = 2` and describe what happens to the test data - the reconstructed sandal looks off, what do you think happens to it? Then plot the representation of the test data in the latent space, colouring each point according to its class and describe what you see. [Example plot](https://dhsvendsen.github.io/images/two_latent_dims_simple.png).

*Answer 7.1.5 Writing Part 1/2.* <br>
The reconstructed sandal looks very similar to the shoe. When it is shrunk down to the latent space of 2, it sees the sandal and the shoe as similar classes and reconstructs the two similarly. From the plot, we can guess that the colors that are more clearly in their own defined group, such as orange, red, or blue, represent objects that have a more accurate reconstructed image. Whereas the colors that are more scrambled around and messy represent images like the sandal that aren't reconstructed with confident accuracy at this dimension.

In [None]:
#Answer 7.1.5 Code Part 2/2
autoencoder = Autoencoder(latent_dim = 2)
autoencoder.compile(optimizer='adam', loss=losses.MeanSquaredError())
history = autoencoder.fit(x_train, x_train,
                epochs=10,
                shuffle=True,
                validation_data=(x_test, x_test))

# Visualization cell
encoded_imgs = autoencoder.encoder(x_test).numpy()
decoded_imgs = autoencoder.decoder(encoded_imgs).numpy()

n = 10
plt.figure(figsize=(20, 4))
for i in range(n):
    # display original
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(x_test[i])
    plt.title("original")
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # display reconstruction
    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(decoded_imgs[i])
    plt.title("reconstructed")
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

In [None]:
plt.scatter(encoded_imgs[:,0],encoded_imgs[:,1],color=["C"+str(i) for i in y_test])

> **Ex. 7.2.1**: Explain in your own words how a GAN works. Touch upon:
    > * What do the generator and discriminator networks do?
    > * What are their respective input and output?
    > * Your *iteration_images* should already have some images stored in it. Obeserve them and explain what delicate dance to the two networks engage in during training? What would the accuracy of the discriminator be, faced with a perfect generator? 

*Answer 7.2.1* <br>
Generative Adversarial Network is a machine learning model that consists of two neural networks (generator and discriminator), which train each other/together to generate more and more accurate predictions. The goal of the generator is to generate a fake sample that fools the discriminatorinto thinking it is actually real. On the flip side, the discriminator has the goal of accurately determining whether the sample presented is real or fake. The generator takes in random noise as its input and outputs a fake sample. The discriminator takes in a sample (either real or fake) and outputs a prediction of whether it is real or fake. The two networks engage in a delicate dance during training where the generator tries to maximize the mistakes or error that the discriminator makes, while the discriminator tries to maximize its accuracy and minimize the errors made. They go back and forth and eventually train each other so that the generator improves at producing samples that are harder to distingiush between real and fake, and the discriminator becomes better at distinguishing between real and fake. Faced with a perfect generator, the accuracy of the discriminator should hypothetically be 50% so it wouldn't actually be able to distiguish between real or fake samples so it should half a 50% chance of predicting correctly each time. 

### Week 8

> **Ex. 8.1.1**:  Compared to the autoencoder of last week which mapped the data into a 2-D latent space, but was given enough model complexity to achieve a low reconstruction loss (the one generated in **Ex. 7.1.6**), how are the low dimensional representations of the images distributed in latent space?

*Answer 8.1.1* <br>
The difference between the autoencoder with low reconstruction loss from last week shows that the cluster of classes in the latent space are more spread out, meaning that they classes are well defined, however, generating something from a point in between the clusters would result in a garbage result from the decoder. On the otherhand the variational autoencoder has the data, clustered, but very closely together where there are no gaps but rather overlaps in the data, meaning it can confuse some classes, but can give you a valid result for generation.

> **Ex. 8.1.2**: A VAE has a data reconstruction term (like a regular autoencoder) and a regularizer term. How does the regularization term of the VAE lead to:
> - A smoother transition between images in data-space when moving around in latent space
> - An effective generative model

*Answer 8.1.2* <br>
The regularization term (KL Divergence) makes it so the results from the encoder are grouped together closer (in a more gausessian distribution) when compared to a traditional autoencoder, making the transitions from classes smoother as they are closer together, and even overlap at times. Now that we know that the datapoints in the the latent space are stochastic (have a probalistic correlation), we can generate beleiveable NEW data by using drawing from this normal distribution and passing it through the decoder.