# Deadline

This assignment will be due on **April 10, 2019**. Note that this is a group assignment and as such only **one** member per team should submit on canvas.

# Clone Git

Run the following to get the required files needed for this assignment. 

In [0]:
!git clone https://github.com/cis700/hw2-release.git
!mv hw2-release/* .
!rm -rf hw2-release/

**Collaboration Policy**

This homework assignment is meant to be done in **groups of 2**. You may work on this individually, but be warned that this homework assignment is extremely long and will be very difficult to do alone. We highly recommend you form groups.

You may collaborate with others on a high level, however, all LaTeX and code must be done independently of other groups. Groups who have been shown to be violating this policy will automatically receive a 0 for the assignment and be referred to the Office of Student Conduct. By submitting an assignment you agree that the work produced is your work and your work **only**.

**Late Policy**

With the exception of emergencies, any homework assignment submitted past the deadline will receive a 20% penalty for each day submitted late.

**Online Policy**

You may look up guides online that give you general advice / explanations on VAE's / GAN's and may look for instance at the PyTorch documentation but **may not** copy code from anywhere online. We will be transparent that there exists GAN and VAE solutions online (with their respective hyperparameters), however, copying any such code is **strictly prohibited**. We have spent many hours constructing this homework so that you do not have to utilize such resources and as such will be strict in enforcing this policy. Any violations will be directly reported to the OSC. 


# Image Classification

## Question 1. Build a CNN Dog Classifier

**Understanding the Dataset**

For this assignment, we are going to use the Stanford Dogs dataset [link](http://vision.stanford.edu/aditya86/ImageNetDogs/). We are providing a dataloader for you, which can be imported using the following code. Get familiar with the data loader, and visualize some dog pictures. 

**Q1a (2 pts):** Set the subset parameter for the data loader to 3 and visualize 5 pictures from the training set. Use subset = 3 for the rest of this question unless otherwise stated.

In [0]:
from dogloader import dogs

**Logistic Regression Classification**

**Q1b (2 pts):** Use PyTorch to create a Logistic Regression model to classify the Dog Datset. Plot the training curve and report the test accuracy.

**Feed Forward Neural Network Classification**

**Q1c (3 pts):** Create a Feed Forward neural network to classify the Dog Dataset. Plot the training curve and report the test accuracy.

**CNN Classification**

**Q1c (3 pts):** Create a CNN to classify the Dog Dataset. Plot the training curve and report the test accuracy.

**Q1d (5 pts):** Use the same architecture from the previous question (except for the last linear layer--change the output number of classes), but set the subset = 120 this time. Train a new model, plot your training accuracy and report your final test accuracy. What do notice about the training and testing accuracy? Keep a note of your final testing accuracy. 

## Question 2. Transfer Learning on ResNets for Image Classification 

In this question we will explore different ways to bootstrap your image classifier when training examples are scarce. For the dog dataset, we have around 100 training images for each class, and variation between each class is limited (all dogs). Due to the limited number of training examples, building a high-performing image classifier is challenging. 

ImageNet is one of the most popular computer vision data sets. It  contains 10 million images in total from multiple different categories. The Stanford Dogs dataset we are using is a subset from ImageNet, orginally intended for fine-grained object classification tasks. PyTorch provides ResNets that are pre-trained on ImageNet, and using this pretrained model, we can build a dog classifier much more easily. 

**Note:** In Question 2 set subset = 120 (the default).

**Q2 (15 pts):** In the code snippet below, load a pre-trained ResNet model and swap out the last fully connected layer from the network with your own classification layer. Train the network, plot your training accuracy, and report your final test accuracy. Your test accuracy should be at least 70%. A few tips:

* After the last fully connected layer of the network, remember to freeze the network except for the last layer to speed up the training.

* Pretrained ResNet models assumes that input is normalized in the following fashion, and is of the size at least 224.

In [0]:
input_size = 224
input_transforms = transforms.Compose([
            transforms.Resize((input_size, input_size)),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])])

## Question 3. Generative Models: Convolutional Autoencoders and Variational Autoencoders

In this question you will create a Convolutional AutoEncoder (CAE) and a Variational AutoEncoder (VAE) for the Stanford Dogs Dataset. Before you em**bark** on this part, please read the [slides](https://www.seas.upenn.edu/~cis700dl/slides/7M.pptx) thoroughly to ensure that you understand how an autoencoder works (i.e. the loss function and general structure). The slides provide some sample code you follow, but note that you may need to create a bigger network. Also, the sample code on the slides only output a black and white image, but you should create a network that output color images. 

To get good reconstruction results, the AE might take a long time to train (for instance > 2 hours); however, you should be able to tell whether or not your network is learning much earlier than that. Be sure to not just look at your loss, but also periodically examine your image outputs to ensure that they are becoming sharper and sharper representations of your input image. The choice of when to stop your AE's training is up to your discretion (however the output should be reasonably close). 

**Note:** In this section set subset = 120 (the default)

**Q3a (10 pts):** After creating your AE, plot the resulting training curve and in your report give 5 examples of the input and corresponding reconstruction.





**Q3b (5 pts):** An interesting property of AE's is that the "code" vector has additive and subtractive properties. Perform addition and subtraction on different code vectors and pass it through your decoder to create a new image. Explain why the image looks as it does. 

**Q3c (10 pts optional extra credit)**: Using an autoencoder can be a great way to get increased classification accuracy without actually needing more data. For extra credit, use the AE that you just constructed to seed the network and then train on all 120 classes of the Stanford Dog's Dataset. Plot the training curve and test accuracy 

*(Hint: Employ a similar strategy to ResNet to remove the last layer of your AE and instead replace it with a fully connected layer of size 120)*

**Q3d (20 pts):** Variational AutoEncoders, as covered in class, learn the probability distribution of the inputs and can be used to generate novel images.  We can further pass convolutional features into a VAE to create a convolutional VAE.  Convert your CAE from the previous question to a convolutional VAE by executing the following steps:

* In the encoder, generate two tensors of the same length, representing mean and standard deviation of the latent probability distribution. 

* Sample from a normal distribution from the learned mean and std to generate the final encoding tensor -- this is the reparametrization trick discussed in lecture.

* Decode from the encoding tensor just as in the convolutional autoencoder. 

* Change your loss function from MSE reconstruction loss to the objective function for VAE:
 * Reconstrucation Loss: binary_cross_entropy loss between original and reconstructed image. 
 * Regularization on the sampled latent normal distribution 

$$ L_{reconstruciton} = -\frac{1}{n} \sum_{i}^{n}(x_i log(f(z_i)) + (1-x_i) log(1-f(z_i))) $$
$$ L_{regularization} = \frac{1}{2n}\sum_{i}^{n}(\mu_{i}^{2} + \sigma_{i}^2 - log(\sigma_i^2)- 1)$$
$$ L_{loss} =L_{regularization} +  L_{reconstruction}$$






## Question 4. Generative Models: GAN's

In this question you will create a Generative Adversarial Network (GAN) for the Stanford Dogs Dataset. Before you embark on this part, please read the [slides](https://www.seas.upenn.edu/~cis700dl/slides/7W.pptx) thoroughly to ensure that you understand how the multiple loss functions and general structure of a GAN works. The slides provide some sample code you follow, but note that you may need to create a bigger network and that the slides only output a black and white image, but you should create a network that output color images. 

GANs are notoriously difficult to train, and as such we do not expect that you get perfect looking dogs even after multiple hours of training. In the dataset provided there's a great deal of variation between dog breeds which can make it difficult to get good results from our simplistic GAN approach. If implemented correctly, you should be getting results which look blurry but vaguely dog-like.

![Dog1](https://imgur.com/76dpyLt.png)
![Dog2](https://imgur.com/z5Sldte.png)
![Dog3](https://i.imgur.com/RferuyY.png)
![Dog4](https://i.imgur.com/t9Rrbga.png)




**Q4a (2 pts):** Create a normally distributed vector $z$ with $\mu = 0$, $\sigma = 1$ (i.e the np.random.normal default). The vector should have size [batch size, feature length, 1, 1]. You don't need to submit anything for this part. Here feature length describes the length of your vector $z$.


**Q4b (2 pts):** Create the generator network. Use [ConvTranspose2d](https://pytorch.org/docs/stable/nn.html) to upsample the noise vector $z$ to a size of your choosing (512 is what we used but you can/should tune this). Deconvolve it until the number of channels is 3 (so it's RGB), and the output size is (64,64).  In other words, the output of the generator should be [batch size,3,64,64]. Use ReLU and batch norm after every deconvolution and use a sigmoid layer at the end to create your final output. Don't worry about the loss function for now. Describe what your final network looks like in your writeup and why you made these choices.

**Q4c (2 pts):** Create the discriminator network. This should feel very similar to creating a CNN to classify whether an example is in the distribution or not in the distribution. Don't worry about the loss function for this part. Describe what your final network looks like in your writeup and why you made these choices.

**Q4d (3 pts):** Code the loss function for the generator, it should be as follows:

$$ L_G = -\frac{1}{n}\sum_{i=1}^{n}\lg D(G(z))$$

An alternate form for convenince is:
$$ L_G = \frac{1}{n}\sum_{i=1}^{n}L_{CE}(D(G(z)), 1)$$

Where $L_{CE}$ is the cross entropy loss function and n is the batch size. Submit a screenshot (or use mcode) to include your implementation in the writeup. Why is the ground truth label for fake data 1 here? Explain your answer in the write-up.



**Q4e (3 pts):** Code the loss function for the generator, it should be as follows:

$$ L_D = \frac{1}{2n}\sum_{i=1}^{n}L_{CE}(D(X_i), 1) + L_{CE}(D(G(z)), 0)$$

Where $L_{CE}$ is the cross entropy loss function and n is the batch size. Be sure that you're not normalizing by $n$ twice (if you feed in two vectors into BCELoss, they normalize it by the size of the vector for you).

**Important:** Remember to **detach** your generator when calculating this loss. Think about why this is the case and the repercussions of not doing so. Detail the answer to this question in your write-up. Submit a screenshot (or use mcode) of your implementation in the writeup.

**Q4f (3 pts):** Create two optimizers, one for your Discriminator network and one for your generator network. Describe your choices for the learning rate and optimizer in the writeup.

**Q4g (20 pts):** Put all of these parts together to create a GAN. The training loop should look as follows:



1.   Create your vector $z$
2.   Zero out the gradient for $G$
3.   Generate your fake image, and calculate $L_G$
4.   Backpropagate $L_G$ and have your optimizer take a step
5.   Zero out the gradient for $D$
6.   Calculate $L_D$
7.   Backpropagate $L_D$ and step your optimizer.

Hints for training:

1.   Make sure you're using tensorboard to keep track of the discriminator AND generator losses throughout. Note that if your discriminator loss goes to 0, this represents a failure mode of your training (look at how the generator loss function is calculated and you realize that G's gradients vanish).
2.   Keep vigilant for mode collapse. Display your image after every delta number of iterations to see that not only is your image getting better, but you're also not getting the same image each time. If so, then your network is suffering from mode collapse.
3.   Analyze your generator loss and make sure that it's roughly oscillating around a certain loss. Unlike most other tasks, your generator and discriminator loss are NOT intended to monotonically decrease. Instead they should bounce around a certain loss (see image below for example). Your graph doesn't have to look like this but this is meant to illustrate how the loss should roughly bounce around a certain loss.
![Sample Loss Graph](https://cdn-images-1.medium.com/max/1600/1*4A5bo8gVG9wmg-5wtqavOg.png)

What makes GAN's difficult to train is that they are extremely sensitive to hyperparameters. Some hyperparameters to think about when tuning your GAN:


1.   The number of times you run your discriminator vs. your generator (generally you run your discriminator more times than your generator because the discriminator is what gives your generator feedback).
2.   The learning rate for your discriminator / generator
3.   Your image output size
4.   Your feature vector size (i.e. the length of z)
5.   Your standard neural network hyperparameters (i.e. number of layers, width of layers, batch size, etc.)



We're going to overall be pretty generous with how your outputs look for this part (i.e. does it look somewhat dog-like), however, you can look to lots of other sources for how to improve your GAN performance / understand GANs better:



1.   [Keep Calm and train a GAN](https://medium.com/@utk.is.here/keep-calm-and-train-a-gan-pitfalls-and-tips-on-training-generative-adversarial-networks-edd529764aa9)
2.   [Why is it so hard to train GAN's!](https://medium.com/@jonathan_hui/gan-why-it-is-so-hard-to-train-generative-advisory-networks-819a86b3750b)

And many more!

We will be awarding generous extra credit to those with particularly good GAN results. 

Be sure to start this part early as you will definitely run into issues with training that you wouldn't originally forsee!

After you finish creating your GAN, give us 5 of your best outputs (not all of your results will look reasonable) and plot your training curve for both the generator AND discriminator. Report your final hyper-parameter choices that weren't already reported above (i.e. the batch size, model parameters, etc.)

