<a href="https://colab.research.google.com/github/damianomarsili/temp/blob/main/Final_Project_Submission_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Image Inpainting with GAN's
Project mentor: Guanghui Qin

Kevin Kim <kkim170@jh.edu>, Camden Shultz <cshultz3@jh.edu>, Jocelyn Hsu <jhsu37@jh.edu>, Damiano Marsili <dmarsil1@jh.edu>

LINK TO GHUB

# Outline and Deliverables - TODO

TODO: LINKS TO SECTIONS

### Uncompleted Deliverables
1. Different input images: We were unable to extend our use-case past landscape images. Our model will work for different types of images, but the accuracy is poor on non-landscape images. Extending our use-case would have required a significantly larger training set and thus a greatly increased training time. We did not have the time or computational resources to do so. However, we are confident given enough training data and time, our implementation can be extended to different image domains.


### Completed Deliverables
1. Masks on images: We discuss our masking and preprocessing [in "Pre-processing" below](##Pre-processing).
2. Trained a GAN on landscape images: We discuss training our GAN [in "Methods" below](TODO).
3. Successfully inpainted images: We discuss our inpainting technique [in "?" below](TODO).
4. Implemented Navier-Stokes baseline comparison: We discuss our baselines [in "Baselines" below](TODO).
5. Implemented FID evaluation metric: We discuss our evaluation metric [in "Experimental Setup" below](TODO).


### Additional Deliverables
1. We decided to add a Fast-Marching algorithm as a second baseline metric. We discuss this [in "Baselines" below](TODO).
2. We experimented with different mask sizes and were able to maintain decent accuracy. We discuss this [in "?" below](TODO).
3. We added a feature allowing users to upload their own images and see how our model performs on them. We discuss this [in "?" below](TODO).

# Preliminaries

## Problem Definition

Our final project tackles the task of image inpainting, which refers to the challenge of repairing missing or damaged portions of an image in such a way that the generated image closely resembles the original image. As an example, in the images below, an inpainting solution would attempt to fill in the white squares to best reproduce an input that can plausibly pass for the original image. To achieve this task, we will use a Generative Adversarial Network (GAN).


![Example of inpainting challenge](./sample_imgs/intro1.PNG)

## Use cases
Image inpainting is a widely applicable image editing technique. One of the most useful applications is repairing damaged or old images. An example of this is given in the images below. The input images on the left have visible signs of damage, and we can use inpainting to 'repair' the images, as shown with the images on the right.


![Use cases](./sample_imgs/use_case.PNG)

Another application of image inpainting is the ability to remove unwanted items from an image. For instance, a picture of a scenic landscape taken on vacation may be interrupted by a tourist. Image inpainting can be used to remove the tourist from the image and restore the scenic landscape.

## Uniqueness
Image inpainting and the broader topic of GANs is unique and exciting as we are creating meaningful new data without labels. In contrast to the majority of the algorithms we covered in the course, which revolve around classification or regression, we are instead using machine learning to produce new data that can plausibly pass for a sample instance in the input dataset.

Our approach with GANs is related to our learning of autoencoders in class. We draw this comparison as we are attempting to learn properties of the input in an attempt to regenerate instances similar to it.

## Ethical Implications
There is a significant ethical implication of the task, and that is that it can be used for deception if it is used incorrectly. For instance, image inpainting can be used to remove evidence from pictures of a crime, or produce realistic false images for the purpose of fake news. Some examples of falsified images are:
* Time's cover with OJ's skin darkened (June 27, 1994)
* National Geographic's cover of the pyramids
* Ford's advertisement in which a black person was changed into a white person
* A photo of Bill Clinton together with Ronald Reagan... published before the two ever met

## Dataset
For our project we utilized the MIT Places dataset, Which is one of the largest image datasets. We chose the MIT Places Dataset as it provides high coverage and high diversity of examples for landscapes, which we identified as our target image set. The dataset features 10 million 256x256 images covering 434 disjoint categories of scenes and landscapes. For primary testing, we will mainly use low variance images, as we hypothesize these will be easier to inpaint due to the low variance of RGB values for all the pixels in the image. Our low variance set includes the "Snow field", "Sky", "Mountains" and "Cornfields" image set. Each of the four sets contains around 15,000 images, for a total set of 60,000 images. We show a handful of examples below:

In [None]:
# TODO: Load your data and print 2-3 examples

## Pre-processing

TODO: this section

What features did you use or choose not to use? Why?

If you have categorical labels, were your datasets class-balanced?

How did you deal with missing data? What about outliers?

What approach(es) did you use to pre-process your data? Why?

Are your features continuous or categorical? How do you treat these features differently?

In [None]:
# For those same examples above, what do they look like after being pre-processed?

In [None]:
# Visualize the distribution of your data before and after pre-processing.
#   You may borrow from how we visualized data in the Lab homeworks.

# Models and Evaluation

## Experimental Setup
As our evaluation metric, we decided to use Frechlet Inception Distance (FID). We had originally planned to use Inception Score (IS), but found that FID was more sensitive to small quantitative changes in the image that are extremely noticeable by humans. This was particularly true of blurring, where IS scored blurry images much lower than FID (where lower scores indicate a better generated image). FID works by comparing the activations of a deep layer of a pretrained model named Inception v3. Since we are considering the activations of a deep layer close to the output, the activations effectively capture similarity as a human would interpret it. The formula for FID is given as:


$$FID = 	\lVert  \mu - \mu_w  \rVert^2_2 + tr(\Sigma + \Sigma_w - 2(\Sigma^{1/2} \Sigma_w \Sigma^{1/2})^{1/2})$$

Where $\mu$ and $\mu_w$ are the means and $\Sigma$ and $\Sigma_w$ are the covariance matrices of the activation scores of the two images passed to the Inception v3 model. We implement FID as follows:

In [1]:
import numpy as np
from scipy.linalg import sqrtm

def fid(inception_model, imgs1, imgs2):
  # Extract convolutions from inception model
  conv1 = inception_model.predict(imgs1)
  conv2 = inception_model.predict(imgs2)

  # Compute mean & cov
  m1, cov1 = conv1.mean(axis=0), np.cov(conv1, rowvar=False)
  m2, cov2 = conv2.mean(axis=0), np.cov(conv2, rowvar=False)

  # Sum squared difference of means
  ss = np.sum((m1 - m2) ** 2)
  
  # Square root of cov product
  mean_cov = sqrtm(cov1.dot(cov2))

  # Sanity check for im numbers
  if np.iscomplexobj(mean_cov):
    mean_cov = mean_cov.real
  
  FID = ss + np.trace(cov1 + cov2 - 2.0 * mean_cov)
  return FID

For our loss function, we opted for (? TODO). We chose this loss function as (? TODO). We also attempted to use (? TODO: alternative loss) loss, although we found this function to perform worse upon experimentation. One reason for this may be (? TODO). The formula for our loss function is given as:

TODO: loss function formula

In [None]:
# TODO: Code for loss functions

For our data split, we decided on a (TODO: ?) split. We decided on this split as we felt we needed as many images as possible for training, since the model would need to learn a fairly complex representation in order to generate plausible images. Since testing relies on a pretrained GAN, we felt we did not need an excessively large testing set to evaluate our models performance appropriately.

## Baselines 
We compared our model against two analytical baselines: Navier-Stokes and Fast Marching. Navier-Stokes is a differential equations based solution which works by projecting eigenvectors into the masked region and using those to predict the missing pixel values. On the other hand, Fast Marching works by evaluating a neighborhood of pixels around the border of the mask and using those to determine the values of the missing pixels, slowly working towards the center. These are reasonable baselines as they are standard pre-machine-learning methods for image inpainting, and therefore are extremely well documented and their effectiveness has been previously established. Moreover, these baseline methods are extremely easy to implement, so we could focus our attention on our model. We show some sample inpainted pictures using the baseline methods below. The image on the left shows the original image, the middle image shows the mask and the rightmost image shows the analytical inpainted solution.

Navier Stokes:

![Navier Stokes example](./sample_imgs/nav_stokes1.PNG)

Fast-Marching:

![Fast-Marching image](./sample_imgs/fast_marching1.PNG)

## Methods

We chose to implement the Generative Adversarial Network (GAN) for our image inpainting task. We chose this method because our ultimate goal is to infill a missing portion of an image by building a generator, and the adversarial nature of GAN with the addition of a discriminator allows us to improve our generator's performance. First, we trained the discriminator on real landscape data and (untrained) generator-created data. Afterwards, we let the generator inpaint the masked portion of images and optimize the generator's performance by running the discriminator on the generator's output. 

Once the generator was trained and hyperparameters tuned in accordance with the minimax loss (minimize with respect to the discriminator and maximize with respect to the generator), we completed the image inpainting task on our test set. We evaluated the model's performance on the test set with the Frechlet Inception Distance (FID) due to its sensitivity to small pixel value changes, which is essential to achieving realistic image inpainting.

Overall, it was easy to implement (TODO: ). We found training the GAN to be challenging, as the disciminator loss often decreased to 0 due to vanishing gradients, causing the generator to inpaint masks with very incorrect values. Furthermore, GAN requires a very large dataset to train, so our model's performance was limited by computational resources. (TODO: still adding Poisson blending?)

(TODO) For each method, what hyperparameters did you evaluate? How sensitive was your model's performance to different hyperparameter settings?

In [None]:
# Code for training models, or link to your Git repository

In [None]:
# Show plots of how these models performed during training.
#  For example, plot train loss and train accuracy (or other evaluation metric) on the y-axis,
#  with number of iterations or number of examples on the x-axis.

## Results

Show tables comparing your methods to the baselines.

What about these results surprised you? Why?

Did your models over- or under-fit? How can you tell? What did you do to address these issues?

What does the evaluation of your trained models tell you about your data? How do you expect these models might behave differently on different data?  

In [None]:
# Show plots or visualizations of your evaluation metric(s) on the train and test sets.
#   What do these plots show about over- or under-fitting?
#   You may borrow from how we visualized results in the Lab homeworks.
#   Are there aspects of your results that are difficult to visualize? Why?

# Discussion

## What you've learned

Convolutional neural networks as well as loss functions learned in class were most relevant to our project, as our GAN model is a specific type of neural network, and we needed to consider the best loss functions for our image inpainting task. What we found most surprising was the ability for computers to simulate creativity. (TODO: check if we're still doing poisson blending) If we had two more weeks to work on our project, we'd like to incorporate Poisson blending to achieve seamless edges between the infilled mask and the original image.


What lessons did you take from this project that you want to remember for the next ML project you work on? Do you think those lessons would transfer to other datasets and/or models? Why or why not?

What was the most helpful feedback you received during your presentation? Why?
