## DeepSEE: Deep Disentangled Semantic Explorative Extreme Super-Resolution

This is a demo for our paper **Deep Disentangled Semantic Explorative Extreme Super-Resolution** ([ACCV 2020](http://accv2020.kyoto/) oral). 

Please check out our [project page](https://mcbuehler.github.io/DeepSEE/) for details and the paper.

## Setup
You can follow the steps below to set up the demo from scratch. 

Make sure to use a GPU runtime:
`Runtime -> Change runtime type -> GPU`

In [None]:
!git clone https://github.com/mcbuehler/DeepSEE

Cloning into 'DeepSEE'...
remote: Enumerating objects: 357, done.[K
remote: Counting objects: 100% (357/357), done.[K
remote: Compressing objects: 100% (262/262), done.[K
remote: Total 357 (delta 132), reused 307 (delta 87), pack-reused 0[K
Receiving objects: 100% (357/357), 6.34 MiB | 18.98 MiB/s, done.
Resolving deltas: 100% (132/132), done.


In [None]:
!pip install -r DeepSEE/requirements.txt;

Collecting wrapt==1.12.0
  Downloading https://files.pythonhosted.org/packages/ee/bc/7993faa8084b5a5dbabb07a197ae1b7590da4752dc80455d878573553e2f/wrapt-1.12.0.tar.gz
Collecting tqdm==4.40.0
[?25l  Downloading https://files.pythonhosted.org/packages/a5/13/cd55c23e3e158ed5b87cae415ee3844fc54cb43803fa3a0a064d23ecb883/tqdm-4.40.0-py2.py3-none-any.whl (54kB)
[K     |████████████████████████████████| 61kB 6.7MB/s 
[?25hCollecting facenet-pytorch==2.1.1
[?25l  Downloading https://files.pythonhosted.org/packages/f6/95/55cd29c3de12df643f29a995046ca2a517d585358cef1acfee1d7a97a2f5/facenet_pytorch-2.1.1-py3-none-any.whl (1.9MB)
[K     |████████████████████████████████| 1.9MB 17.2MB/s 
[?25hCollecting IPython==7.9.0
[?25l  Downloading https://files.pythonhosted.org/packages/81/2e/59cdacea6476a4c21b7c090a91250ffbcd085900f5eb9f4e4d68dd2ee4e3/ipython-7.9.0-py3-none-any.whl (775kB)
[K     |████████████████████████████████| 778kB 42.1MB/s 
[?25hCollecting opencv-python==4.1.1.26
[?25l  Downloa

If you are asked to restart your runtime, please do so.

In [None]:
import os
os.chdir("DeepSEE")

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


**Add** this folder to your Google Drive. It contains the pre-trained checkpoints and some ready-to-use data samples.

https://drive.google.com/drive/folders/1pVwxpln-lame79yGYPgFA1KvV2iXT_I2?usp=sharing

In [None]:
from demo import Demo, get_demo_options, display_result
from util.util import display_regions
import torch
import numpy as np
torch.manual_seed(0)
np.random.seed(0)

base_path = "/content/gdrive/My Drive/DeepSEE_data/demo_data/"
checkpoints_dir = "/content/gdrive/My Drive/DeepSEE_data/checkpoints"

## Method

A low-resolution input ($x_{lr} \in \mathbb{R}^{H_{lr}\times{W_{lr}}\times{3}}$) image acts as a starting point that carries the low-frequency information. A generator ($G_{\Theta}$) upscales this image and hallucinates the high-frequencies yielding the high-resolution image $\hat x_{hr} \in \mathbb{R}^{H_{hr}\times{W_{hr}}\times{3}}$. As a guidance, $G_{\Theta}$ leverages both a high resolution semantic map ($M \in \mathbb{R}^{H_{hr}\times{W_{hr}}\times{N}}$, where $N$ is the number of the semantic regions) and independent styles per region ($S\in\mathbb{R}^{N\times d}$, where $d$ is the style dimensionality). The upscaled image should thus retain the low-frequency information from the low-resolution image. In addition, it should be consistent in terms of the semantic regions and have specific, yet independent styles per region. We formally define our problem as

$\hat x_{hr} = G_{\Theta}(x_{lr}, \thinspace M, \thinspace S)$

A user is able to control the __appearance__ and __shape__ of each semantic region. 

## Example Results on 8x Upscaling

### Inference for the __Default__ Solution
We upscale a low-resolution (16x16) image to a high-resolution (128x128). We call this the __default solution__.

We provide these sample images from the [CelebA](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html) dataset (bicubically downscaled to 16x16 pixels):

In [None]:
dataset = "CelebA"
os.listdir(os.path.join(base_path, dataset, "image_16x16"))

['185232.png',
 '195804.png',
 '199152.png',
 '194600.png',
 '201892.png',
 '183306.png',
 '197579.png']

For the __default solution__, the model requires two inputs:
1. the low-resolution image $x_{lr}$ and 
2. a semantic mask $M$ (we predicted the semantic mask from the low-resolution image)

The style matrix $S$ is predicted from the low-resolution image $x_{lr}$.

In [None]:
opt = get_demo_options("8x_independent_128x128")
opt.checkpoints_dir = checkpoints_dir
demo = Demo(opt)

In [None]:
filename = "185232.png"

kwargs = {
    "name": "demo",
    "path_image_lr": os.path.join(base_path, dataset, "image_16x16", filename),
    "path_semantics": os.path.join(base_path, dataset, "predicted_labels", filename)
}
result_default_solution = demo.run(**kwargs)
display_result(result_default_solution, size=(128, 128))

Encoding style from LR image...
Style computed.
Upscaling...
Done.


interactive(children=(Dropdown(description='Visualize:', index=1, options=('encoded_style', 'fake_image', 'ima…

### Style Manipulations
We can add random noise to generate multiple high-resolution variants. 

* Experiment by changing `delta` to make the effect stronger or weaker.
* Choose to influence other/multiple semantic regions by changing/adding indices to `region`.
* The style matrix (or rows thereof) can be computed from other high-resolution images (examples in the paper).
* Rows in the style matrix can be interpolated (examples in the paper).

Here is an overview of all available regions:

In [3]:
display_regions()

 0: Background
 1: Skin
 2: Nose
 3: Eyeglass
 4: Left eye
 5: Right eye
 6: Left eyebrow
 7: Right eyebrow
 8: Left Ear
 9: Right Ear
10: Mouth
11: Upper Lip
12: Lower Lip
13: Hair
14: Hat
15: Earring
16: Necklace
17: Neck
18: Cloth


In the following example, we change the __lips__ adding noise to the corresponding rows in the style matrix.

In [None]:
encoded_style_orig = result_default_solution["encoded_style"].detach().clone()
encoded_style_noisy = encoded_style_orig.clone()

delta = 0.03
region = [11, 12]  # Corresponds to the lips.

# We add the same noise to both the upper and lower lip
noise = torch.rand([1, encoded_style_orig.shape[2]]) * delta
encoded_style_noisy[:, region, :] = (encoded_style_orig[:, region, :] + torch.stack([noise, noise], dim=1)).clamp(-1, 1)

kwargs = {
    "name": "demo_style_manipulation",
    "path_image_lr": os.path.join(base_path, dataset, "image_16x16", filename),
    "path_semantics": os.path.join(base_path, dataset, "predicted_labels", filename),
    "encoded_style": encoded_style_noisy
}
result_stochastic = demo.run(**kwargs)
display_result(result_stochastic, size=(128, 128))

Upscaling...
Done.


interactive(children=(Dropdown(description='Visualize:', index=1, options=('encoded_style', 'fake_image', 'ima…

Now let's also modify the __eyebrows__.

In [None]:
encoded_style_orig = result_default_solution["encoded_style"].detach().clone()
encoded_style_noisy = encoded_style_orig.clone()

delta = 0.03
region = [6, 7]  # Corresponds to the eyesbrows.

noise = torch.rand([1, encoded_style_orig.shape[2]]) * delta
encoded_style_noisy[:, region, :] = (encoded_style_orig[:, region, :] + torch.stack([noise, noise], dim=1)).clamp(-1, 1)

kwargs = {
    "name": "demo_style_manipulation",
    "path_image_lr": os.path.join(base_path, dataset, "image_16x16", filename),
    "path_semantics": os.path.join(base_path, dataset, "predicted_labels", filename),
    "encoded_style": encoded_style_noisy
}
result_stochastic = demo.run(**kwargs)
display_result(result_stochastic, size=(128, 128))

Upscaling...
Done.


interactive(children=(Dropdown(description='Visualize:', index=1, options=('encoded_style', 'fake_image', 'ima…

## Example Results on 32x Upscaling

We also provide a model for extreme upscaling from 16x16 to 512x512 pixels.

Again, we provide ready-to-use examples. These sample are from the [CelebAMask-HQ](https://github.com/switchablenorms/CelebAMask-HQ) dataset (bicubically downscaled to 16x16 pixels):

In [None]:
dataset = "CelebAMask-HQ"
os.listdir(os.path.join(base_path, dataset, "image_16x16"))

['14337.png', '28368.png', '19776.png']

In [None]:
opt = get_demo_options("32x_independent")
opt.checkpoints_dir = checkpoints_dir
demo_32x = Demo(opt)

In [None]:
filename = "14337.png"

kwargs = {
    "name": "demo",
    "path_image_lr": os.path.join(base_path, dataset, "image_16x16", filename),
    "path_semantics": os.path.join(base_path, dataset, "predicted_labels", filename)
}
result_default_solution = demo_32x.run(**kwargs)
display_result(result_default_solution)

Encoding style from LR image...
Style computed.
Upscaling...
Done.


interactive(children=(Dropdown(description='Visualize:', index=1, options=('encoded_style', 'fake_image', 'ima…

### Semantic Manipulations
We can repaint the semantic mask and run inference again. This yields different shapes in the upscaled image, but preserves the overall appearance.

In [None]:
kwargs = {
    "name": "demo_semantic_manipulation",
    "path_image_lr": os.path.join(base_path, dataset, "image_16x16", filename),
    "path_semantics": os.path.join(base_path, dataset, "manipulated_labels", filename)
}
result_default_solution = demo_32x.run(**kwargs)
display_result(result_default_solution)

Encoding style from LR image...
Style computed.
Upscaling...
Done.


interactive(children=(Dropdown(description='Visualize:', index=1, options=('encoded_style', 'fake_image', 'ima…

### Style Manipulations

In [None]:
encoded_style_orig = result_default_solution["encoded_style"].detach().clone()
encoded_style_noisy = encoded_style_orig.clone()
import torch
delta = 0.15
region = [1]  # Corresponds to the skin region.

encoded_style_noisy[:, region, :] = (encoded_style_orig[:, region, :] + (torch.rand(encoded_style_orig[:, region, :].shape) * delta)).clamp(-1, 1)
kwargs = {
    "name": "demo_style_manipulation",
    "path_image_lr": os.path.join(base_path, dataset, "image_16x16", filename),
    "path_semantics": os.path.join(base_path, dataset, "manipulated_labels", filename),
    "encoded_style": encoded_style_noisy
}
result = demo_32x.run(**kwargs)
display_result(result).

Upscaling...
Done.


interactive(children=(Dropdown(description='Visualize:', index=1, options=('encoded_style', 'fake_image', 'ima…