# **Feature Visualization** using **DeepDream** and **Tensorflow**

A notebook exploring how neural networks construct their understanding of images.

Sources and Acknowledgements:


*   Distill.Pub's blogpost: https://distill.pub/2017/feature-visualization/
*   Alex Mordvintsev's Notebook: https://github.com/krantirk/DeepDream/blob/master/deepdream.py
*   Official DeepDream Tutorial: https://www.tensorflow.org/tutorials/generative/deepdream
*   Tensorflow's Lucid Library: https://github.com/tensorflow/lucid





# [Feedback Form](https://docs.google.com/forms/d/e/1FAIpQLSe_A7TibnNTqZu1GOH53f2ebSUvXmxmAjw2Avszex_UEWfcVQ/viewform?usp=sf_link)

All the feedback is anonymous, and really helps me improve my events in the future. As a bonus, if you fill it out you get an extra surprise code cell! 

# Neural Networks are not Interpretable

That's a big problem. If we're talking about filtering out spam in emails, a mistake isn't the end of the world. But as the neural networks start being used for self driving cars and making medical diagnoseses, mistakes are life and death matters.

Ideally, we should **always know** why a neural network made one decision as opposed to another.

# Visualizing Smaller Networks 

Smaller networks are easier to visulaize, and its a good strategy to try and thoroughly understand simple networks before you move on to larger networks if you're seriously interested in learning this stuff.

*   [Neural Network Zoo](https://www.asimovinstitute.org/neural-network-zoo/)
*   [3D Visualization of a Convolutional Neural Network](https://www.cs.ryerson.ca/~aharley/vis/conv/)
*   [Tensorflow's Neural Network Playground](https://playground.tensorflow.org/)



# Diving Deeper: GoogleNet

Let's talk briefly about what [GoogleNet](https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/googlenet.html) is, and its [architecture](https://miro.medium.com/max/2588/1*ZFPOSAted10TPd3hBQU8iQ.png). 

GoogleNet was the prize winning architecture for tne ImageNet Large Scale Visual Recognition Challenge in 2014, with 22 layers and ~ 7 million parameters/weights. 

It's perfect for this demo because of its incredibly minimal computation requirements compared to other networks of this caliber.

The kind of questions we're trying to answer are: 


*   What is each neuron contributing to the final prediction?
*   Why is it doing that as opposed to something else?
*   What structure allows these neurons collectively produce accurate predictions?



![alt text](https://miro.medium.com/max/2588/1*ZFPOSAted10TPd3hBQU8iQ.png)

# Baby Steps

Say we were tasked with finding out the answers to those questions.

Where do we start? 

How do we start?

# Back to Basics

We know that neural networks are made up of nodes, and connections between nodes, where each node represents its confidence of a certain pattern or "idea" being present in our input.

When we pass in an image through our network, each neuron's activation or lack thereof is represented by a number between 0 and 1.

Maybe we can get an idea of what that neuron is looking for by looking at what images in our dataset activated it the most.

Here are the images that resulted in the top activations for 4 randomly chosen neurons. 

Labels from Left to Right: (Each neuron is represented by an 3x3 grid)

"mixed4a: Unit 6", "mixed4a: Unit 240", "mixed4a: Unit 453", "mixed4a: Unit 492"

<details><summary>Dataset Results</summary>
<p>

<img src=https://distill.pub/2017/feature-visualization/images/why_optimization_examples.jpg width="15000">


</p>
</details>


And sure enough, there seems to be a strong connection between images that most activate a certain neuron.


This can be generalized further than simply images to anything that can be represented as an image. For example, a spectrograph of a song is an image that sufficiently well captures the features of a song.

For more details check out this [blog post](https://benanne.github.io/2014/08/05/spotify-cnns.html) by Sander Dieleman 

This, however, is still an incomplete picture. Is the neuron on the right looking for the sky? or a specific type of dome structure?

# Diving Deeper: More Networks

Based on techniques in this [paper](https://arxiv.org/abs/1311.2901) by Zeiler & Fergus, we can use yet another neural network to optimize for specific input images that lead to a certain activation in our network.

This way, we pick a neuron, and starting with a completely random image, make small changes in each pixel(using gradient descent) so that they would raise the activation of the neuron.

Essentially: We are generating an image from scratch to maximally excite a certain neuron. This way we don't need to rely on any input images to see what a certain neuron is looking for.

Here's what that looks like: 

<details><summary>Noise to Feature</summary>
<p>

<img src=https://distill.pub/2017/feature-visualization/images/opt_progress_mixed4a-11.png width="15000">


</p>
</details>




Note: This is an incredibly complicated process. Depending on how you configure your second network, you have to make various tradeoffs between avoiding noise and other problems that arise when working on this type of problem. More details in this amazing [blog post](https://distill.pub/2017/feature-visualization/) by Distill.Pub

<details><summary>Obligatory Meme</summary>
<p>

![alt text](https://i.imgflip.com/3o1x2x.jpg)

</p>
</details>



# So, what's everyone looking for?

Here we can see the output when we optimize images to activate the neurons in our previous dataset examples

<img src=https://distill.pub/2017/feature-visualization/images/why_optimization_examples.jpg width="15000">

Neuron Labels from left to right:

"mixed4a, Unit 6", "mixed4a, Unit 240", "mixed4a, Unit 453", "mixed4a, Unit 492"


<details><summary>Visualization Results</summary>
<p>

<img src=https://distill.pub/2017/feature-visualization/images/why_optimization_neuron.png width="15000">


</p>
</details>


# What if we didn't start with noise?

An immediate question that arises is "What would the resulting image be if we didn't start with noise?"

We can find out by running the same process on images of our own.

# Practical: Exaggerating Features in our own Image

Let's get our hands dirty and write some code

Here are the setup and import statements 

In [0]:
# Capture the output, otherwise we get annoying warnings and errors about upgrading to tensorflow 2
%%capture

!pip install --quiet lucid==0.0.5
#!pip install --quiet --upgrade-strategy=only-if-needed git+https://github.com/tensorflow/lucid.git

import warnings # Otherwise we get a bunch of "dont't use tensorflow 1 when there is 2 warnings"
warnings.simplefilter("ignore")
import logging # 
logging.getLogger('tensorflow').disabled = True

from io import BytesIO
from IPython.display import clear_output, Image, display
import PIL.Image
from __future__ import print_function

import numpy as np
import scipy.ndimage as nd
import tensorflow as tf

import lucid.modelzoo.vision_models as models
from lucid.misc.io import show
import lucid.optvis.objectives as objectives
import lucid.optvis.param as param
import lucid.optvis.render as render
import lucid.optvis.transform as transform

This code fetches our images and extracts our pre trained inception algorithm and GoogleNet graph into memory 

In [0]:
# Capture the output, otherwise we get annoying warnings and errors about upgrading to tensorflow 2
%%capture

# Get our algorithm 
!wget -nc --no-check-certificate https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip && unzip -n inception5h.zip
# Get our image
!wget -nc https://i.imgur.com/AoPDzOT.jpg # Gogh Higher Res 
!wget -nc https://i.imgur.com/gPvEBRn.jpg # Gogh Lower Res
!wget -nc https://i.imgur.com/HgmXhmx.jpg # Mona Lisa Lower Res

# Open the image
with open("gPvEBRn.jpg", 'rb') as f:
  file_contents = f.read()

with open("HgmXhmx.jpg", 'rb') as f:
  lisa_contents = f.read()

# Creating a TensorFlow session and loading the model
model_fn = 'tensorflow_inception_graph.pb'
graph = tf.Graph()
sess = tf.InteractiveSession(graph=graph)
with tf.gfile.GFile(model_fn, 'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
t_input = tf.placeholder(np.float32, name='input') # define the input tensor
imagenet_mean = 117.0
t_preprocessed = tf.expand_dims(t_input-imagenet_mean, 0)
tf.import_graph_def(graph_def, {'input':t_preprocessed})

def T(layer):
    '''This function can isolate layers of the GoogleNet network allowing us to 
     peak inside'''
    return graph.get_tensor_by_name("import/%s:0"%layer)


Optional: Upload or link to your own image!


Use the two cells below to upload an image from your computer

In [0]:
user_image_flag = False # This way these cells won't run when you "Run All"
if user_image_flag:
  # Create file upload dialog
  from google.colab import files
  uploaded = files.upload()

In [0]:
if user_image_flag:
  if type(uploaded) is not dict: uploaded = uploaded.files  ## Deal with filedit versions
  file_contents = uploaded[uploaded.keys()[0]]

OR Use the cell below to link to an image 

In [0]:
if user_image_flag:
  !wget -nc https://link.to/image_name.jpg # Link to image
  with open("image_name.jpg", 'rb') as f:
    file_contents = f.read()

Display Source Image

In [0]:
def showarray(a, fmt='jpeg'):
    a = np.uint8(np.clip(a, 0, 255))
    f = BytesIO()
    PIL.Image.fromarray(a).save(f, fmt)
    display(Image(data=f.getvalue()))
    
img0 = sess.run(tf.image.decode_image(file_contents))
lisa = sess.run(tf.image.decode_image(lisa_contents))
showarray(img0)

# Image Manipulation and Gradient Descent Code 

Now onto the code to actually manipulate our image, we will use gradient descent to exaggerate certain features in our image, we will also do some more complex manipulations

<!-- Core Algorithm:


*   Gradient Descent 
*   Rendering Image 
*   Helpers -->



In [0]:
# Hyperparameters # 

octave_n = 4 # Number of times we scale the image, 
             # helps distribute the size of manipulations
octave_scale = 1.4 # Ratio by which we scale each time 

iter_n = 10 # Number of times we update the weights or "manipulate pixels"
strength = 200 # Effective learning rate

# Helper function that uses TensorFlow to resize an image
def resize(img, new_size):
    return sess.run(tf.image.resize_bilinear(img[np.newaxis,:], new_size))[0]

# Apply gradients to an image in a seires of tiles
def calc_grad_tiled(img, t_grad, tile_size=256):
    '''Random shifts are applied to the image to blur tile boundaries over
    multiple iterations.
    This helps reduce noise in our image'''
    h, w = img.shape[:2]
    sx, sy = np.random.randint(tile_size, size=2)
    # We randomly roll the image in x and y to avoid seams between tiles.
    img_shift = np.roll(np.roll(img, sx, 1), sy, 0)
    grad = np.zeros_like(img)
    for y in range(0, max(h-tile_size//2, tile_size),tile_size):
        for x in range(0, max(w-tile_size//2, tile_size),tile_size):
            sub = img_shift[y:y+tile_size,x:x+tile_size]
            g = sess.run(t_grad, {t_input:sub})
            grad[y:y+tile_size,x:x+tile_size] = g
    imggrad = np.roll(np.roll(grad, -sx, 1), -sy, 0)
    # Add the image gradient to the image and return the result
    return img + imggrad*(strength * 0.01 / (np.abs(imggrad).mean()+1e-7))

# Putting it all together
# Applies deepdream at multiple scales
def render_deepdream(t_obj, input_img, show_steps = True):
    # Collapse the optimization objective to a single number (the loss)
    t_score = tf.reduce_mean(t_obj)
    # We need the gradient of the image with respect to the objective
    t_grad = tf.gradients(t_score, t_input)[0]

    # split the image into a number of octaves (laplacian pyramid)
    img = input_img
    octaves = []
    for i in range(octave_n-1):
        lo = resize(img, np.int32(np.float32(img.shape[:2])/octave_scale))
        octaves.append(img-resize(lo, img.shape[:2]))
        img = lo

    # generate details octave by octave
    for octave in range(octave_n):
        if octave>0:
            hi = octaves[-octave]
            img = resize(img, hi.shape[:2])+hi
        for i in range(iter_n):
            img = calc_grad_tiled(img, t_grad)
        if show_steps:
            clear_output()
            showarray(img)
    return img

Let's try this on our image!



# mixed4a, Unit 6: The baseball spiral

In [0]:
feature_channel = 249 #@param {type:"slider", max: 512}
layer = "mixed4a"  #@param ["mixed4d_3x3_bottleneck_pre_relu", "mixed3a", "mixed3b", "mixed4a", "mixed4c", "mixed5a"]
if feature_channel >= T(layer).shape[3]:
  print("Feature channel exceeds size of layer ", layer, " feature space. ")
  print("Choose a smaller channel number.")
else:
  render_deepdream(T(layer)[:,:,:,feature_channel], img0)

# mixed4a, Unit 453: Clouds—or fluffiness?

In [0]:
feature_channel = 304 #@param {type:"slider", max: 512}
layer = "mixed4a"  #@param ["mixed4d_3x3_bottleneck_pre_relu", "mixed3a", "mixed3b", "mixed4a", "mixed4c", "mixed5a"]
strength = 100 #@param {type:"slider", max: 1000}

if feature_channel >= T(layer).shape[3]:
  print("Feature channel exceeds size of layer ", layer, " feature space. ")
  print("Choose a smaller channel number.")
else:
  render_deepdream(T(layer)[:,:,:,feature_channel], img0)

# mixed4a, Unit 492: Buildings or Sky?


In [0]:
feature_channel = 294 #@param {type:"slider", max: 512}
layer = "mixed4a"  #@param ["mixed4d_3x3_bottleneck_pre_relu", "mixed3a", "mixed3b", "mixed4a", "mixed4c", "mixed5a"]
if feature_channel >= T(layer).shape[3]:
  print("Feature channel exceeds size of layer ", layer, " feature space. ")
  print("Choose a smaller channel number.")
else:
  render_deepdream(T(layer)[:,:,:,feature_channel], img0)

# Back to Theory: Even this is incomplete

Usually We can find many different images that optimally excite a certain neuron.

Examples: 

mixed4a, 97

![alt text](https://distill.pub/2017/feature-visualization/images/diversity/mixed4a_97_diversity.png)

mixed4a, 143

![alt text](https://distill.pub/2017/feature-visualization/images/diversity/mixed4a_143_diversity.png)

A lot more research is needed on this topic. We are barely scratching the surface of the potential that neural networks have, and understanding them is crucial to being able to harness that potential in a safe way.



# Just one?

We can also gereralize our process to appease more than one neuron at a time.



In [0]:
# Capture the output, otherwise we get annoying warnings and errors about upgrading to tensorflow 2
%%capture
# Let's import a model from the Lucid modelzoo!
model = models.InceptionV1()
model.load_graphdef()

In [0]:
def interpolate_param_f():
  unique = param.fft_image((6, 128, 128, 3))
  shared = [
    param.lowres_tensor((6, 128, 128, 3), (1, 128//2, 128//2, 3)),
    param.lowres_tensor((6, 128, 128, 3), (1, 128//4, 128//4, 3)),
    param.lowres_tensor((6, 128, 128, 3), (1, 128//8, 128//8, 3)),
    param.lowres_tensor((6, 128, 128, 3), (2, 128//8, 128//8, 3)),
    param.lowres_tensor((6, 128, 128, 3), (1, 128//16, 128//16, 3)),
    param.lowres_tensor((6, 128, 128, 3), (2, 128//16, 128//16, 3)),
  ]
  return param.to_valid_rgb(unique + sum(shared), decorrelate=True)

In [0]:
obj = objectives.channel_interpolate("mixed4a_pre_relu", 97, "mixed4a_pre_relu", 143)
_ = render.render_vis(model, obj, interpolate_param_f)

In [0]:
obj = objectives.channel_interpolate("mixed4a_pre_relu", 476, "mixed4a_pre_relu", 460)
_ = render.render_vis(model, obj, interpolate_param_f)

In [0]:
obj = objectives.channel_interpolate("mixed4a_pre_relu", 6, "mixed4a_pre_relu", 453)
_ = render.render_vis(model, obj, interpolate_param_f)

In [0]:
#@title
# You can use `Clear Output` if the animation gets annoying.
%%html
<style> 
  #animation {
    width: 128px;
    height: 128px;
    background: url('https://storage.googleapis.com/tensorflow-lucid/static/img/notebook-interpolation-example-run-4.png') left center;
    animation: play 1.5s steps(6) infinite alternate;
  }
  @keyframes play {
    100% { background-position: -768px; }
  }
</style><div id='animation'></div>

This interpolation process is also fairly complicated. There are a bunch of alignment issues you run into. More details [here](https://colab.research.google.com/github/tensorflow/lucid/blob/master/notebooks/differentiable-parameterizations/aligned_interpolation.ipynb).

The question we want to think about here is how the "idea" or "pattern" changes when we're considering more than one neuron.
Biologically speaking, most ideas are represented by groups of neurons working together. 

So instead of trying to interpret single neurons, we should also attempt to interpret groups of neuron activations as a whole.
Current research shows that random group activations are often interpretable, but at a lower rate than sigular neurons. Keep in mind that the amount of possible group activations are several orders of magnitude larger than single activations.


# Let's try to exaggerate multiple features in our image

In fact, why not do a whole layer?

Lets pick one of the layers in GoogleNet and have our model optimize for activations in that layer.

We can start with the the most basic layer 'mixed3a'

In [0]:
octave_n = 4 #@param {type:"slider", max: 10}
octave_scale = 1.4 #@param {type:"number"}
iter_n = 25 #@param {type:"slider", max: 50}
strength = 200 #@param {type:"slider", max: 1000}
layer = "mixed3a"  #@param ["mixed3a", "mixed3b", "mixed4a", "mixed4c", "mixed5a", "mixed5b"]

final = render_deepdream(tf.square(T(layer)), img0)

How about a more complex layer?

In [0]:
octave_n = 4 #@param {type:"slider", max: 10}
octave_scale = 1.4 #@param {type:"number"}
iter_n = 50 #@param {type:"slider", max: 50}
strength = 150 #@param {type:"slider", max: 1000}
layer = "mixed4c"  #@param ["mixed3a", "mixed3b", "mixed4a", "mixed4c", "mixed5a", "mixed5b"]

final = render_deepdream(tf.square(T(layer)), img0)

# What type of structure can we notice, if any?

It turns out that our neurons early on in the network (closer to the input) pick up the most basic features in the image. Lines, edges, angles, colours, that type of thing.

As we progress further, the features get more complex. You start to have shapes, some complicated pattterns.

And eventually the neurons/layers close to the output are starting to capture more and more complex patterns and ideas, such as whole objects, animals, etc.

Column labels from Left to Right: 

Edges (layer conv2d0), Textures (layer mixed3a), Patterns (layer mixed4a), Parts (layers mixed4b & mixed4c), Objects (layers mixed4d & mixed4e)

<details><summary>Final Structure</summary>
<p>

![alt text](https://distill.pub/2017/feature-visualization/images/sprite_hero.png)



</p>
</details>










# Style Transfer

Instead of individually picking the neurons we want to optimize for, can we instead optimize for another image? Yes!

As usual, this is fairly complicated it. There are many different models for style transfer, and most of them are too computationally intensive to run on google collab, we will be using the old and basic TF-Hub and even then we will have to scale down our images.

There's different notions of "style" you can prioritize when you train a network for style transfer. How much focus do you place on colour? Texture? Geometric Composition? 

You can learn more about style transfer [here](https://github.com/MacgyverCode/Style-Transfer-Colab).

We need a different version of tensorflow to use the Tensorflow Hub module, so we should also restart our runtime before proceeding 

In [0]:
try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass
import tensorflow as tf
import numpy as np
import PIL.Image
import matplotlib.pyplot as plt

We need a few helper functions here to make our lives easier. 

*  We need to limit the maximum dimension of our images to 512 pixels, otherwise we run out of resources. We can do this by scaling the images as we load them.
*  We want to display the two images at once to see the styles being transfered.
*  We want to display the returned tensor as an image.


In [0]:
def load_img(path_to_img):
  max_dim = 512
  img = tf.io.read_file(path_to_img)
  img = tf.image.decode_image(img, channels=3)
  img = tf.image.convert_image_dtype(img, tf.float32)

  shape = tf.cast(tf.shape(img)[:-1], tf.float32)
  long_dim = max(shape)
  scale = max_dim / long_dim

  new_shape = tf.cast(shape * scale, tf.int32)

  img = tf.image.resize(img, new_shape)
  img = img[tf.newaxis, :]
  return img

def imshow(image, title=None):
  if len(image.shape) > 3:
    image = tf.squeeze(image, axis=0)

  plt.imshow(image)
  if title:
    plt.title(title)
    
def display_compare(content_image, style_image):
  plt.subplot(1, 2, 1)
  imshow(content_image, 'Content Image')

  plt.subplot(1, 2, 2)
  imshow(style_image, 'Style Image')

def tensor_to_image(tensor):
  tensor = tensor*255
  tensor = np.array(tensor, dtype=np.uint8)
  if np.ndim(tensor)>3:
    assert tensor.shape[0] == 1
    tensor = tensor[0]
  return PIL.Image.fromarray(tensor)

In [0]:
import tensorflow_hub as hub

content_path = tf.keras.utils.get_file('Gogh.jpg', 'https://i.imgur.com/gPvEBRn.jpg')
style_path = tf.keras.utils.get_file('Waves.jpg','https://i.imgur.com/TDSIlPY.jpg')

content_image = load_img(content_path)
style_image = load_img(style_path)

# content_image = load_img(style_path)
# style_image = load_img(content_path)

display_compare(content_image, style_image)


In [0]:
hub_module = hub.load('https://tfhub.dev/google/magenta/arbitrary-image-stylization-v1-256/1')
stylized_image = hub_module(tf.constant(content_image), tf.constant(style_image))[0]
tensor_to_image(stylized_image)

# Finally: Keep On Dreaming

One thing that makes DeepDream so fascinating is that there are virtually no limits to what you can do with it. Just put everything in a loop and let it run.
It will keep modifying the image in new and interesting ways, ad infiniteum. The result is kind of like Alice falling through the rabbit hole, discovering a strange new world along the way.

In [0]:
layer = "mixed4c"  #@param ["mixed4d_3x3_bottleneck_pre_relu", "mixed3a", "mixed3b", "mixed4a", "mixed4c", "mixed5a"]
iter_n = 20 #@param {type:"slider", max: 50}
strength = 200 #@param {type:"slider", max: 1000}
zooming_steps = 20 #@param {type:"slider", max: 512}
zoom_factor = 1.4 #@param {type:"number"}
 
frame = img0
 
# newsize = np.int32(np.float32(frame.shape[:2])*zoom_factor)
# frame = resize(frame, newsize)
 
img_y, img_x, _ = img0.shape
for i in range(zooming_steps):
  frame = render_deepdream(tf.square(T(layer)), frame, False)
  # clear_output()
  showarray(frame)
  newsize = np.int32(np.float32(frame.shape[:2])*zoom_factor)
  frame = resize(frame, newsize)
  frame = frame[(newsize[0]-img_y)//2:(newsize[0]-img_y)//2+img_y,
                (newsize[1]-img_x)//2:(newsize[1]-img_x)//2+img_x,:] 

# [Feedback Form](https://docs.google.com/forms/d/e/1FAIpQLSe_A7TibnNTqZu1GOH53f2ebSUvXmxmAjw2Avszex_UEWfcVQ/viewform?usp=sf_link)

All the feedback is anonymous, and really helps me improve my events in the future. As a bonus, if you fill it out you get an extra surprise code cell! 