# 04 利用卷积神经网络实现图像风格迁移
## 原理
* 将风格图的风格和内容图的内容进行融合，所生成的图片，在内容上应尽可能地接近内容图，在风格上应该尽可能接近风格图
* 因此需要定义**内容损失函数**和**风格损失函数**，经过加权后作为总的损失函数。
* 实现步骤如下：
    * 随机产生一张图片
    * 在每轮的迭代中，根据总的损失函数调整图片的像素值
    * 经过多轮迭代，得到优化后的图片
## 内容损失函数
* 不能简单通过像素检索内容损失
* 使用CNN将各个卷积层的输出作为图像的内容，以VGG 19为例，其中包含了多个卷积层、池化层，以及最后的全连接层。<br/>
![jupyter](./VGG19.webp)
* 本例中使用conv4_2的输出作为图像内容表示，定义内容损失函数如下：
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><msub><mi>L</mi><mrow><mi>c</mi><mi>o</mi><mi>n</mi><mi>t</mi><mi>e</mi><mi>n</mi><mi>e</mi><mi>t</mi></mrow></msub><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mrow data-mjx-texclass="REL"><mover><mi>p</mi><mrow></mrow></mover></mrow></mrow><mo>,</mo><mrow data-mjx-texclass="ORD"><mrow data-mjx-texclass="REL"><mover><mi>x</mi><mrow><mpadded height="-3pt" depth="+3pt" voffset="-3pt"><mstyle displaystyle="false" scriptlevel="0"><mrow><mstyle displaystyle="false" scriptlevel="2"><mo stretchy="false">⇀</mo></mstyle></mrow></mstyle></mpadded></mrow></mover></mrow></mrow><mo>,</mo><mi>l</mi><mo stretchy="false">)</mo><mo>=</mo><mfrac><mn>1</mn><mn>2</mn></mfrac><munder><mo data-mjx-texclass="OP">∑</mo><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mrow></munder><mo stretchy="false">(</mo><msup><mrow><msub><mi>F</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub></mrow><mo>′</mo></msup><mo>−</mo><msup><mrow><msub><mi>P</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub></mrow><mo>′</mo></msup><msup><mo stretchy="false">)</mo><mrow><mn>2</mn></mrow></msup></math>

## 风格损失函数
* 风格是一个很难说清楚的概念，可能是笔触、纹理、结构、布局、用色等等,这里我们使用卷积层各个特征图之间的互相关作为圈像的风格，以conv1_1为例。
    * 共包含64个特征图即feature map，或者说图像的深度、通道的个数
    * 每个特征图都是对上一层输出的一种理解，可以类比成64个人对同一幅画的不同理解
    * 这些人可能分别偏好印象派、现代主义、超现实主义、表现主义等不同风格
    * 当图像是某一种风格时，可能这一部分人很欣赏，但那一部分人不喜欢
    * 当图像是另一种风格时，可能这一部分人不喜欢，但那一部分人很欣赏
    * 64个人之间理解的差异，可以用特征圈的互相关表示，这里使用 $ Gram $ 矩阵计算互相关
    * 不同的风格会导致差异化的互相关结果
* $ Gram $矩阵的计算如下，如果有64个特征圈，那么$ Gram $矩阵的大小便是 $64*64$，第$ i $行第 $ j $ 列的值表示第 $ i $ 个特征图和第 $ j $ 个特征圈之问的互相关，用内积计算。
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><msubsup><mi>G</mi><mrow><mi>i</mi><mi>j</mi></mrow><mo>′</mo></msubsup><mo>=</mo><munder><mo data-mjx-texclass="OP">∑</mo><mrow><mi>k</mi></mrow></munder><mrow><msubsup><mi>F</mi><mrow><mi>i</mi><mi>k</mi></mrow><mo>′</mo></msubsup></mrow><mrow><msubsup><mi>F</mi><mrow><mi>i</mi><mi>k</mi></mrow><mo>′</mo></msubsup></mrow></math>

* 风格损失函数定义如下，对多个卷积层的风格表示差异进行加权：
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><msub><mi>E</mi><mrow><mi>l</mi></mrow></msub><mo>=</mo><mfrac><mn>1</mn><mrow><mn>4</mn><msubsup><mi>N</mi><mrow><mi>l</mi></mrow><mrow><mn>2</mn></mrow></msubsup><msubsup><mi>M</mi><mrow><mi>l</mi></mrow><mrow><mn>2</mn></mrow></msubsup></mrow></mfrac><munder><mo data-mjx-texclass="OP">∑</mo><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mrow></munder><mo stretchy="false">(</mo><msubsup><mi>G</mi><mrow><mi>i</mi><mi>j</mi></mrow><mo>′</mo></msubsup><mo>−</mo><msubsup><mi>A</mi><mrow><mi>i</mi><mi>j</mi></mrow><mo>′</mo></msubsup><msup><mo stretchy="false">)</mo><mrow><mn>2</mn></mrow></msup></math>

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><msub><mi>L</mi><mrow><mi>s</mi><mi>t</mi><mi>y</mi><mi>l</mi><mi>e</mi></mrow></msub><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mrow data-mjx-texclass="REL"><mover><mi>a</mi><mrow><mpadded height="-3pt" depth="+3pt" voffset="-3pt"><mstyle displaystyle="false" scriptlevel="0"><mrow><mstyle displaystyle="false" scriptlevel="2"><mo stretchy="false">⇀</mo></mstyle></mrow></mstyle></mpadded></mrow></mover></mrow></mrow><mo>,</mo><mrow data-mjx-texclass="ORD"><mrow data-mjx-texclass="REL"><mover><mi>x</mi><mrow><mpadded height="-3pt" depth="+3pt" voffset="-3pt"><mstyle displaystyle="false" scriptlevel="0"><mrow><mstyle displaystyle="false" scriptlevel="2"><mo stretchy="false">⇀</mo></mstyle></mrow></mstyle></mpadded></mrow></mover></mrow></mrow><mo stretchy="false">)</mo><mo>=</mo><munderover><mo data-mjx-texclass="OP">∑</mo><mrow><mi>l</mi><mo>=</mo><mn>0</mn></mrow><mrow><mi>L</mi></mrow></munderover><msub><mi>ω</mi><mrow><mi>l</mi></mrow></msub><msub><mi>E</mi><mrow><mi>l</mi></mrow></msub></math>

* 这里我们使用$ conv1_1 $、$ conv2_1 $、$ conv3_1 $、$ conv4_1 $、$ conv5_1 $ 五个卷积层进行风格损失函数的计算，不同的权重将会导致不同的迁移效果。（约到后面的层越抽象）
## 总的损失函数
* 总的损失函数即内容损失函数和风格损失函数的加权，不同的权重将导致不同的迁移效果。
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><msub><mi>L</mi><mrow><mi>t</mi><mi>o</mi><mi>t</mi><mi>a</mi><mi>l</mi></mrow></msub><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mrow data-mjx-texclass="REL"><mover><mi>p</mi><mrow><mpadded height="-3pt" depth="+3pt" voffset="-3pt"><mstyle displaystyle="false" scriptlevel="0"><mrow><mstyle displaystyle="false" scriptlevel="2"><mo stretchy="false">⇀</mo></mstyle></mrow></mstyle></mpadded></mrow></mover></mrow></mrow><mo>,</mo><mrow data-mjx-texclass="ORD"><mrow data-mjx-texclass="REL"><mover><mi>a</mi><mrow><mpadded height="-3pt" depth="+3pt" voffset="-3pt"><mstyle displaystyle="false" scriptlevel="0"><mrow><mstyle displaystyle="false" scriptlevel="2"><mo stretchy="false">⇀</mo></mstyle></mrow></mstyle></mpadded></mrow></mover></mrow></mrow><mo>,</mo><mrow data-mjx-texclass="ORD"><mrow data-mjx-texclass="REL"><mover><mi>x</mi><mrow><mpadded height="-3pt" depth="+3pt" voffset="-3pt"><mstyle displaystyle="false" scriptlevel="0"><mrow><mstyle displaystyle="false" scriptlevel="2"><mo stretchy="false">⇀</mo></mstyle></mrow></mstyle></mpadded></mrow></mover></mrow></mrow><mo stretchy="false">)</mo><mo>=</mo><mi>α</mi><msub><mi>L</mi><mrow><mi>c</mi><mi>o</mi><mi>n</mi><mi>t</mi><mi>e</mi><mi>n</mi><mi>t</mi></mrow></msub><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mrow data-mjx-texclass="REL"><mover><mi>p</mi><mrow><mpadded height="-3pt" depth="+3pt" voffset="-3pt"><mstyle displaystyle="false" scriptlevel="0"><mrow><mstyle displaystyle="false" scriptlevel="2"><mo stretchy="false">⇀</mo></mstyle></mrow></mstyle></mpadded></mrow></mover></mrow></mrow><mo>,</mo><mrow data-mjx-texclass="ORD"><mrow data-mjx-texclass="REL"><mover><mi>x</mi><mrow><mpadded height="-3pt" depth="+3pt" voffset="-3pt"><mstyle displaystyle="false" scriptlevel="0"><mrow><mstyle displaystyle="false" scriptlevel="2"><mo stretchy="false">⇀</mo></mstyle></mrow></mstyle></mpadded></mrow></mover></mrow></mrow><mo stretchy="false">)</mo><mo>+</mo><mi>β</mi><msub><mi>L</mi><mrow><mi>s</mi><mi>t</mi><mi>y</mi><mi>l</mi><mi>e</mi></mrow></msub><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mrow data-mjx-texclass="REL"><mover><mi>a</mi><mrow><mpadded height="-3pt" depth="+3pt" voffset="-3pt"><mstyle displaystyle="false" scriptlevel="0"><mrow><mstyle displaystyle="false" scriptlevel="2"><mo stretchy="false">⇀</mo></mstyle></mrow></mstyle></mpadded></mrow></mover></mrow></mrow><mo>,</mo><mrow data-mjx-texclass="ORD"><mrow data-mjx-texclass="REL"><mover><mi>x</mi><mrow><mpadded height="-3pt" depth="+3pt" voffset="-3pt"><mstyle displaystyle="false" scriptlevel="0"><mrow><mstyle displaystyle="false" scriptlevel="2"><mo stretchy="false">⇀</mo></mstyle></mrow></mstyle></mpadded></mrow></mover></mrow></mrow><mo stretchy="false">)</mo></math>

In [None]:
# -*- coding: utf-8 -*-

import tensorflow as tf
import numpy as np
import scipy.io
import scipy.misc# scipy.io 和 scipy.misc：用于读取和保存图像数据
import os
import time

# 打印当前的系统时间，用于记录训练过程的时间戳。便于计算耗时
def the_current_time():
	print(time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(int(time.time()))))

# 定义内容图像、风格图像、输出路径。
CONTENT_IMG = 'content.jpg'
STYLE_IMG = 'style5.jpg'
OUTPUT_DIR = 'neural_style_transfer_tensorflow/'

if not os.path.exists(OUTPUT_DIR):
	os.mkdir(OUTPUT_DIR)

# 定义输出图像的宽度、高度和通道数（RGB为3）。
IMAGE_W = 800
IMAGE_H = 600
COLOR_C = 3

# 定义噪声图像在初始输入图像中的权重（随机叠加的噪音层）、内容损失和风格损失函数的因子。
NOISE_RATIO = 0.7
BETA = 5
ALPHA = 100

VGG_MODEL = 'imagenet-vgg-verydeep-19.mat'
# VGG19 预训练模型中使用的图像平均值，用于对输入图像进行预处理。
MEAN_VALUES = np.array([123.68, 116.779, 103.939]).reshape((1, 1, 1, 3))

def load_vgg_model(path):
	'''
	VGG-19的每一个层都有一个int标识
	Details of the VGG19 model:
	- 0 is conv1_1 (3, 3, 3, 64)
	- 1 is relu
	- 2 is conv1_2 (3, 3, 64, 64)
	- 3 is relu    
	- 4 is maxpool
	- 5 is conv2_1 (3, 3, 64, 128)
	- 6 is relu
	- 7 is conv2_2 (3, 3, 128, 128)
	- 8 is relu
	- 9 is maxpool
	- 10 is conv3_1 (3, 3, 128, 256)
	- 11 is relu
	- 12 is conv3_2 (3, 3, 256, 256)
	- 13 is relu
	- 14 is conv3_3 (3, 3, 256, 256)
	- 15 is relu
	- 16 is conv3_4 (3, 3, 256, 256)
	- 17 is relu
	- 18 is maxpool
	- 19 is conv4_1 (3, 3, 256, 512)
	- 20 is relu
	- 21 is conv4_2 (3, 3, 512, 512)
	- 22 is relu
	- 23 is conv4_3 (3, 3, 512, 512)
	- 24 is relu
	- 25 is conv4_4 (3, 3, 512, 512)
	- 26 is relu
	- 27 is maxpool
	- 28 is conv5_1 (3, 3, 512, 512)
	- 29 is relu
	- 30 is conv5_2 (3, 3, 512, 512)
	- 31 is relu
	- 32 is conv5_3 (3, 3, 512, 512)
	- 33 is relu
	- 34 is conv5_4 (3, 3, 512, 512)
	- 35 is relu
	- 36 is maxpool
	- 37 is fullyconnected (7, 7, 512, 4096)
	- 38 is relu
	- 39 is fullyconnected (1, 1, 4096, 4096)
	- 40 is relu
	- 41 is fullyconnected (1, 1, 4096, 1000)
	- 42 is softmax
	'''
	vgg = scipy.io.loadmat(path)
	vgg_layers = vgg['layers'] # 获取模型的所有层

	def _weights(layer, expected_layer_name):
		W = vgg_layers[0][layer][0][0][2][0][0]
		b = vgg_layers[0][layer][0][0][2][0][1]
		layer_name = vgg_layers[0][layer][0][0][0][0]
		assert layer_name == expected_layer_name
		return W, b

	def _conv2d_relu(prev_layer, layer, layer_name):
		W, b = _weights(layer, layer_name)
		W = tf.constant(W)
		b = tf.constant(np.reshape(b, (b.size)))
		return tf.nn.relu(tf.nn.conv2d(prev_layer, filter=W, strides=[1, 1, 1, 1], padding='SAME') + b)

	def _avgpool(prev_layer):
		return tf.nn.avg_pool(prev_layer, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
# graph：存储 VGG19 网络各层输出的字典。逐层搭建 VGG19 网络，每层使用卷积加 ReLU 或池化操作。

	graph = {}
	graph['input']    = tf.Variable(np.zeros((1, IMAGE_H, IMAGE_W, COLOR_C)), dtype='float32')
	graph['conv1_1']  = _conv2d_relu(graph['input'], 0, 'conv1_1')
	graph['conv1_2']  = _conv2d_relu(graph['conv1_1'], 2, 'conv1_2')
	graph['avgpool1'] = _avgpool(graph['conv1_2'])
	graph['conv2_1']  = _conv2d_relu(graph['avgpool1'], 5, 'conv2_1')
	graph['conv2_2']  = _conv2d_relu(graph['conv2_1'], 7, 'conv2_2')
	graph['avgpool2'] = _avgpool(graph['conv2_2'])
	graph['conv3_1']  = _conv2d_relu(graph['avgpool2'], 10, 'conv3_1')
	graph['conv3_2']  = _conv2d_relu(graph['conv3_1'], 12, 'conv3_2')
	graph['conv3_3']  = _conv2d_relu(graph['conv3_2'], 14, 'conv3_3')
	graph['conv3_4']  = _conv2d_relu(graph['conv3_3'], 16, 'conv3_4')
	graph['avgpool3'] = _avgpool(graph['conv3_4'])
	graph['conv4_1']  = _conv2d_relu(graph['avgpool3'], 19, 'conv4_1')
	graph['conv4_2']  = _conv2d_relu(graph['conv4_1'], 21, 'conv4_2')
	graph['conv4_3']  = _conv2d_relu(graph['conv4_2'], 23, 'conv4_3')
	graph['conv4_4']  = _conv2d_relu(graph['conv4_3'], 25, 'conv4_4')
	graph['avgpool4'] = _avgpool(graph['conv4_4'])
	graph['conv5_1']  = _conv2d_relu(graph['avgpool4'], 28, 'conv5_1')
	graph['conv5_2']  = _conv2d_relu(graph['conv5_1'], 30, 'conv5_2')
	graph['conv5_3']  = _conv2d_relu(graph['conv5_2'], 32, 'conv5_3')
	graph['conv5_4']  = _conv2d_relu(graph['conv5_3'], 34, 'conv5_4')
	graph['avgpool5'] = _avgpool(graph['conv5_4'])
	return graph

def content_loss_func(sess, model):
	def _content_loss(p, x):
		N = p.shape[3]
		M = p.shape[1] * p.shape[2]
		return (1 / (4 * N * M)) * tf.reduce_sum(tf.pow(x - p, 2))
	return _content_loss(sess.run(model['conv4_2']), model['conv4_2'])

STYLE_LAYERS = [('conv1_1', 0.5), ('conv2_1', 1.0), ('conv3_1', 1.5), ('conv4_1', 3.0), ('conv5_1', 4.0)]

def style_loss_func(sess, model):
	def _gram_matrix(F, N, M):
		Ft = tf.reshape(F, (M, N))
		return tf.matmul(tf.transpose(Ft), Ft)

	def _style_loss(a, x):
		N = a.shape[3]
		M = a.shape[1] * a.shape[2]
		A = _gram_matrix(a, N, M)
		G = _gram_matrix(x, N, M)
		return (1 / (4 * N ** 2 * M ** 2)) * tf.reduce_sum(tf.pow(G - A, 2))

	return sum([_style_loss(sess.run(model[layer_name]), model[layer_name]) * w for layer_name, w in STYLE_LAYERS])

def generate_noise_image(content_image, noise_ratio=NOISE_RATIO):
	noise_image = np.random.uniform(-20, 20, (1, IMAGE_H, IMAGE_W, COLOR_C)).astype('float32')
	input_image = noise_image * noise_ratio + content_image * (1 - noise_ratio)
	return input_image

def load_image(path):
	image = scipy.misc.imread(path)
	image = scipy.misc.imresize(image, (IMAGE_H, IMAGE_W))
	image = np.reshape(image, ((1, ) + image.shape))
	image = image - MEAN_VALUES
	return image

def save_image(path, image):
	image = image + MEAN_VALUES
	image = image[0]
	image = np.clip(image, 0, 255).astype('uint8')
	scipy.misc.imsave(path, image)

the_current_time()

with tf.Session() as sess:
	content_image = load_image(CONTENT_IMG)
	style_image = load_image(STYLE_IMG)
	model = load_vgg_model(VGG_MODEL)

	input_image = generate_noise_image(content_image)
	sess.run(tf.global_variables_initializer())

	sess.run(model['input'].assign(content_image))
	content_loss = content_loss_func(sess, model)

	sess.run(model['input'].assign(style_image))
	style_loss = style_loss_func(sess, model)

	total_loss = BETA * content_loss + ALPHA * style_loss
	optimizer = tf.train.AdamOptimizer(2.0)
	train = optimizer.minimize(total_loss)

	sess.run(tf.global_variables_initializer())
	sess.run(model['input'].assign(input_image))

	ITERATIONS = 2000
	for i in range(ITERATIONS):
		sess.run(train)
		if i % 100 == 0:
			output_image = sess.run(model['input'])
			the_current_time()
			print('Iteration %d' % i)
			print('Cost: ', sess.run(total_loss))

			save_image(os.path.join(OUTPUT_DIR, 'output_%d.jpg' % i), output_image)


In [None]:
import tensorflow as tf

cifar = tf.keras.datasets.cifar100
(x_train, y_train), (x_test, y_test) = cifar.load_data()
model = tf.keras.applications.ResNet50(
    include_top=True,
    weights=None,
    input_shape=(32, 32, 3),
    classes=100,)

loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"])
model.fit(x_train, y_train, epochs=5, batch_size=64)

In [None]:
#导入tensorflow，判断是否安装成功
import tensorflow as tf
print(tf.__version__)
