# Deep Convolutional Generative Adversarial Network (DCGAN)
## Tutorial

## 1. Introduction

This tutorial will guide you through the process of building a Generative Adversarial Network with convolutional layers (DCGAN).

Our aim is to generate samples that resemble as much as possible the ones that are given in the training set, by drawing from vectors in a latent space.

DCGANs consist of two competitive networks: a Generator and a Discriminator. In order to achieve our goal, we train their parameters in order to minimize two loss functions: $\mathcal{L_{disc}}$ and $\mathcal{L_{gen}}$ for the discriminator and the generator respectively

In the following notebook we will focus on the application of generative methods in Machine Learning to a proxy dataset for a Monte Carlo simulation output in a High Energy Physics experiment. 

In such datasets not only the common metrics used to gauge the performances of a GAN network are relevant (such as the Discriminator accuracy or the loss values) but also other statistic information are, for example the distributions of the variables that the dataset embeds. 

For a comprehensive treatment of DCGANs refer to:
1. https://arxiv.org/abs/1406.2661
2. https://arxiv.org/abs/1606.03498v1



### 1.1 Important imports

In [None]:
import numpy as np
import pickle
import os
import tensorflow as tf
import matplotlib.pyplot as plt
import time
import datetime
from tensorflow.keras import layers, models
import tensorflow.keras.backend as kb


from IPython import display
import warnings; warnings.simplefilter('ignore')

# Load the TensorBoard notebook extension
%load_ext tensorboard

In [None]:
#get some utils from:
!git clone https://github.com/dlanci/UZHMLWorkshop2020-GAN
os.chdir('UZHMLWorkshop2020-GAN/')
from utils.utils import load_dataset
from utils.utils import generate_and_save_images

In [None]:
#load the dataset file
!wget -q --show-progress -O batch0.pickle "https://www.dropbox.com/s/wn8ilvp8k67grz4/batch0.pickle?dl=0";
!wget -q --show-progress -O batch1.pickle "https://www.dropbox.com/s/5vjme0o8drbi0v4/batch1.pickle?dl=0";
!wget -q --show-progress -O batch2.pickle "https://www.dropbox.com/s/rz2b8c4911kb4iy/batch2.pickle?dl=0";
!wget -q --show-progress -O batch3.pickle "https://www.dropbox.com/s/2wa94zzt7wq2002/batch3.pickle?dl=0";
!wget -q --show-progress -O batch4.pickle "https://www.dropbox.com/s/icntmsval8nync5/batch4.pickle?dl=0";

## 2. The dataset

### 2.1 Load the dataset

In [None]:
tuple_, tot_evts = load_dataset(path='.', test=False)

### 2.2 And explore it

We just loaded a python $\texttt{dict}$, and for this notebook we will use the numpy $\texttt{ndarray}$ that is accessible at the key: <span style="color:red">'reco_imgs'</span>. Such tensor has dimensions:

$$
tuple\_['reco\_imgs'].shape = [tot\_evts,
                                X\_pixels, 
                                Y\_pixels,
                                1]
$$

It contains a number tot_events of images whose pixels map the energy deposit of a particle in the calorimeter, note that the n_channels for each of these images is 1 as the energy deposited in every pixel is a scalar



### The LHCb Hadron Calorimeter
---------------
![Center](./nb_img/calo.jpeg)

from https://cds.cern.ch/record/1293073/files/Guz.pdf

In [None]:

X_pixels=48   # n of cells in horizontal. direction
Y_pixels=48   # n of cells in horizontal. direction


#Use fancy indexing to divide the loaded dataset 
#in train and test subsamples such that the train_set
#contains 90% of the original sample and the test_set
#the remainder

X_train=tuple_['reco_imgs'][0:np.int(tot_evts*0.9)]
X_test=tuple_['reco_imgs'][np.int(tot_evts*0.9):tot_evts]



# ___________________________________________________________________
#now let's normalise the energy deposit per cell so that
#it reaches 1 as a maximum value

maxval=X_train[np.where(X_train!=0)].max()

X_train_norm = (X_train)/(maxval)
X_test_norm = (X_test)/(maxval)


# ___________________________________________________________________
#and let's take a look at one of those images, and
#the total energy deposit per image distribution 

plt.subplot(1,2,1)
plt.imshow(X_train_norm[2].reshape(X_pixels,Y_pixels))
plt.xlabel('X', fontsize=15)
plt.ylabel('Y', fontsize=15)
plt.title('A sample image', fontsize=15)
plt.colorbar()
plt.subplot(1,2,2)
plt.hist(np.sum(X_train_norm,axis=(1,2,3)),bins=100);
plt.xlabel('E (a.u.)', fontsize=15)
plt.ylabel('dN/dE', fontsize=15)
plt.title('Total energy deposit per image', fontsize=15)
fig = plt.gcf()
fig.set_size_inches(16,5)

### 2.3 Tensorflow Datasets


Tensorflow 2.x handles the data pipe-in, the batching and shuffling of the dataset (such as many other useful utilities such as reshaping and one-hot encoding) through the $\texttt{Dataset}$ function.

(See https://www.tensorflow.org/api_docs/python/tf/data/Dataset)

In [None]:
# Let's fix some numbers
BATCH_SIZE = 32
BUFFER_TRAIN_SIZE = np.int(tot_evts*0.9)
BUFFER_TEST_SIZE = np.int(tot_evts*0.1)


# ___________________________________________________________________
# Our dataset tensor is sliced along the first dimension, 
# (i.e. the # of events dimension) with from_tensor_slices:

train_dataset = tf.data.Dataset.from_tensor_slices(X_train_norm)
test_dataset  = tf.data.Dataset.from_tensor_slices(X_test_norm)

print("The full train dataset:              ",train_dataset)

# ___________________________________________________________________
# The sliced tensor is then shuffled, we set the BUFFER_SIZE as 
# the shape[0] of the X_train, and the option of reshuffling
# at every epoch.

train_dataset = train_dataset.shuffle(BUFFER_TRAIN_SIZE, 
                                      reshuffle_each_iteration=True)
test_dataset = test_dataset.shuffle(BUFFER_TEST_SIZE, 
                                      reshuffle_each_iteration=True)

print("The shuffled train dataset:          ",train_dataset)

# ___________________________________________________________________
# We divide our dataset tensor is batches of BATCH_SIZE images, and
# drop the remainder of events that exceed the sample batching


train_dataset = train_dataset.batch(BATCH_SIZE, 
                                    drop_remainder=True)
test_dataset = test_dataset.batch(BATCH_SIZE, 
                                    drop_remainder=True)

print("The batched train dataset:           ",train_dataset)


# ___________________________________________________________________
# elements in the dataset can stiil be accessed
# through an iterator 

it = iter(train_dataset)
img = next(it)
plt.imshow(img.numpy()[0].reshape(48,48))
plt.xlabel('X', fontsize=15)
plt.ylabel('Y', fontsize=15)
plt.title('A sample image', fontsize=15)
plt.colorbar()

## 3. Model building

### 3.1 Building the Generator model

Tensorflow is further adopting Keras as their high level API. In this notebook we will make use of the functional API as it is a way to create models that are more flexible than the $\texttt{tf.keras.Sequential}$ API **[1]** . The functional API can handle models with non-linear topology, shared layers, and even multiple inputs or outputs.

We'll define two <span style="color:blue">functions </span> each one returning a model, one for the generator (G) and one for the discriminator (D), let's start from G:

The generator is an upsampling network that maps points in the latent space to higher dimension objects, in our case also to higher rank objects as the output are (48x48) images. The key method for an upsampling network is the deconvolution operation and we will implement it with Conv2DTranspose.


![Center](./nb_img/GAN/gen.jpeg)

**Useful references:**

* [1] https://www.tensorflow.org/guide/keras/functional

* [2] A guide to the arithmetic of convolution
https://arxiv.org/pdf/1603.07285v1.pdf

* [3] A comprehensive treatment of the Deconvolution operation:
https://distill.pub/2016/deconv-checkerboard/

* [4] https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2DTranspose

In [None]:
def make_generator_model(z_dim):
    
    n_nodes = 12*12
    
        
    z_input_init = layers.Input((z_dim,)) # As opposed to the Sequential API, the functional API 
                                          # require the specification of the layer.Input, as an
                                          # argument the shape (z_dim,) is the shape of the latent
                                          # space input vector
    
                           #output dim, # no bias b
    z_input = layers.Dense(64 * n_nodes, use_bias=False)(z_input_init) # Here we propagate the input
    z_input = layers.LeakyReLU(alpha=0.2)(z_input)                     # through dense connected layers    
    z_input = layers.Reshape((12, 12, 64))(z_input)                    # and reshape it as a rank 3 tensor
        
    # note that the 1st dimension (batch_size) is not specified
    

    hid = layers.Conv2DTranspose(filters=32,                           # number of output channels
                                 kernel_size=(4,4),                    # size of the convolution kernel              
                                 strides=(2,2),                        
                                 padding='same',
                                 use_bias=False)(z_input)
    #print(hid.shape)

    
    hid=layers.BatchNormalization()(hid)
    hid=layers.LeakyReLU(alpha=0.2)(hid)
    
    
    hid = layers.Conv2DTranspose(filters=16,
                                 kernel_size=(5,5), 
                                 strides=(2,2), 
                                 padding='same',
                                 use_bias=False)(hid)
    
    #print(hid.shape)
    
    hid=layers.BatchNormalization()(hid)
    hid=layers.LeakyReLU(alpha=0.2)(hid)    
    
    out = layers.Conv2DTranspose(filters=1,
                                 kernel_size=(5,5), 
                                 strides=(1,1), 
                                 padding='same',
                                 use_bias=False,
                                 activation='sigmoid')(hid)

    #print(hid.shape)
    
    model = models.Model(z_input_init, outputs=out)   #specify inputs and outputs of your model
    
    return model
    

In [None]:
noise_dim=128                                      # dimension of the latent vector for the generator
generator = make_generator_model(z_dim=noise_dim)  # let's create the generator and print
generator.summary()                                # a useful summary of the created layers

### 3.2 Building the Discriminator model


The discriminator is an downsampling network that is trained to tell apart true from fake images. The true images are the dataset images while fake images are the ones produced by the discriminator. The key method to downsample and extract important features of the image is the convolution operation:





![Center](./nb_img/GAN/disc.jpeg)

**Useful references**

* [1] https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense
* [2] https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D

### 3.2.1 Minibatch discrimination

Minibatch discrimination is an important method to avoid the Generator to collapse to a parameter setting where it always emits the same point. Mode collapse of the generator is often connected to the Discriminator returning the same output for similar inputs. In this way all Generator outputs race toward a single point that the Discriminator currently believes is highly realistic.

Minibatch discrimination thus helps the Discriminator to telling the outputs of the Generator to become more dissimilar to each other

Let $f(x_{i}) \in R^{A}$ denote the output of some intermediate layer of the discriminator ($i$ runs over the batch size  and $A$ is the features dimension). When multiplied by a tensor $T \in R^{AxBxC}$ we obtain a matrix $M_{i} \in R^{BxC}$. We compute the $L_{1}$-distance between rows of the resulting matrix across samples and apply a negative exponential

$$
o(x_{i}) = \sum_{j=1}^{n} \text{exp}( -|| M_{i} - M_{j}||)
$$



The task of the discriminator is thus effectively still to classify single examples as real data or generated data, but it is now able to use the other examples in the minibatch as side information. 

In [None]:
def make_discriminator_model():
    

    
    image_input_layer = layers.Input((X_pixels,Y_pixels,1))                     # Again define the input
    
    hid = layers.Conv2D(64, kernel_size=4, strides=2, padding='same')(image_input_layer) # Example usage
    hid = layers.LeakyReLU(alpha=0.2)(hid)                                               # of a convolutional
    hid = layers.Dropout(0.3)(hid)                                                       # layer
    
    #print(hid.output)
    
    hid = layers.Conv2D(128, kernel_size=4, strides=2, padding='same')(hid)
    hid = layers.LeakyReLU(alpha=0.2)(hid)
    hid = layers.Dropout(0.3)(hid)
    
    #print(hid.output)
    
    feature = layers.Flatten()(hid)     #Here I call the output "feature" for future needs
    
    hid = layers.Dense(32)(feature)
    hid = layers.LeakyReLU(alpha=0.2)(hid)
    
    
    
    """
    #MINIBATCH DISCRIMINATION
    
    #dimensions of minibatch matrix
    n_kernels=12
    dim_kernel=12
    
    mb_discr = layers.Dense(n_kernels*dim_kernel)(hid)
    mb_discr = layers.Reshape((n_kernels,dim_kernel))(mb_discr)  #Matrix M
    
    diffs = kb.expand_dims(mb_discr, 3)-kb.expand_dims(kb.permute_dimensions(mb_discr, [1, 2, 0]), 0)
    abs_diffs = kb.sum(kb.abs(diffs), axis=2)
    
    minibatch_features = kb.sum(kb.exp(-abs_diffs),2)
    print(minibatch_features.shape)# (None, n_kernels)
    hid=layers.Concatenate()([hid, minibatch_features])
    """
    
    out = layers.Dense(1)(hid) #note that we don't activate the last layer of the discriminator
                               #as the loss function we will use requires the unactivated output
    
    model = models.Model(inputs=image_input_layer, outputs=[out, feature])
    
    
    return model
    

In [None]:
discriminator = make_discriminator_model()
discriminator.summary()

### 3.3 Tensorflow 2.x: eager execution

TensorFlow's allows the user to evaluates operations immediately **[1]** , without building graphs: operations return concrete values instead of constructing a computational graph to run later (as it was in tf 1.X). This makes it easy to get started with TensorFlow and debug models. In our case we can visualize an output of the untrained generator and the corresponding (unactivated) output of the discriminator:

**Useful references**

* [1] https://www.tensorflow.org/guide/eager


In [None]:
noise = tf.random.normal((1, noise_dim))
generated_image = generator(noise, training=False)
plt.imshow(generated_image[0,:,:,0])

In [None]:
decision, _ = discriminator(generated_image)
print (decision)

## 4 Definition of the Losses

In the next step we will define the functions to be minimized during the training process. The two models (G,D) participate in a non-cooperative game and try to minimize two different loss functions:


## 4.1 Discriminator Loss

The Discriminator's task is to correctly tell apart images of the original dataset from images created by the Generator network, in other words we want to minimize the negative cross-entropy cost [1] on the Discriminator's predictions:

$$
\mathcal{L}_{D} = - \underbrace{\log(D(x))}_{\mathcal{L}^{\text{true images}}_{D}} - \underbrace{\log(1-D(G(z)))}_{\mathcal{L}^{\text{fake images}}_{D}}
$$

Where $D(x)$ represents the activated output of the discriminator on a sample from the true images dataset and $D(G(z))$ represents the activated output of the discriminator on a sample produced by the generator.


**Note:**  $\mathcal{L}^{\text{true images}}_{D}$ is minimum as $D(x)$ approaches 1 and conversely $\mathcal{L}^{\text{fake images}}_{D}$ is minimum as $D(G(z))$ approaches 0


### 4.1.1 One sided label smoothing

It was shown in [2] that replacing the targets for a classifier (0,1) with one sided smoothed values (0,0.99) improved convergency of the network, for this reason we set our labels to a smoothed value in the discriminator and generator losses

**Useful links**

* [1] https://www.tensorflow.org/api_docs/python/tf/nn/sigmoid_cross_entropy_with_logits
* [2] https://arxiv.org/abs/1606.03498v1






In [None]:
def discriminator_loss(real_output, fake_output, eps=1e-3):

    real_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
                logits=real_output,
                labels=tf.ones_like(real_output))) #one sided label smoothing
                #labels=(1-eps)*tf.ones_like(real_output))) #one sided label smoothing
    
    
    fake_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
                logits=fake_output,
                labels=tf.zeros_like(fake_output)))
    
    total_loss = real_loss + fake_loss
    
    return total_loss


discriminator_optimizer = tf.keras.optimizers.Adam(learning_rate=1e-4)   #define the minimization algorithm here

## 4.2 Generator Loss

Following the same reasoning that we used for the Discriminator we maximise the probability for the Generator to "fool" the discriminator and generate samples that are similar to the original dataset by minimizing the loss:

$$
\mathcal{L}_{G} = - \underbrace{\log(D(G(z)))}_{\mathcal{L}^{\text{fake images}}_{G}}
$$


**Note:** The Generator never sees how a sample from the true dataset looks like, he's only trained based on the output of the Discriminator

### 4.2.1 Feature matching

Feature matching is a technique adopted to address the instability of GANs by specifying a new loss function for the Generator. Instead of directly minimizing $\mathcal{L}_{G}$ the new loss function requires the Generator to match the expected value of the features on an intermediate layer of the discriminator (see 3.2)
The new loss of the generator becomes then:

$$
\mathcal{L}_{G} = || \mathbb{E}_{x \sim p_{data}}f(x) - \mathbb{E}_{z \sim p_{z}} f(G(z))||^{2}_{2}
$$




In [None]:
def generator_loss_GAN(fake_output, eps=1e-3):
    gan_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
                logits=fake_output,
                labels=tf.ones_like(fake_output))) 
                #labels=(1-eps)*tf.ones_like(fake_output))) #one sided label smoothing
    return gan_loss
    
    
"""

FEATURE MATCHING


def generator_loss_FEATURE(true_features, fake_features):
    gan_loss= tf.sqrt(tf.reduce_mean(tf.pow(true_features-fake_features,2)))
    return gan_loss

"""    

generator_optimizer = tf.keras.optimizers.Adam(learning_rate=3e-4)   #define the minimization algorithm here

## 5 Set up the training cycle

### 5.1 Tensorboard for metrics inspection

Tensorboard is a useful tool to gather and compare the common metrics used to gauge the performance of our networks, in the next lines we set up the needed objects for it

**Useful links**

* Tensorboard toolkit https://www.tensorflow.org/tensorboard
* Keras metrics module https://www.tensorflow.org/api_docs/python/tf/keras/metrics

In [None]:
# Define our metrics

d_train_loss = tf.keras.metrics.Mean('d_train_loss', dtype=tf.float32)
g_train_loss = tf.keras.metrics.Mean('g_train_loss', dtype=tf.float32)
d_train_accuracy_on_real = tf.keras.metrics.BinaryCrossentropy('train_accuracy_on_real', from_logits=True)
d_train_accuracy_on_fake = tf.keras.metrics.BinaryCrossentropy('train_accuracy_on_fake', from_logits=True)

current_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
train_log_dir = './GAN/logs/gradient_tape/' + current_time + '/train'
train_summary_writer = tf.summary.create_file_writer(train_log_dir)



### 5.2 The train step function

In this step we define the function that regulates each training step. If you used tf 1.X in this step you would be defining tf.Sessions and tf.placeholders to build the graph underlying your training cycles.
The way you create a graph in tf 2.x is by creating a tf.function [1] or by decorating a Python callable (the train_step() function in our case)

**Useful links**

* [1] https://www.tensorflow.org/api_docs/python/tf/function

In [None]:
num_examples_to_generate = 16                                   #seed for testing 
seed = tf.random.normal([num_examples_to_generate, noise_dim])  #purposes

@tf.function #tf.function decorator, this causes the function to be "compiled".
def train_step(images):
    
    noise = tf.random.normal([BATCH_SIZE, noise_dim])

    with tf.GradientTape(persistent=True) as gen_tape, tf.GradientTape(persistent=True) as disc_tape:
        generated_images = generator(noise, training=True)

        real_output, real_feature = discriminator(images, training=True)
        fake_output, fake_feature = discriminator(generated_images, training=True)

        gen_loss = generator_loss_GAN(fake_output)
        
        
        """
        ENABLE FEATURE MATCHING
        
        gen_loss = generator_loss_FEATURE(real_feature, fake_feature)        
        
        """
        
        disc_loss = discriminator_loss(real_output, fake_output)
        
        
    for i in range(1): 
        gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
        generator_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
                                                                       
    gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)                                                                       
    discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))
    
    d_train_loss(disc_loss)
    g_train_loss(gen_loss)
    d_train_accuracy_on_real(tf.ones_like(real_output), real_output)
    d_train_accuracy_on_fake(tf.zeros_like(fake_output), fake_output)
    
    
    return gen_loss, disc_loss

    

### 5.2 The train step function

In this step we define the function that regulates each training step. If you used tf 1.X in this step you would be defining tf.Sessions and tf.placeholders to build the graph underlying your training cycles. The way you create a graph in tf 2.x is by creating a tf.function or by decorating a Python callable (the train_step() function in our case)

In [1]:
"""
Example snippet on how to save intermediate statuses of the network (checkpoints)

checkpoint_dir = './GAN/training_checkpoints' #directory in which the checkpoint will be saved
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
checkpoint = tf.train.Checkpoint(generator_optimizer=generator_optimizer,
                                 discriminator_optimizer=discriminator_optimizer,
                                 generator=generator,
                                 discriminator=discriminator)
"""



def train(dataset, epochs):

    for epoch in range(epochs):
        print("____________")
        print("Epoch {}".format(epoch))        
        start = datetime.datetime.now()

        for image_batch in train_dataset:
            
            g_loss, d_loss = train_step(image_batch)

        
        with train_summary_writer.as_default():
            
            tf.summary.scalar('d_loss', d_train_loss.result(), step=epoch)
            tf.summary.scalar('g_loss', g_train_loss.result(), step=epoch)            
            tf.summary.scalar('d_accuracy_on_fake', d_train_accuracy_on_fake.result(), step=epoch)
            tf.summary.scalar('d_accuracy_on_real', d_train_accuracy_on_real.result(), step=epoch)            
        
        
        # Produce images for the GIF as we go
        display.clear_output(wait=True)
        generate_and_save_images(generator,
                                 epoch + 1,
                                 seed, maxval)
        
        """
        
        # Save a checkpoint of the model every 10 epochs
        if (epoch + 1) % 10 == 0:
            checkpoint.save(file_prefix = checkpoint_prefix)
        
        """

            

        print('Time for epoch {0} is {1} sec'.format(epoch + 1, datetime.datetime.now()-start))
        print("Done Epoch {} ".format(epoch))
        print("____________")
    # Generate a batch of images after the final epoch
    display.clear_output(wait=True)
    
    generate_and_save_images(generator,  #function defined in the
                               epochs,   #utils.py, plots
                               seed,
                               maxval)

In [None]:
%tensorboard --logdir ./GAN/logs/gradient_tape

In [None]:
EPOCHS = 40
train(train_dataset, EPOCHS)

## 6 Deploy the model to generate a fake MC sample

### 6.1 Cope with real life conditions

Often we're deploying these toy models in a limited resources environment, and to generate a statistically significant ensemble we cannot feed input latent vectors of arbitrary shape. Thus we have to resort to workarounds like the following:

In [None]:
predictionary={} #create a python dictionary that will contain the full set

num_examples_to_generate=20000 # specify the total size of the set
n_samples_per_batch=2000       # specify 

n_cycles = num_examples_to_generate//n_samples_per_batch  #get the number of cycles as an integer

for i in range(n_cycles):
    seed = tf.random.normal([n_samples_per_batch, noise_dim]) #create a new seed for every iteration
    predictionary[i]=generator(seed, training=False).numpy()  #assign the numpy-ed version of the output
                                                              #of the i-th iteration
    
out=np.concatenate([predictionary[i] for i in range(n_cycles)])  #Create the total set by concatenating
                                                                 #the different dict items

### 6.2 Compare the fake E distribution with the true one

In scientific applications of DCGANs the standard metrics used in section 5.1 and the by-eye verisimilitude of single samples is not enough, in our case for example the energy distribution of the simulated sample is of key importance for physics applications, let's inspect the performance of this network in picking up the energy distribution of the true sample

In [None]:
plt.subplot(1,2,1)
vals_gen=plt.hist(np.sum(out,axis=(1,2,3)), range=(0,9), density=True, bins=50, edgecolor='black')[0];
plt.subplot(1,2,2)
vals_true=plt.hist(np.sum(X_train_norm,axis=(1,2,3)), density=True, range=(0,9),bins=50, edgecolor='black')[0];
fig = plt.gcf()
fig.set_size_inches(15,5)

In [None]:
idx = np.arange(0,9, step=9/50)
diff=plt.bar(idx, height=(vals_true-vals_gen), edgecolor='black',
            linewidth=1, color='lightblue',width = .15, align = 'edge')

plt.xlabel('E (GeV)')
plt.ylabel('dN/dE)')
plt.title("MC - NN output")
fig = plt.gcf()
fig.set_size_inches(15,4)