<h1><center>A residual neural network - ResNet</center></h1>
 

<h2>1. Introduction</h2>

<h3>1.1 Neural Networks</h3>

A neural network is a series of algorithms that endeavors to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates. Neural networks can adapt to changing input, so the network generates the best possible result without needing to redesign the output criteria.

<img src="images/Simply_Neural_Network.png" width="300" height="300">
<center> Simply_Neural_Network </center>

Neural networks are comprised of a node layers, containing an input layer, one or more hidden layers, and an output layer. Each node connects to another and has an associated weight and threshold. If the output of any individual node is above the specified threshold value, that node is activated, sending data to the next layer of the network. Otherwise, no data is passed along to the next layer of the network.

Neural networks rely on training data to learn and improve their accuracy over time. However, once these learning algorithms are fine-tuned for accuracy, they are powerful tools in computer science and artificial intelligence, allowing us to classify and cluster data at a high velocity. Tasks in speech recognition or image recognition can take minutes versus hours when compared to the manual identification by human experts.

<h3>1.2. VGG16 convolutional neural network</h3>

One of the most popular models of nero networks is VGG16
VGG16 (also called OxfordNet) is a convolutional neural network architecture named after the Visual Geometry Group from Oxford, who developed it. It was used to win the ILSVRC2014 competition in 2014. It still considered to be an excellent vision model.

<img src="images/VGG16_architecture.png" width="600" height="400">
<center> Architecture of VGG16 networks </center>

VGG-16 is a convolutional neural network that 16 layers deep. The model loads a set of weights pre-trained on ImageNet. The model achieves 92.7% top-5 test accuracy in ImageNet, which is a dataset of over 14 million images belonging to 1000 classes.

<h2>2. Resnet neural network</h2>

<h3>2.1 Definition</h3>

The ResNET network is a type of specialized neural network, specifically a typical example of a convolutional network that helps to handle more sophisticated deep learning tasks and models.
The distinguishing feature of this network is the fact that it has the so-called residual blocks. Often can find signs like ResNET50 or ResNET152 (the numbers indicate the number of layers for a given network).

The aforementioned residual blocks play a key role here, so it is worth explaining what role they play in this particular network.
We know that neural networks are universal approximators of functions and that accuracy increases with increasing number of layers. However, there is a limit to the number of layers that can improve accuracy.

So, if neural networks were universal approximators of functions, they should be able to learn any function. In practice, however, it turns out that the shallower webs learn better than their deeper counterparts for some time, which is quite counterintuitive.

<h3>2.2 Residual blocks</h3>

When we need to train a network that has many layers, it is a very difficult task. It is often the case that despite the addition of additional layers and iterations, the finally obtained network has worse parameters than in the case of a network trained with fewer layers and fewer iterations.

As you might expect, an increase in the two mentioned before
parameters has unfavorable consequences of such action. Namely, the extension of the learning time of the model, which in the described case is tantamount to its loss, and the already mentioned aspect has a worse trained model.

The idea was to "inject" into the next layer, the signal from the previous layer and it's called 'identity shortcut connection'. It seems to be a complex, however, an explanation with the help of the model below will allow you to understand how simple this this mechanism of operation is.

<img src="images/residual_block.png" width="400" height="400">
<center> Residual block structure </center>

The above photo shows a single residual block used in ResNET networks. To explain the principle of action more clearly, let's adopt two scenarios.

The first is the case where the layer weights are zero. Then also the output of such a block would show the value of the input signal, because the weights of these layers would not add anything new.

On the other hand, if the weights of the layers taken into account are different from zero (and thus the F (x) signal as well), there is a chance that such a network with residues will learn something new, it can even be assumed from something correct, because it is on "correct" learning we care the most.

However, it is worth emphasizing the phrase "there is a chance". Because at the design stage of this approach, no one could be sure that this approach would bring positive results, butr, the assumption was very simple. 
It was argued that introducing such a block (i.e. to the original signal from before additional weights, adding new weights) will not worsen the learning process, and on the contrary, it may be possible to gain something from it. Ultimately, this experimentation was successful, and made ResNet popular 

<h3>2.3 Additional information</h3>

ResNet, as already mentioned, is a powerful type of neural network that is very often used in many tasks related to graphics. It's also already known that ResNet adds output from an earlier layer to a later one.

This helps to alleviate the so-called disappearing gradient, which is responsible for the fact that often adding subsequent layers in typical neural networks does not improve the results, and even leads to the fact that the learned network has a lower efficiency. It is the vanishing gradient problem that is responsible for the fact that from a certain point we cannot increase the efficiency of a typical neural network.


<h2>3. Practice</h2>

In this part, the operation of both models of the discussed neural networks will be presented, using the tensorflow.keras library and the ImageNet database

ImageNet is an image database with a total of 14 million images and 22 thousand visual categories. As it is publicly available for research and educational use, it has been widely used in the research of object recognition algorithms, and has played an important role in the deep learning revolution.
ImageNet has been mostly used for researching object recognition algorithms on the subset of 1000 categories.

<h3>3.1 VGG16 model</h3>

In the case of the VGG16 network, we used a ready-made model that allows us to conduct tests without any problems.
To do this, we have created the following function to retrieve a given image, process it for analysis, and then insert it into the model and get the value predicted by the model.
This model gives as a result a tuple containing the names of the objects (i.e. objects, animals, food, etc.) that it was taught and the probability with which it predicted the choice of a specific image.
Moreover, we can list more than one of such image-probability sets using the function top = n (where n is n images with the highest probability of matching).

In [37]:
from tensorflow.keras.preprocessing import image
import tensorflow.keras.preprocessing as keras_utils
import tensorflow.keras.applications.vgg16 as vgg16
import numpy as np

modelVGG16 = vgg16.VGG16(weights='imagenet', include_top=True)
# modelVGG16.summary()

def vgg16_network(image, model):
    img = keras_utils.image.load_img(image, target_size=(224, 224))
    x = keras_utils.image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    vgg16_input = vgg16.preprocess_input(x)
    vgg16_features = model.predict(vgg16_input)
    return vgg16.decode_predictions(vgg16_features, top=3)[0]

objectName1 = 'strawberry.jpg'
objectName2 = 'cheeseburger.jpg'
objectName3 = 'toilet_tissue.jpg'
print('Result of VGG16:')    
vgg16Result1 = vgg16_network('photos/'+objectName1, modelVGG16)
vgg16Result2 = vgg16_network('photos/'+objectName2, modelVGG16)
vgg16Result3 = vgg16_network('photos/'+objectName3, modelVGG16)
print("object: " + objectName1 + " => ",vgg16Result1[0][1:]) 
print("object: " + objectName2 + " => ",vgg16Result2[0][1:], '...', vgg16Result2[1][1:]) 
print("object: " + objectName3 + " => ",vgg16Result3[0][1:]) 

Result of VGG16:
object: strawberry.jpg =>  ('strawberry', 0.9970606)
object: cheeseburger.jpg =>  ('bagel', 0.276235) ... ('cheeseburger', 0.27371728)
object: toilet_tissue.jpg =>  ('toilet_tissue', 0.6716942)


As you can see in the example above, the model managed to recognize two of the three elements, making a mistake in the image of the cheeseburger tagging it as a bagel. It's worth taking into account that the second correct result differs by only 0.3%.

<h3>3.2 ResNet model</h3>

The same example as for the purpose of the VGG16 network was used as part of the imlemetation of the ResNet network.

In [36]:
import tensorflow.keras.applications.resnet50 as resnet50

modelresnet50 = resnet50.ResNet50(weights='imagenet')
#modelresnet50.summary()

def ResNet_network(image, model):
    img = keras_utils.image.load_img(image, target_size=(224, 224))
    x = keras_utils.image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    resnet_input = resnet50.preprocess_input(x)
    resnet_features = model.predict(resnet_input)
    return resnet50.decode_predictions(resnet_features, top=3)[0]

print('Result of ResNet:')    
resnetResult1 = ResNet_network("photos/"+objectName1, modelresnet50)
resnetResult2 = ResNet_network("photos/"+objectName2, modelresnet50)
resnetResult3 = ResNet_network("photos/"+objectName3, modelresnet50)
print("object: " + objectName1 + " => ",resnetResult1[0][1:]) 
print("object: " + objectName2 + " => ",resnetResult2[0][1:]) 
print("object: " + objectName3 + " => ",resnetResult3[0][1:], '...', resnetResult3[1][1:]) 

Result of ResNet:
object: strawberry.jpg =>  ('strawberry', 0.99925846)
object: cheeseburger.jpg =>  ('cheeseburger', 0.29881862)
object: toilet_tissue.jpg =>  ('paper_towel', 0.55854785) ... ('toilet_tissue', 0.44143495)



W tym przypadku model miał również problemy z identyfikacją jednego z elementów, błędnie definiując chusteczkę toaletową jako ręcznik papierowy, co można uznać za mały błąd, biorąc pod uwagę podobieństwo elementów.

<h3>3.3 Connnect of ResNet and VGG16 networks</h3>

As shown in the two previous examples, the models for the same data had different prediction results, so you can think about improving the reseltates by combining the two networks so that the final result is as close as possible to the target result.

The algorithm presented below is a simple example of averaging the results from combining these two types of neural networks, which should allow the assumed effect to be achieved. 

In [35]:
def hybridNetowrk(resnet, vgg16):
    result = [list(t) for t in resnet]
    for item in vgg16:
        clases = [res[0] for res in result]
        if(item[0] in clases):
            index = clases.index(item[0]) 
            result[index][2] = (result[index][2] + item[2]) / 2
        else:
            result.append(item)
    result = sorted(result, key=lambda x: x[2], reverse=True)
    return result

hybrid1 = hybridNetowrk(resnetResult2, vgg16Result2)
hybrid2 = hybridNetowrk(resnetResult3, vgg16Result3)
print('Result of hybrid (ResNet & VGG16):')
print("object: " + objectName2 + " => ", hybrid1[0][1:])
print("object: " + objectName3 + " => ", hybrid2[0][1:])

Result of hybrid (ResNet & VGG16):
object: cheeseburger.jpg =>  ['cheeseburger', 0.2862679362297058]
object: toilet_tissue.jpg =>  ['toilet_tissue', 0.5565645694732666]


<h3>3.4 Comparison performance of the networks</h3>

In this part, all presented Nero network models will be tested on a set of sample data consisting of 44 elements, and then their results will be compared with each other.

In [27]:
import os
from prettytable import PrettyTable
import progressbar

directory = 'photos/'
resNet_score, vgg16_score, hybrid_score = 0, 0, 0
resNet_sum, vgg16_sum, hybrid_sum = 0, 0, 0
iteration  = 0
table = PrettyTable(['Object', 'Resnet predict', 'Resnet %', 'VGG16 predict', 'VGG16 %', 'Hybrid', 'Hybrid %'])
size = len(os.listdir(directory))
bar = progressbar.ProgressBar(maxval=size, widgets=[progressbar.Bar('=', '[', ']'), ' ', progressbar.Percentage()])

bar.start()
for filename in os.listdir(directory):
    bar.update(iteration+1)
    if filename.endswith(".jpg") or filename.endswith(".png"):
        iteration += 1
        resNet_predict = ResNet_network(directory+filename, modelresnet50)
        vgg16_predict  = vgg16_network(directory+filename, modelVGG16)
        hybrid_predict = hybridNetowrk(resNet_predict, vgg16_predict)
        resnet_val = resNet_predict[0][1]
        vggg16_val = vgg16_predict[0][1]
        hybrid_val = hybrid_predict[0][1]
        resnet_percentage = round(resNet_predict[0][2],5)
        vggg16_percentage = round(vgg16_predict[0][2],5)
        hybrid_percentage = round(hybrid_predict[0][2],5)
        if (resnet_val == filename.split('.')[0]):  
            resNet_score += 1
        else:
            resnet_val = "\033[0;31;40m" + resnet_val + "\033[0m"
        if (vggg16_val == filename.split('.')[0]):   
            vgg16_score  += 1
        else:
            vggg16_val = "\033[0;31;40m" + vggg16_val + "\033[0m"
        if (hybrid_val == filename.split('.')[0]):  
            hybrid_score += 1
        else:
            hybrid_val = "\033[0;31;40m" + hybrid_val + "\033[0m"
        table.add_row([filename.split('.')[0], resnet_val, resnet_percentage, vggg16_val, \
                       vggg16_percentage, hybrid_val, hybrid_percentage])
        resNet_sum += resnet_percentage
        vgg16_sum  += vggg16_percentage
        hybrid_sum += hybrid_percentage
    else:
        continue
bar.finish()

print("Table of test objects and model predictions:\n")
summary = PrettyTable(['Resnet Score', 'Resnet %', 'VGG16 score', 'VGG16 %', 'Hybrid', 'Hybrid %'])
summary.add_row([resNet_score, round(resNet_sum/iteration,5), vgg16_score, \
                 round(vgg16_score/iteration,5), hybrid_score, round(hybrid_sum/iteration,5)])
print(table, "\n\nSUMMARY:\n", summary)



Table of test objects and model predictions:

+------------------+------------------+----------+------------------+---------+------------------+----------+
|      Object      |  Resnet predict  | Resnet % |  VGG16 predict   | VGG16 % |      Hybrid      | Hybrid % |
+------------------+------------------+----------+------------------+---------+------------------+----------+
|     plunger      |     plunger      |   1.0    |     plunger      | 0.99989 |     plunger      | 0.99994  |
|     basenji      |     basenji      | 0.99922  |     basenji      | 0.96796 |     basenji      | 0.98359  |
|    monastery     |      [0;31;40mchurch[0m      | 0.83119  |      [0;31;40mcastle[0m      | 0.45355 |      [0;31;40mchurch[0m      | 0.50693  |
|     buckeye      |      [0;31;40mabacus[0m      | 0.97427  |      [0;31;40mabacus[0m      | 0.72811 |      [0;31;40mabacus[0m      | 0.85119  |
|     crayfish     |     [0;31;40mcricket[0m      | 0.62621  |     [0;31;40mcricket[0m      | 0.




<h2>4. Conclusions</h2>

As presented in item no. 3, the VGG16 model with only 16 layers did as well as the ResNET50 network. In case of mistakes made, a lot of them are common. However, there are exceptions. On this basis, after careful analysis, one could try to make assumptions with which images a given network cope better and with which it is worse.

An interesting observation is the fact that, as far as the percentage of hits was the same, to the disadvantage of ResNET50 in this case, the fact is that he was much more "sure" that he chose well, even though he chose wrong. Probability values were usually not lower than 0.7 for the image predicted by the model. In the case of the VGG16 model, the value of the hit probability was much lower and even values of the order of 0.2 were found. This means that the VGG16 model had "more doubts" than the ResNET50 model in case of wrong choice.

Considering the combination of both neural networks as one model, the intended effect was obtained. Ultimately, this model had the most well-known objects in the dataset.