![Practicum AI Logo image](https://github.com/PracticumAI/practicumai.github.io/blob/main/images/logo/PracticumAI_logo_250x50.png?raw=true) <img src='https://github.com/PracticumAI/practicumai.github.io/blob/main/images/icons/practicumai_beginner.png?raw=true' align='right' width=50>
***
# *Practicum AI:* Deep Learning Basics

This exercise adapted from Baig et al. (2020) <i>The Deep Learning Workshop</i> from <a href="https://www.packtpub.com/product/the-deep-learning-workshop/9781839219856">Packt Publishers</a> (Exercise 1.01, page 7).

## Deep learning for image recognition

Before we dive into the details of exactly _how_ deep learning works, let's explore it through an example. In this exercise, we will use a pre-trained deep learning model, [ResNet50](https://arxiv.org/abs/1512.03385), which has been trained on [ImageNet](https://image-net.org/), a collection of about 1.3 million images labeled as being in one of 1,000 categories. 

To help with this exercise, let us introduce you to Amelia.

Amelia is a biologist that loves research. If she could spend all day in the lab, she would. Unfortunately, she's in a service position... She feels like she spends more time in committee meetings than she does doing science! And now she's been assigned to the her department's Annual Picnic Committee.

Her first task is figuring out what each person is bringing to the picnic. Amelia wants to automate the process as much as possible so she can get back to her research. She's going to use a camera to take pictures of the food as people arrive. Then she'll use a deep learning model to recognize what each person brought.

Let's help Amelia code her lunch recognition system!

Amelia is a Practicum AI alumni and recalls the AI Application Development Pathway.

![Practicum AI Appliction Pathway Image](https://github.com/PracticumAI/deep_learning_2_draft/blob/main/AI%20Application%20Pathway.png?raw=true) <img src='https://github.com/PracticumAI/deep_learning_2_draft/blob/main/AI%20Application%20Pathway.png' width=10 align='right' height=1>

With her lunch-recognizing camera, she's already completed Step 1: Choose a Problem! Due to the flexible nature of coding, implementing the next six steps will jump around a bit. Don't worry though, Amelia knows her stuff and will make sure we know where we're at in the application process. Here's the overiew of the steps in the application development process and how the correspond to the code in this Jupyter Notebook:
1. Choose a Problem - Making a lunch-recognizer!
2. Gather Good Data - Amelia is very busy, she doesn't have time to take thousands of images of food! Instead she's going to use a model that's already been trained to "recognize" various food items (and hundreds of other things!).
3. Clean and Prep Data - The model she's using already has training, so she doesn't need to worry about prepping her training data. She'll have to do a bit of work to make sure that her new inputs are formatted correctly, however.
4. Choose a Model - Amelia needs a model that's already trained and one that recognizes images. That narrows her search to models like ResNet.
5. Train the Model - Our heroine is going to use a model that's pre-trained so... Done! She's up and running with an AI application without needing to compile or train anything. Magic!
6. Evalute the Model - As part of the evaluation process, Amelia is going to need to test the model to see how well it recognizes food.
7. Deploy the Model - Since Amelia is comfortable using Jupyter Notebooks, she's going to leave the application here. Embedding the model in another application is uneccesary (and beyond the scope of this course!)

#### 1. Import libraries

Import the necessary libraries. For this exercise, Amelia will use the pre-trained ResNet50 model that is part of Keras: `from tensorflow.keras.applications.resnet50 import ResNet50`. Check out the [Keras documentation](https://www.tensorflow.org/api_docs/python/tf/keras/applications/resnet50/ResNet50) for more details. 

In [1]:
# Import necessary libraries for image processing and deep learning. The image processing functions, like img_to_array, will help Amelia format the image to run through her model.

from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input
from tensorflow.keras.applications.resnet50 import decode_predictions

# Import base tensorflow and set seed to achieve consistent results.
import tensorflow as tf 
import numpy as np

seed = 42  # Set the seed for reproducibility

tf.random.set_seed(seed)
np.random.seed(seed)


#### 2. Instantiate the Resnet50 model

Instantiate the Resnet50 model as a variable. Instantiating is a programming term that means you're taking the 'blueprint' of something (in this case, ResNet50), and making an object out of it (the model we're going to use here). This step creates the instance of the model to use.

> &#x1F4DD;  Some Background on ResNet: 
> ResNet, which stands for Residual Network, which won the 2015 ImageNet competition, was introduced to address the vanishing gradient problem commonly faced when training very deep neural networks. As networks become deeper, gradients (values used to update network weights) can become extremely small, effectively halting training. 
> ResNet introduces the concept of "residual blocks."  As it processes data, instead of relying solely on the current "thought" or layer, it can also "refer back" to earlier layers, much like using recent memories to help recall older ones. These "references back" are called skip connections. They act like bridges, letting the network jump over some layers to ensure that even as it delves deeper into processing, it doesn't forget or lose important early details. This shortcut or skip connection allows gradients to propagate more easily through the network. 
> This architectural innovation has enabled the training of networks with depths previously thought unfeasible. ResNet models, with hundreds or even thousands of layers, have achieved state-of-the-art performance on many image classification benchmarks. 
> In this unit’s exercise we used the ResNet50 model, which as its name suggests, consists of 50 layers.

<div style="padding: 10px; margin-bottom: 20px; border: thin solid #E5C250; border-left-width: 10px;background-color: #fff"><strong>Tip:</strong> You will likely see some output highlighted in red. While red is used for errors, it is also used for warnings. It can take some getting used to, but red is OK in this case...</div>

```python
mymodel = ResNet50() # Create an instance of the ResNet50 model pre-trained on ImageNet data
```

In [2]:
# Code it!


#### 3. Load image

Amelia has a test image of the faculty favorite lunch, pizza, to use to test the system. Let's load her pizza image in.

Since ResNet50 was trained using images that are 224X224 pixels, we need to transform the input image to be the same size.

<div style="padding: 10px; margin-bottom: 20px; border: thin solid #E5C250; border-left-width: 10px;background-color: #fff"><strong>Tip:</strong> The pizza image is stored in the images folder, the complete path of the location where the image is located must be given.
<br><br>
    
</div>

```python
myimage = load_img('images/pizza.jpg', target_size = (224, 224)) # Load an image file for testing, resizing it to the required input size of 224x224 pixels
```


In [3]:
# Code it!


#### 4. View the pizza image

Let's take a quick look at the image to verify that it's a pizza.  Type the variable name and run the code block.

```python
myimage
```

In [1]:
# Code it!



#### 5. Convert image to array

Convert the image to an array because the model expects it in this format.

```python
myimage = img_to_array(myimage) # Convert the loaded image to an array format suitable for processing
```

In [5]:
# Code it!


#### 6. Reshape image

Reshape the image.  All images fed to this model need to be 224 pixels high and 224 pixels wide, with 3 channels, one for each color (Red, Green, Blue).  If our image was greyscale, how many channels would we specify?

```python
myimage = myimage.reshape((1, 224, 224, 3)) # Reshape the image array to the format the model expects (batch size, height, width, color channels)
```

In [6]:
# Code it!


#### 7. Pre-process image

Execute the *preprocess_image()* function with the image.

```python
myimage = preprocess_input(myimage) # Preprocess the image to ensure its values are appropriate for the ResNet50 model
```

In [7]:
# Code it!


#### 8. Execute predict method

Execute the model's predict method.

```python
myresult = mymodel.predict(myimage) # Use the model to predict the class (or category) of the image
```

In [2]:
# Code it!


#### 9. Get prediction label

The model's predict method returns a number.  Convert this to its corresponding text label.

```python
mylabel = decode_predictions(myresult) # Decode the prediction result to get human-readable class labels
```

In [25]:
# Code it!


#### 10. Assign list item to variable 

Assign the first item listed by the prediction to a variable - this is the label with the highest probability.

```python
# Extract the label with the highest predicted probability. 
# Recalling that in Python, all indexes start at 0, the [0][0] indexing retrieves the first prediction from the first batch of results.
mylabel = mylabel[0][0] 
```

In [26]:
# Code it!


#### 11. Embed label 

Embed the label in a sentence and then print it.

```python
# The 'mylabel' variable contains information about the prediction in the format (ID, Label, Probability).
# Using 'mylabel[1]' extracts the human-readable label (e.g., 'pizza') for the predicted class.
print("This is an image of a " + mylabel[1]) # Print the predicted class label in a formatted string

```

In [3]:
# Code it!


<div style="padding: 10px;margin-bottom: 20px;border: thin solid #E5C250;border-left-width: 10px;background-color: #fff"><strong>Tip:</strong> Although we use an image of a pizza here, you can use just about any image with this model. Try out this exercise multiple times with different images to see if you can fool it. The <a href='https://raw.githubusercontent.com/PracticumAI/deep_learning/main/resnet_labels.txt'>resnet_labels.txt</a> file lists all the images this model is trained to classify.</div>

#### 12. Create a speech sentence

Create a longer sentence to convert to speech. Amelia wants her model to output an audio file to tell her the results so she doesn't even have to look up from her microscope to hear what's been brought.

```python
sayit = "This is an image of a " + mylabel[1] + " in full living color."
```

In [28]:
# Code it!


#### 13. Import gtts libraries

Import the required libraries.  Google Text to Speech (gtts) is an open source cloud-based application programming interface (API) that... Converts text to speech.

In [30]:
!pip install gTTS
from gtts import gTTS
import os

#### 14. Execute the gtts function

Pass the sayit variable to the gTTS API.

```python
myobj = gTTS(text = sayit)
```

In [32]:
# Code it!


#### 15. Save the audio file

gTTS will convert the string you gave it into an audio file. Save the audio file. The default location is the current directory.

```python
myobj.save("prediction.mp3") # Save the audio file in the current directory.
```

In [33]:
# Code it!


<div style="padding: 10px;margin-bottom: 20px;border:  thin solid #30335D; border-left-width: 10px;background-color: #fff"><strong>Note:</strong> This last block of code is only needed if you are running Jupyter Notebooks on a local computer.  Otherwise, download the .mp3 file and listen to it on your computer.</div>

In [None]:
# Uncomment and run if running on local system as opposed to an HPC
# os.system("prediction.mp3")

#### 16. Let's put it all together

We can put all of these steps together in a function to make it easier to test more images.

In [38]:
# Define a function that automates the process of loading, processing, and predicting the class of an image
def whats_for_lunch(image): 
    myimage = load_img(image, target_size = (224, 224))
    myimage = img_to_array(myimage)
    myimage = myimage.reshape((1, 224, 224, 3))
    myimage = preprocess_input(myimage)
    myresult = mymodel.predict(myimage)
    mylabel = decode_predictions(myresult)
    toplabel = mylabel[0][0]

    if toplabel[1] == 'pizza':
        sayit = "Amelia, a participant brought " + toplabel[1] + ", add it to the list!"
    else:
        sayit = "Amelia, this is an image of" + toplabel[1] + ", you can go back to work."

    myobj = gTTS(text = sayit)

    return mylabel, myobj

In [39]:
label, soundclip2 = whats_for_lunch('squash_test.jpg')
print(label)

soundclip2.save("prediction2.mp3")

[[('n02281787', 'lycaenid', 0.26136062), ('n03804744', 'nail', 0.11522047), ('n02277742', 'ringlet', 0.055332493), ('n07836838', 'chocolate_sauce', 0.051150553), ('n07684084', 'French_loaf', 0.034306075)]]


Amelia should test with non-pizza meals too to make sure her system is working. Let's try this burger!

![Photo of a hamburger](images/hamburger.jpg)

In [None]:
label, soundclip3 = whats_for_lunch('images/hamburger.jpg')
print(label)

soundclip3.save("prediction3.mp3")

It looks like Amelia's classifier is working well as it predicted that the image was a cheeseburger. 

But what about food that looks a bit like pizza, but isn't? How about this quiche? Does it fool the AI? Can you find images that trick Amelia's image recognition model?

![A photo of quiche](images/quiche.jpg)

In [None]:
label, soundclip4 = whats_for_lunch('images/quiche.jpg')
print(label)

soundclip4.save("prediction4.mp3")

# Bonus Exercises:

1. This lunch image recognizer seems to only recognize one kind of lunch... Pizza. Change the code to recognize each of the different kinds of food in ImageNet.
2. Change the code so that the lunch recognizer recognizes drinks, deserts and main items generally.
3. Change the code so that if the confidence of the model's prediction is less than 75, the code output says it's not sure what the image is.
4. Change the code so that each image that's loaded has it's label added to a list that is saved to a text file.