![Practicum AI Logo image](https://github.com/PracticumAI/practicumai.github.io/blob/main/images/logo/PracticumAI_logo_250x50.png?raw=true) <img src='https://github.com/PracticumAI/practicumai.github.io/blob/main/images/icons/practicumai_beginner.png?raw=true' align='right' width=50>
***
# *Practicum AI:* Deep Learning Basics


This exercise adapted from Baig et al. (2020) <i>The Deep Learning Workshop</i> from <a href="https://www.packtpub.com/product/the-deep-learning-workshop/9781839219856">Packt Publishers</a> (Exercise 1.01, page 7).


## Deep learning for image recognition


Before we dive into the details of exactly _how_ deep learning works, let's explore it through an example. In this exercise, we will use a pre-trained deep learning model, [ResNet50](https://arxiv.org/abs/1512.03385), which has been trained on [ImageNet](https://image-net.org/), a collection of about 1.3 million images labeled as being in one of 1,000 categories.


We'll be using image recognition to identify... Squash. Lucky for us, ImageNet is something of a squash aficionado.


For this exercise, recall the AI Application Development Pathway:


![Practicum AI Application Pathway Image](https://github.com/PracticumAI/deep_learning_2_draft/blob/main/AI%20Application%20Pathway.png?raw=true) <img src='https://github.com/PracticumAI/deep_learning_2_draft/blob/main/AI%20Application%20Pathway.png' width=10 align='right' height=1>


Step 1: Choose a Problem! Due to the flexible nature of coding, implementing the next six steps will jump around a bit. Here's the overview of the steps in the application development process and how the correspond to the code in this Jupyter Notebook:
1. Choose a Problem - Making a squash-recognizer!
2. Gather Good Data - We're going to "cheat", and use a model that's already trained to identify images with squash in them.
3. Clean and Prep Data - This is already done for us with the pre-trained model.
4. Choose a Model - We need a model that's already trained, and one that recognizes images. That narrows our search to models like ResNet.
5. Train the Model - Done! We'll be up and running with an AI application without needing to compile or train anything. Magic!
6. Evaluate the Model - As part of the evaluation process, we are going to need to test the model to see how well it recognizes squash.
7. Deploy the Model - We're going to leave the application here in this Jupyter Notebook. Embedding the model in another application is unnecessary (and beyond the scope of this course!)


#### 1. Import libraries


Import the necessary libraries. For this exercise, we will use the pre-trained ResNet50 model that is part of Keras: `from tensorflow.keras.applications.resnet50 import ResNet50`. Check out the [Keras documentation](https://www.tensorflow.org/api_docs/python/tf/keras/applications/resnet50/ResNet50) for more details.

In [None]:
# Import necessary libraries for image processing and deep learning. The image processing functions, like img_to_array, will help format the images to run through our model.

from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input
from tensorflow.keras.applications.resnet50 import decode_predictions

# Import base tensorflow and set seed to achieve consistent results.
import tensorflow as tf 
import numpy as np

seed = 42  # Set the seed for reproducibility

tf.random.set_seed(seed)
np.random.seed(seed)


#### 2. Instantiate the Resnet50 model

Instantiate the Resnet50 model as a variable. Instantiating is a programming term that means you're taking the 'blueprint' of something (in this case, ResNet50), and making an object out of it (the model we're going to use here). This step creates the instance of the model to use.

> &#x1F4DD;  Some Background on ResNet: 
> ResNet, which stands for Residual Network, which won the 2015 ImageNet competition, was introduced to address the vanishing gradient problem commonly faced when training very deep neural networks. As networks become deeper, gradients (values used to update network weights) can become extremely small, effectively halting training. 
> ResNet introduces the concept of "residual blocks."  As it processes data, instead of relying solely on the current "thought" or layer, it can also "refer back" to earlier layers, much like using recent memories to help recall older ones. These "references back" are called skip connections. They act like bridges, letting the network jump over some layers to ensure that even as it delves deeper into processing, it doesn't forget or lose important early details. This shortcut or skip connection allows gradients to propagate more easily through the network. 
> This architectural innovation has enabled the training of networks with depths previously thought unfeasible. ResNet models, with hundreds or even thousands of layers, have achieved state-of-the-art performance on many image classification benchmarks. 
> In this unit’s exercise we used the ResNet50 model, which as its name suggests, consists of 50 layers.

<div style="padding: 10px; margin-bottom: 20px; border: thin solid #E5C250; border-left-width: 10px;background-color: #fff"><strong>Tip:</strong> You will likely see some output highlighted in red. While red is used for errors, it is also used for warnings. It can take some getting used to, but red is OK in this case...</div>

```python
mymodel = ResNet50() # Create an instance of the ResNet50 model pre-trained on ImageNet data
```

In [None]:
# Code it!


#### 3. Load image

We'll need an image of a squash to test our model. Let's load a sqaush image in.

Since ResNet50 was trained using images that are 224X224 pixels, we need to transform the input image to be the same size.

<div style="padding: 10px; margin-bottom: 20px; border: thin solid #E5C250; border-left-width: 10px;background-color: #fff"><strong>Tip:</strong> The squash image is stored in the images folder, the complete path of the location where the image is located must be given.
<br><br>

</div>

```python
myimage = load_img('images/squash_test.jpg', target_size = (224, 224)) # Load an image file for testing, resizing it to the required input size of 224x224 pixels
```


In [None]:
# Code it!


#### 4. View the pizza image

Let's take a quick look at the image to verify that it's a pizza.  Type the variable name and run the code block.

```python
myimage
```

In [None]:
# Code it!


#### 5. Convert image to array

Convert the image to an array because the model expects it in this format.

```python
myimage = img_to_array(myimage) # Convert the loaded image to an array format suitable for processing
```

In [None]:
# Code it!


#### 6. Reshape image

Reshape the image.  All images fed to this model need to be 224 pixels high and 224 pixels wide, with 3 channels, one for each color (Red, Green, Blue).  If our image was greyscale, how many channels would we specify?

```python
myimage = myimage.reshape((1, 224, 224, 3)) # Reshape the image array to the format the model expects (batch size, height, width, color channels)
```

In [None]:
# Code it!


#### 7. Pre-process image

Execute the *preprocess_image()* function with the image.

```python
myimage = preprocess_input(myimage) # Preprocess the image to ensure its values are appropriate for the ResNet50 model
```

In [None]:
# Code it!


#### 8. Execute predict method

Execute the model's predict method.

```python
myresult = mymodel.predict(myimage) # Use the model to predict the class (or category) of the image
```

In [None]:
# Code it!


#### 9. Get prediction label

The model's predict method returns a number.  Convert this to its corresponding text label.

```python
mylabel = decode_predictions(myresult) # Decode the prediction result to get human-readable class labels
```

In [None]:
# Code it!


#### 10. Assign list item to variable 

Assign the first item listed by the prediction to a variable - this is the label with the highest probability.

```python
# Extract the label with the highest predicted probability. 
# Recalling that in Python, all indexes start at 0, he [0][0] indexing retrieves the first prediction from the first batch of results.
mylabel = mylabel[0][0] 
```

In [None]:
# Code it!


#### 11. Embed label 

Embed the label in a sentence and then print it.

```python
# The 'mylabel' variable contains information about the prediction in the format (ID, Label, Probability).
# Using 'mylabel[1]' extracts the human-readable label (e.g., 'butternut_squash') for the predicted class.
print("This is an image of a " + mylabel[1]) # Print the predicted class label in a formatted string

```

In [None]:
# Code it!


<div style="padding: 10px;margin-bottom: 20px;border: thin solid #E5C250;border-left-width: 10px;background-color: #fff"><strong>Tip:</strong> Although we use an image of a squash here, you can use just about any image with this model. Try out this exercise multiple times with different images to see if you can fool it. The <a href='https://raw.githubusercontent.com/PracticumAI/deep_learning/main/resnet_labels.txt'>resnet_labels.txt</a> file lists all the images this model is trained to classify.</div>

#### 12. Create a speech sentence

Create a longer sentence to convert to speech. We want our model to output an audio file to tell us the results because, why not?

```python
sayit = "This is an image of a " + mylabel[1] + " in full living color."
```

In [None]:
# Code it!


#### 13. Import gtts libraries

Import the required libraries.  Google Text to Speech (gtts) is an open source cloud-based application programming interface (API) that... Converts text to speech.

In [None]:
%pip install gTTS
from gtts import gTTS
import os


#### 14. Execute the gtts function

Pass the sayit variable to the gTTS API.

```python
myobj = gTTS(text = sayit)
```

In [None]:
# Code it!


#### 15. Save the audio file

gTTS will convert the string you gave it into an audio file. Save the audio file. The default location is the current directory.

```python
myobj.save("prediction.mp3") # Save the audio file in the current directory.
```

In [None]:
# Code it!


<div style="padding: 10px;margin-bottom: 20px;border:  thin solid #30335D; border-left-width: 10px;background-color: #fff"><strong>Note:</strong> Download the .mp3 file from Atlas and listen to it on your computer. The audio file can be found in the same folder as this notebook.</div>

#### 16. Let's put it all together

We can put all of these steps together in a function to make it easier to test more images.

In [None]:
# Define a function that automates the process of loading, processing, and predicting the class of an image
def whats_this_image(image): 
    myimage = load_img(image, target_size = (224, 224))
    myimage = img_to_array(myimage)
    myimage = myimage.reshape((1, 224, 224, 3))
    myimage = preprocess_input(myimage)
    myresult = mymodel.predict(myimage)
    mylabel = decode_predictions(myresult)
    toplabel = mylabel[0][0]

    if toplabel[1] == 'butternut_squash':
        sayit = "Researcher, this is a " + toplabel[1] + ", you can breath a sigh of relief!"
    else:
        sayit = "Researcher, this is a " + toplabel[1] + ", time to panic!"

    myobj = gTTS(text = sayit)

    return mylabel, myobj

In [None]:
label, soundclip2 = whats_this_image('images/definitely_not_squash.jpg')
print(label)

soundclip2.save("prediction2.mp3")

You should test with other images too to make sure the system is working. Find an open source image online or upload a picture of yours and give it a try.

In [None]:
label, soundclip3 = whats_this_image('images/your_images_name.jpg')
print(label)

soundclip3.save("prediction3.mp3")

It looks like our classifier is working well. You can also see if you can fool the classifier with images the look like squash but aren't. Find a squash-alike image and see how the classifier does:

In [None]:
label, soundclip4 = whats_this_image('images/your_squash_alike_images_name.jpg')
print(label)

soundclip4.save("prediction4.mp3")

# Bonus Exercises:

1. This squash image recognizer seems to only recognize one kind of squash... Change to code to recognize each of the different kinds of squash in ImageNet.
2. Change the code so that the squash recognizer can recognize the other kinds of squash in ImageNet as squash generally.
3. Change the code so that if the confidence of the model's prediction is less than 75, it says it's not sure what the image is.