# Image classification
## Objectives
- To learn how a convolutional neural network works
- To train a few image classification models using a web interface (Teachable Machine)
- To put those models to use in a python script

## Introduction
### The problem
Image the following scenario: you are trying to figure out whether there is a relationship between the medium in which an artwork is created, and its subject matter. To do this, you need to see artworks -- lots of artworks. So you go to a few museums, but quickly realise you need a more systematic way of gathering your data. Plus, as is well known, the majority of a museum's collection is hidden away in the archives, not on display in the galleries. So you ask the museums whether they can give you digital facsimiles of their collections -- by the end of it, you end up with over a million images in your hard drive. What do you do?
### The solution
Sure, you could go through each of the images one by one and classify them as being sculptures, or paintings, or drawings, and classify their subject matter as landscape, abstract, or portrait. You would get a high-quality dataset at the end of the process, but it would take you months to get there. You don't have months. You have two days.

This is where a computational approach can help you. Specifically, a subfield of Artificial Intelligence known as *computer vision,* which essentially allows automated systems to interpret the visual world in a human-like way. It is put to use in cutting-edge scenarios like self-driving cars -- to help distinguish between pedestrians, obstacles, traffic lights, for example. If you have driven a car made in the last four or five years, you might have seen examples of this: when the dashboard shows you the speed limit for the road you are currently on is a real-world application of computer vision: a camera at the front of the car noticed and identified a speed limit sign. More nefarious uses of computer vision also exist in the wild: [famously China uses security cameras to track the movements of its citizens](https://www.npr.org/2021/01/05/953515627/facial-recognition-and-beyond-journalist-ventures-inside-chinas-surveillance-sta?t=1652430994420).

![Automatic traffic sign detection](imgs\speed_limiter_header.jpg)

Our use case is, however, more innocent (though still problematic) and much simpler than the complicated systems in place for state surveillance or self-driving cars. All we need is an image-level classification to identify a work's medium and subject matter. To do that, we need to teach a machine how to identify these two characteristics and we will do that using a point-and-click webservice that is aptly called [Teachable Machine](https://teachablemachine.withgoogle.com/).

## Teachable Machine, Machine Learning, and Convolutional Neural Networks
Google's [Teachable Machine](https://teachablemachine.withgoogle.com/) is a web app that allows users to easily train and test simple image and sound classification models. The image classification models refine the pre-trained, popular [MobileNet](https://arxiv.org/abs/1704.04861) models, a small, low-latency and low-powered model designed to be used by mobile devices. By *model* here, we mean a *machine learning* model.

### Machine Learning
*Machine learning* is a subfield of AI dedicated to allow computer systems to learn about a dataset and, then, make predictions *based on that learning.* For example, if we have access to all the past data relating to loan applications, we can use that to *predict* whether a particular applicant is likely to be granted or refused a loan.

Essentially, *machine learning* as a field of study tries to find the best ways in which computer systems can learn about the data. There are many different machine learning methods, but these can be roughly divided into two different groups: **supervised** and **unsupervised** machine learning.

- **supervised learning** means that the computer will learn from a dataset that has been, in some way, curated by a human -- usually for classification tasks in which we present examples of each category.

- **unsurpervised learning** means that we won't tell the computer anything about the dataset -- usually for clustering or generative tasks.

MobileNet, used by Teachable Machine, is a specific type of supervised learning model known as a *Convolutional Neural Network.*

#### Convolutional Neural Networks
[Convolutional Neural Networks (CNN)](https://en.wikipedia.org/wiki/Convolutional_neural_network) are a type of algorithms inspired by brain architecture (hence the 'Neural'), and are commonly used to analyse visual imagery. The details are too complex to get into here, but essentially, you can imagine a series of 'neurons' arranged in layers; each layer of neurons is responsible for identifying a particular characteristic of the input image in increasing complexity -- for example, first edges, then surfaces, then what those surfaces are (the feature extraction phase), then putting it all together, then identifying what the object is (the classification phase).
![Convolutional Neural Network Diagram](imgs\convolutional_neural_network.png)

When we fine-tune a CNN, what we do, essentially, is modify its last layer, i.e., the classification, so that the predicted results fall into the categories that we want.

### Teachable Machine walkthrough
Because Teachable Machine is a web-app, we can train our simple model easily through the browser, without having to worry too much about how it works under the hood (right now). So how do we do this?

1. Head over to [Teachable Machine](https://teachablemachine.withgoogle.com/)
2. Click 'Get Started'
3. Select 'Image Project'
4. Select 'Standard Image Project'

You should now see a simple diagrammatic view of what you need to do next: on the left, two windows entitled 'Class 1' and 'Class 2,' plus options to use the webcam or upload files; in the middle, a window enticengly entitled 'Training,' and at the end, 'Preview.'

![Techable Machine interface](imgs\TM_interface.png)




## Try the interface

We'll begin by trying the Teachable Machine interface in a dataset unrelated to our problem, the [Caltech101](https://en.wikipedia.org/wiki/Caltech_101).

The complete dataset contains nearly 10000 images, split into 101 image-level categories (hence the 101 in the title of the dataset). Although this particular dataset is relatively small, 101 categories are a little much, and we won't need the full 10000 images to train our (very small) model, so we will use a subset of the data.

### Test data

You should see a folder called `caltech101_dataset`. Inside that folder, you should see five more folders: `airplanes`, `butterfly`, `Faces`, `Motorbikes`, and `sunflower` -- these are the categories we are going to be training our model for (5 of the 101 available in the entire dataset). Each of those folders has a number of random images from that category, have a look at a few to get a sense of their diversity (or lack thereof), quality, size, etc. You might already begin to see a few problems with the data, but right now we are only interested in it insofar as it will allow us to quickly learn how to train a model using Teachable Machine.

### Upload the images to Teachable Machine

1. Head over to [Teachable Machine](https://teachablemachine.withgoogle.com/)
2. Click 'Get Started'
3. Select 'Image Project'
4. Select 'Standard Image Project'
5. Click the 'Upload' button in 'Class 1'
6. Select all the images inside the `airplanes` folder
7. Rename 'Class 1' to 'Airplanes'
8. Repeat the process (points 5-7) to the remaining categories

At the end of this process, you should see something like this:
![](imgs\TM_Categories.png)

You might be able to spot a few more drawbacks with this dataset. For example, there are a lot more samples of Airplanes or Motorbikes, than Sunflowers or Butterflies. We might see how this affects our models abilities to categorise an image later on.

#### What happens when you train a model

From here, we could just click the 'Train Model' button and be off to the races, but before we do that, there are a few parameters in the 'Advanced' tab that are worth looking through. These are:

- `Epochs`
- `Batch Size`
- `Learning Rate`

Before we explore what each of these terms means, it will be good to have a sense of what happens when we *train* or *fine-tune* a model. 

When we train a model, we are feeding the algorithm each image of our dataset; for each image, the algorithm will put it through the various layers we discussed above in order to examine their characteristics, and then attribute those characteristics to the class we categorise the image as -- the more images we feed the algorithm, the easier it will be for it to distinguish between essential characteristics (i.e., the ones that define an airplane as an airplane), and the non-essential characteristics (i.e., things that are part of the image that do not make an airplane).

Another thing that happens during training is that the dataset is further divided into a `training set` and a `validating set`. This means that a certain percentage of the dataset is reserved to validate the results of the training. In Teachable Machine this is done automatically for you with an 80/20 split. Once we train the model, we can explore a little more what the point of this division is.

With this very short and high-level explanation in mind, here's what those terms mean:

- `Epochs` are the number of times the entire dataset is fed to the algorithm. Once all the images have been analysed and classified, we have one epoch. You would tweak this number to try to improve the predictive capabilities of your algorithm. (Though be careful of [*overfitting*](https://www.ibm.com/cloud/learn/overfitting))

- `Batch Size` defines how many samples are fed to the algorithm at the same time; typically, you'd want to tweak this value if you are finding problems with system performance (i.e., it's taking too long to retrain the model).

- `Learning Rate` is a parameter that -- metaphorically -- defines how quickly the model learns about the data. Small tweaks to this value will influence the predictive capabilities of the model. What `Learning Rate` actually *is* is a little more complex, you can start by [reading the Wiki article about it](https://en.wikipedia.org/wiki/Learning_rate).

### Train the model

Click the 'Train Model' button, and *do not change tabs or close the browser*. You will see a progress bar that will count through the `epochs` we've defined earlier. At the end of this process, the 'Preview' window will become active, and we will be able to test how capable our model is at sorting images into the categories we've defined above.

### Test the model

Now, finally, is time for the fun stuff. [Search for a few images for each category](https://images.google.com/) and see whether the model categorises them accurately. To test each image, make sure that the 'Input' radio button is active, and that 'File' is selected in the dropdown menu. Try to find images that would lead to a miscategorization, and try to understand why they would be miscategorised. How would you tweak the training to correct that? Finally, click on the 'Under the hood' button in the 'Training' window, and have a look at the statistical results of your training. Try to make sense of it, particularly the 'Confusion Matrix' which is one of the main tools in assessing a model's performance.

## Do it all over again

<div class = "alert alert-block alert-info">*You can skip this section if all you are interested is putting the models to use: you should already have a `models` folder with a couple of pre-trained models. If you want to train the models yourself, [all the training data is available in this `.zip` file](https://drive.google.com/file/d/1YtRBLjBSzzFcuGVmiOznuwF1MoSolQnJ/view?usp=sharing)*</div>

Now that we are relatively familiar with what *training* is, and we know how to work with Teachable Machine, it's time to train a couple of models that can actually be useful to our problem -- i.e., cataloguing a collection of artworks by medium and subject matter.

A good model starts with good training data, and while we cannot guarantee that the data we're using is *good*, we can make sure it is adequate to our purposes. (And the only reason why we are not too concerned with the quality of our training set is because we are just doing this as a tutorial -- if we were to put this model to actual use, ensuring the quality of the training data is paramount).

In the zip file you've downloaded above, there should be two other folders. One, `art_media_dataset`, will be used to train a model to identify the medium of the artwork; the other, `Wiki_Art_dataset` will be used to train a model capable of identifying the subject matter. The steps we will be taking are the same in both cases, and are the same as the ones we took for our caltech101 model above:

1. Upload the contents of each folder inside the `art_media_dataset` to its own category
2. Train the model (this will take a few minutes)
3. Test the model in browser (with a random google image)
4. Export the model to your machine:
  1. Click the 'Export Model' button at the top of the 'Preview Window'
  2. Select the 'Tensorflow' tab (*not* Tensorflow.js)
  3. Make sure you have 'Keras' selected in the 'Model Conversion Type'
  4. Click 'Download my model' and save it to someplace where you can find it (converting the model will take a few minutes). The result will be a zip file containing two files: one will be a `.h5` which is the model itself, the other is a text file with the labels for each category. Rename the zipfile so that you can distinguish between both models.

Repeat the process for the `Wiki_Art_dataset`.



## Load the model

Now it's time to load the model. First we will need a few libraries:

In [24]:
# install pip package in the current jupyter kernel
import sys
!{sys.executable} -m pip install keras
!{sys.executable} -m pip install Pillow
!{sys.executable} -m pip install numpy

Collecting keras
  Using cached keras-2.11.0-py2.py3-none-any.whl (1.7 MB)
Installing collected packages: keras
Successfully installed keras-2.11.0


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-intel 2.11.0 requires absl-py>=1.0.0, which is not installed.
tensorflow-intel 2.11.0 requires astunparse>=1.6.0, which is not installed.
tensorflow-intel 2.11.0 requires flatbuffers>=2.0, which is not installed.
tensorflow-intel 2.11.0 requires gast<=0.4.0,>=0.2.1, which is not installed.
tensorflow-intel 2.11.0 requires google-pasta>=0.1.1, which is not installed.
tensorflow-intel 2.11.0 requires grpcio<2.0,>=1.24.3, which is not installed.
tensorflow-intel 2.11.0 requires h5py>=2.9.0, which is not installed.
tensorflow-intel 2.11.0 requires libclang>=13.0.0, which is not installed.
tensorflow-intel 2.11.0 requires numpy>=1.20, which is not installed.
tensorflow-intel 2.11.0 requires opt-einsum>=2.3.2, which is not installed.
tensorflow-intel 2.11.0 requires packaging, which is not installed.
tensorfl

Collecting Pillow
  Downloading Pillow-9.3.0-cp39-cp39-win_amd64.whl (2.5 MB)
Installing collected packages: Pillow
Successfully installed Pillow-9.3.0


You should consider upgrading via the 'c:\Users\ntjs3\AppData\Local\Programs\Python\Python39\python.exe -m pip install --upgrade pip' command.


Collecting numpy
  Downloading numpy-1.23.5-cp39-cp39-win_amd64.whl (14.7 MB)
Installing collected packages: numpy
Successfully installed numpy-1.23.5


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
transformers 4.16.2 requires filelock, which is not installed.
transformers 4.16.2 requires huggingface-hub<1.0,>=0.1.0, which is not installed.
transformers 4.16.2 requires packaging>=20.0, which is not installed.
thinc 8.1.5 requires blis<0.8.0,>=0.7.8, which is not installed.
thinc 8.1.5 requires catalogue<2.1.0,>=2.0.4, which is not installed.
thinc 8.1.5 requires confection<1.0.0,>=0.0.1, which is not installed.
thinc 8.1.5 requires cymem<2.1.0,>=2.0.2, which is not installed.
thinc 8.1.5 requires murmurhash<1.1.0,>=1.0.2, which is not installed.
thinc 8.1.5 requires preshed<3.1.0,>=3.0.2, which is not installed.
tensorflow-intel 2.11.0 requires absl-py>=1.0.0, which is not installed.
tensorflow-intel 2.11.0 requires astunparse>=1.6.0, which is not installed.
tensorflow-intel 2.11.0 requires flatbuffers>=2.0,

In [25]:
from keras.models import load_model
from PIL import Image, ImageOps
import numpy as np

The `keras.models` library is the one which will actually allow you to read the `.h5` file we saved earliear.

The `PIL` library is the standard python library for image manipulation, of which we will need to do a little to get a prediction from the model.

Finally, `numpy` is another well-known library to work with scientific data. We will use this to create an input array for predictions.

Now it's time to actually load the models into memory:

In [26]:
# Load the model
model_medium = load_model('models/converted_keras_medium/keras_model_medium.h5', compile=False)

## Load an image and test the model

Now we will need to make sure the model is working, so we will test it with a single image.

Upload the `test` folder to your Google Colab, using the same methods as above.

We'll start by loading an image into memory:

In [27]:
image = Image.open('test/image3.jpg')

Now, we'll need to transform the image so that it is in the same format expected by the model. We will need to resize and crop it.

In [28]:
size = (224, 224) # defines the size of the image to 224x224 px
image = ImageOps.fit(image, size, Image.ANTIALIAS) # resizes the image to the size specified above and crops it into a square

Next, we'll need to turn the image into a numpy array, which is the data needed by the model to make the prediction.

In [29]:
image_array = np.asarray(image)
normalized_image_array = (image_array.astype(np.float32) / 127.0) - 1

We now need to create the input array, and insert our new transformed image into it.

In [30]:
data = np.ndarray(shape=(1, 224, 224, 3), dtype=np.float32)
data[0] = normalized_image_array

Finally, we can get a prediction from the model:

In [31]:
prediction_medium = model_medium.predict(data)
print(prediction_medium)

[[5.3160015e-18 3.2935341e-19 1.3042406e-23 1.0000000e+00]]


This looks a little like gibberish, but it isn't. Remember that our model for medium had four classes ['Drawing', 'Engraving', 'Painting', 'Sculpture'] and what we have here are the values for each of those classes. What we need to do is find the highest value, and translate it to a class and confidence percentage.

In [32]:
category_medium = np.argmax(prediction_medium, axis = 1) # Returns an array with index of the maximum values
confidence_medium = prediction_medium[0][category_medium] # Returns an array with the maximum confidence value 
print(f"This is a {category_medium[0]} with confidence of {confidence_medium[0]}") # Prints the result

This is a 3 with confidence of 1.0


This is a little better, but still not ideal. What we want is to have a completely readable result, rather than a category number. Luckily, we have that `labels.txt` that we can put to good use now. We need to open it and turn it into a dictionary, so that we can print the actual category, rather than its index. So let's do that now.

In [33]:
labels_medium_dictionary = {}
with open('models/converted_keras_medium/labels_medium.txt') as labels_medium: # opens the file in a context manager
  for line in labels_medium:  # for each line in the file
    (key, value) = line.split(" ", 1) # splits the line into two (by whitespace); first value becomes the key, the second the value
    labels_medium_dictionary[int(key)] = value.rstrip("\n") # creates the entry in the dictionary

print(labels_medium_dictionary) # prints the result to the screen

{0: 'Drawing', 1: 'Engraving', 2: 'Painting', 3: 'Sculpture'}


Now that we have our dictionary, we can easily make our prediction much more readable:

In [34]:
print(f"This is a {labels_medium_dictionary[int(category_medium[0])]} with a confidence of {confidence_medium[0]}")

This is a Sculpture with a confidence of 1.0


We can now do the same for our genre model:

In [35]:
# Load the model
model_genre = load_model('models/converted_keras_genre/keras_model_genre.h5', compile=False)

In [36]:
# Create labels dictionary
labels_genre_dictionary = {}
with open('models/converted_keras_genre/labels_genre.txt') as labels_genre: # opens the file in a context manager
  for line in labels_genre:  # for each line in the file
    (key, value) = line.split(" ", 1) # splits the line into two (by whitespace); first value becomes the key, the second the value
    labels_genre_dictionary[int(key)] = value.rstrip("\n") # creates the entry in the dictionary

print(labels_genre_dictionary) # prints the result to the screen

# Get prediction
prediction_genre = model_genre.predict(data)
print(prediction_genre)

# Gets highest value prediction and confidence
category_genre = np.argmax(prediction_medium, axis = 1) # Returns an array with index of the maximum values
confidence_genre = prediction_medium[0][category_genre] # Returns an array with the maximum confidence value 

# Prints the result in readable form
print(f"This is a {labels_genre_dictionary[int(category_genre[0])]} with a confidence of {confidence_genre[0]}")


{0: 'Abstract', 1: 'Animal', 2: 'Cityscape', 3: 'Figurative', 4: 'Flower', 5: 'Genre Painting', 6: 'Landscape', 7: 'Marina', 8: 'Mythological', 9: 'Nude', 10: 'Portrait', 11: 'Religious', 12: 'Still Life', 13: 'Symbolic'}
[[1.7347479e-06 3.5765997e-04 1.4940852e-11 1.3129324e-01 1.5120938e-09
  7.5458574e-06 1.3809119e-08 3.8409483e-13 5.8085110e-02 2.2743238e-04
  8.0372030e-01 1.2844914e-14 3.5339461e-11 6.3070348e-03]]
This is a Figurative with a confidence of 1.0


Now, to put all the results together is just a question of tailoring the output:

In [37]:
print(f"Medium: {labels_medium_dictionary[int(category_medium[0])]}, Confidence: {confidence_medium[0]}\n \
Genre: {labels_genre_dictionary[int(category_genre[0])]}, Confidence: {confidence_genre[0]}")

Medium: Sculpture, Confidence: 1.0
 Genre: Figurative, Confidence: 1.0


## Doing it over and over again

You might be thinking: why should I go through all this work just to check the results for one image, when we could do that just as easily through the web interface? That's a fair point, but remember, you're not doing all this for one image, you're doing it for a million! Luckily, writing a script to get classifications is just as easy to do for one as it is to do for one million. All we need is a little `for` loop.

Before that, we just need to collect some information about the names and number of files we have.

In [38]:
import os, os.path # imports the libraries to work with file paths

# We create a constant with the directory where the images we want to test are
IMAGE_DIR = "data_to_label/"

# Get a list of file names
file_names = [name for name in os.listdir(IMAGE_DIR) if '.jpg' in name]

# Get total number of files
nr_files = len(file_names)

Now that we now how many files we have, and what their names are, we can transform the images into a numpy array, like we did above. To make the code more readable, we begin by creating a little function that does that for us:

In [39]:
def transform_image(image_path):
  image = Image.open(image_path)
  size = (224, 224) # defines the size of the image to 224x224 px
  image = ImageOps.fit(image, size, Image.ANTIALIAS) # resizes the image to the size specified above and crops it into a square
  image_array = np.asarray(image)
  normalized_image_array = (image_array.astype(np.float32) / 127.0) - 1
  return normalized_image_array


Now we can create our input array, much like we did before, with one small difference: the first value will no longer be 1 (for a single image), but the number of images we want to test:

In [40]:
data = np.ndarray(shape=(nr_files, 224, 224, 3), dtype=np.float32)

We can test we've done it correctly by printing how long the array is:

In [41]:
print(len(data))

1054


Now, the final step we need to take to create our input array is to iterate through our list of files and, for each one, normalise the image and add it to our data array:

In [42]:
for i, file in enumerate(file_names):
  file_path = IMAGE_DIR + file
  print(f"[{i}]: {file_path}") # just to check we are iterating correctly
  data[i] = transform_image(file_path)

[0]: data_to_label/Alfred_Sisley_1.jpg
[1]: data_to_label/Alfred_Sisley_10.jpg
[2]: data_to_label/Alfred_Sisley_11.jpg
[3]: data_to_label/Alfred_Sisley_12.jpg
[4]: data_to_label/Alfred_Sisley_13.jpg
[5]: data_to_label/Alfred_Sisley_14.jpg
[6]: data_to_label/Alfred_Sisley_15.jpg
[7]: data_to_label/Alfred_Sisley_16.jpg
[8]: data_to_label/Alfred_Sisley_17.jpg
[9]: data_to_label/Alfred_Sisley_18.jpg
[10]: data_to_label/Alfred_Sisley_19.jpg
[11]: data_to_label/Alfred_Sisley_2.jpg
[12]: data_to_label/Alfred_Sisley_20.jpg
[13]: data_to_label/Alfred_Sisley_21.jpg
[14]: data_to_label/Alfred_Sisley_3.jpg
[15]: data_to_label/Alfred_Sisley_4.jpg
[16]: data_to_label/Alfred_Sisley_5.jpg
[17]: data_to_label/Alfred_Sisley_6.jpg
[18]: data_to_label/Alfred_Sisley_7.jpg
[19]: data_to_label/Alfred_Sisley_8.jpg
[20]: data_to_label/Alfred_Sisley_9.jpg
[21]: data_to_label/Amedeo_Modigliani_1.jpg
[22]: data_to_label/Amedeo_Modigliani_10.jpg
[23]: data_to_label/Amedeo_Modigliani_11.jpg
[24]: data_to_label/Amed

Now, if we try to predict using this data array, we will see a slightly different result:

In [43]:
prediction_medium = model_medium.predict(data)
prediction_genre = model_genre.predict(data)
print(prediction_medium)

[[5.6839347e-01 4.2399037e-01 1.2071071e-03 6.4091063e-03]
 [8.8607979e-01 1.2016713e-04 1.1360569e-01 1.9432456e-04]
 [2.5517851e-02 9.2038503e-03 7.7084768e-01 1.9443069e-01]
 ...
 [9.9993742e-01 2.1748693e-07 6.2320600e-05 4.1531876e-15]
 [9.9901307e-01 6.7391945e-04 3.1302648e-04 7.8724093e-12]
 [9.9994648e-01 5.4980569e-12 5.3513220e-05 4.6586460e-11]]


Instead of having a single line with four values, we have multiple lines with four values each. You might have guessed it: each line corresponds to the predicted results for each image. So we now have to change our display code slightly, so that we are able to understand it. We iterate through our input files again, and print to console to make sure everything is readable.

In [44]:
category_medium = np.argmax(prediction_medium, axis = 1) # Returns an array with index of the maximum values for each file
category_genre = np.argmax(prediction_genre, axis = 1) # Returns an array with index of the maximum values for each file

for i, file in enumerate(file_names):
  medium = labels_medium_dictionary[int(category_medium[i])]
  confidence_medium = prediction_medium[i][category_medium[i]]*100 #to get a percentage value

  genre = labels_genre_dictionary[int(category_genre[i])]
  confidence_genre = prediction_genre[i][category_genre[i]]*100
  print(f"{file} -- Medium: {medium}({confidence_medium:.2f}%), Genre: {genre}({confidence_genre:.2f}%)")


Alfred_Sisley_1.jpg -- Medium: Drawing(56.84%), Genre: Landscape(99.97%)
Alfred_Sisley_10.jpg -- Medium: Drawing(88.61%), Genre: Landscape(99.99%)
Alfred_Sisley_11.jpg -- Medium: Painting(77.08%), Genre: Landscape(50.15%)
Alfred_Sisley_12.jpg -- Medium: Drawing(82.35%), Genre: Landscape(100.00%)
Alfred_Sisley_13.jpg -- Medium: Drawing(99.99%), Genre: Landscape(99.98%)
Alfred_Sisley_14.jpg -- Medium: Painting(98.85%), Genre: Landscape(100.00%)
Alfred_Sisley_15.jpg -- Medium: Painting(99.05%), Genre: Landscape(100.00%)
Alfred_Sisley_16.jpg -- Medium: Sculpture(94.87%), Genre: Cityscape(99.59%)
Alfred_Sisley_17.jpg -- Medium: Drawing(95.54%), Genre: Still Life(100.00%)
Alfred_Sisley_18.jpg -- Medium: Painting(94.89%), Genre: Landscape(100.00%)
Alfred_Sisley_19.jpg -- Medium: Painting(87.44%), Genre: Landscape(100.00%)
Alfred_Sisley_2.jpg -- Medium: Drawing(99.98%), Genre: Landscape(99.97%)
Alfred_Sisley_20.jpg -- Medium: Painting(100.00%), Genre: Landscape(98.73%)
Alfred_Sisley_21.jpg -- 

## Saving the output to a file

Finally, the downside of this type of output is that it becomes very hard to read the results on the screen. It is much better if we could save it somewhere, and then read it later for analysis. That's exactly what we are doing next, simply saving the results to a `.csv` file, or 'comma-separated value' -- which is no more than a simple table in text form.

We begin by creating the file in `append` mode, meaning, we can add stuff to it, rather than write over it:

In [45]:
with open('output.csv', 'a') as outfile:
  outfile.write('file_name,medium,medium_confidence,genre,genre_confidence\n')

Now, we simply change the `for` loop above to write to the file, instead of writing to the screeen:

In [46]:
for i, file in enumerate(file_names):
  medium = labels_medium_dictionary[int(category_medium[i])]
  confidence_medium = prediction_medium[i][category_medium[i]]*100 #to get a percentage value

  genre = labels_genre_dictionary[int(category_genre[i])]
  confidence_genre = prediction_genre[i][category_genre[i]]*100
  with open('output.csv', 'a') as outfile:
    outfile.write(f"{file},{medium},{confidence_medium:.2f},{genre},{confidence_genre:.2f}\n")

## Conclusion

If you spend anytime combing through the results we got above, you will probably conclude that they are not very good -- in fact, you are right, they are terrible. There's an old dictum that states that:

<blockquote class="twitter-tweet"><p lang="en" dir="ltr">Data Science is 99% preparation, 1% misinterpretation.</p>&mdash; Big Data Borat (@BigDataBorat) <a href="https://twitter.com/BigDataBorat/status/324892846685564930?ref_src=twsrc%5Etfw">April 18, 2013</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

and you can see both of those at play in the results we have here. You can improve the results by doing a few things:

### Improve the training data

There's no fine point to it: the data we used to train both models is, at best, of dubious quality, and it simply isn't a very extensive training set. In both cases, we used a subset of a larger dataset which was, itself, of uncertain origin. With machine learning, if you put garbage in, you will get garbage out. This is the first point that needs to be addressed if you tried to improve the results of the models, don't even think about reading the other strategies below until you've addressed that.

### Refine the categories

Again, the way in which the dataset was divided is questionable at best. That kind of classification is best left to the domain experts -- which we are not. You can refine the categories to be more exclusive than they are now and then, see the point above, improve the training data to reflect that.

### Tweak the parameters: `epochs`, `learning rate`
Tweaking those parameters will give you the most immediate improvements (or deteriorations), but they will not fix the problems with your training data. A lot of trial and error, testing, and interpretation of results is involved in this tweaking, so be patient.

While the results may not be great, the method to get to them is sound. For this type of application, you've learned to:

1. Gather the data
1. Organise the data
1. Train the model
1. Get predictions from the model

Where this will take you next, is your call!

## Acknowledgments and data sources

### Acknowledgments
This tutorial is greatly indebted to a (as yet unpublished) [Programming Historian](https://programminghistorian.org/) lesson, by [Nabeel Siddiqui](https://nabeelsiddiqui.net/). You can see the pre-publication draft [here](https://github.com/programminghistorian/ph-submissions/issues/414).

### Data sources

#### Caltech101
A subset of this dataset was aquired through manipulation of the version [available here](https://voxel51.com/docs/fiftyone/user_guide/dataset_zoo/datasets.html#caltech-101).

#### Art Images: Drawing/Painting/Sculptures/Engravings

A subset of this dataset was aquired through manipulation of the version [available here](https://www.kaggle.com/datasets/thedownhill/art-images-drawings-painting-sculpture-engraving).

#### WikiArt

A subset of this dataset was aquired through manipulation of the version [available here](https://www.kaggle.com/datasets/ipythonx/wikiart-gangogh-creating-art-gan).

#### Best Artworks of All Time

A subset of this dataset was aquired through manipulation of the version [available here](https://www.kaggle.com/datasets/ikarus777/best-artworks-of-all-time).