<a href="https://colab.research.google.com/github/debsarlin/afterschool/blob/master/DVLab_Workshop_2021_03_24.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Images as Data with the Distant Viewing Toolkit**

*Date*: 24 March 2021

# 1. Introduction

This document displays a file known as a Jupyter notebook. In this case, it contains a mix of plain text (like this one!) and code in the open-source programing language Python. The notebook is being hosted on Google's Colab platform, which allows us to run the code for free on a third-party system without the need to install Python and its many dependencies on our local machine. If you are interested in using these methods further, however, it should be possible to install all of this on your machine and run the code locally.

In this tutorial we will demonstrate two different computer vision algorithms using a subset of functions available from the Distant Viewing Toolkit. We will look at a subset of 1000 images from Harvard University Art Museum's *American Professional Photographers* collection. This collection was assembled by Barbara Norfleet from 1975-1977 from forgotten negatives and prints held by over 25 professional photography studios. The photographs focus on the depiction of social events — such as weddings, parties, and beauty pageants — that the studios were hired to photograph. The entire collection can be found at [harvardartmuseums.org](https://harvardartmuseums.org/collections?q=%22American+Professional+Photographers+Collection%22).

To run the code below, all that is needed is to sign into a Google account (you will be prompted to login by clicking on the first code block below). For more information about this project, please see the [Distant Viewing Lab](https://www.distantviewing.org/), the [Collections as Data Website](https://collectionsasdata.github.io/), and the References section at the bottom of this page. Questions or issues with the code can be sent to Taylor Arnold (tarnold2@richmond.edu).

# 2. Install Python Modules

The Google Colab environment already has a running version of Python and several of the most common modules (third-party code that extends the basic language). We only need to install a few other pieces, such as Detectron2 and the Distant Viewing workshop code. To do this, hover over the code block below and click on the run button that appears in the upper left corner of the block.

In [None]:
!pip install -q pyyaml==5.1
!pip install -q Keras tensorflow
!pip install -q detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.8/index.html
!pip install -q git+https://github.com/distant-viewing/dvt_workshop.git
# exit(0) 

It is possible that you may see one or two errors about extra dependencies. Our experience is that these can be ignored for the moment. However, before continuing, it is important that you first restart the Python runtime to properly finish the installation of the above components. To do this, click on the menu above and select the option **Runtime > Restart runtime**.


# 3. Download Data 

Now that we have the Python modules installed, we next need to  download the image data and metadata. This can be done by running the following code block:

In [None]:
!wget https://distantviewing.org/appc_workshop_set.tar.gz -q -O appc_workshop_set.tar.gz
!wget https://distantviewing.org/appc_workshop_meta.csv -q -O appc_workshop_meta.csv
!tar -xf appc_workshop_set.tar.gz

Once finished, you will have a CSV file containing metadata about the collection as well as the 1000 digitized images that we will look at in this tutorial. 

# 4. Load Modules

As a final setup task, we need to load in all of the libraries that we will use in the workshop. Again, just run the following lines to load the modules.

In [None]:
# import some common libraries
import numpy as np
import pandas as pd
import cv2
from os.path import join
from google.colab.patches import cv2_imshow

# import functions from dvt_workshop
from dvt_workshop import (
    dvt_detectron_config,
    dvt_tidy_instance_data,
    dvt_show_instance_predictions,
    dvt_load_embed_image_model,
    dvt_embed_image
)

Note that you will ideally not see any output in the previous block, which means that everything loaded without a problem. If you did have an error, this is likely an indication that a more serious problem has occured.

# 5. Load Metadata

1.   List item
2.   List item



Now, let's load the metadata about the collection into Python. We do this using the function `pd.read_csv` and providing the name of the file that we downloaded above.

In [None]:
meta = pd.read_csv("appc_workshop_meta.csv")
meta

There is limited metadata here. Most importantly for us is the variable called *path*, which gives the filename associated with the image. We also see an indication of the date (note: the date is approximate) of when the image was taken and an indication of whether this is a digized negative or a digitized print.

# 6. Looking at an Image in Python

Let's grab an image from the corpus and display it in Python. To do this, we use the functions `cv2.imread` and `cv_imshow`. The code is written to grab row 10's image. You can change the number to anything between 0 and 999 to see a different image. (Note: If you have an error when running the code below, it is likely because you forgot to reset the runtime after running Section 2. To fix, reset the runtime and run the code in Sections 3-5 again.)

In [None]:
im = cv2.imread(join("appc_workshop_set", meta.path[10]))
cv2_imshow(im)

Note that digital images are stored in Python as three large grids of numbers. These correspond to the red, green, and blue intensities of each pixel in the image.



We can see the size of the image by looking at the shape attribute:

In [None]:
im.shape

This image is 390 pixels high, 521 pixels wide, and has three color *channels* (red, green, and blue). The image is black and white, but technically coded as a color image that consists entirely of shades of gray. The original image has a higher resolution; we have created a smaller version here to make the code run quickly during the workshop.

We can look at the first few rows and columns of the image. Here is the upper left-hand corner of the image's red channel intensities:

In [None]:
im[:10, :10, 0]

A value of 0 indicates that the pixel is entirely black and a value of 255 indicates white (at least, when we know that a color is greyscale; color images are more difficult to directly describe directly). Looking at the pixels, we can see the black border of the image as the first four rows of pixels, followed by the relatively bright sky below it.

# 7. Running Object Segmentation

Now, let's do some computer vision with the data, starting with image segmentation. The thing-stuff segmentation does not work well with grayscale images, but detecting objects does. We will load this model using the function `dvt_detectron_config` (it will automatically download some additional data the first time it is run).

In [None]:
dvt_image_segment, md = dvt_detectron_config()

Once we have the model loaded, we will run the function `dvt_image_segment` to detect objects and their locations within the image. Our function `dvt_tidy_instance_data` collects the objects into a rectangular dataset. We see that the algorithm has detected a number of different people, which matches our understanding of the image.

In [None]:
instances = dvt_image_segment(im)
dvt_tidy_instance_data(instances, md, meta.path[7])

We can further visualize the predictions by drawing the detected objects over the image with the function `dvt_show_instance_predictions`.

In [None]:
dvt_show_instance_predictions(instances, md, im)

And we see that the algorithm does a very good job of detecting and locating the people in a relatively old (compared to the data used to train the model) and low-resolution image.

# 8. Image Segmentation Across the Corpus

Now that we know how to detect people and objects within an image, let's loop over the entire collection to detect objects in all 1000 images. The following code runs the same code as above, but over every row of the metadata. It prints out a running summary after every 25 images to indicate that it is still working well. The runtime depends a bit on how many people are using the Google servers, but usually finishes in just a few minutes.

In [None]:
df = []
for iter, path in enumerate(meta.path[1:]):
    im = cv2.imread(join("appc_workshop_set", path))
    instances = dvt_image_segment(im)
    df.append(dvt_tidy_instance_data(instances, md, path))
    if (iter % 25) == 0:
      print("Done with {0:d} of {1:d}".format(iter, meta.shape[0]))


df = pd.concat(df)

Looking at the output, we see that there a large number of people and objects that have been detected. 

In [None]:
df

Just to get a sense of what has been detected, let's find those images that contain a high number of certain objects.

In [None]:
df.value_counts(subset=["path", "class"])

We can then look at the results. For example, this image of a ship with dozens of sailors looking over the railing has 42 people.

In [None]:
im = cv2.imread(join("appc_workshop_set", "137348_42928874_INV010704P.jpeg"))
instances = dvt_image_segment(im)
dvt_show_instance_predictions(instances, md, im)

We can then look at another image of a woman sitting at a desk. We detected 36 books on the bookshelves.

In [None]:
im = cv2.imread(join("appc_workshop_set", "122462_20428547_INV165733P.jpeg"))
instances = dvt_image_segment(im)
dvt_show_instance_predictions(instances, md, im)

Take a moment to look at what was and was not detected in these images. Notice that the algorithms are quite good at detecting people, but struggle more with other kinds of objects (did you see the "mouse" on the desk in the second image?). We tend to use it most for detecting people, and even then do so with the understanding that it will both miss some people in the images and mistakenly detect people in others.

# 9. Image Embeddings

As a second model, let's run an image embedding algorithm. This associates an image with a string of numbers. The specific numbers do not have an intrinsic meaning, but images with similar values will share common features. To start, we load the embedding model with the function `dvt_load_embed_image_model`. As with the previous model, this will download additional parts of the model the first time it is run.

In [None]:
dvt_embed = dvt_load_embed_image_model("fc1")

To compute the embedding, we run the function `dvt_embed_image`, which returns the sequence of numbers:

In [None]:
em = dvt_embed_image(join("appc_workshop_set", meta.path[22]), dvt_embed)
em

Looking at the shape of the output, we see that the embedding assigns 4096 numbers (2^12) to the image.

In [None]:
em.shape

Embeddings are not very interesting on their own. Let's compute the embedding for the entire corpus by looping over each row of our metadata. As with the example above, this will print out the progress after every 25 images, and it takes a couple of minutes on average to finish.

In [None]:
em_list = []
for iter, path in enumerate(meta.path):
    em = dvt_embed_image(join("appc_workshop_set", path), dvt_embed)
    em_list.append(em)
    if (iter % 25) == 0:
        print("Done with {0:d} of {1:d}".format(iter, meta.shape[0]))


em = np.vstack(em_list)

The output now has 1000 rows (one for each image) and 4096 columns (to contain the embedding for each image).

In [None]:
em.shape

Finally, we can use these values to see which images in the collection are most similar to one another.

# 10. Recommendation System

For this last section, let's build a basic recommendation system for our collection of images. Given a starting image, this system will find other images in the corpus that are the other most similar in terms of their embedding. To start, let's set up the plot area on Python to display an array of images all at once by running the following code.

In [None]:
import matplotlib.pyplot as plt
import matplotlib.patches as patches

plt.rcParams["figure.figsize"] = (16,12)

Now, the code below selects an image by its row number and prints out the ten images that have the most similar embedding. Each image will always be closest to itself, therefore the image in the upper left of the output is the image we are comparing to. You should experiment with setting the variable `ref_img_num` to different values between 0 and 999 to see how the algorithm works for different starting images.

In [None]:
ref_img_num = 612
idx = np.argsort(np.sum(np.abs(em - em[ref_img_num, :])**2, axis=1))[:9]

for ind, i in enumerate(idx):
    plt.subplots_adjust(left=0, right=1, bottom=0, top=1)
    plt.subplot(3, 3, ind + 1)

    img = cv2.imread(join("appc_workshop_set", meta.path.values[i]))
    plt.imshow(img)
    plt.axis("off")

Hopefully you can see some interesting patterns that are found in the data. We found that it tends to group images based on the number of people in the image and their relative sizes within the camera's frame. It also detects some common objects. We admit that not all of results are perfect, with some images not having much in common with one another. This is due to the relatively small size of this subset of the collection. When run on larger collections there is more room for each image to be very similar to others. You can see an example of this in action on our digital public project [Photogrammar](https://photogrammar.org).

# References

For more information about our work and projects, please see the following papers and projects:

- T. Arnold, P. Leonard, and L. Tilton, (2017). "Knowledge creation through recommender systems." *Digital Scholarship in the Humanities*, Volume 32, Issue Supplement_2, December 2019, Pages ii151–ii157.
 [DOI: 10.1093/llc/fqx035](https://doi.org/10.1093/llc/fqx035).
- T. Arnold and L. Tilton, (2019). "Distant Viewing: Analyzing Large Visual Corpora." *Digital Scholarship in the Humanities*, Volume 34, Issue Supplement_1, December 2019, Pages i3–i16.
 [DOI: 10.1093/llc/fqz013](https://doi.org/10.1093/llc/fqz013).
- T. Arnold, A. Berke, and L. Tilton, (2019). "Visual Style in Two Network Era Sitcoms". *Journal of Cultural Analytics*. [DOI: 10.22148/16.043](https://doi.org/10.22148/16.043)
- T. Arnold and L. Tilton, (2020). "Distant Viewing Toolkit: A Python Package for the Analysis of Visual Culture". *Journal of Open Source Software*, 5(45), 1800, [DOI: 10.21105/joss.01800](https://doi.org/10.21105/joss.01800)
- T. Arnold, N. Ayers, J. Madron, R. Nelson, L.Tilton, L. Wexler. (2021) [Photogrammar](https://photogrammar.org) (Version 3.0). 2021.
