# Machine Curation

This notebook presents the code that underpins the machine-curated tour of the Liverpool Biennial collection. The machine curated exhibition can be viewed through the following [link](https://metaobjects.org/testing/liverpoolbiennial/).

We load in data from GitHub and present the preprocessing steps that operate on that data and render it useful to our machine curator. Finally, we compute the similarity rankings across the Liverpool Biennial collection that ultimately allow viewers of the machine curated exhibition to navigate their way through the collection.

## First steps: getting the data

As a first step, we need to clone the repository that accompanies this notebook. It contains the data we will need in the steps below.

In [None]:
!git clone https://github.com/DurhamARC/machine-curation.git

Unzip the 50 images that comprise the Liverpool Biennial 2021 collection and load dataset.

In [None]:
%pushd machine-curation/datasets/liverpool_biennial_2021/original_images
!unzip 'images_part_*.zip'
%popd

Load the metadata that describes the Liverpool Biennial collection and its content.

In [None]:
import pandas as pd
lb2021 = pd.read_csv('machine-curation/datasets/liverpool_biennial_2021/LB2021_metadata.csv')

# Preprocessing: generating data

The next four sections create the data that our machine curator needs. We do this in four stages:


1.   A machine generated image is computed for each caption in the Liverpool Biennial.
2.   Keywords are extracted from the description of each artwork in the Liverpool Biennial.
3.   A machine generated caption is computed for each artwork in the Liverpool Biennial.
4.   Heatmaps are overlayed over each artwork in the Liverpool Biennial.

## Generate image from title

Each artwork in the Liverpool Biennial collection is associated with one machine generated image. This machine generated image is created by the `Imagine` model - a natural language to image model - from the [`big_sleep` module](https://github.com/lucidrains/big-sleep). That is, the code below maps artwork titles to machine generated images.

First, we install the relevant modules.

In [None]:
!pip install folium==0.2.1 # an idiosyncracy of using pip with big-sleep in Colab
!pip install big-sleep

To test out the model on one image caption (takes around 30 minutes) run the following code. It will prompt you for the name of an artwork which you should enter with an underscore separating each word.

At the end you will we a file called `<given_title>.best.png` will have been generated. This is the image that the machine associates with your artwork title.

In [None]:
import pandas as pd
from big_sleep import Imagine

image_caption = input("Enter the name of an artwork in your collection with an "
                      "underscore separating each word, e.g. Masterless_Voices")

dream = Imagine(
    text = image_caption,
    lr = 5e-2,
    epochs = 1,
    iterations = 1000,
    save_every=200,
    num_cutouts = 32,
    save_best = True,
)
dream()

As an example of the kind of output one  might get, let's consider the work **`Masterless Voices`** by Ines Doujak and John Barker. A description of this 
artwork can be found [here](https://www.biennial.com/2021/exhibition/artists/ines-doujak-and-john-barker).

When the string `Masterless_Voices` - the title of a work by Ines Doujak and John Barker - is entered, the following output is generated:

![Machine_generated_image](https://raw.githubusercontent.com/DurhamARC/machine-curation/master/datasets/liverpool_biennial_2021/example_images/Masterless_Voices.best.png)

For comparison, below is the original artwork, the title of which is visualised by the ML model in the above.

![original_image](https://raw.githubusercontent.com/DurhamARC/machine-curation/master/datasets/liverpool_biennial_2021/example_images/Masterless_Voices.original.png)

The following code allows us to run the model over all artwork titles in our dataset. The `*.best.png` files will collected and saved in a folder called `title_to_image_data`. 

This takes a very long time in Colab. To collect the data more quickly so it can be used in the stages below reduce the number of iterations passed to Imagine().

In [None]:
import os

import pandas as pd
from big_sleep import Imagine

lb2021 = pd.read_csv('machine-curation/datasets/liverpool_biennial_2021/LB2021_metadata.csv')

for item in lb2021.clean_title:
    dream = Imagine(
        text=item,
        lr=5e-2,
        epochs=1,
        save_every=200,
        save_progress=True,
        iterations=1000,
        num_cutouts=32,
        save_best=True,
    )
    dream()

# Move 'best' machine generated images into a data dir
data_dir = "title_to_image_data"
os.mkdir(data_dir)
for file in os.listdir():
    if file.endswith(".best.png"):
        os.rename(file, os.path.join(data_dir, file))
    elif file.endswith(".png"):
        # clean cwdir
        os.remove(file)


## Extract keywords


Each artwork in the Liverpool Biennial is associated with detailed written literature discussing the themes, contexts and factual basis underlying the artwork. For example:

*Pan African Flag for the Relic Travellers' Alliance*, by Larry Achiampong .....

![original_image](https://raw.githubusercontent.com/DurhamARC/machine-curation/master/datasets/liverpool_biennial_2021/example_images/pan_african_flag.png)

 ....... Maps to the following project description (from the Liverpool Biennial [website](https://www.liverpoolbiennial2021.com/))

> "Larry Achiampong presents a series of eight different Pan African flags, exhibited across ten locations, on buildings and streets throughout Liverpool city centre. With some designs featuring 54 stars that represent the 54 countries of Africa, the flags evoke solidarity and collective empathy – while some of their locations speak to Liverpool’s connection with the enslavement of West Africans as part of the transatlantic slave trade. The colours of the flags reflect Pan African symbolism: green, black and red represent Africa’s land, people and the struggles the continent has endured respectively, while yellow-gold represents a new dawn and prosperity. Achiampong has configured these colours into icons that are suggestive of community, motion and the human figure in ascension.For Liverpool Biennial 2021, four of the artist’s flags from his original series are shown - Ascension, Community, Motion and Squadron - as well as four new flag designs that generate new symbolic constitutions;What I hear I Keep – related to the act of sending and receiving messages that resonate.Dualities – related to the connection between those born within the African continent and those of the African Diaspora.Bringers of Life – related to the eternal reverence of the elements that bring and fortify life.Mothership – in praise, honour and respect of the centre of community; Black Womxn.Supported by The African Arts Trust. 'What I Hear I Keep' was commissioned by De La Warr Pavilion.This artwork is now open, local residents can plan your visit here."

Our machine curator can extract keywords from the project descriptions for each artwork in the Liverpool Biennial. We demonstrate this step here. Subsequently, this will be used to facilitate transitions to images that resonate as similar with the keywords that we extract from the project descriptions in this step.

First, import the modules:

In [None]:
!pip install sentence_transformers

import numpy as np
import pandas as pd
import itertools
from sklearn.feature_extraction.text import CountVectorizer
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

Load the BERT model and read in our dataset (our workflow here follows [this](https://towardsdatascience.com/keyword-extraction-with-bert-724efca412ea#_=_) example):

In [None]:
model = SentenceTransformer('distilbert-base-nli-mean-tokens')

lb2021 = pd.read_csv('machine-curation/datasets/liverpool_biennial_2021/LB2021_metadata.csv')

Define a diversification function:

In [None]:
def max_sum_sim(doc_embedding, word_embeddings, words, top_n, nr_candidates):
    # Calculate distances and extract keywords
    distances = cosine_similarity(doc_embedding, word_embeddings)
    distances_candidates = cosine_similarity(word_embeddings, 
                                             word_embeddings)

    # Get top_n words as candidates based on cosine similarity
    words_idx = list(distances.argsort()[0][-nr_candidates:])
    words_vals = [words[index] for index in words_idx]
    distances_candidates = distances_candidates[np.ix_(words_idx, words_idx)]

    # Calculate the combination of words that are the least similar to each other
    min_sim = np.inf
    candidate = None
    for combination in itertools.combinations(range(len(words_idx)), top_n):
        sim = sum([distances_candidates[i][j] for i in combination for j in combination if i != j])
        if sim < min_sim:
            candidate = combination
            min_sim = sim

    return [words_vals[idx] for idx in candidate]

Define a function to extract keywords:

In [None]:
n_gram_range = (3, 3)
stop_words = "english"

def extract_keywords(project_description):
    count = CountVectorizer(ngram_range=n_gram_range, stop_words=stop_words).fit([project_description])
    candidates = count.get_feature_names()
    doc_embedding = model.encode([project_description])
    candidate_embeddings = model.encode(candidates)
    return max_sum_sim(doc_embedding, candidate_embeddings, candidates, top_n=5, nr_candidates=20)


Now we can extract keywords from the example we looked at above - *Pan African Flag for the Relic Travellers' Alliance* by Larry Achiampong.

In [None]:
# Extract the project description from our LB csv
project_description = lb2021.loc[lb2021.artist == "larry-achiampong", "featured_text"].values[0]

# Extract keywords from project description
keywords = extract_keywords(project_description)

print("The keywords for the 'Pan African Flag for the Relic Travellers' Alliance' project description are:")
for count, keyword in enumerate(keywords):
    print(f"  {count+1}) '{keyword}'")

There we have it - the keywords for the 'Pan African Flag for the Relic Travellers' Alliance' project description.

We can now use the same methodology to extract keywords from all LB2021 project descriptions:

In [None]:
all_keywords = []
for project_description in lb2021.featured_text:
    all_keywords.append(extract_keywords(project_description))

for index, keyword_set in enumerate(all_keywords):
    print(f"The keywords for '{lb2021.iloc[index].clean_title}' by {lb2021.iloc[index].artist} are:")
    for index, keyword in enumerate(keyword_set):
        print(f"  {index+1}) '{keyword}'")

## Generate image captions

Generating image captions using CATR

In [None]:
!git clone https://github.com/saahiluppal/catr.git
%pushd catr/
!pip install -r requirements.txt
%popd

We can generate captions using the following syntax:

`!python predict.py --path <path-to-png>`

For example:

In [None]:
import os
import subprocess

image_path = os.path.join(
    os.sep, 
    'content', 
    'machine-curation', 
    lb2021.loc[lb2021.clean_title == 'Ammonite', 'path_to_original_image'].values[0]
)
output = subprocess.run(['python','catr/predict.py', '--path', image_path], capture_output=True)
print(f"The machine generated caption for Ammonite is: '{output.stdout.decode('ascii').rstrip()}'")

For interest, here is a picture of the original artwork, *Ammonite*, by Alice Channer:

![original_image](https://raw.githubusercontent.com/DurhamARC/machine-curation/master/datasets/liverpool_biennial_2021/example_images/Ammonite.png)


Now we can repeat the same process to get machine generated captions for all images in the Liverpool Biennial:

In [None]:
machine_generated_captions = []
for index, artwork in enumerate(lb2021.clean_title):
    image_path = os.path.join(
        os.sep,
        'content',
        'machine-curation',
        lb2021.iloc[index].path_to_original_image
    )
    output = subprocess.run(['python','catr/predict.py', '--path', image_path], capture_output=True)
    machine_generated_captions.append(output.stdout.decode('ascii').rstrip())
    print(f"The machine generated caption for '{artwork}' by {lb2021.iloc[index].artist} is: '{output.stdout.decode('ascii').rstrip()}'")

Our orignal machine curator uses the VLP model in this stage to generate captions. For more information about VLP see this [link](https://github.com/LuoweiZhou/VLP). This was preceded by feature extraction using [detectron-vlp](https://github.com/LuoweiZhou/detectron-vlp). We were, however, unable to reproduce this workflow in Colab given a number of dependecies that could not be fulfilled from within the Colab structure - for example, an old version of Torch and Cuda. CATR, used above, is a simple alternative with similar results.

## Heatmaps

Heatmaps are important in our machine-curated exhibition. They appear to the 'right' of the original image itself. The overlay is informed by the machine generated caption computed above. That is, the heatmap often highlights objects that appear in the machine's caption - it gives us a view into what a machine 'sees' in an artwork, the way it parses objects and interprets concepts.

First import the modules:

In [None]:
import os
import torch
!pip install git+https://github.com/openai/CLIP.git
import clip
from PIL import Image
import numpy as np
!pip install torchray
from torchray.attribution.grad_cam import grad_cam

And load the CLIP language model:

In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"
def get_model():
    return clip.load("RN50", device=device, jit=False)
model, preprocess = get_model()

Load the images

In [None]:
images=[]
for index, artwork in enumerate(lb2021.clean_title):
    image_path = os.path.join(
        os.sep,
        'content',
        'machine-curation',
        lb2021.iloc[index].path_to_original_image
    )
    image = preprocess(Image.open(image_path).convert("RGB"))
    images.append(image)

image_input = torch.tensor(np.stack(images)).cuda()

These functions are taken from miniClip ([here](https://github.com/HendrikStrobelt/miniClip/blob/main/miniclip/imageWrangle.py)) They allow us to compute heatmaps within Colab without having to install miniClip and all its dependencies (some of which create incompatibilities withour other code)

In [None]:
from matplotlib import cm

def min_max_norm(array):
    lim = [array.min(), array.max()]
    array = array - lim[0] 
    array.mul_(1 / (1.e-10+ (lim[1] - lim[0])))
    return array

def torch_to_rgba(img):
    img = min_max_norm(img)
    rgba_im = img.permute(1, 2, 0).cpu()
    if rgba_im.shape[2] == 3:
        rgba_im = torch.cat((rgba_im, torch.ones(*rgba_im.shape[:2], 1)), dim=2)
    assert rgba_im.shape[2] == 4
    return rgba_im

def numpy_to_image(img, size):
    """
    takes a [0..1] normalized rgba input and returns resized image as [0...255] rgba image
    """
    resized = Image.fromarray((img*255.).astype(np.uint8)).resize((size, size))
    return resized

def heatmap(image:torch.Tensor, heatmap: torch.Tensor, size=None, alpha=.6):
    if not size:
        size = image.shape[1]

    img = torch_to_rgba(image).numpy() # [0...1] rgba numpy "image"
    hm = cm.hot(min_max_norm(heatmap).numpy()) # [0...1] rgba numpy "image"

    img = np.array(numpy_to_image(img,size))
    hm = np.array(numpy_to_image(hm, size))

    return Image.fromarray((alpha * hm + (1-alpha)*img).astype(np.uint8))


Now we can generate a heatmap overlay for all images in the Liverpool Biennial. These new images are saved in a new directory: `attention_maps`. Heatmaps are generated based on the machine generated captions computed above

In [None]:
layer='layer4.2.relu'
alpha=0.7 # can be changed

!rm -rf attention_maps
os.mkdir("attention_maps")


att_img_path=[] 
for i in range (len(lb2021)):
    image = image_input[i].reshape([1,3,224,224])
    txt_input = machine_generated_captions[i]
    tokenized_text = clip.tokenize(txt_input).to(device)
    with torch.no_grad():
        image_features = model.encode_image(image)
        text_features = model.encode_text(tokenized_text)
        image_features_norm = image_features.norm(dim=-1, keepdim=True)
        image_features_new = image_features / image_features_norm
        text_features_norm = text_features.norm(dim=-1, keepdim=True)
        text_features_new = text_features / text_features_norm
    text_prediction = (text_features_new* image_features_norm)
    saliency = grad_cam(model.visual, image.type(model.dtype), text_prediction, saliency_layer=layer)
    
    hm = heatmap(image[0], saliency[0][0,].detach().type(torch.float32).cpu(), alpha=alpha)    
    img_name = os.path.join("attention_maps", "att_map_"+lb2021.iloc[i].clean_title.replace(" ", "_")+".png")
    att_img_path.append(img_name)
    hm.convert("RGB").save(img_name)
    

A few of the attention maps below are added reproduced here (you will see the rest in a folder called attention_maps after running the cell above). We map image captions to heatmaps showing the way objects from the captions are identified and highlighted by the heatmaps.

"A group of five different types of toothbrushes."

(`Tongues` by Anu Põder)

![original_image](https://raw.githubusercontent.com/DurhamARC/machine-curation/master/datasets/liverpool_biennial_2021/example_images/att_map_Tongues.png)

"A car with a lot of surfboards in the back of it." 

(`Superposition` by Erick Beltrán

![original_image](https://raw.githubusercontent.com/DurhamARC/machine-curation/master/datasets/liverpool_biennial_2021/example_images/att_map_Superposition.png)

"A large building with a clock on the front of it" 

(`Pan African Flag for the Relic Travellers' Alliance` by Larry Achiampong)

![original_image](https://raw.githubusercontent.com/DurhamARC/machine-curation/master/datasets/liverpool_biennial_2021/example_images/att_map_Pan_African_Flag_for_the_Relic_Travellers'_Alliance.png)

"A red liquid in a glass filled with liquid" 

(`The Goblets` by Ane Graff)

![original_image](https://raw.githubusercontent.com/DurhamARC/machine-curation/master/datasets/liverpool_biennial_2021/example_images/att_map_The_Goblets.png)

# CLIP similarities

Having completed the preprocessing steps above, we are now in a position to use this data to create connections between artworks in the collection. These connections are based on similarity values and inform way viewers can use the machine curator to move between images. There are three steps to this workflow:


1.   The similarities between captions are ascertained.
2.   The similarities between machine-generated images are computed
3.   The similarities between artworks and computed keywords are computed.



First, we import modules

In [None]:
import json

import torch
import clip
from PIL import Image
import pandas as pd
import numpy as np
from scipy.spatial import distance

Load the CLIP model


In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Working with {device}")
model, preprocess = clip.load("ViT-B/32", device=device)


## Caption similarities

Now we are in a position to be able to compute the similarity values between the machine generated captions that belong to each artwork in the Liverpool Biennial collection. These similarity values are vased on the cosine distance of the CLIP text features.

In the machine curated exhibition this data is used when navigating right. That is, when moving to the right the viewer is shown a picture with the most similar machine generated caption to the artwork's caption that is currently being viewed.

In [None]:
txt_input = machine_generated_captions
text = clip.tokenize(txt_input).to(device)

with torch.no_grad():
    Caption_features = model.encode_text(text)
Caption_features /= Caption_features.norm(dim=-1, keepdim=True)

# calculate the ordering of indices and distance values based on the distance between caption features 
CaptionTxtDistance_ind=np.argsort(distance.cdist(Caption_features.cpu().numpy(),Caption_features.cpu().numpy(), metric='cosine')).squeeze()
CaptionTxtDistance_val=np.sort(distance.cdist(Caption_features.cpu().numpy(),Caption_features.cpu().numpy(), metric='cosine')).squeeze()

# save list of tuples (index, similarity value)
CaptionTxtDistance=[]
for i in range (len(lb2021)):
    tup=[]
    for j in range (1,len(lb2021)):
        tup.append((int(CaptionTxtDistance_ind[i][j]), float(1.0-CaptionTxtDistance_val[i][j]))) #similarity=1-distance
    CaptionTxtDistance.append(tup)

print(CaptionTxtDistance)


Now let us look at the similarities for the following artwork:

In [None]:
print(lb2021.iloc[0].clean_title)
print(machine_generated_captions[0])


similarities to the above are recorded here:

In [None]:
print(CaptionTxtDistance[0])

The most dissimilar caption is:

In [None]:
most_dissimilar_tuple = CaptionTxtDistance[0][-1]
print(most_dissimilar_tuple)
print(machine_generated_captions[most_dissimilar_tuple[0]])

The most similar caption is:

In [None]:
most_similar_tuple = CaptionTxtDistance[0][0]
print(most_similar_tuple)
print(machine_generated_captions[most_similar_tuple[0]])

## Similarities between generated images

Here we get similarity values between artworks based on the cosine distance of the CLIP image features (where image=generated images).

This is used in our machine-curated exhibition when navigating left. That is, to the left of the original image, we show a computer generated image; clicking on that image leads the viewer to the most similar generated image in the collection.

In [None]:
#load (from title) generated images and extract image features

gen_images=[]
for item in lb2021.generated_img_path:
    image = preprocess(Image.open(item).convert("RGB"))
    gen_images.append(image)

image_input = torch.tensor(np.stack(gen_images))
with torch.no_grad():
    gen_images_features = model.encode_image(image_input).float()

gen_images_features /= gen_images_features.norm(dim=-1, keepdim=True)

# calculate the ordering of indices and distance values based on the distance between generated image features 
gen_images_distance_ind = np.argsort(distance.cdist(gen_images_features.cpu().numpy(),gen_images_features.cpu().numpy(), metric='cosine')).squeeze()
gen_images_distance_val = np.sort(distance.cdist(gen_images_features.cpu().numpy(),gen_images_features.cpu().numpy(), metric='cosine')).squeeze()

# save list of tuples (index, similarity value)
gen_images_distance=[]
for i in range (len(lb2021)):
    tup=[]
    for j in range (1,len(lb2021)):
        tup.append((int(gen_images_distance_ind[i][j]), float(1.0-gen_images_distance_val[i][j]))) #similarity=1-distance
    gen_images_distance.append(tup)

print(gen_images_distance)


## Similarities between artworks and computed keywords

Here we compute similarity values between artworks based on the cosine distance of the joint CLIP image and text features. By image we refer to the original artworks and by text we signify the keywords that were extracted from artwork descriptions by a BERT language model.

This step enables the machine curator to determine which images resonate with another image's keywords. In the exhibition this is used when navigating using the down arrow.

In [None]:
# load keywords and extract text features
text = clip.tokenize(machine_generated_captions).to(device)

with torch.no_grad():
    keywords_features = model.encode_text(text)
keywords_features /= keywords_features.norm(dim=-1, keepdim=True)

# load original artwork images and extract image features
orginal_images=[]
for img in lb2021.path_to_original_image:
    img_path = os.path.join('machine-curation', img)
    image = preprocess(Image.open(img_path).convert("RGB"))
    orginal_images.append(image)

image_input = torch.tensor(np.stack(orginal_images))
with torch.no_grad():
    orginal_images_features = model.encode_image(image_input).float()
orginal_images_features /= orginal_images_features.norm(dim=-1, keepdim=True)

# joint features from image and text features
joint_features = (orginal_images_features+keywords_features)/2

# calculate the ordering of indices and distance values based on the distance between joint image and text features
joint_features_distance_ind=np.argsort(distance.cdist(joint_features.cpu().numpy(),joint_features.cpu().numpy(),metric='cosine')).squeeze()
joint_features_distance_val=np.sort(distance.cdist(joint_features.cpu().numpy(),joint_features.cpu().numpy(),metric='cosine').squeeze())

# save list of tuples (index, similarity value)
joint_features_distance=[]
for i in range (len(df)):
    tup=[]
    for j in range (1,len(df)):
        tup.append((int(JointFeaturesDistance_ind[i][j]), float(1.0-JointFeaturesDistance_val[i][j]))) #similarity=1-distance
    joint_features_distance.append(tup)

print(joint_features_distance)