# Training a model for the Tribune

This notebook uses the [Tensorflow for poets](https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/) tutorial to train a new model for classifying images in the Tribune collection.

First we'll clone the code repository.

In [None]:
!git clone https://github.com/googlecodelabs/tensorflow-for-poets-2

Now let's move into the new directory

In [None]:
cd tensorflow-for-poets-2

## Our categories

For our initial experiment we're going to try and distinguish between two categories — protests and portraits.

In [None]:
img_sets = {
    'protests': ['FL4520808', 'FL4520807', 'FL4520809', 'FL4520810', 'FL4520811', 'FL4520812', 'FL4520813', 'FL4520814', 'FL4520816', 'FL4520817', 'FL4520818', 'FL4520820', 'FL4520821', 'FL4520822', 'FL4520823', 'FL4520825', 'FL4520826', 'FL4520827', 'FL4520828', 'FL4520829', 'FL4520830', 'FL4520832', 'FL4520833', 'FL4520834', 'FL4520835', 'FL4520836', 'FL4562467', 'FL4562470', 'FL4562473', 'FL4562477', 'FL4562493', 'FL4562496', 'FL4562498', 'FL4562502', 'FL4562504', 'FL4562506', 'FL4562507', 'FL4562514', 'FL4562526', 'FL4562531', 'FL4562534', 'FL4562538', 'FL4562543', 'FL4562548', 'FL4431373', 'FL4431375', 'FL4431376', 'FL4431377', 'FL4431405', 'FL4431403'],
    'portraits': ['FL4549209', 'FL4564140', 'FL4549684', 'FL4545567', 'FL4488477', 'FL4545569', 'FL4534794', 'FL4510388', 'FL4513567', 'FL4513591', 'FL4513594', 'FL4468261', 'FL4531198', 'FL4531240', 'FL4517378', 'FL4517384', 'FL4529746', 'FL4512049', 'FL4512055', 'FL4485185', 'FL4487605', 'FL4487592', 'FL4485540', 'FL4484944', 'FL4484950', 'FL4481774', 'FL4481787', 'FL4478835', 'FL4486661', 'FL4486662', 'FL4474330', 'FL4474354', 'FL4480349', 'FL4480384', 'FL4486300', 'FL4473256', 'FL4474185', 'FL4474152', 'FL4479422', 'FL4479449', 'FL4474018', 'FL4472433', 'FL4479794', 'FL4466608', 'FL4466614', 'FL4450989', 'FL4489424', 'FL4480459', 'FL4588049', 'FL4492349', 'FL4502482', 'FL4491527', 'FL4444441', 'FL4490697', 'FL4433631', 'FL4434468', 'FL4430650', 'FL4430652', 'FL4468274', 'FL4529677', 'FL4532361', 'FL4495950', 'FL8797006', 'FL4522775', 'FL4517556', 'FL4517563', 'FL4518600', 'FL4515829', 'FL4515847', 'FL4519602', 'FL4424262', 'FL4424263', 'FL4424264', 'FL4424278', 'FL4424279', 'FL4588015', 'FL4588016', 'FL4588017', 'FL4537870', 'FL4537872', 'FL4537873', 'FL4537874', 'FL4537878', 'FL4537880', 'FL4537881', 'FL4537882', 'FL4537883', 'FL4537888', 'FL4537889', 'FL4537891', 'FL4537895', 'FL4537896', 'FL4537897', 'FL4537899', 'FL4537902', 'FL4537906', 'FL4537907', 'FL4537909', 'FL4537911', 'FL4540963', 'FL4540964', 'FL4540966', 'FL4540970', 'FL4540972', 'FL4540973', 'FL4540975', 'FL4539968', 'FL4539969', 'FL4539970', 'FL4539971', 'FL4539972', 'FL4539974', 'FL4539988', 'FL4539989', 'FL4490339', 'FL4538816', 'FL4538817', 'FL4538818', 'FL4538825', 'FL4538826', 'FL4538827', 'FL4538828', 'FL4538829', 'FL4538838', 'FL4538839', 'FL4538840', 'FL4538841']
}

Download the training images.

In [None]:
import os
from urllib.parse import urlparse
from tqdm import tqdm_notebook
import requests
# Download training images
for img_set in ['protests', 'portraits']:
    img_dir = os.path.join('tf_files', 'tribune', img_set)
    os.makedirs(img_dir, exist_ok=True)
    for img in tqdm_notebook(img_sets[img_set]):
        img_url = 'https://s3-ap-southeast-2.amazonaws.com/wraggetribune/images/500/{0}-500.jpg'.format(img)
        parsed = urlparse(img_url)
        filename = os.path.join(img_dir, os.path.basename(parsed.path))
        response = requests.get(img_url, stream=True)
        with open(filename, 'wb') as fd:
            for chunk in response.iter_content(chunk_size=128):
                fd.write(chunk)

In [None]:
ls tf_files/tribune

Run this in a terminal, Jupyter doesn't allow background processes...

I'm assuming this won't be possible on Binder?

```
tensorboard --logdir tf_files/training_summaries &
```

## Train the model

In [None]:
%%bash
IMAGE_SIZE=224
ARCHITECTURE="mobilenet_0.50_${IMAGE_SIZE}"

python -m scripts.retrain \
  --bottleneck_dir=tf_files/bottlenecks \
  --how_many_training_steps=500 \
  --model_dir=tf_files/models/ \
  --summaries_dir=tf_files/training_summaries/"${ARCHITECTURE}" \
  --output_graph=tf_files/tribune_graph.pb \
  --output_labels=tf_files/tribune_labels.txt \
  --architecture="${ARCHITECTURE}" \
  --image_dir=tf_files/tribune

## Test the trained model

First let's test against the training set.

In [None]:
# Make a list of all the test images
import os
import random
from IPython.display import display, HTML
imgs = []
data_dir = 'tf_files/tribune/'
for img_dir in [d for d in os.listdir(data_dir) if os.path.isdir(os.path.join(data_dir, d))]:
    for img in [i for i in os.listdir(os.path.join(data_dir, img_dir)) if i[-4:] == '.jpg']:
        imgs.append(os.path.join(data_dir, img_dir, img))    

In [None]:
# Choose one image at random
img = random.sample(imgs, 1)[0]
display(HTML('<img src="tensorflow-for-poets-2/{0}"><br>{0}'.format(img)))

In [None]:
!python -m scripts.label_image --graph=tf_files/tribune_graph.pb --labels=tf_files/tribune_labels.txt --image=$img

## Test against a randomly selected image from the complete collection

Let's see how our model goes against images it's never seen before...

In [None]:
# Load Tribune images data
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/GLAM-Workbench/ozglam-data-records-of-resistance/master/data/images.csv')

In [None]:
# Set up a directory for test images
test_dir = os.path.join('tf_files', 'tribune_tests')
os.makedirs(test_dir, exist_ok=True)
images = df['images']

In [None]:
# get a random image
img = images.sample(1).iloc[0]
img_url = 'https://s3-ap-southeast-2.amazonaws.com/wraggetribune/images/500/{0}-500.jpg'.format(img)
filename = os.path.join(test_dir, '{}-500.jpg'.format(img))
response = requests.get(img_url, stream=True)
with open(filename, 'wb') as fd:
    for chunk in response.iter_content(chunk_size=128):
        fd.write(chunk)
display(HTML('<img src="tensorflow-for-poets-2/{0}"><br>{0}'.format(filename)))

In [None]:
!python -m scripts.label_image --graph=tf_files/tribune_graph.pb --labels=tf_files/tribune_labels.txt --image=$filename