<h2>Download IIIF image as tiles and stich the tiles</h2>

This notebook shows how to download full images from an IIIF server. Given an image identifier, this code opens its info.json, reads image height, width and tile height, width, download the individual tiles and stich them into a full image. That is because an IIIF server does allow downloading images greater than 1000 x 1000 pixels. 

IIIF image server defines an <b>Image API</b> (https://iiif.io/api/image/3.0/#53-sizes) and a <b>Presentation API</b> (https://iiif.io/api/presentation/3.0/). 

The <b>Image API</b> provides information for returning an image in response to a standard HTTP request. The URL can specify the region, size, rotation, quality characteristics and format of the requested image. It defines the info.json file structure that includes technical properties of the image such as the image size, tile size, image format, and color quality.

The <b>Presentation API</b> provide information for online viewing the images. It defines the manifest.json file structure that includes descriptive, rights and linking information for the object and contains information for the client to  begin to display something quickly to the user.

In [1]:
import cv2
import os
from pandas import *
import requests
import numpy as np

In [2]:
def read_image_data(file_name):
    # Reads image identifiers from a csv file
    df = read_csv(file_name)
    image_identifier_list = df.loc[df['side'] == 'recto', 'image_identifier'].tolist()
    input_files = [x.strip() for x in image_identifier_list if x != ""]  
    return input_files

In [3]:
# Print an example identifier
identifiers = read_image_data ('image_data.csv')
print(identifiers[0])

8b8c3e3eb8e50dbad6fa81260c4f5f826c57ee675950dc07b1525110b8959470


The Image API can be called by requesting an image using the following URL template:
{scheme}://{server}{/prefix}/{identifier}/{region}/{size}/{rotation}/{quality}.{format}

Scheme = http <br/>
Server = iaa.iiifhosting.com/iiif/ <br/>
Prefix = 8b8c3e3eb8e50dbad6fa81260c4f5f826c57ee675950dc07b1525110b8959470 <br/>
Region = Rectangular portion of the underlying image content to be returned. Region can be specified by pixel coordinates, percentage or by the value full, which specifies that the full image should be returned. <br/>
Size = The size parameter specifies the dimensions to which the extracted region is to be scaled. <br/>
Rotation = specifies mirroring and rotation. <br/>
Quality = determines whether the image is delivered in color, grayscale or black and white. <br/>
Format = format of the returned image is expressed as a suffix at the end of the URI i.e. default.jpg, default.png. default.tif. <br/>

In [4]:
#Open and print an example info.json
prefix ='https://iaa.iiifhosting.com/iiif/dede6fd6633bc35ec6b65690ef499e8f2b1772d5b687d340c3a2d01cd7ed2e3c/'
info_url = prefix + 'info.json'
info = requests.get(info_url).json()
print(info)

{'@context': 'http://iiif.io/api/image/2/context.json', '@id': 'https://iaa.iiifhosting.com/iiif/dede6fd6633bc35ec6b65690ef499e8f2b1772d5b687d340c3a2d01cd7ed2e3c', 'protocol': 'http://iiif.io/api/image', 'width': 7216, 'height': 5412, 'sizes': [{'width': 225, 'height': 169}, {'width': 451, 'height': 338}, {'width': 902, 'height': 676}], 'tiles': [{'width': 256, 'height': 256, 'scaleFactors': [1, 2, 4, 8, 16, 32]}], 'profile': ['http://iiif.io/api/image/2/level1.json', {'formats': ['jpg'], 'qualities': ['native', 'color', 'gray'], 'supports': ['regionByPct', 'regionSquare', 'sizeByForcedWh', 'sizeByWh', 'sizeAboveFull', 'rotationBy90s', 'mirroring']}]}


The IIIF server does not allow to download images greater than 1000 x 1000 pixels. To see this let's gather the full region size image with full sizes, and see that its shape is only 1000 x 1000 x 3 but not height x width x 3.

In [5]:
# get image sizes from info.json and construct the image_url
height = str(info['height'])
width = str(info['width'])
image_url = prefix + 'full/' + width + ',' + height + '/0/default.jpg'
print(image_url)

https://iaa.iiifhosting.com/iiif/dede6fd6633bc35ec6b65690ef499e8f2b1772d5b687d340c3a2d01cd7ed2e3c/full/7216,5412/0/default.jpg


In [6]:
# Convert http response image to a numpy array image
response = requests.get(image_url, stream=True).raw
image_array = np.asarray(bytearray(response.read()), dtype="uint8")
image = cv2.imdecode(image_array, cv2.IMREAD_COLOR)

In [7]:
print(image.shape)

(1000, 1000, 3)


Therefore we download the full image as tiles and stich the tiles to reform the full image. 

In [8]:
# Get the full image size and tile size from info.json.
# Download image tile by tile, stich the tiles into a full size image.
image_url = prefix + 'full/' + width + ',' + height + '/0/default.jpg'

stiched_image = np.empty((info['height'], info['width'], 3), dtype=np.uint8)
tile_width = info['tiles'][0]['width']
tile_height = info['tiles'][0]['height']

y = 0
while y < info['height']:
    x = 0
    adj_height = min(tile_height, info['height'] - y)
    while x < info['width']:
        adj_width = min(tile_width, info['width'] - x)
        tile_url = prefix + str(x) + ',' + str(y)+ ',' + str(adj_width)+ ',' + str(adj_height) + '/full/0/default.jpg'
        response = requests.get(tile_url, stream=True).raw
        tile_array = np.asarray(bytearray(response.read()), dtype="uint8")
        tile_image = cv2.imdecode(tile_array, cv2.IMREAD_COLOR)
        stiched_image[y:y+adj_height, x:x+adj_width] = tile_image
        x = x + tile_width
    y = y + tile_height


In [9]:
print(stiched_image.shape)

(5412, 7216, 3)


In [None]:
def download_iiif_image(identifier):
    # Downloads tiles and stich them into a full size image
    prefix ='https://iaa.iiifhosting.com/iiif/' + identifier + '/'
    info_url = prefix + 'info.json'
    info = requests.get(info_url).json()    
    height = str(info['height'])
    width = str(info['width'])
    image_url = prefix + 'full/' + width + ',' + height + '/0/default.jpg'
    stiched_image = np.empty((info['height'], info['width'], 3), dtype=np.uint8)
    tile_width = info['tiles'][0]['width']
    tile_height = info['tiles'][0]['height']

    y = 0
    while y < info['height']:
        x = 0
        adj_height = min(tile_height, info['height'] - y)
        while x < info['width']:
            adj_width = min(tile_width, info['width'] - x)
            tile_url = prefix + str(x) + ',' + str(y)+ ',' + str(adj_width)+ ',' + str(adj_height) + '/full/0/default.jpg'
            response = requests.get(tile_url, stream=True).raw
            tile_array = np.asarray(bytearray(response.read()), dtype="uint8")
            tile_image = cv2.imdecode(tile_array, cv2.IMREAD_COLOR)
            stiched_image[y:y+adj_height, x:x+adj_width] = tile_image
            x = x + tile_width
        y = y + tile_height
    
    return stiched_image

In [None]:
out_path = 'images/'
in_path = 'image_data.csv'

input_files = read_image_data(in_path)

for input_file in input_files:
    download_image('https://iaa.iiifhosting.com/iiif/' + input_file', out_path + input_file + '.jpg')


In [None]:
cv2.imwrite(

In [None]:

def download_image(image_url: str, jpg_outfile: Path):
    ua = UserAgent()
    proxy_address = "socks5://jbfbllio-rotate:j5793hhv0ak3@p.webshare.io:80/"
    connector = ProxyConnector.from_url(proxy_address)
    img = iiif_image_from_url(
        image_url.replace("https", "http"),
        headers={"USER-AGENT": ua.random},
        connector=connector,
    )
    cv2.imwrite(str(jpg_outfile), img, [int(cv2.IMWRITE_JPEG_QUALITY), 95])
   

In [None]:
def read_image_data(file_name):
    df = read_csv(file_name)
    image_identifier_list = df.loc[df['side'] == 'recto', 'image_identifier'].tolist()
    input_files = set(x.strip() for x in image_identifier_list if x != "")   
    return input_files

In [None]:
import nest_asyncio
nest_asyncio.apply()

out_path = 'images'
image_data_path = 'image_data.csv'
input_files = read_image_data(image_data_path)
print(f"""Requested to download {len(input_files)} images to {out_path}.""")
futures = [
    download_image(
        f"""https://iaa.iiifhosting.com/iiif/{input_file}""",
         f"""images/{input_file}.jpg"""
    )
    for input_file in input_files
]
print(f"""Finished downloading all {len(input_files)} images to {out_path} folder.""")

