<h2>Download IIIF image as tiles and stich the tiles</h2>

This notebook shows how to download full images from an IIIF server. Given an image identifier, this code opens its info.json, reads image height, width and tile height, width, download the individual tiles and stich them into a full image. That is because an IIIF server does allow downloading images greater than 1000 x 1000 pixels. 

IIIF image server defines an <b>Image API</b> (https://iiif.io/api/image/3.0/#53-sizes) and a <b>Presentation API</b> (https://iiif.io/api/presentation/3.0/). 

The <b>Image API</b> provides information for returning an image in response to a standard HTTP request. The URL can specify the region, size, rotation, quality characteristics and format of the requested image. It defines the info.json file structure that includes technical properties of the image such as the image size, tile size, image format, and color quality.

The <b>Presentation API</b> provide information for online viewing the images. It defines the manifest.json file structure that includes descriptive, rights and linking information for the object and contains information for the client to  begin to display something quickly to the user.

<h3>Follow Dead Sea Scrolls (DSS) fragment images to its source</h3>

IAA photographs the DSS fragment images and host them via an IIIF server (https://iaa.iiifhosting.com). The public image archive can be viewed here (https://www.deadseascrolls.org.il/explore-the-archive). Fragment manifests are here (https://dss.digitalbibleonline.org/manifests/all/). Each manuscript consist of several fragments. Each fragment is imaged from recto and verso, using different wavelengths (https://youtu.be/MY-8AQZOe_w), infrared and color formats. Given a manuscript name https://www.qumran-digital.org/qd-images/ can present its fragment images in all their available format.</b>

A fragment manifest contains an identifier for each of its images. If we are asked to download the fragment images of a manuscript with the given name (4Q171) first, we its fragments' manifests from (https://dss.digitalbibleonline.org/manifests/all/) then, we gather the identifier of the interested image format (i.e. recto and infrared) from the fragment manifest and finally, we download the fragment image from (https://iaa.iiifhosting.com) using the identifier of the interested fragment image. </b>

In this project we are given a list of manuscript names (https://tauex-my.sharepoint.com/:x:/g/personal/berat_tauex_tau_ac_il/EUWgmjVoV0BKjE2mL3Wp5sgBdkUXjnkmsct34MmqRZ_EjA?e=j5bZWf). These manuscripts are dated by an internal evidence and can be used a ground truth for manuscript dating problem. We fetched the interested fragment image identifiers (https://github.com/beratkurar/hebrew_letter_detection/blob/main/fragment_identifiers.csv) from the manifests (https://dss.digitalbibleonline.org/manifests/all/). 

In [None]:
import cv2
import os
from pandas import *
import requests
import numpy as np

In [None]:
def read_image_data(file_name):
    # Read image identifiers from a csv file
    df = read_csv(file_name)
    image_identifier_list = df.loc[df['label'] == 'Near Infra-Red (NIR) -Recto', 'image_identifier'].tolist()
    #image_identifier_list = df.loc[df['side'] == 'recto', 'image_identifier'].tolist()
    input_files = [x.strip() for x in image_identifier_list if x != ""]  
    return input_files

In [None]:
# Print an example identifier
identifiers = read_image_data ('fragment_identifiers.csv')
print(identifiers[1])

7eac0d1613687328eb4a8be51fff265eb9108deae862fa3f7ae2858db3c47046


The Image API can be called by requesting an image using the following URL template:
{scheme}://{server}{/prefix}/{identifier}/{region}/{size}/{rotation}/{quality}.{format}

Scheme = http <br/>
Server = iaa.iiifhosting.com/iiif/ <br/>
Prefix = 7eac0d1613687328eb4a8be51fff265eb9108deae862fa3f7ae2858db3c47046 <br/>
Region = Rectangular portion of the underlying image content to be returned. Region can be specified by pixel coordinates, percentage or by the value full, which specifies that the full image should be returned. <br/>
Size = The size parameter specifies the dimensions to which the extracted region is to be scaled. <br/>
Rotation = specifies mirroring and rotation. <br/>
Quality = determines whether the image is delivered in color, grayscale or black and white. <br/>
Format = format of the returned image is expressed as a suffix at the end of the URI i.e. default.jpg, default.png. default.tif. <br/>

In [None]:
#Open and print an example info.json
prefix ='https://iaa.iiifhosting.com/iiif/7eac0d1613687328eb4a8be51fff265eb9108deae862fa3f7ae2858db3c47046/'
info_url = prefix + 'info.json'
info = requests.get(info_url).json()
print(info)

{'@context': 'http://iiif.io/api/image/2/context.json', '@id': 'https://iaa.iiifhosting.com/iiif/7eac0d1613687328eb4a8be51fff265eb9108deae862fa3f7ae2858db3c47046', 'protocol': 'http://iiif.io/api/image', 'width': 7216, 'height': 5412, 'sizes': [{'width': 225, 'height': 169}, {'width': 451, 'height': 338}, {'width': 902, 'height': 676}], 'tiles': [{'width': 256, 'height': 256, 'scaleFactors': [1, 2, 4, 8, 16, 32]}], 'profile': ['http://iiif.io/api/image/2/level1.json', {'formats': ['jpg'], 'qualities': ['native', 'color', 'gray'], 'supports': ['regionByPct', 'regionSquare', 'sizeByForcedWh', 'sizeByWh', 'sizeAboveFull', 'rotationBy90s', 'mirroring']}]}


The IIIF server does not allow to download images greater than 1000 x 1000 pixels. To see this let's gather the full region size image with full sizes, and see that its shape is only 1000 x 1000 x 3 but not height x width x 3.

In [None]:
# get image sizes from info.json and construct the image_url
height = str(info['height'])
width = str(info['width'])
image_url = prefix + 'full/' + width + ',' + height + '/0/default.jpg'
print(image_url)

https://iaa.iiifhosting.com/iiif/7eac0d1613687328eb4a8be51fff265eb9108deae862fa3f7ae2858db3c47046/full/7216,5412/0/default.jpg


In [None]:
# Convert http response image to a numpy array image
response = requests.get(image_url, stream=True).raw
image_array = np.asarray(bytearray(response.read()), dtype="uint8")
image = cv2.imdecode(image_array, cv2.IMREAD_COLOR)

In [None]:
print(image.shape)

(1000, 1000, 3)


Therefore we download the full image as tiles and stich the tiles to reform the full image. 

In [None]:

# Get the full image size and tile size from info.json.
stiched_image = np.empty((info['height'], info['width'], 3), dtype=np.uint8)
tile_width = info['tiles'][0]['width']
tile_height = info['tiles'][0]['height']

# Download image tile by tile, stich the tiles into a full size image.
y = 0
while y < info['height']:
    x = 0
    adj_height = min(tile_height, info['height'] - y)
    while x < info['width']:
        adj_width = min(tile_width, int(info['width'] - x))
        tile_url = prefix + str(x) + ',' + str(y) + ',' + str(adj_width) + ',' + str(adj_height) + '/full/0/default.jpg'
        response = requests.get(tile_url, stream=True).raw
        tile_array = np.asarray(bytearray(response.read()), dtype="uint8")
        tile_image = cv2.imdecode(tile_array, cv2.IMREAD_COLOR)
        stiched_image[y:y+adj_height, x:x+adj_width] = tile_image
        x = x + tile_width
    y = y + tile_height


In [None]:
print(stiched_image.shape)

(5412, 7216, 3)


In [None]:
def download_iiif_image(identifier):
    # Download tiles and stich them into a full size image
    
    #Construct prefix using the identifier
    prefix ='https://iaa.iiifhosting.com/iiif/' + identifier + '/'    
    #Read info.json
    info_url = prefix + 'info.json'
    info = requests.get(info_url).json() 
    #Read image sizes and construct an empty image
    height = str(info['height'])
    width = str(info['width'])
    stiched_image = np.empty((info['height'], info['width'], 3), dtype=np.uint8)
    #Read tile sizes
    tile_width = info['tiles'][0]['width']
    tile_height = info['tiles'][0]['height']
    #Download tile by tile and stich them into a full size image
    y = 0
    while y < info['height']:
        x = 0
        adj_height = min(tile_height, info['height'] - y)
        while x < info['width']:
            adj_width = min(tile_width, info['width'] - x)
            tile_url = prefix + str(x) + ',' + str(y)+ ',' + str(adj_width)+ ',' + str(adj_height) + '/full/0/default.jpg'
            response = requests.get(tile_url, stream=True).raw
            tile_array = np.asarray(bytearray(response.read()), dtype="uint8")
            tile_image = cv2.imdecode(tile_array, cv2.IMREAD_COLOR)
            stiched_image[y:y+adj_height, x:x+adj_width] = tile_image
            x = x + tile_width
        y = y + tile_height
    
    return stiched_image

In [None]:
import asyncio

def background(f):
    def wrapped(*args, **kwargs):
        return asyncio.get_event_loop().run_in_executor(None, f, *args, **kwargs)
    return wrapped

image_identifiers_file = 'fragment_identifiers.csv'
image_download_folder = 'images/'
identifiers = read_image_data(image_identifiers_file)

#Run loop iterations in parallel
@background
def download_and_save_images(identifier):   
    full_size_image = download_iiif_image(identifier)
    cv2.imwrite(image_download_folder + identifier + '.jpg', full_size_image)

#Run loop iterations in parallel
for identifier in identifiers:
    download_and_save_images(identifier)

Future exception was never retrieved
future: <Future finished exception=JSONDecodeError('Expecting value: line 1 column 1 (char 0)')>
Traceback (most recent call last):
  File "/home/nachum/berat/anaconda3/envs/tfenv/lib/python3.9/site-packages/requests/models.py", line 971, in json
    return complexjson.loads(self.text, **kwargs)
  File "/home/nachum/berat/anaconda3/envs/tfenv/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/home/nachum/berat/anaconda3/envs/tfenv/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/nachum/berat/anaconda3/envs/tfenv/lib/python3.9/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nachum/ber