# Prepare illustrations

Initially, we stored the illustrations extracted from the _Repertorio_ somewhere and used the raw URL for the `foaf:depiction` property of our concepts.

To clean things up a bit, we do the following:

1. Use a w3id namespace instead of raw URLs.
2. Use the concept ID as illustration ID instead of the autogenerated filenames.
3. Write metadata into the PNG files.
4. Write RDF metadata into a separate RDF graph.

In [1]:
from pathlib import Path
import re

import rdflib

from PIL.PngImagePlugin import PngImageFile, PngInfo
from PIL import ImageEnhance

In [2]:
OLD_ILLUSTRATIONS_DIR = Path.home() / 'Downloads' / 'Repertorio_clean' / 'Repertorio_clean' / 'illustrations'
NEW_ILLUSTRATIONS_DIR = Path('illustrations')
NEW_ILLUSTRATIONS_DIR.mkdir(exist_ok=True)


CREDITS = 'ISMEO/Franceso Martore'
AUTHOR = 'Franceso Martore'
LICENSE = 'CC BY 4.0'
LICENSE_URL = 'https://creativecommons.org/licenses/by/4.0/'
SOURCE = ('Faccenna, Domenico, and Anna Filigenzi. 2007. '
          'Repertorio terminologico per la schedatura delle sculture dell’arte gandharica. Rome: IsIAO.')
COMMENT = 'Edited by the DiGA project <https://w3id.org/diga/>.'

OLD_URL_PREFIX = 'https://pages.ceres.rub.de/diga/terms/illustrations/'
OLD_URL_PATTERN = f'{OLD_URL_PREFIX}(.*).png'

We start with the current data from VocBench.

In [3]:
g = rdflib.Graph()
g.parse('diga_terms_vocbench.ttl')

<Graph identifier=Nc0aa901507d042639b8a16c613aad2cd (<class 'rdflib.graph.Graph'>)>

Now let’s run a few tests to make sure that our assumptions about the data are met.

In [4]:
res = g.query('''
SELECT ?concept ?id ?img
WHERE {
    ?concept a skos:Concept .
    ?concept dc:identifier ?id .
    ?concept foaf:depiction ?img .
} LIMIT 3
''')

for concept, id_, img in res:
    print(concept, id_, re.match(OLD_URL_PATTERN, img).group(1))

https://w3id.org/diga/terms/1000747715 1000747715 Repertorio-198_Graphic_1621265402512_353
https://w3id.org/diga/terms/1035523561 1035523561 Repertorio-059_Graphic_1621258346688_505
https://w3id.org/diga/terms/1036537964 1036537964 Repertorio-032_Graphic_1619536280979_211


Are there any concepts with more than one image? (In that case, we cannot simply use the ID as image name.)

In [5]:
res = g.query('''
SELECT ?concept (count(?img) as ?imgcount)
WHERE {
    ?concept a skos:Concept .
    ?concept foaf:depiction ?img .
} GROUP BY ?concept
HAVING ( count(?img) > 1 )
''')
for concept, count in res:
    print(concept, count.value)

Okay, none have (as expected), so we can continue.

Are there images without identifier?

In [6]:
res = g.query('''
SELECT ?concept
WHERE {
    ?concept a skos:Concept .
    ?concept foaf:depiction ?img .
    FILTER NOT EXISTS { ?concept dc:identifier ?id . }
}
''')
for concept, in res:
    print(concept)

Also not, so we’re safe to use the identifier as file name.

We create a new RDF graph for image information.

In [7]:
img_g = rdflib.Graph()

In [8]:
from rdflib.namespace import RDF, FOAF, DC, DCTERMS

DIGA = rdflib.Namespace('https://w3id.org/diga/')
ILLUSTRATIONS = rdflib.Namespace('https://w3id.org/diga/illustrations/')
CC = rdflib.Namespace('http://creativecommons.org/ns#')

img_g.bind('foaf', FOAF)
img_g.bind('dc', DC)
img_g.bind('dct', DCTERMS)
img_g.bind('diga', DIGA)
img_g.bind('di', ILLUSTRATIONS)
img_g.bind('cc', CC)

Now we walk through the original data. For illustration purposes, let’s look at the transformation steps with the first result.

First, get some basic data.

In [9]:
res = g.query('''
SELECT ?concept ?id ?label ?img
WHERE {
    ?concept a skos:Concept .
    ?concept dc:identifier ?id .
    ?concept foaf:depiction ?img .
    ?concept skosxl:prefLabel ?xlabel .
    ?xlabel skosxl:literalForm ?label .
    FILTER ( langMatches(lang(?label), 'en') )
}
''')

for concept, id_, label, img in res:
    old_name = re.match(OLD_URL_PATTERN, img).group(1)
    print(concept, id_, label, old_name)
    old_path = OLD_ILLUSTRATIONS_DIR / f'{old_name}.png'
    print(old_path, old_path.is_file())
    new_path = NEW_ILLUSTRATIONS_DIR / f'{id_}.png'
    print(new_path)
    break

https://w3id.org/diga/terms/1000747715 1000747715 spreading upwards and downwards, with lanceolate leaves and blossoms Repertorio-198_Graphic_1621265402512_353
/home/frederik/Downloads/Repertorio_clean/Repertorio_clean/illustrations/Repertorio-198_Graphic_1621265402512_353.png True
illustrations/1000747715.png


Now, let’s add metadata to the PNG, rename it and save it at the new location.

In [10]:
image = PngImageFile(old_path)
metadata = PngInfo()
metadata.add_text('Author', AUTHOR)
metadata.add_text('Title', label)
metadata.add_text('Source', SOURCE)
metadata.add_text('Copyright', f'© {CREDITS}, {LICENSE} <{LICENSE_URL}>')
metadata.add_text('Comment', COMMENT)
image.save(new_path, pnginfo=metadata)

Let’s see how the metadata look like in the final file.

In [11]:
im2 = PngImageFile(new_path)
im2.text

{'Author': 'Franceso Martore',
 'Title': 'spreading upwards and downwards, with lanceolate leaves and blossoms',
 'Source': 'Faccenna, Domenico, and Anna Filigenzi. 2007. Repertorio terminologico per la schedatura delle sculture dell’arte gandharica. Rome: IsIAO.',
 'Copyright': '© ISMEO/Franceso Martore, CC BY 4.0 <https://creativecommons.org/licenses/by/4.0/>',
 'Comment': 'Edited by the DiGA project <https://w3id.org/diga/>.'}

Now let’s add RDF metadata.

In [12]:
iuri = ILLUSTRATIONS.term(id_)
img_g.add((iuri, RDF.type, FOAF.Image))
img_g.add((iuri, DCTERMS.creator, rdflib.Literal(AUTHOR)))
img_g.add((iuri, DCTERMS.title, label))
img_g.add((iuri, DC.source, rdflib.Literal(SOURCE)))
img_g.add((iuri, DCTERMS.license, rdflib.URIRef(LICENSE_URL)))
img_g.add((iuri, CC.license, rdflib.URIRef(LICENSE_URL)))
img_g.add((iuri, CC.attributionName, rdflib.Literal(CREDITS)))
img_g.add((iuri, DCTERMS.publisher, DIGA.term('')))

print(img_g.serialize(format='turtle'))

@prefix cc: <http://creativecommons.org/ns#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix di: <https://w3id.org/diga/illustrations/> .
@prefix diga: <https://w3id.org/diga/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

di:1000747715 a foaf:Image ;
    cc:attributionName "ISMEO/Franceso Martore" ;
    cc:license <https://creativecommons.org/licenses/by/4.0/> ;
    dc:source "Faccenna, Domenico, and Anna Filigenzi. 2007. Repertorio terminologico per la schedatura delle sculture dell’arte gandharica. Rome: IsIAO." ;
    dct:creator "Franceso Martore" ;
    dct:license <https://creativecommons.org/licenses/by/4.0/> ;
    dct:publisher diga: ;
    dct:title "spreading upwards and downwards, with lanceolate leaves and blossoms"@en .




Now let’s do the same for all of the data.

In [13]:
res = g.query('''
SELECT ?concept ?id ?label ?img
WHERE {
    ?concept a skos:Concept .
    ?concept dc:identifier ?id .
    ?concept foaf:depiction ?img .
    ?concept skosxl:prefLabel ?xlabel .
    ?xlabel skosxl:literalForm ?label .
    FILTER ( langMatches(lang(?label), 'en') )
}
''')

for concept, id_, label, img in res:
    old_name = re.match(OLD_URL_PATTERN, img).group(1)
    old_path = OLD_ILLUSTRATIONS_DIR / f'{old_name}.png'
    new_path = NEW_ILLUSTRATIONS_DIR / f'{id_}.png'

    # Write image+metadata
    image = PngImageFile(old_path)
    metadata = PngInfo()
    metadata.add_text('Author', AUTHOR)
    metadata.add_text('Title', label)
    metadata.add_text('Source', SOURCE)
    metadata.add_text('Copyright', f'© {CREDITS}, {LICENSE} <{LICENSE_URL}>')
    metadata.add_text('Comment', COMMENT)
    image.save(new_path, pnginfo=metadata)
    
    # Add RDF info
    iuri = ILLUSTRATIONS.term(id_)
    img_g.add((iuri, RDF.type, FOAF.Image))
    img_g.add((iuri, DCTERMS.creator, rdflib.Literal(AUTHOR)))
    img_g.add((iuri, DCTERMS.title, label))
    img_g.add((iuri, DC.source, rdflib.Literal(SOURCE)))
    img_g.add((iuri, DCTERMS.license, rdflib.URIRef(LICENSE_URL)))
    img_g.add((iuri, CC.license, rdflib.URIRef(LICENSE_URL)))
    img_g.add((iuri, CC.attributionName, rdflib.Literal(CREDITS)))
    img_g.add((iuri, DCTERMS.publisher, DIGA.term('')))

Now we can save the resulting RDF file.

In [14]:
with open('diga_illustrations.rdf', 'wb') as outfile:
    img_g.serialize(destination=outfile, format='xml')

As the final step, we update the `foaf:depiction` property in the original data. This is done here purely as a test, the actual transformation is run directly in VocBench. It is a one-time operation.

In [15]:
g.bind('diga_illustrations', ILLUSTRATIONS)

res = g.update('''
DELETE { ?concept foaf:depiction ?img . }
INSERT { ?concept foaf:depiction ?new_img . }
WHERE {
    ?concept a skos:Concept .
    ?concept dc:identifier ?id .
    ?concept foaf:depiction ?img .
    BIND(URI(CONCAT(STR(diga_illustrations:), ?id)) as ?new_img)
}
''')

Now test the results of the update.

In [16]:
res = g.query('''
SELECT ?concept ?img
WHERE {
    ?concept a skos:Concept .
    ?concept foaf:depiction ?img .
} LIMIT 3
''')
for concept, img in res:
    print(concept, img)

https://w3id.org/diga/terms/1000747715 https://w3id.org/diga/illustrations/1000747715
https://w3id.org/diga/terms/1035523561 https://w3id.org/diga/illustrations/1035523561
https://w3id.org/diga/terms/1036537964 https://w3id.org/diga/illustrations/1036537964
