# Exploring some CVMA CGIF Data



In [2]:
import gzip, httpx
import pyoxigraph as px
from rich import print

In [3]:
store = px.Store()
store.bulk_load(gzip.open("cvma.ttl.gz").read(), "text/turtle")

## An example CGIF metadata record from the CVMA website

In [10]:
r = httpx.get("https://corpusvitrearum.de/id/F3431/about.cgif")
print(r.text)

## the same record from the CVMA CGIF TTL dump

In [4]:
for s,p,o, _ in store.quads_for_pattern(px.NamedNode("https://corpusvitrearum.de/id/F3431"), None, None):
    print(s,p,o)
for s,p,o, _ in store.quads_for_pattern(px.NamedNode("https://iconclass.org/73A22"), None, None):
    print(s,p,o)


## There is 'extra' data in the CGIF LDJSON

Note how the CGIF fragment (from the website) contains extra data on the referenced IRIs. This is *very useful*, the triples for these extra entities are also stored in the full CGIF data dump. In the raw data dump the same triples get repeated, and it then depends on the triplestore in which it is loaded if duplicates are created. In the oxigraph implementation this is OK, adding the same triple is idempotent, but there is not guarantee that this will be the same for all stores.

This extra data which defines from which vocabulary the schema:keywords in the keywords are added, should be exploited to store this triple in the Research Data Graph using a different mapped property.
For example, the schema:keywords for all items which come from https://iconclass.org/ can be stored with something like n4c:hasIconclass, or anything from https://geonames.org/ is stored with n4c:hasGeoname  (where n4c is the prefix for https://nfdi4culture.de/ontology)




In [14]:
raw = gzip.open("cvma.ttl.gz", "rt").read()
print(raw.count("""iconclass:49L8
  a schema:DefinedTerm ;
  schema:inDefinedTermSet "https://iconclass.org/"^^xsd:string .
"""))

## Linking back to more metadata

Looking at the webpage for the CVMA entry above, there is also a link to the "content" metadata. This is not about the ImageObject, but about the thing depicted by the image.
See: https://corpusvitrearum.de/id/F3431/about.json

In [16]:
r = httpx.get("https://corpusvitrearum.de/id/F3431/about.json")
print(r.text)

This metadata about the physical described object is not linked in the CGIF at the moment, but would be valuable for doing a research where "more" info is needed. Or federated searches are done.
To illustrate, in this fragment, the measurements for width and height are in cm for example, whereas in the ImageObject the width and height are in pixels for the image.

What property can we add to the CGIF to refer to the extra metadata?