# [The Getty Arts & Architecture Thesaurus]((https://www.getty.edu/research/tools/vocabularies/aat/)) (AAT) 

Note: this NoteBook is written with the assumption that you run it in [Visual Studio Code](https://code.visualstudio.com/), with the following extensions: Python, SPARQL Executor, and REST Client. Some functionality may otherwise not be available. 

## What is the AAT? 

The [**Getty Arts & Architecture Thesaurus**](https://www.getty.edu/research/tools/vocabularies/aat/) is a multilingual, semantically structured **controlled vocabulary**. It encompases terms, descriptions, and other information for generic concepts related to visual art, architecture, archaeology, and other cultural heritage. Importantly, the AAT contains generic terms, not iconographic subjects or proper names. In other words: ["each concept is a case of many (a generic thing), not a case of one (a specific thing)"](https://www.getty.edu/research/tools/vocabularies/aat/about.html). The full AAT database contains around 74,460 concept records (subject_id) and 503,230 terms (term_id). Each concept record contains one or more terms (e.g., singular/plural forms, spelling variations, translations). A record minimally contains a unique numeric id, a term, and an indication for the position in the structured hierarchy. Often it also contains a description of the term, a list of associated or equivalent terms and temporal information. The AAT is translated to Dutch by the Netherlands Institute for Art History. Note that the AAT is a compiled resource, so it's not a complete, definitive collection of concepts. It expands through community [contributions](https://www.getty.edu/research/tools/vocabularies/contributors.html) by domain experts. 


### Terminology
Here you find a list of the most important terminology related to the AAT. 
- `Facet`: The major subdivision or upper layer of AAT's hierarchical structure, each containing classes of concepts. In total, there are 7 facets.  
- `Hierarchy`: Groupings of terminology that are arranged within the facets.  
- `Record`: A single entry that contains information about a specific concept.  
- `Term`: A word or phrase that represents the concept in the record. Includes singular/plural forms, spelling variations, and translations. A record can contain multiple terms.   
- `Descriptor` or preferred term: The term that will by default used to refer to a concept. 

Take a look at this example record, and see the terminology in practice. 

<div>
<img src="img/example_record.png" width="500"/>
</div>


[Image source](https://www.getty.edu/research/tools/vocabularies/aat/AAT-Users-Manual.pdf), p32. 

## What does the AAT contain?

The AAT is structured as a hierarchical database of concepts, containing seven **facets** that each host a number of **hierarchies**. Below you can see an outline of the facets and hierarchies, including example terms. See [here](https://www.getty.edu/vow/AATHierarchy?find=beauty&logic=AND&note=&page=1&subjectid=300000000) for a collapsable overview on the Getty website. It allows you to browse through the structure and inspect the leaf nodes. 

The root of the structure is called *Top of the AAT hierarchies*. Each facet contains a clickable link to their hierarchy on the Getty website.

Most facets are relatively straightforward in their meaning. The final facet, *Brand Names*, allows for necessary additions by the conservation community, particularly where a material, process, or object does not have a generic name and the names are under trademark protection. The largest facet is the *Objects* facet. 

- [`Root`](https://www.getty.edu/vow/AATHierarchy?find=beauty&logic=AND&note=&subjectid=300000000): **Top of the AAT hierarchies**
	- [`ASSOCIATED CONCEPT`](https://www.getty.edu/vow/AATHierarchy?find=beauty&logic=AND&note=&subjectid=300179462) FACET \
		(e.g. *beauty*, *socialism*, *cultural pluralism*)
		- Associated Concepts 
	- [`PHYSICAL ATTRIBUTES`](https://www.getty.edu/vow/AATHierarchy?find=beauty&logic=AND&note=&page=1&subjectid=300264087) FACET \
		(e.g. *borders, round, waterlogged*)
		- Attributes and Properties
		- Conditions and Effects
		- Design Elements
		- Color
	- [`STYLES AND PERIODS`](https://www.getty.edu/vow/AATHierarchy?find=beauty&logic=AND&note=&subjectid=300015646) FACET \
		(e.g. *Abstract Expressionist, Yoruba*)
		- Styles and Periods
	- [`AGENTS`](https://www.getty.edu/vow/AATHierarchy?find=beauty&logic=AND&note=&page=1&subjectid=300264089) FACET \
		(e.g. *printmakers, landscape architects*)
		- People
		- Organizations
		- Living Organisms
	- [`ACTIVITIES`](https://www.getty.edu/vow/AATHierarchy?find=beauty&logic=AND&note=&subjectid=300264090) FACET \
		(e.g. *archaeology, engineering, analyzing*)
		- Disciplines
		- Functions
		- Events
		- Physical and Mental Activities
		- Processes and Techniques
	- [`MATERIALS`](https://www.getty.edu/vow/AATHierarchy?find=beauty&logic=AND&note=&subjectid=300264091) FACET \
		(e.g. *iron, clay, artificial ivory*)
		- Materials
	- [`OBJECTS`](https://www.getty.edu/vow/AATHierarchy?find=beauty&logic=AND&note=&subjectid=300264092) FACET \
		(e.g. *paintings, facades, cathedrals, chairs*)
		- Object Groupings and Systems
		- Object Genres 
		- Components
		- Built Environment
			- Settlements and Landscapes
			- Built Complexes and Districts
			- Single Built Works
			- Open Spaces and Site Elements
		- Furnishings and Equipment
			- Furnishings
			- Costume
			- Tools and Equipment
			- Weapons and Ammunition
			- Measuring Devices
			- Containers
			- Sound Decides
			-  Recreational Artifacts
			- Transportation Vehicles
		- Visual and Verbal Communication
			- Visual Works
			- Exchange Media
			- Information Forms
	- [`BRAND NAMES`](https://www.getty.edu/vow/AATHierarchy?find=beauty&logic=AND&note=&subjectid=300343372) FACET \
		(e.g. *Agfacolor, Arches paper*)
		- Brand Names


### Take a look under the hood: a Turtle file
The AAT was originally released in XML and Relational Table formats. More recently, it has been made available as[ Linked Open Data (LOD)](https://www.getty.edu/research/tools/vocabularies/lod/index.html) in JSON, RDF, N3/Turtle, and N-Triples formats. It's recommended to not work with the XML or Relational Table formats because they may become obsolete in the future. The LOD options are the more robust choice. 

Let's take a look at what a single record can look like in [Turtle](https://en.wikipedia.org/wiki/Turtle_(syntax)) (Terse RDF Triple Language, or .ttl) format. Turtle is a syntax for expressing data in the Resource Description Framework (RDF), which is used to represent information about resources in the web and data interchange. Turtle is designed to be more human-readable compared to other RDF syntaxes, like XML. Turtle supports the use of URIs to uniquely identify concepts, improving data retrieval and linking related information. 


Open the [aat_300123559.ttl file](sparql/aat_300123559.ttl) in a new window and try to identify the following information: 
- The record identifier 
- The label name 
- The date of the most recent modification
- The parent label
- The Dutch translations



You may have found that the record identifier is *300123559*, the label name *Attributes and Properties (hierarchy name)*, the date of the most recent modification *2015-07-03*, and the parent label *Physical Attributes Facet*. The Dutch translations of the terms and the descriptions are accessible through the `@nl` tags 

## How can the AAT be used? 
The AAT can be downloaded in various formats for local use, but more integrated options are available, such as the SPARQL Endpoint and the API. See the sections below with information on how to use them. 


### [SPARQL Endpoint](https://vocab.getty.edu/)
With the SPARQL Endpoint you can take advantage of the linked data principles that are integrated in the AAT, and it allows you to perform complex queries. 

Below you find a Python implementation of a SPARQL query. It contains a simple query that retrieves all term labels that correspond to a given record. You can change the output by specifying different `subjectID`s

In [None]:
%pip install SPARQLWrapper

In [1]:
from SPARQLWrapper import SPARQLWrapper, JSON

def query_getty_aat(subjectID):
    # Define the SPARQL endpoint
    endpoint = "http://vocab.getty.edu/sparql"
    
    # Define the SPARQL query with the input subjectID
    knows_query = f"""
    PREFIX aat: <http://vocab.getty.edu/aat/>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    SELECT * {{
        aat:{subjectID} rdfs:label ?label .
        FILTER (LANG(?label) = 'nl'  || LANG(?label) = 'en'   )  
    }}"""
    
    # Initialize the SPARQL wrapper
    sparql = SPARQLWrapper(endpoint)
    sparql.setQuery(knows_query)
    sparql.setReturnFormat(JSON)
    
    # Execute the query and get the results
    results = sparql.query().convert()
    
    # Print the results
    print("Subject ID: ", subjectID)
    for result in results["results"]["bindings"]:
        #print("subjectID:", subjectID)
        print(" "*3, "label: ", result["label"]["value"])
    
    print()

######## Change the subjectIDs here ########
subjectIDs = ["300008458", "300386018"]


for subjectID in subjectIDs:
    query_getty_aat(subjectID)



Subject ID:  300008458
    label:  cathedral cities
    label:  cathedral city
    label:  cities, cathedral
    label:  domsteden
    label:  domstad

Subject ID:  300386018
    label:  Costa Rican
    label:  Costa Ricaans



### [API](http://vocabsservices.getty.edu/Docs/Getty_Vocabularies_Web_Services_Documentation_v2.pdf)


The AAT also offers an **API** (Application Programming Interface), a tool that allows developers to access and manipulate data programmatically. This makes it easier to integrate into applications and websites.

The API provides a straightforward way to retrieve specific information, such as related terms or hierarchical structures. It supports [various operations](http://vocabsservices.getty.edu/AATService.asmx), including **GetChildren** and **GetParents**, which allow you to explore the relationships between concepts. The API supports different protocols like **SOAP** and **HTTP**. 

To get started with the API, you can try out the **GetChildren** operation using the **HTTP POST** method. For example, you can access it [here](http://vocabsservices.getty.edu/AATService.asmx?op=AATGetChildren) with a specific subject ID, such as `300015646`.

Alternatively, can run **HTTP GET** requests for the operators that are included in the current repositoey (see the files [GetParentLabel](http/getparentlabel.http) and [getSubjectTerms](http/getsubjectterms.http)). 


 Alternatively, you can run **HTTP GET** requests to retrieve data in a more straightforward manner. This combination of ease of use and powerful querying capabilities makes the API an excellent choice for leveraging the Getty's extensive thesaurus data. You can run them by clicking the `Send request` button 





- [AAT Programming guidelines](http://vocabsservices.getty.edu/AATService.asmx)
  - a set of 11 operations are supported, such as GetChildren, GetParents, GetSubjectTerms
  - examples included in SOAP 1.1 and 1.2, HTTP GET, and HTTP POST protocols
- Try out the GetChildren operation with HTTP POST protocol [here](http://vocabsservices.getty.edu/AATService.asmx?op=AATGetChildren) (subjectID: 300015646)

- Or run HTTP GET requests for some operators (see [GetParentLabel](http/getparentlabel.http) and [getSubjectTerms](http/getsubjectterms.http))


## AAT in other projects 


- [Termennetwerk](https://termennetwerk.netwerkdigitaalerfgoed.nl/) presents a Dutch search engine for terms, and links them to their URIs in a variety of thesauri. 

- [Europeana](https://www.europeana.eu/en) is an online information portal that provides access to millions of cultural heritage
resources by aggregating metadata from museums, libraries, and archives across Europe. A part of their records are encoded with AAT URIs, which they use to retrieve translations of the records [(source)](https://doi.org/10.7152/nasko.v5i1.15179 ).  

- The paper [A Methodology for Semantic Enrichment of Cultural Heritage Images Using Artificial Intelligence Technologies](https://doi.org/10.3390/jimaging7080121) proposes a method that enables analysis and enrichment of a collection of cultural images. Their test case is concerned with food in a cultural context, and thus includes a body of food-related images. They combine the use of ontologies (including AAT) to consistently represent concepts, with Computer Vision tools to enrich the image descriptions.

- ["Linking HBIM graphical and semantic information through the Getty AAT"](https://doi.org/10.1088/1757-899X/364/1/012100)


## Other Getty Vocabularies

### [ULAN](https://www.getty.edu/research/tools/vocabularies/ulan/about.html)
The AAT is one example of the vocabularies that are developed by the Getty Institute. There's [more](https://www.getty.edu/research/tools/vocabularies/index.html), for example the Union List of Artist Names (ULAN). It contains records for artists and agents in the cultural landscape, specifically the visual arts. This includes [names, relationships, and biographical information for makers and other people and corporate bodies](https://www.getty.edu/research/tools/vocabularies/ulan/about.html#scope).

Similar to the AAT, the ULAN consists or *records*, in this case referring to a unique person, institute or corporation. A record can contain mutliple *terms* that may capture [given names, pseudonyms, variant spellings, names in multiple languages, and names that have changed over time](https://www.getty.edu/research/tools/vocabularies/ulan/about.html#scope). A minimal record contains the following fields: record type, name, name source, display biography, nationality, role, birth date and death date. 

See an overview of the ULAN Facets below. Information is taken from the [ULAN documentation](https://www.getty.edu/research/tools/vocabularies/ulan/about.html#scope)


- `PERSONS, ARTIST` Facet: represents information about individuals involved in the creation or production of works of art or architecture (e.g., *Rembrandt van Rijn*)
- `CORPORATE BODIES` Facet: represents information about corporate bodies, defined as two or more people working together to create or produce art or architecture (e.g., *Adler and Sullivan*)
- `NON-ARTISTS` Facet: mostly represents patrons, who often had input in the creative process, and occasionally donors, sitters, and others whose names are required for indexing visual works but who are themselves not artists.  
- `UNKNOWN PEOPLE BY CULTURE` Facet: refers to the generic culture in which a work was created (e.g. *unknown Aztec*, or simply *Aztec*)
- `UNIDENTIFIED NAMED PEOPLE ` Facet: people or corporate bodies where the identity is knowable, but has not yet been thoroughly researched

Records may be linked through associative relationships, including professional relationships (e.g. *assistant of*, *influenced*) and familial relationships (e.g. *child of*, *sibling of*). See [here](https://www.getty.edu/vow/ULANFullDisplay?find=van+gogh&role=&nation=&page=1&subjectid=500115588) for an example of an ULAN record. 

The ULAN is also available as Linked Open Data, and can be accessed through the Sparql Endpoint. Most of the ULAN records are in English, but notes and descriptions are sometimes translated to other languages, including Dutch. 


### [Iconography Authority]()

## AAT and Data Station SSH

The following properties are useful: 
- `Subject_ID` allows for a link back to the source 
- `Term_Language` (possibly only keep Dutch and English entries)
- `Term_ID` 
- `Term_Text` 




There's different `Record_Types` for nodes:
- Facet
- Hierarchy
- Concept
- GuideTerm (placeholder to create a level in the hierarchy)
- (ScopeNote)
- (ObsoleteSubject)


![example of the use of different terms](img/008-complex-hierarchy.png)

[image source](https://vocab.getty.edu/doc/#Term)


### Challenges
- **Size**: the AAT contains more than 50.000 records that all refer to unique concepts. If we would use the Dutch terms for each concept, it would mean we have 50.000 keywords to choose from. Many of these terms may be topically irrelevant, too specific or too broad for our purposes. 
- **Lack of generalizability across facets**: each facet has its own internal logic. A path that may lead to useful terms in one facet might not translate to another facet. 
- 



### Recommendations 
- Curation per facet: establish a path to a layer that yields keyword at the desired level of granularity for each facet. Some facets may be excluded entirely (e.g. `Brand Names`)
- It may be more efficient to combine some of the facets with information from other thesauri/structured vocabularies 

### Let's look at the facets 

TODO: \
Add examples \
Add conclusions 


[`Associated concepts`](https://www.getty.edu/vow/AATHierarchy?find=beauty&logic=AND&note=&subjectid=300179462)
- e.g. philosofical concepts, scientific concepts etc
+ leaf nodes useful
± hierarchy names sometimes useful


[`Physical attributes`](https://www.getty.edu/vow/AATHierarchy?find=beauty&logic=AND&note=&page=1&subjectid=300264087)
- leaf nodes not that useful (too specific)
± hierarchy names somewhat useful


[`Styles and Periods`](https://www.getty.edu/vow/AATHierarchy?find=beauty&logic=AND&note=&subjectid=300015646)
+ nodes under the hierarchy nodes useful
+ hierarchy nodes and guide terms not useful



[`Agents`](https://www.getty.edu/vow/AATHierarchy?find=beauty&logic=AND&note=&page=1&subjectid=300264089)
- categories of people/organizations/animals/plants
- too detailed?
- node leafs or node leafs -1 might be useful 
- 


[`Activities`](https://www.getty.edu/vow/AATHierarchy?find=beauty&logic=AND&note=&subjectid=300264090)
- does not seem really useful to me 
- a lot of verbs that you probably wouldn't use to describe a dataset 

[`Materials`](https://www.getty.edu/vow/AATHierarchy?find=beauty&logic=AND&note=&subjectid=300264091)
- hierarchy/guide terms nodes are not useful 
- not sure if the leaf nodes are interesting 

[`Objects`](https://www.getty.edu/vow/AATHierarchy?find=beauty&logic=AND&note=&subjectid=300264092)
- largest category 
- hierarchy nodes useful
- leaf nodes useful (when of hierarchy nodes)
- deeper leaf nodes may be too specific (e.g. types of comic operas)


[`Brand Names`](https://www.getty.edu/vow/AATHierarchy?find=beauty&logic=AND&note=&subjectid=300343372)
- not useful for our purpose, was introduced by the conservation community for materials/processes/objects don't have a generic name but rather a name under trademark protection. 



--> I'd say `Associated concepts`, `Styles and Periods`, and `Objects` are the facets with the concepts that could be most useful as keywords for finding/describing datasets.  