In [29]:
# imports SPARQL prefixes and functions defs

from pprint import pprint
from SPARQLWrapper import SPARQLWrapper, JSON, TURTLE, CSV 

prefixes = '''    
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX aat: <http://vocab.getty.edu/aat/>
PREFIX gvp: <http://vocab.getty.edu/ontology#> 
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX skosxl: <http://www.w3.org/2008/05/skos-xl#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX ulan: <http://vocab.getty.edu/ulan/>
'''    


def sparql_query(query, format):
    formats = {"json": JSON, "turtle": TURTLE, "csv": CSV}
    f_ = formats[format]
    endpoint = "http://vocab.getty.edu/sparql"
    sparql = SPARQLWrapper(endpoint)

    query = prefixes + query     
    sparql.setQuery(query)

    sparql.setReturnFormat(f_)
    results = sparql.query().convert()    
    # sparql.setReturnFormat(XML)
    # results = sparql.query()
    return results
    # # Print the results
    # print("Subject ID: ", subjectID)
    # 


def print_sparql_results(results):
    for row in results["results"]["bindings"]:
        return (row)


ULAN is organized according to the following Facets
* http://vocab.getty.edu/ulan/500000002,"Persons, Artists" (total of 963.716 concepts)
* http://vocab.getty.edu/ulan/500000003,Corporate Bodies
* http://vocab.getty.edu/ulan/500125081,Unknown Person by Culture
* http://vocab.getty.edu/ulan/500299802,Non-Artists
* http://vocab.getty.edu/ulan/500355043,Unidentified Named People and Firms

In the following queries, I will explore how to get concepts from one of those Facets

In [36]:
# query ULAN ConceptScheme, form member of the ulan:500000002 "Persons, Artists" facet

ulan_concepts_q= '''
SELECT *
WHERE {
    ?concept skos:inScheme ulan: ;
             gvp:broader ulan:500000002 ;
#             rdfs:label ?label .
            xl:prefLabel ?label .
       ?label  xl:literalForm ?label_literal

    }
    ORDER BY ?concept
LIMIT 100
'''

ulan_concepts = sparql_query(query=ulan_concepts_q, format='csv')
# pprint(ulan_concepts)




In [37]:

with open('ulan-artists.csv', 'wb') as csv_f:
    csv_f.write(ulan_concepts)

# Labels in ULAN

in ULAN we can find concept labels under both `rdfs:label` and `xl:prefLabel`. However there are some issues with these

**rdfs:label issues**
* often, there is more than 1 rdfs:label for each concept - which one should be chosen?
* often, labels are wrapped by literal quotation marks - can be address by processing the labels in SPARQL
* no consistent pattern: although quotation marks seems to encapsulate the `"Surname, Name" pattern, **not all concepts have labels like this** ie.  ulan:500000009


example:
```
concept,rdfs:label
http://vocab.getty.edu/ulan/500000004,Samuel John Carter
http://vocab.getty.edu/ulan/500000004,"Carter, Samuel (sr.)"
http://vocab.getty.edu/ulan/500000004,"Carter, Samuel John"
http://vocab.getty.edu/ulan/500000005,Giovanni Battista Merano
http://vocab.getty.edu/ulan/500000005,"Merano, Giovanni Battista"
http://vocab.getty.edu/ulan/500000006,"Meyer, Hannes"
http://vocab.getty.edu/ulan/500000006,Hannes Meyer
http://vocab.getty.edu/ulan/500000007,Augustinus Corvus
http://vocab.getty.edu/ulan/500000007,"Cordus, Augustus"
http://vocab.getty.edu/ulan/500000007,"Corvus, Augustinus"
http://vocab.getty.edu/ulan/500000009,Francesco Morelli
http://vocab.getty.edu/ulan/500000009,Francisco Morelli
```

*xl:prefLabel issues*
* although many labels in Dutch spelling, not all have it
* often there is more than 1 spelling, some times just one 


examples:

```
concept,label, label_literal
http://vocab.getty.edu/ulan/500000004,http://vocab.getty.edu/ulan/term/1500000004-nl,"Carter, Samuel John"
http://vocab.getty.edu/ulan/500000005,http://vocab.getty.edu/ulan/term/1500000005-nl,"Merano, Giovanni Battista"
http://vocab.getty.edu/ulan/500000006,http://vocab.getty.edu/ulan/term/1500000006,"Meyer, Hannes"
http://vocab.getty.edu/ulan/500000007,http://vocab.getty.edu/ulan/term/1500000007-nl,"Corvus, Augustinus"
http://vocab.getty.edu/ulan/500000009,http://vocab.getty.edu/ulan/term/1500000012,"Morelli, Francesco"
http://vocab.getty.edu/ulan/500000010,http://vocab.getty.edu/ulan/term/1500000013,"Réattu, Jacques"
http://vocab.getty.edu/ulan/500000010,http://vocab.getty.edu/ulan/term/1500000014-nl,"Reattu, Jacques"
http://vocab.getty.edu/ulan/500000011,http://vocab.getty.edu/ulan/term/1500000015-nl,"Pegram, Frederick"
http://vocab.getty.edu/ulan/500000012,http://vocab.getty.edu/ulan/term/1500280905,Paul Richards
http://vocab.getty.edu/ulan/500000013,http://vocab.getty.edu/ulan/term/1500000017-nl,"Royle, Stanley"
http://vocab.getty.edu/ulan/500000014,http://vocab.getty.edu/ulan/term/1500000018-nl,"Russell, Henrietta"```

Concepts from the following Facets, are returned, when we query for concepts that have `gvp:broader FACET-URI`

  

- `PERSONS, ARTIST` Facet:  ulan:500000002  
- `CORPORATE BODIES` Facet: ulan:500000003
- `NON-ARTISTS` Facet: mostly represents patrons, who often had input in the creative process, and occasionally donors, sitters, and others whose names are required for indexing visual works but who are themselves not artists.  
- `UNKNOWN PEOPLE BY CULTURE` Facet: refers to the generic culture in which a work was created (e.g. *unknown Aztec*, or simply *Aztec*)
 - `UNIDENTIFIED NAMED PEOPLE ` Facet: people or corporate bodies where the identity is knowable, but has not yet been thoroughly researched
