# DraCor Postdata Middle-Ware Tyouts

Prequisites: a running Stardog instance with a database `PD_KG` (imported the Postdata Knowledge Graph).
Used this repo: https://github.com/dh-network/stardog-docker-compose; the data was supplied by POSTDATA. Imported it manually to the database as described in the repo.

Used this to get started with Stardog: https://github.com/stardog-union/pystardog/blob/develop/notebooks/tutorial.ipynb

In [4]:
#use the official stardog package: https://pystardog.readthedocs.io/en/latest/
import stardog

In [15]:
#libraries, that are in the pystardog notebook; used to render the results as pd dataframes (maybe don't need them later)
import io
import pandas as pd

## Connect to stardog

In [6]:
# set sparql endpoint (stardog), user name and password
database_name = "PD_KG"

usr = "admin"
pwd = "admin"

endpoint = "http://localhost:5820"

In [7]:
#as in the pystardog notebook https://github.com/stardog-union/pystardog/blob/develop/notebooks/tutorial.ipynb
connection_details = {
  'endpoint': endpoint,
  'username': usr,
  'password': pwd
}

In [8]:
conn = stardog.Connection(database_name, **connection_details)

## Simple Test

In [10]:
query = """
SELECT * WHERE {
  ?s ?p ?o
}
LIMIT 10
"""

In [11]:
csv_results = conn.select(query, content_type='text/csv')
df = pd.read_csv(io.BytesIO(csv_results))
df.head()

Unnamed: 0,s,p,o
0,http://postdata.linhd.uned.es/resource/a_unknown,http://postdata.linhd.uned.es/ontology/postdat...,UNKNOWN
1,http://postdata.linhd.uned.es/resource/st_1_te...,http://postdata.linhd.uned.es/ontology/postdat...,http://postdata.linhd.uned.es/kos/
2,http://postdata.linhd.uned.es/resource/st_1_te...,http://postdata.linhd.uned.es/ontology/postdat...,http://postdata.linhd.uned.es/kos/
3,http://postdata.linhd.uned.es/resource/st_1_te...,http://postdata.linhd.uned.es/ontology/postdat...,http://postdata.linhd.uned.es/kos/
4,http://postdata.linhd.uned.es/resource/st_1_lu...,http://postdata.linhd.uned.es/ontology/postdat...,http://postdata.linhd.uned.es/kos/


In [12]:
#to get a dictionary instead
#query_results = conn.select(query)
#type(query_results)

In [14]:
#to clean the connection
#conn.__exit__()

## Helper Functions
Setup functions to query and parse the results (e.g. as pd dataframe).

In [35]:
def sparql(query, parse=False):
    """
    Helper function to send a SPARQL query to the Stardog (should have a connection `conn` established).
    The optional parameter `parse` can be used to get a padas dataframe back. 
    """
    if parse:
        csv_results = conn.select(query, content_type='text/csv')
        df = pd.read_csv(io.BytesIO(csv_results))
        return df
    else:
        results = conn.select(query)
        return results

In [50]:
#inject <$> to queries
def replace_placeholder(query,uri):
    """Replaces the placeholder in a query.
    """
    placeholder = "$"
    return query.replace(placeholder,uri)

## Query the Postdata Knowledge Graph
I played around in Stardog Studio and came up with some queries. These need to be tested here and then eventually wrapped in functions to be used in an API.

### poeticWorks

#### List all works

A "poeticWork" is of the class `http://postdata.linhd.uned.es/ontology/postdata-core#PoeticWork`.

In [29]:
query = """
PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>

SELECT ?work ?title WHERE {
    ?work a pdc:PoeticWork ;
          pdc:title ?title.
}
LIMIT 1000000
"""
# by default stardog returns 1.000 results only; need to set a LIMIT that exceeds the number of works expected 

In [36]:
#use the helper function to send the query
results = sparql(query,parse=True)

In [37]:
type(results)

pandas.core.frame.DataFrame

In [39]:
len(results)

1003

In [40]:
results.head()

Unnamed: 0,work,title
0,http://postdata.linhd.uned.es/resource/pw_juan...,"Sabrás, querido Fabio"
1,http://postdata.linhd.uned.es/resource/pw_juan...,Divino dueño mío
2,http://postdata.linhd.uned.es/resource/pw_jaci...,"En corros aquí y allí,"
3,http://postdata.linhd.uned.es/resource/pw_pedr...,Al Marqués de Priego
4,http://postdata.linhd.uned.es/resource/pw_vice...,Part of:


In [41]:
# a simple uri of a work
results["work"][0]

'http://postdata.linhd.uned.es/resource/pw_juana-ines-de-la-cruz_sabras-querido-fabio'

#### Works, Titles and Authors (Creators in the WorkConception)
A "poeticWork" "isRealisedThrough" `<http://postdata.linhd.uned.es/ontology/postdata-core#isRealisedThrough>` a `<http://postdata.linhd.uned.es/ontology/postdata-core#WorkConception>`. "WorkConception".

This "WorkConception" has a "AgentRole "`<http://postdata.linhd.uned.es/ontology/postdata-core#hasAgentRole>`; the property "hasAgentRole" connects it to a "AgentRole" `<http://postdata.linhd.uned.es/ontology/postdata-core#AgentRole>`.

"AgentRole" can be classified with "<http://postdata.linhd.uned.es/ontology/postdata-core#roleFunction>" which links to the KOS, e.g. `<http://postdata.linhd.uned.es/kos/Creator>`. (This one should be used for the author).

The Autor is connected to the "AgentRole" with the property "hasAgent" `<http://postdata.linhd.uned.es/ontology/postdata-core#hasAgent>`.

"Agent" is a <http://postdata.linhd.uned.es/ontology/postdata-core#Person> .
The "Agent" `pdc:name` has a Name (Literal!).

In [45]:
query = """
PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>

SELECT ?Work ?Title ?Agent ?Name WHERE {
    ?Work a pdc:PoeticWork ;
        pdc:title ?Title .
    
    OPTIONAL { 
        ?Work pdc:wasInitiatedBy ?WorkConception .
        
        ?WorkConception pdc:hasAgentRole ?AgentRole .
        
        ?AgentRole pdc:roleFunction <http://postdata.linhd.uned.es/kos/Creator> ; 
                   pdc:hasAgent ?Agent .
        
        ?Agent pdc:name ?Name .
    }
}
LIMIT 1000000
"""

In [46]:
results = sparql(query,parse=True)
results.head()

Unnamed: 0,Work,Title,Agent,Name
0,http://postdata.linhd.uned.es/resource/pw_juan...,"Sabrás, querido Fabio",http://postdata.linhd.uned.es/resource/p_juana...,Juana Inés de la Cruz
1,http://postdata.linhd.uned.es/resource/pw_juan...,Divino dueño mío,http://postdata.linhd.uned.es/resource/p_juana...,Juana Inés de la Cruz
2,http://postdata.linhd.uned.es/resource/pw_jaci...,"En corros aquí y allí,",http://postdata.linhd.uned.es/resource/p_jacin...,Jacinto Polo de Medina
3,http://postdata.linhd.uned.es/resource/pw_pedr...,Al Marqués de Priego,http://postdata.linhd.uned.es/resource/p_pedro...,Pedro Fernández Marañón
4,http://postdata.linhd.uned.es/resource/pw_vice...,Part of:,http://postdata.linhd.uned.es/resource/p_vicen...,Vicente Wenceslao Querolt


In [47]:
len(results)

1003

Obviously, there are no poems with more than one author. But this could happen, so maybe the query needs to be adapted.

#### Author(s) of a given PoeticWork

Adapted query of the one above to get the authors by a given PoeticWork.

In [54]:
query = """
PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>

SELECT ?Agent ?Name WHERE {
    <$> a pdc:PoeticWork ;
        pdc:wasInitiatedBy ?WorkConception .
        
        ?WorkConception pdc:hasAgentRole ?AgentRole .
        
        ?AgentRole pdc:roleFunction <http://postdata.linhd.uned.es/kos/Creator> ; 
                   pdc:hasAgent ?Agent .
        
        OPTIONAL {
            ?Agent pdc:name ?Name .
        }
}
"""

In [56]:
uri = "http://postdata.linhd.uned.es/resource/pw_juana-ines-de-la-cruz_sabras-querido-fabio"
print(replace_placeholder(query,uri))
results = sparql(replace_placeholder(query,uri))


PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>

SELECT ?Agent ?Name WHERE {
    <http://postdata.linhd.uned.es/resource/pw_juana-ines-de-la-cruz_sabras-querido-fabio> a pdc:PoeticWork ;
        pdc:wasInitiatedBy ?WorkConception .
        
        ?WorkConception pdc:hasAgentRole ?AgentRole .
        
        ?AgentRole pdc:roleFunction <http://postdata.linhd.uned.es/kos/Creator> ; 
                   pdc:hasAgent ?Agent .
        
        OPTIONAL {
            ?Agent pdc:name ?Name .
        }
}



In [57]:
results

{'head': {'vars': ['Agent', 'Name']},
 'results': {'bindings': [{'Agent': {'type': 'uri',
     'value': 'http://postdata.linhd.uned.es/resource/p_juana-ines-de-la-cruz'},
    'Name': {'type': 'literal', 'value': 'Juana Inés de la Cruz'}}]}}

An endpoint to get creators of a poem `/corpora/{corpus}/poems/{poem}/authors` could return something along the lines of DraCor API `https://dracor.org/api/corpora/{corpus}/play/{play}`:

```
"authors": [
    {
      "name": "Lessing, Gotthold Ephraim",
      "fullname": "Gotthold Ephraim Lessing",
      "shortname": "Lessing",
      "refs": [
        {
          "ref": "Q34628",
          "type": "wikidata"
        },
        {
          "ref": "118572121",
          "type": "pnd"
        }
      ]
    }
  ]
```

In [93]:
def get_authors_of_poem(poem_uri:str) -> list:
    """Get the autors 
    Returns the agents in an AgentRole with the roleFunction of "creator".
    
    Args:
        poem_uri (str): URI of a poeticWork
    Returns:
        list: List of dictionaries containg information on author name and id = uri.
    """
    
    query = """
        PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>

        SELECT ?Agent ?Name WHERE {
            <$> a pdc:PoeticWork ;
            pdc:wasInitiatedBy ?WorkConception .
        
            ?WorkConception pdc:hasAgentRole ?AgentRole .
        
            ?AgentRole pdc:roleFunction <http://postdata.linhd.uned.es/kos/Creator> ; 
                   pdc:hasAgent ?Agent .
        
            OPTIONAL {
                ?Agent pdc:name ?Name .
            }
    }
    """
    sparql_results = sparql(replace_placeholder(query,poem_uri))
    
    authors = []
    for binding in sparql_results["results"]["bindings"]:
        author = {}
        author["name"] = binding["Name"]["value"]
        author["id"] = binding["Agent"]["value"]
        authors.append(author)
        
    
    return authors

In [94]:
get_authors_of_poem(uri)

[{'name': 'Juana Inés de la Cruz',
  'id': 'http://postdata.linhd.uned.es/resource/p_juana-ines-de-la-cruz'}]

#### Information on a single corpus (the whole graph)
DraCor's endpoint `/corpora` returns information on all corpora in the database. This endpoint is used to display the stats by setting the parameter `include=metrics`. See https://dracor.org/doc/api#/public/list-corpora. The data on a single corpus:

```
{
    "licence": "CC BY-NC 3.0",
    "licenceUrl": "https://creativecommons.org/licenses/by-nc/3.0/deed.en_US",
    "description": "Derived from the [Folger Shakespeare Library](https://shakespeare.folger.edu/). Enhancements documented in our [README at GitHub](https://github.com/dracor-org/shakedracor).",
    "uri": "https://dracor.org/api/corpora/shake",
    "title": "Shakespeare Drama Corpus",
    "name": "shake",
    "acronym": "ShakeDraCor",
    "metrics": {
      "plays": 37,
      "characters": 1433,
      "male": 797,
      "female": 116,
      "text": 37,
      "sp": 31066,
      "stage": 10450,
      "wordcount": {
        "text": 908286,
        "sp": 876744,
        "stage": 41230
      },
      "updated": "2022-07-02T23:36:24.109+02:00"
    },
    "repository": "https://github.com/dracor-org/shakedracor"
  }
```
We can not really list "corpora" in the POSTDATA Knowledge Graph because there is only one included, but stil we have to implement an endpoint that would provide information for the frontpage.

We can at least provide some metrics, e.g.

```
"metrics": {
      "authors" : 1,
      "poems": 1,
      "stanzas": 1,
      "verses": 1,
      "words" : 1
      }
```

In [105]:
def count_poeticWorks(corpus=None) -> int:
    """Count poeticWorks in a corpus.
    
    Returns:
        int: Number of poetic works/poems.
        
    TODO: handle multiple corpora
    """
    
    query = """
    PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>
    
    SELECT (COUNT(?poeticWork) AS ?count) WHERE {
    ?poeticWork a pdc:PoeticWork .
    } 
    LIMIT 1000000
    """
    
    sparql_results = sparql(query)
    work_count = int(sparql_results["results"]["bindings"][0]["count"]["value"])
    
    return work_count

In [106]:
count_poeticWorks()

1003

In [113]:
def count_Stanzas(corpus=None) -> int:
    """Count stanzas in a corpus.
    
    Returns:
        int: Number of stanzas.
        
    TODO: handle multiple corpora
    """
    
    query = """
    PREFIX pdp: <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#>
    
    SELECT (COUNT(?Stanza) AS ?count) WHERE {
    ?Stanza a pdp:Stanza .
    } 
    LIMIT 1000000
    """
    
    sparql_results = sparql(query)
    stanza_count = int(sparql_results["results"]["bindings"][0]["count"]["value"])
    return stanza_count

In [114]:
count_Stanzas()

5109

In [None]:
# general function to entites of type in graph.

In [None]:
#count verses and words.

## DraCor specific boilderplate

DraCor API returns for a single play `https://dracor.org/api/corpora/ger/play/lessing-emilia-galotti`:
```
{
  "id": "ger000088",
  "name": "lessing-emilia-galotti",
  "corpus": "ger",
  "title": "Emilia Galotti",
  "author": {
    "name": "Lessing, Gotthold Ephraim",
    "warning": "The single author property is deprecated. Use the array of 'authors' instead!"
  },
  "authors": [
    {
      "name": "Lessing, Gotthold Ephraim",
      "fullname": "Gotthold Ephraim Lessing",
      "shortname": "Lessing",
      "refs": [
        {
          "ref": "Q34628",
          "type": "wikidata"
        },
        {
          "ref": "118572121",
          "type": "pnd"
        }
      ]
    }
  ],
  "genre": "Tragedy",
  "libretto": false,
  "allInSegment": 30,
  "allInIndex": 0.6976744186046512,
  "cast": [...],
  "segments": [...],
  "yearWritten": null,
  "yearPremiered": "1772-03-13",
  "yearPrinted": "1772",
  "yearNormalized": 1772,
  "wikidataId": "Q782653",
  "subtitle": "Ein Trauerspiel in fünf Aufzügen",
  "relations": [...],
  "source": {
    "name": "TextGrid Repository",
    "url": "http://www.textgridrep.org/textgrid:rksp.0"
  },
  "originalSource": "Gotthold Ephraim Lessing: Werke. Herausgegeben von Herbert G. Göpfert in Zusammenarbeit mit Karl Eibl, Helmut Göbel, Karl S. Guthke, Gerd Hillen, Albert von Schirmding und Jörg Schönert, Band 1–8, München: Hanser, 1970 ff."
}
```