# DraCor Postdata Middle-Ware Tyouts

Prequisites: a running Stardog instance with a database `PD_KG` (imported the Postdata Knowledge Graph).
Used this repo: https://github.com/dh-network/stardog-docker-compose; the data was supplied by POSTDATA. Imported it manually to the database as described in the repo.

Used this to get started with Stardog: https://github.com/stardog-union/pystardog/blob/develop/notebooks/tutorial.ipynb

In [4]:
#use the official stardog package: https://pystardog.readthedocs.io/en/latest/
import stardog

In [15]:
#libraries, that are in the pystardog notebook; used to render the results as pd dataframes (maybe don't need them later)
import io
import pandas as pd

## Connect to stardog

In [6]:
# set sparql endpoint (stardog), user name and password
database_name = "PD_KG"

usr = "admin"
pwd = "admin"

endpoint = "http://localhost:5820"

In [7]:
#as in the pystardog notebook https://github.com/stardog-union/pystardog/blob/develop/notebooks/tutorial.ipynb
connection_details = {
  'endpoint': endpoint,
  'username': usr,
  'password': pwd
}

In [8]:
conn = stardog.Connection(database_name, **connection_details)

## Simple Test

In [10]:
query = """
SELECT * WHERE {
  ?s ?p ?o
}
LIMIT 10
"""

In [11]:
csv_results = conn.select(query, content_type='text/csv')
df = pd.read_csv(io.BytesIO(csv_results))
df.head()

Unnamed: 0,s,p,o
0,http://postdata.linhd.uned.es/resource/a_unknown,http://postdata.linhd.uned.es/ontology/postdat...,UNKNOWN
1,http://postdata.linhd.uned.es/resource/st_1_te...,http://postdata.linhd.uned.es/ontology/postdat...,http://postdata.linhd.uned.es/kos/
2,http://postdata.linhd.uned.es/resource/st_1_te...,http://postdata.linhd.uned.es/ontology/postdat...,http://postdata.linhd.uned.es/kos/
3,http://postdata.linhd.uned.es/resource/st_1_te...,http://postdata.linhd.uned.es/ontology/postdat...,http://postdata.linhd.uned.es/kos/
4,http://postdata.linhd.uned.es/resource/st_1_lu...,http://postdata.linhd.uned.es/ontology/postdat...,http://postdata.linhd.uned.es/kos/


In [12]:
#to get a dictionary instead
#query_results = conn.select(query)
#type(query_results)

In [14]:
#to clean the connection
#conn.__exit__()

## Helper Functions
Setup functions to query and parse the results (e.g. as pd dataframe).

In [35]:
def sparql(query, parse=False):
    """
    Helper function to send a SPARQL query to the Stardog (should have a connection `conn` established).
    The optional parameter `parse` can be used to get a padas dataframe back. 
    """
    if parse:
        csv_results = conn.select(query, content_type='text/csv')
        df = pd.read_csv(io.BytesIO(csv_results))
        return df
    else:
        results = conn.select(query)
        return results

In [50]:
#inject <$> to queries
def replace_placeholder(query,uri):
    """Replaces the placeholder in a query.
    """
    placeholder = "$"
    return query.replace(placeholder,uri)

## Query the Postdata Knowledge Graph
I played around in Stardog Studio and came up with some queries. These need to be tested here and then eventually wrapped in functions to be used in an API.

### poeticWorks

#### List all works

A "poeticWork" is of the class `http://postdata.linhd.uned.es/ontology/postdata-core#PoeticWork`.

In [29]:
query = """
PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>

SELECT ?work ?title WHERE {
    ?work a pdc:PoeticWork ;
          pdc:title ?title.
}
LIMIT 1000000
"""
# by default stardog returns 1.000 results only; need to set a LIMIT that exceeds the number of works expected 

In [36]:
#use the helper function to send the query
results = sparql(query,parse=True)

In [37]:
type(results)

pandas.core.frame.DataFrame

In [39]:
len(results)

1003

In [40]:
results.head()

Unnamed: 0,work,title
0,http://postdata.linhd.uned.es/resource/pw_juan...,"Sabrás, querido Fabio"
1,http://postdata.linhd.uned.es/resource/pw_juan...,Divino dueño mío
2,http://postdata.linhd.uned.es/resource/pw_jaci...,"En corros aquí y allí,"
3,http://postdata.linhd.uned.es/resource/pw_pedr...,Al Marqués de Priego
4,http://postdata.linhd.uned.es/resource/pw_vice...,Part of:


In [41]:
# a simple uri of a work
results["work"][0]

'http://postdata.linhd.uned.es/resource/pw_juana-ines-de-la-cruz_sabras-querido-fabio'

#### Works, Titles and Authors (Creators in the WorkConception)
A "poeticWork" "isRealisedThrough" `<http://postdata.linhd.uned.es/ontology/postdata-core#isRealisedThrough>` a `<http://postdata.linhd.uned.es/ontology/postdata-core#WorkConception>`. "WorkConception".

This "WorkConception" has a "AgentRole "`<http://postdata.linhd.uned.es/ontology/postdata-core#hasAgentRole>`; the property "hasAgentRole" connects it to a "AgentRole" `<http://postdata.linhd.uned.es/ontology/postdata-core#AgentRole>`.

"AgentRole" can be classified with "<http://postdata.linhd.uned.es/ontology/postdata-core#roleFunction>" which links to the KOS, e.g. `<http://postdata.linhd.uned.es/kos/Creator>`. (This one should be used for the author).

The Autor is connected to the "AgentRole" with the property "hasAgent" `<http://postdata.linhd.uned.es/ontology/postdata-core#hasAgent>`.

"Agent" is a <http://postdata.linhd.uned.es/ontology/postdata-core#Person> .
The "Agent" `pdc:name` has a Name (Literal!).

In [45]:
query = """
PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>

SELECT ?Work ?Title ?Agent ?Name WHERE {
    ?Work a pdc:PoeticWork ;
        pdc:title ?Title .
    
    OPTIONAL { 
        ?Work pdc:wasInitiatedBy ?WorkConception .
        
        ?WorkConception pdc:hasAgentRole ?AgentRole .
        
        ?AgentRole pdc:roleFunction <http://postdata.linhd.uned.es/kos/Creator> ; 
                   pdc:hasAgent ?Agent .
        
        ?Agent pdc:name ?Name .
    }
}
LIMIT 1000000
"""

In [46]:
results = sparql(query,parse=True)
results.head()

Unnamed: 0,Work,Title,Agent,Name
0,http://postdata.linhd.uned.es/resource/pw_juan...,"Sabrás, querido Fabio",http://postdata.linhd.uned.es/resource/p_juana...,Juana Inés de la Cruz
1,http://postdata.linhd.uned.es/resource/pw_juan...,Divino dueño mío,http://postdata.linhd.uned.es/resource/p_juana...,Juana Inés de la Cruz
2,http://postdata.linhd.uned.es/resource/pw_jaci...,"En corros aquí y allí,",http://postdata.linhd.uned.es/resource/p_jacin...,Jacinto Polo de Medina
3,http://postdata.linhd.uned.es/resource/pw_pedr...,Al Marqués de Priego,http://postdata.linhd.uned.es/resource/p_pedro...,Pedro Fernández Marañón
4,http://postdata.linhd.uned.es/resource/pw_vice...,Part of:,http://postdata.linhd.uned.es/resource/p_vicen...,Vicente Wenceslao Querolt


In [47]:
len(results)

1003

Obviously, there are no poems with more than one author. But this could happen, so maybe the query needs to be adapted.

#### Author(s) of a given PoeticWork

Adapted query of the one above to get the authors by a given PoeticWork.

In [54]:
query = """
PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>

SELECT ?Agent ?Name WHERE {
    <$> a pdc:PoeticWork ;
        pdc:wasInitiatedBy ?WorkConception .
        
        ?WorkConception pdc:hasAgentRole ?AgentRole .
        
        ?AgentRole pdc:roleFunction <http://postdata.linhd.uned.es/kos/Creator> ; 
                   pdc:hasAgent ?Agent .
        
        OPTIONAL {
            ?Agent pdc:name ?Name .
        }
}
"""

In [56]:
uri = "http://postdata.linhd.uned.es/resource/pw_juana-ines-de-la-cruz_sabras-querido-fabio"
print(replace_placeholder(query,uri))
results = sparql(replace_placeholder(query,uri))


PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>

SELECT ?Agent ?Name WHERE {
    <http://postdata.linhd.uned.es/resource/pw_juana-ines-de-la-cruz_sabras-querido-fabio> a pdc:PoeticWork ;
        pdc:wasInitiatedBy ?WorkConception .
        
        ?WorkConception pdc:hasAgentRole ?AgentRole .
        
        ?AgentRole pdc:roleFunction <http://postdata.linhd.uned.es/kos/Creator> ; 
                   pdc:hasAgent ?Agent .
        
        OPTIONAL {
            ?Agent pdc:name ?Name .
        }
}



In [57]:
results

{'head': {'vars': ['Agent', 'Name']},
 'results': {'bindings': [{'Agent': {'type': 'uri',
     'value': 'http://postdata.linhd.uned.es/resource/p_juana-ines-de-la-cruz'},
    'Name': {'type': 'literal', 'value': 'Juana Inés de la Cruz'}}]}}

An endpoint to get creators of a poem `/corpora/{corpus}/poems/{poem}/authors` could return something along the lines of DraCor API `https://dracor.org/api/corpora/{corpus}/play/{play}`:

```
"authors": [
    {
      "name": "Lessing, Gotthold Ephraim",
      "fullname": "Gotthold Ephraim Lessing",
      "shortname": "Lessing",
      "refs": [
        {
          "ref": "Q34628",
          "type": "wikidata"
        },
        {
          "ref": "118572121",
          "type": "pnd"
        }
      ]
    }
  ]
```

In [93]:
def get_authors_of_poem(poem_uri:str) -> list:
    """Get the autors 
    Returns the agents in an AgentRole with the roleFunction of "creator".
    
    Args:
        poem_uri (str): URI of a poeticWork
    Returns:
        list: List of dictionaries containg information on author name and id = uri.
    """
    
    query = """
        PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>

        SELECT ?Agent ?Name WHERE {
            <$> a pdc:PoeticWork ;
            pdc:wasInitiatedBy ?WorkConception .
        
            ?WorkConception pdc:hasAgentRole ?AgentRole .
        
            ?AgentRole pdc:roleFunction <http://postdata.linhd.uned.es/kos/Creator> ; 
                   pdc:hasAgent ?Agent .
        
            OPTIONAL {
                ?Agent pdc:name ?Name .
            }
    }
    """
    sparql_results = sparql(replace_placeholder(query,poem_uri))
    
    authors = []
    for binding in sparql_results["results"]["bindings"]:
        author = {}
        author["name"] = binding["Name"]["value"]
        author["id"] = binding["Agent"]["value"]
        authors.append(author)
        
    
    return authors

In [94]:
get_authors_of_poem(uri)

[{'name': 'Juana Inés de la Cruz',
  'id': 'http://postdata.linhd.uned.es/resource/p_juana-ines-de-la-cruz'}]

### Information on a single corpus (the whole graph)
DraCor's endpoint `/corpora` returns information on all corpora in the database. This endpoint is used to display the stats by setting the parameter `include=metrics`. See https://dracor.org/doc/api#/public/list-corpora. The data on a single corpus:

```
{
    "licence": "CC BY-NC 3.0",
    "licenceUrl": "https://creativecommons.org/licenses/by-nc/3.0/deed.en_US",
    "description": "Derived from the [Folger Shakespeare Library](https://shakespeare.folger.edu/). Enhancements documented in our [README at GitHub](https://github.com/dracor-org/shakedracor).",
    "uri": "https://dracor.org/api/corpora/shake",
    "title": "Shakespeare Drama Corpus",
    "name": "shake",
    "acronym": "ShakeDraCor",
    "metrics": {
      "plays": 37,
      "characters": 1433,
      "male": 797,
      "female": 116,
      "text": 37,
      "sp": 31066,
      "stage": 10450,
      "wordcount": {
        "text": 908286,
        "sp": 876744,
        "stage": 41230
      },
      "updated": "2022-07-02T23:36:24.109+02:00"
    },
    "repository": "https://github.com/dracor-org/shakedracor"
  }
```
We can not really list "corpora" in the POSTDATA Knowledge Graph because there is only one included, but stil we have to implement an endpoint that would provide information for the frontpage.

We can at least provide some metrics, e.g.

```
"metrics": {
      "authors" : 1,
      "poems": 1,
      "stanzas": 1,
      "verses": 1,
      "words" : 1
      }
```

In [121]:
def count_works(corpus=None) -> int:
    """Count poeticWorks in a corpus.
    
    Returns:
        int: Number of poetic works/poems.
        
    TODO: handle multiple corpora
    """
    
    query = """
    PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>
    
    SELECT (COUNT(?poeticWork) AS ?count) WHERE {
    ?poeticWork a pdc:PoeticWork .
    } 
    LIMIT 1000000
    """
    
    sparql_results = sparql(query)
    work_count = int(sparql_results["results"]["bindings"][0]["count"]["value"])
    
    return work_count

In [122]:
count_works()

1003

In [115]:
def count_stanzas(corpus=None) -> int:
    """Count stanzas in a corpus.
    
    Returns:
        int: Number of stanzas.
        
    TODO: handle multiple corpora
    """
    
    query = """
    PREFIX pdp: <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#>
    
    SELECT (COUNT(?Stanza) AS ?count) WHERE {
    ?Stanza a pdp:Stanza .
    } 
    LIMIT 1000000
    """
    
    sparql_results = sparql(query)
    stanza_count = int(sparql_results["results"]["bindings"][0]["count"]["value"])
    return stanza_count

In [116]:
count_stanzas()

5109

In [123]:
def count_verses(corpus=None) -> int:
    """Count verses/lines in a corpus.
    
    Returns:
        int: Number of verselines.
        
    TODO: handle multiple corpora
    """
    
    query = """
    PREFIX pdp: <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#>
    
    SELECT (COUNT(?line) AS ?count) WHERE {
        ?line a pdp:Line .
    } 
    LIMIT 1000000
    """
    
    sparql_results = sparql(query)
    verses_count = int(sparql_results["results"]["bindings"][0]["count"]["value"])
    return verses_count

In [120]:
count_verses()

105236

In [124]:
def count_words(corpus=None) -> int:
    """Count words in a corpus.
    
    Returns:
        int: Number of words.
        
    TODO: handle multiple corpora
    """
    
    query = """
    PREFIX pdp: <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#>
    
    SELECT (COUNT(?word) AS ?count) WHERE {
        ?word a <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#Word> .
    } 
    LIMIT 1000000
    """
    
    sparql_results = sparql(query)
    word_count = int(sparql_results["results"]["bindings"][0]["count"]["value"])
    return word_count

In [125]:
count_words()

571355

Syllables are different: There are "GrammaticalSyllables" `<http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#GrammaticalSyllable>` and "MetricalSyllables" `<http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#MetricalSyllable>`.

In [133]:
def count_metrical_syllables(corpus=None) -> int:
    """Count metrical syllables in a corpus.
    
    Returns:
        int: number of metrical syllables
    """
    
    query = """
    PREFIX pdp: <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#>
    
    SELECT (COUNT(?syllable) AS ?count) WHERE {
        ?syllable a <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#MetricalSyllable> .
    } 
    LIMIT 1000000
    """
    
    sparql_results = sparql(query)
    syllable_count = int(sparql_results["results"]["bindings"][0]["count"]["value"])
    
    return syllable_count
    

In [134]:
count_metrical_syllables()

299990

In [135]:
def count_grammatical_syllables(corpus=None) -> int:
    """Count grammatical syllables in a corpus.
    
    Returns:
        int: number of gramamtical syllables
    """
    
    query = """
    PREFIX pdp: <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#>
    
    SELECT (COUNT(?syllable) AS ?count) WHERE {
        ?syllable a <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#GrammaticalSyllable> .
    } 
    LIMIT 1000000
    """
    
    sparql_results = sparql(query)
    syllable_count = int(sparql_results["results"]["bindings"][0]["count"]["value"])
    
    return syllable_count

In [136]:
count_grammatical_syllables()

1054869

Authors (Actors/Persons) should only be counted if they are connected to a WorkConception in the ActorRole of creator. See above.

In [129]:
def count_authors(corpus=None) -> int:
    """Count authors in a corpus.
    
    Authors (Actors/Persons) are only counted if they are 
    connected to a "WorkConception" in the "ActorRole" with the function "creator".
    
    Returns:
        int: Number of actors.
        
    TODO: handle multiple corpora
    """
    
    query = """
    PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>

    SELECT (COUNT(DISTINCT ?Agent) AS ?count) WHERE {
        ?WorkConception a pdc:WorkConception ;
            pdc:hasAgentRole ?AgentRole .
        
        ?AgentRole pdc:roleFunction <http://postdata.linhd.uned.es/kos/Creator> ; 
            pdc:hasAgent ?Agent .
    }
    LIMIT 1000000
    """
    
    sparql_results = sparql(query)
    authors_count = int(sparql_results["results"]["bindings"][0]["count"]["value"])
    return authors_count

In [130]:
count_authors()

73

#### Combine the corpus metrics

In [137]:
def get_corpus_metrics(corpus=None) -> dict:
    """Get metrics for a given corpus.
    
    Returns:
        dict: corpus metrics
    """
    
    metrics = {}
    metrics["authors"] = count_authors(corpus)
    metrics["poems"] = count_works(corpus)
    metrics["stanzas"] = count_stanzas(corpus)
    metrics["verses"] = count_verses(corpus)
    metrics["words"] = count_words(corpus)
    metrics["grammatical_syllables"] = count_grammatical_syllables(corpus)
    metrics["metrical_syllables"] = count_metrical_syllables(corpus)
    
    return metrics 

In [138]:
get_corpus_metrics()

{'authors': 73,
 'poems': 1003,
 'stanzas': 5109,
 'verses': 105236,
 'words': 571355,
 'grammatical_syllables': 1054869,
 'metrical_syllables': 299990}

In [None]:
# TODO: general function to entites of type in graph.

#### Corpus info
Because we don't really have metadata on the corpus, we fake this here for demonstrator purposes.

DraCor example:
```
{
    "licence": "CC BY-NC 3.0",
    "licenceUrl": "https://creativecommons.org/licenses/by-nc/3.0/deed.en_US",
    "description": "Derived from the [Folger Shakespeare Library](https://shakespeare.folger.edu/). Enhancements documented in our [README at GitHub](https://github.com/dracor-org/shakedracor).",
    "uri": "https://dracor.org/api/corpora/shake",
    "title": "Shakespeare Drama Corpus",
    "name": "shake",
    "acronym": "ShakeDraCor",
    "metrics": {
      "plays": 37,
      "characters": 1433,
      "male": 797,
      "female": 116,
      "text": 37,
      "sp": 31066,
      "stage": 10450,
      "wordcount": {
        "text": 908286,
        "sp": 876744,
        "stage": 41230
      },
      "updated": "2022-07-02T23:36:24.109+02:00"
    },
    "repository": "https://github.com/dracor-org/shakedracor"
  }
```

In [146]:
def get_corpus_info(corpus=None, metrics=False) -> dict:
    """Get information on a corpus
    
     Args:
        corpus (optional): select a corpus (not implemented yet). defaults to None.
        metrics (optional): include corpus metrics. defaults to False.    
    Returns:
        dict: information on the given corpus.
    TODO: include more data (e.g. repository; see DraCor output)
    """
    #we don't have a mechanism yet that allows for filtering of a corpus,
    
    
    corpus_data = {}
    
    if corpus == None:
        # corpus defaults to None --> get the default POSTDATA corpus
        corpus_data["name"] = "postdata"
        corpus_data["title"] = "POSTDATA Corpus"
        corpus_data["description"] = "POSTDATA Knowledge Graph of Spanish Poetry. See https://postdata.linhd.uned.es"
    
    if metrics == True:
        corpus_data["metrics"] = get_corpus_metrics(corpus)
        
    return corpus_data  

In [147]:
get_corpus_info(metrics=True)

{'name': 'postdata',
 'title': 'POSTDATA Corpus',
 'description': 'POSTDATA Knowledge Graph of Spanish Poetry. See https://postdata.linhd.uned.es',
 'metrics': {'authors': 73,
  'poems': 1003,
  'stanzas': 5109,
  'verses': 105236,
  'words': 571355,
  'grammatical_syllables': 1054869,
  'metrical_syllables': 299990}}

#### List of available corpora
For the DraCor frontend (uses `/corpora` with param `include=metrics`) we need to have the on the corpora wrapped to an array even though we only have one corpus for the demonstrator.

In [161]:
def get_corpora(metrics=False) -> list:
    """Get a list of corpora
    
    Only one corpus is returned at the moment!
    TODO: handle more corpora
    """
    data = get_corpus_info(metrics=metrics)
    
    return [data]

In [154]:
get_corpora(metrics=True)

[{'name': 'postdata',
  'title': 'POSTDATA Corpus',
  'description': 'POSTDATA Knowledge Graph of Spanish Poetry. See https://postdata.linhd.uned.es',
  'metrics': {'authors': 73,
   'poems': 1003,
   'stanzas': 5109,
   'verses': 105236,
   'words': 571355,
   'grammatical_syllables': 1054869,
   'metrical_syllables': 299990}}]

### Data on a single Poem
(as included in the list of poems returned by the DraCor `/corpora/{corpus}` endpoint)

see DraCor example:
```
{
      "writtenYear": "1908",
      "wikidataId": "Q25556355",
      "source": "Татарская электронная библиотека",
      "id": "tat000001",
      "title": "Беренче театр",
      "sourceUrl": "http://kitap.net.ru/galiaskar/2.php",
      "networkSize": "7",
      "name": "qamal-berenche-teatr",
      "yearNormalized": 1908,
      "printYear": null,
      "subtitle": "Комедия 1 пәрдәдә",
      "premiereYear": null,
      "authors": [
        {
          "name": "Камал, Галиәсгар",
          "fullname": "Галиәсгар Камал",
          "shortname": "Камал",
          "refs": [
            {
              "ref": "Q2497099",
              "type": "wikidata"
            }
          ],
          "fullnameEn": "Ğäliäsğar Kamal",
          "nameEn": "Kamal, Ğäliäsğar",
          "shortnameEn": "Kamal",
          "alsoKnownAs": [
            "Ğäliäsğar Kamal"
          ]
        }
      ],
      "networkdataCsvUrl": "https://dracor.org/api/corpora/tat/play/qamal-berenche-teatr/networkdata/csv",
      "author": {
        "name": "Камал, Галиәсгар"
      },
      "subtitleEn": "Comedy in 1 Act",
      "titleEn": "First Theatre"
    }
```

In [166]:
#we test with 
poem_uri = "http://postdata.linhd.uned.es/resource/pw_juana-ines-de-la-cruz_sabras-querido-fabio"

In [165]:
#Function to get title of poem
def get_poem_title(poem_uri:str) -> str:
    """Get the title of a poem
    
    Args:
        poem_uri (str): URI of the poem
    
    Returns:
        str: Title of the poem
    """
    
    query = """
    PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>

    SELECT ?title WHERE {
        <$> a pdc:PoeticWork ;
            pdc:title ?title.
    }
    """
    
    sparql_results = sparql(replace_placeholder(query,poem_uri))
    title = str(sparql_results["results"]["bindings"][0]["title"]["value"])
    return title

In [167]:
get_poem_title(poem_uri)

'Sabrás, querido Fabio'

In [173]:
#Function to convert poem uri to postdata poetry lab link
# this will be used in "sourceUrl" (which is somewhat wrong, but will do because it links back to poetry lab)

def work_uri_to_poetry_lab_url(poem_uri:str) -> str:
    """Convert the URI of a poem into a link to poetry lab platform
    """
    poetry_lab_base_url = "http://poetry.linhd.uned.es:3000" + "/en/"
    
    #In the Graph: http://postdata.linhd.uned.es/resource/pw_juana-ines-de-la-cruz_sabras-querido-fabio
    # On the platform: http://poetry.linhd.uned.es:3000/es/author/juana-ines-de-la-cruz/poetic-work/sabras-querido-fabio
    #Split on "_"
    
    author_part = poem_uri.split("_")[1]
    title_part = poem_uri.split("_")[2]
    
    poetry_lab_url = poetry_lab_base_url + "author/" + author_part + "/poetic-work/" + title_part
    
    return poetry_lab_url

In [174]:
work_uri_to_poetry_lab_url(poem_uri)

'http://poetry.linhd.uned.es:3000/en/author/juana-ines-de-la-cruz/poetic-work/sabras-querido-fabio'

In [179]:
#in DraCor we have id/uri and a name. Don't know if this is feasible
def work_uri_to_poem_name(poem_uri:str) -> str:
    """Convert the URI to a local name consisting of author + "_" + "title"
    """
    author_part = poem_uri.split("_")[1]
    title_part = poem_uri.split("_")[2]
    
    poem_name = author_part + "_" + title_part
    
    return poem_name

In [180]:
work_uri_to_poem_name(poem_uri)

'juana-ines-de-la-cruz_sabras-querido-fabio'

In [164]:
# do we have information on dates? 
# (need to sparql that in Stardog Studio first; probably look for all disitinct propertues in the work conception)

In [183]:
def get_poem_metadata(poem_uri:str) -> dict:
    """Get Metadata of a single poem.
    """
    
    poem_data = {}
    poem_data["id"] = poem_uri
    
    #don't know if this works; only for POSTDATA but assumes that the URIs are always structured the same way
    poem_data["name"] = work_uri_to_poem_name(poem_uri)
    poem_data["title"] = get_poem_title(poem_uri)
    poem_data["authors"] = get_authors_of_poem(poem_uri)
    
    # this only works for postdata
    poem_data["source"] = "POSTDATA Poetry Lab"
    poem_data["sourceUrl"] = work_uri_to_poetry_lab_url(poem_uri)
    
    return poem_data

In [184]:
#test this
get_poem_metadata(poem_uri)

{'id': 'http://postdata.linhd.uned.es/resource/pw_juana-ines-de-la-cruz_sabras-querido-fabio',
 'name': 'juana-ines-de-la-cruz_sabras-querido-fabio',
 'title': 'Sabrás, querido Fabio',
 'authors': [{'name': 'Juana Inés de la Cruz',
   'id': 'http://postdata.linhd.uned.es/resource/p_juana-ines-de-la-cruz'}],
 'source': 'POSTDATA Poetry Lab',
 'sourceUrl': 'http://poetry.linhd.uned.es:3000/en/author/juana-ines-de-la-cruz/poetic-work/sabras-querido-fabio'}

### List of Works need by DraCor frontend

To create a view like https://dracor.org/ger we need to create an response like the endpoint https://dracor.org/doc/api#/public/list-corpus-content

```
{
  "description": "Edited by Daniil Skorinkin and Frank Fischer. Features a handful of plays in Tatar language, provided through Tatar Electronic Library.",
  "title": "Tatar Drama Corpus",
  "repository": "https://github.com/dracor-org/tatdracor",
  "name": "tat",
  "dramas": [
    {
      "writtenYear": "1908",
      "wikidataId": "Q25556355",
      "source": "Татарская электронная библиотека",
      "id": "tat000001",
      "title": "Беренче театр",
      "sourceUrl": "http://kitap.net.ru/galiaskar/2.php",
      "networkSize": "7",
      "name": "qamal-berenche-teatr",
      "yearNormalized": 1908,
      "printYear": null,
      "subtitle": "Комедия 1 пәрдәдә",
      "premiereYear": null,
      "authors": [
        {
          "name": "Камал, Галиәсгар",
          "fullname": "Галиәсгар Камал",
          "shortname": "Камал",
          "refs": [
            {
              "ref": "Q2497099",
              "type": "wikidata"
            }
          ],
          "fullnameEn": "Ğäliäsğar Kamal",
          "nameEn": "Kamal, Ğäliäsğar",
          "shortnameEn": "Kamal",
          "alsoKnownAs": [
            "Ğäliäsğar Kamal"
          ]
        }
      ],
      "networkdataCsvUrl": "https://dracor.org/api/corpora/tat/play/qamal-berenche-teatr/networkdata/csv",
      "author": {
        "name": "Камал, Галиәсгар"
      },
      "subtitleEn": "Comedy in 1 Act",
      "titleEn": "First Theatre"
    }
  ],
  "acronym": "TatDraCor"
}
```

In [189]:
def get_poem_uris(corpus=None) -> list:
    """Helper function to get a list of URIs of PoeticWorks
    """
    query = """
    PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>

    SELECT ?work WHERE {
        ?work a pdc:PoeticWork .
    }
    LIMIT 1000000
    """
    
    sparql_results = sparql(query)
    bindings = sparql_results["results"]["bindings"]
    
    poem_uris = []
    
    for binding in bindings:
        poem_uris.append(binding["work"]["value"])
    
    return poem_uris

In [191]:
#get_poem_uris() returns a very long list
len(get_poem_uris())

1003

In [192]:
def get_corpus_content(corpus=None) -> dict:
    """Returns metadata on the corpus.
        
        Similar to DraCor's https://dracor.org/doc/api#/public/list-corpus-content
        
        Returns:
            dict: data on the corpus listing all the poems
    """
    
    corpus_data = get_corpus_info(metrics=False)
    
    corpus_data["poems"] = []
    
    poem_uris = get_poem_uris()
    
    for poem_uri in poem_uris:
        poem_data = get_poem_metadata(poem_uri)
        corpus_data["poems"].append(poem_data)
    
    return corpus_data

In [203]:
#%%time
#this is a little bit slow, hmpf
#uncomment to show
#get_corpus_content()

In [200]:
#create example data for postman mock server
example_data = get_corpus_content()
with open("corpus_content_example.json", "w", encoding='utf-8') as outfile:
    json.dump(example_data, outfile, ensure_ascii=False)