# DraCor Postdata Middle-Ware Tyouts

**Live-Version**: This notebook can operate on POSTDATAs Stardog instance and was derived from the `query_tryouts.ipynb`, which worked with a local Triple store.

POSTDATA data uses named graphs for "Scansions" therefore the queries needed to be adapted.

In [3]:
#use the official stardog package: https://pystardog.readthedocs.io/en/latest/
import stardog

In [4]:
#libraries, that are in the pystardog notebook; used to render the results as pd dataframes (maybe don't need them later)
import io
import pandas as pd

The queries and functions below where developed against a local installation of stardog. I tested them with connecting to POSTDATA's Poetry Lab Triple Store. Some worked, others faild. Look at the designated notebook for the implementation with Poetry Lab.

## Connect to stardog

In [5]:
import json

In [6]:
#local use: config.local.json
#poetrylab: config.poetrylab.json

#config_file = "config.local.json"
config_file = "config.poetrylab.json"

with open(config_file) as f:
    config = json.load(f)

In [7]:
usr = config["server"]["credentials"]["user"]
pwd = config["server"]["credentials"]["password"]

endpoint = config["server"]["protocol"] + "://" +  config["server"]["url"] + ":" + config["server"]["port"]

In [8]:
#as in the pystardog notebook https://github.com/stardog-union/pystardog/blob/develop/notebooks/tutorial.ipynb
connection_details = {
  'endpoint': endpoint,
  'username': usr,
  'password': pwd
}

In [9]:
database_name = config["server"]["database"]

In [10]:
#connection_details

In [11]:
conn = stardog.Connection(database_name, **connection_details)

## Simple Test

In [12]:
query = """
SELECT * WHERE {
  ?s ?p ?o
}
LIMIT 10
"""

In [13]:
csv_results = conn.select(query, content_type='text/csv')
df = pd.read_csv(io.BytesIO(csv_results))
df.head()

Unnamed: 0,s,p,o
0,http://postdata.linhd.uned.es/resource/sc_juan...,http://postdata.linhd.uned.es/ontology/postdat...,http://postdata.linhd.uned.es/A_juana-ines-de-...
1,http://postdata.linhd.uned.es/resource/sp_juan...,http://postdata.linhd.uned.es/ontology/postdat...,http://postdata.linhd.uned.es/resource/sc_juan...
2,http://postdata.linhd.uned.es/kos/automaticsca...,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.w3.org/2004/02/skos/core#Concept
3,http://postdata.linhd.uned.es/kos/AutomaticAnn...,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.w3.org/2004/02/skos/core#Concept
4,http://postdata.linhd.uned.es/resource/sc_juan...,http://postdata.linhd.uned.es/ontology/postdat...,http://postdata.linhd.uned.es/M_juana-ines-de-...


In [14]:
#to get a dictionary instead
#query_results = conn.select(query)
#type(query_results)

In [15]:
#to clean the connection
#conn.__exit__()

## Named Graphs
POSTDATA uses named graphs for "Scansions".
Use to check:
```
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?g WHERE {
	GRAPH ?g { ?sub ?pred ?obj .}
} LIMIT 10
```

## Helper Functions
Setup functions to query and parse the results (e.g. as pd dataframe).

In [16]:
def sparql(query, parse=False):
    """
    Helper function to send a SPARQL query to the Stardog (should have a connection `conn` established).
    The optional parameter `parse` can be used to get a padas dataframe back. 
    """
    if parse:
        csv_results = conn.select(query, content_type='text/csv')
        df = pd.read_csv(io.BytesIO(csv_results))
        return df
    else:
        results = conn.select(query)
        return results

In [17]:
#inject <$> to queries
def replace_placeholder(query,uri):
    """Replaces the placeholder in a query.
    """
    placeholder = "$"
    return query.replace(placeholder,uri)

## Query the Postdata Knowledge Graph
I played around in Stardog Studio and came up with some queries. These need to be tested here and then eventually wrapped in functions to be used in an API.

### poeticWorks

#### List all works

A "poeticWork" is of the class `http://postdata.linhd.uned.es/ontology/postdata-core#PoeticWork`.

In [18]:
query = """
PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>

SELECT ?work ?title WHERE {
    ?work a pdc:PoeticWork ;
          pdc:title ?title.
}
LIMIT 1000000
"""
# by default stardog returns 1.000 results only; need to set a LIMIT that exceeds the number of works expected 

In [19]:
#use the helper function to send the query
results = sparql(query,parse=True)

In [20]:
type(results)

pandas.core.frame.DataFrame

In [21]:
len(results)

10081

In [22]:
results.head()

Unnamed: 0,work,title
0,http://postdata.linhd.uned.es/resource/pw_juan...,"Sabrás, querido Fabio"
1,http://postdata.linhd.uned.es/resource/pw_juan...,"Silvio, tu opinión va errada"
2,http://postdata.linhd.uned.es/resource/pw_juan...,Hombres necios que acusáis
3,http://postdata.linhd.uned.es/resource/pw_juan...,"Si acaso, Fabio mío"
4,http://postdata.linhd.uned.es/resource/pw_juan...,Mientras la gracia me excita


In [23]:
# a simple uri of a work
results["work"][0]

'http://postdata.linhd.uned.es/resource/pw_juana-ines-de-la-cruz_sabras-querido-fabio'

#### Works, Titles and Authors (Creators in the WorkConception)
A "poeticWork" "isRealisedThrough" `<http://postdata.linhd.uned.es/ontology/postdata-core#isRealisedThrough>` a `<http://postdata.linhd.uned.es/ontology/postdata-core#WorkConception>`. "WorkConception".

This "WorkConception" has a "AgentRole "`<http://postdata.linhd.uned.es/ontology/postdata-core#hasAgentRole>`; the property "hasAgentRole" connects it to a "AgentRole" `<http://postdata.linhd.uned.es/ontology/postdata-core#AgentRole>`.

"AgentRole" can be classified with "<http://postdata.linhd.uned.es/ontology/postdata-core#roleFunction>" which links to the KOS, e.g. `<http://postdata.linhd.uned.es/kos/Creator>`. (This one should be used for the author).

The Autor is connected to the "AgentRole" with the property "hasAgent" `<http://postdata.linhd.uned.es/ontology/postdata-core#hasAgent>`.

"Agent" is a <http://postdata.linhd.uned.es/ontology/postdata-core#Person> .
The "Agent" `pdc:name` has a Name (Literal!).

In [24]:
query = """
PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>

SELECT ?Work ?Title ?Agent ?Name WHERE {
    ?Work a pdc:PoeticWork ;
        pdc:title ?Title .
    
    OPTIONAL { 
        ?Work pdc:wasInitiatedBy ?WorkConception .
        
        ?WorkConception pdc:hasAgentRole ?AgentRole .
        
        ?AgentRole pdc:roleFunction <http://postdata.linhd.uned.es/kos/Creator> ; 
                   pdc:hasAgent ?Agent .
        
        ?Agent pdc:name ?Name .
    }
}
LIMIT 1000000
"""

In [25]:
results = sparql(query,parse=True)
results.head()

Unnamed: 0,Work,Title,Agent,Name
0,http://postdata.linhd.uned.es/resource/pw_juan...,"Sabrás, querido Fabio",http://postdata.linhd.uned.es/resource/p_juana...,Juana Inés de la Cruz
1,http://postdata.linhd.uned.es/resource/pw_juan...,"Sabrás, querido Fabio",http://postdata.linhd.uned.es/resource/p_juana...,Juana Ines de La Cruz
2,http://postdata.linhd.uned.es/resource/pw_juan...,"Silvio, tu opinión va errada",http://postdata.linhd.uned.es/resource/p_juana...,Juana Inés de la Cruz
3,http://postdata.linhd.uned.es/resource/pw_juan...,"Silvio, tu opinión va errada",http://postdata.linhd.uned.es/resource/p_juana...,Juana Ines de La Cruz
4,http://postdata.linhd.uned.es/resource/pw_juan...,Hombres necios que acusáis,http://postdata.linhd.uned.es/resource/p_juana...,Juana Inés de la Cruz


In [26]:
len(results)

10384

Obviously, there are no poems with more than one author. But this could happen, so maybe the query needs to be adapted.

#### Author(s) of a given PoeticWork

Adapted query of the one above to get the authors by a given PoeticWork.

In [27]:
query = """
PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>

SELECT ?Agent ?Name WHERE {
    <$> a pdc:PoeticWork ;
        pdc:wasInitiatedBy ?WorkConception .
        
        ?WorkConception pdc:hasAgentRole ?AgentRole .
        
        ?AgentRole pdc:roleFunction <http://postdata.linhd.uned.es/kos/Creator> ; 
                   pdc:hasAgent ?Agent .
        
        OPTIONAL {
            ?Agent pdc:name ?Name .
        }
}
"""

In [28]:
uri = "http://postdata.linhd.uned.es/resource/pw_juana-ines-de-la-cruz_sabras-querido-fabio"
print(replace_placeholder(query,uri))
results = sparql(replace_placeholder(query,uri))


PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>

SELECT ?Agent ?Name WHERE {
    <http://postdata.linhd.uned.es/resource/pw_juana-ines-de-la-cruz_sabras-querido-fabio> a pdc:PoeticWork ;
        pdc:wasInitiatedBy ?WorkConception .
        
        ?WorkConception pdc:hasAgentRole ?AgentRole .
        
        ?AgentRole pdc:roleFunction <http://postdata.linhd.uned.es/kos/Creator> ; 
                   pdc:hasAgent ?Agent .
        
        OPTIONAL {
            ?Agent pdc:name ?Name .
        }
}



In [29]:
results

{'head': {'vars': ['Agent', 'Name']},
 'results': {'bindings': [{'Agent': {'type': 'uri',
     'value': 'http://postdata.linhd.uned.es/resource/p_juana-ines-de-la-cruz'},
    'Name': {'type': 'literal', 'value': 'Juana Inés de la Cruz'}},
   {'Agent': {'type': 'uri',
     'value': 'http://postdata.linhd.uned.es/resource/p_juana-ines-de-la-cruz'},
    'Name': {'type': 'literal', 'value': 'Juana Ines de La Cruz'}}]}}

An endpoint to get creators of a poem `/corpora/{corpus}/poems/{poem}/authors` could return something along the lines of DraCor API `https://dracor.org/api/corpora/{corpus}/play/{play}`:

```
"authors": [
    {
      "name": "Lessing, Gotthold Ephraim",
      "fullname": "Gotthold Ephraim Lessing",
      "shortname": "Lessing",
      "refs": [
        {
          "ref": "Q34628",
          "type": "wikidata"
        },
        {
          "ref": "118572121",
          "type": "pnd"
        }
      ]
    }
  ]
```

### Additional Info on a person
```
<http://postdata.linhd.uned.es/ontology/postdata-core#diedIn>
<http://postdata.linhd.uned.es/ontology/postdata-core#wasBorn>
```
from person to ?o .

* `pdc:diedIn`
* `pdc:wasBorn`
* `rdfs:label`
* `pdc:hasEducation`
* `pdc:ethnicity`
* `pdc:portrait`
* `<http://www.w3.org/2002/07/owl#sameAs>`
* `pdc:article`
* `pdc:description`
* `pdc:hasOccupation`
* `pdc:movement`
* `pdc:religiousAffiliation`
* `pdc:gender`
* `pdc:genre`

inverse:
* `pdc:hasAgent`
* `pdc:broughtIntoLife`
* `pdc:wasDeathOf`

Birth and Death follow the CIDOC patterns:


* `pdc:hasTimeSpan`
* `pdc:tookPlaceAt`

```
SELECT ?person ?sameAs FROM <tag:stardog:api:context:local> WHERE {
  ?person a pdc:Person ;
  	owl:sameAs ?sameAs . 	
}
```
this is the link to wikidata, but only for 10 instances.

This is also true for `pdc:article` (which returns 20 instances, but english and spanish links), so probably, the same, yes:

```
SELECT ?person ?article ?wikidata FROM <tag:stardog:api:context:local> WHERE {
  ?person a pdc:Person ;
  	pdc:article ?article ;
   owl:sameAs ?wikidata.
}
```

There is more information on Birth and Death...

#### Wikidata

in DraCor:
```
"refs": [
            {
              "ref": "Q2497099",
              "type": "wikidata"
            }
        ]
```

In [30]:
def wikidata_uri_of_person(person_uri:str) -> str:
    """Get the WIKIDATA URI of a Person"""
    
    query = """
    PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>
    PREFIX owl: <http://www.w3.org/2002/07/owl#>

    SELECT * FROM <tag:stardog:api:context:local> WHERE {
      <$> a pdc:Person ;
      owl:sameAs ?wikidata .
    }
    """
    sparql_results = sparql(replace_placeholder(query,person_uri))
    
    bindings = sparql_results["results"]["bindings"]
    
    if len(bindings) == 1:
        if bindings[0]["wikidata"]["value"].startswith("http://www.wikidata.org/entity/"):
            return bindings[0]["wikidata"]["value"]
        
    else:
        return None
    

In [31]:
person_uri = "http://postdata.linhd.uned.es/resource/p_lope-de-vega"
wikidata_uri_of_person(person_uri)

'http://www.wikidata.org/entity/Q165257'

In [32]:
print(wikidata_uri_of_person("http://postdata.linhd.uned.es/resource/p_juana-ines-de-la-cruz"))

None


In [33]:
def wikidata_uri_to_author_ref(uri:str) -> dict:
    """Helper Function to generate the ref for the refs of an author"""
    if uri != None and "http://www.wikidata.org/entity/" in uri :
        wd = uri.replace("http://www.wikidata.org/entity/","")
        ref = {
            "ref" : wd ,
            "type" : "wikidata"
            }
        return ref
        

In [34]:
wikidata_uri_to_author_ref(wikidata_uri_of_person(person_uri))

{'ref': 'Q165257', 'type': 'wikidata'}

In [35]:
#if there is no wikidata uri
wikidata_uri_to_author_ref(wikidata_uri_of_person("http://postdata.linhd.uned.es/resource/p_juana-ines-de-la-cruz"))

In [36]:
def get_authors_of_poem(poem_uri:str, include_wikidata=False) -> list:
    """Get the autors 
    Returns the agents in an AgentRole with the roleFunction of "creator".
    
    Args:
        poem_uri (str): URI of a poeticWork
    Returns:
        list: List of dictionaries containg information on author name and id = uri.
    """
    #needed to group it because for some poemes, e.g. http://postdata.linhd.uned.es/resource/pw_juana-ines-de-la-cruz_sabras-querido-fabio it returned the author several times 
    
    query = """
        PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>

        SELECT ?Agent (SAMPLE(?PersName) AS ?Name)  WHERE {
            <$> a pdc:PoeticWork ;
            pdc:wasInitiatedBy ?WorkConception .
        
            ?WorkConception pdc:hasAgentRole ?AgentRole .
        
            ?AgentRole pdc:roleFunction <http://postdata.linhd.uned.es/kos/Creator> ; 
                   pdc:hasAgent ?Agent .
        
            OPTIONAL {
                ?Agent pdc:name ?PersName .
            }
    }
    GROUP BY ?Agent
    """
    sparql_results = sparql(replace_placeholder(query,poem_uri))
    
    authors = []
    for binding in sparql_results["results"]["bindings"]:
        author = {}
        author["name"] = binding["Name"]["value"]
        author["uri"] = binding["Agent"]["value"]        
        
        if include_wikidata == True:
            try:
                wikidata = wikidata_uri_of_person(author["uri"])
                if wikidata != None:
                    author["refs"] = []
                    author["refs"].append(wikidata_uri_to_author_ref(wikidata))
            except:
                pass
        authors.append(author)
        
    
    return authors

In [37]:
%%time
get_authors_of_poem(uri)

CPU times: user 2.6 ms, sys: 1.12 ms, total: 3.72 ms
Wall time: 150 ms


[{'name': 'Juana Ines de La Cruz',
  'uri': 'http://postdata.linhd.uned.es/resource/p_juana-ines-de-la-cruz'}]

I don't know why this returns the same author twice. 

OK, there is spelling variance: 
```
2
<http://postdata.linhd.uned.es/ontology/postdata-core#name>
Juana Inés de la Cruz
3
<http://postdata.linhd.uned.es/ontology/postdata-core#name>
Juana Ines de La Cruz
```

In [38]:
%%time
get_authors_of_poem(uri,include_wikidata=True)

CPU times: user 3.84 ms, sys: 1.32 ms, total: 5.15 ms
Wall time: 252 ms


[{'name': 'Juana Inés de la Cruz',
  'uri': 'http://postdata.linhd.uned.es/resource/p_juana-ines-de-la-cruz'}]

In [39]:
%%time
get_authors_of_poem("http://postdata.linhd.uned.es/resource/pw_lope-de-vega_1-al-sujeto-de-la-dama-que-le-dijo-dios-le-provea",include_wikidata=True)

CPU times: user 3.81 ms, sys: 1.44 ms, total: 5.25 ms
Wall time: 247 ms


[{'name': 'Lope de Vega',
  'uri': 'http://postdata.linhd.uned.es/resource/p_lope-de-vega',
  'refs': [{'ref': 'Q165257', 'type': 'wikidata'}]}]

### Authors
There is an endpoint in the official API to get authors, I wouldn't really have to replicate that. Still, I need a list auf authors to use in OpenRefine.

In [174]:
def get_authors(corpus=None) -> list:
    """Get a list of authors.
    
    Authors (Actors/Persons) are only counted if they are 
    connected to a "WorkConception" in the "ActorRole" with the function "creator".
        
    TODO: handle multiple corpora
    """
    
    query = """
    PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>

    SELECT DISTINCT ?Agent (SAMPLE(?Name) AS ?Name) (SAMPLE(?BirthDate) AS ?BirthDate) (SAMPLE(?BirthPlace) AS ?BirthPlace) (SAMPLE(?DeathDate) AS ?DeathDate) (SAMPLE(?DeathPlace) AS ?DeathPlace) FROM <tag:stardog:api:context:local> WHERE {
        ?WorkConception a pdc:WorkConception ;
            pdc:hasAgentRole ?AgentRole .
        
        ?AgentRole pdc:roleFunction <http://postdata.linhd.uned.es/kos/Creator> ; 
            pdc:hasAgent ?Agent .
  
 	 OPTIONAL {
    	?Agent pdc:name ?Name .
  	}
  OPTIONAL {
  	?Agent pdc:wasBorn ?Birth.
    OPTIONAL {
    ?Birth pdc:hasTimeSpan ?BirthTS.
    ?BirthTS pdc:date ?BirthDate .
    }
    OPTIONAL {
    ?Birth pdc:tookPlaceAt ?BirthPlace.
    }
  }
  
  OPTIONAL {
  	?Agent pdc:diedIn ?Death.
    OPTIONAL {
    ?Death pdc:hasTimeSpan ?DeathTS.
    ?DeathTS pdc:date ?DeathDate .
    }
    OPTIONAL {
    ?Death pdc:tookPlaceAt ?DeathPlace.
    }
  }
  
    }
    GROUP BY ?Agent
    LIMIT 1000000
    """
    
    sparql_results = sparql(query)
    authors = []
    for binding in sparql_results["results"]["bindings"]:
        author = {}
        author["uri"] = binding["Agent"]["value"]
        author["name"] = binding["Name"]["value"]
        if "BirthDate" in binding:
            author["birthDate"] = binding["BirthDate"]["value"].replace("T00:00:00Z","")
        if "BirthPlace" in binding:
            author["birthPlace"] =  {}
            author["birthPlace"]["ref"] = binding["BirthPlace"]["value"].replace("http://www.wikidata.org/entity/","")
            author["birthPlace"]["type"] = "wikidata"
        if "DeathDate" in binding:
            author["deathDate"] = binding["DeathDate"]["value"].replace("T00:00:00Z","")
        if "DeathPlace" in binding:
            author["deathPlace"] =  {}
            author["deathPlace"]["ref"] = binding["DeathPlace"]["value"].replace("http://www.wikidata.org/entity/","")
            author["deathPlace"]["type"] = "wikidata"
        authors.append(author)
    return authors


In [177]:
all_authors = get_authors()

In [178]:
len(all_authors)

1192

In [179]:
#store the authors
with open("authors.json", "w", encoding='utf-8') as outfile:
    json.dump(all_authors, outfile, ensure_ascii=False)

### Information on a single corpus (the whole graph)
DraCor's endpoint `/corpora` returns information on all corpora in the database. This endpoint is used to display the stats by setting the parameter `include=metrics`. See https://dracor.org/doc/api#/public/list-corpora. The data on a single corpus:

```
{
    "licence": "CC BY-NC 3.0",
    "licenceUrl": "https://creativecommons.org/licenses/by-nc/3.0/deed.en_US",
    "description": "Derived from the [Folger Shakespeare Library](https://shakespeare.folger.edu/). Enhancements documented in our [README at GitHub](https://github.com/dracor-org/shakedracor).",
    "uri": "https://dracor.org/api/corpora/shake",
    "title": "Shakespeare Drama Corpus",
    "name": "shake",
    "acronym": "ShakeDraCor",
    "metrics": {
      "plays": 37,
      "characters": 1433,
      "male": 797,
      "female": 116,
      "text": 37,
      "sp": 31066,
      "stage": 10450,
      "wordcount": {
        "text": 908286,
        "sp": 876744,
        "stage": 41230
      },
      "updated": "2022-07-02T23:36:24.109+02:00"
    },
    "repository": "https://github.com/dracor-org/shakedracor"
  }
```
We can not really list "corpora" in the POSTDATA Knowledge Graph because there is only one included, but stil we have to implement an endpoint that would provide information for the frontpage.

We can at least provide some metrics, e.g.

```
"metrics": {
      "authors" : 1,
      "poems": 1,
      "stanzas": 1,
      "verses": 1,
      "words" : 1
      }
```

In [40]:
def count_works(corpus=None) -> int:
    """Count poeticWorks in a corpus.
    
    Returns:
        int: Number of poetic works/poems.
        
    TODO: handle multiple corpora
    """
    
    query = """
    PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>
    
    SELECT (COUNT(?poeticWork) AS ?count) FROM <tag:stardog:api:context:local> WHERE {
    ?poeticWork a pdc:PoeticWork .
    } 
    LIMIT 1000000
    """
    
    sparql_results = sparql(query)
    work_count = int(sparql_results["results"]["bindings"][0]["count"]["value"])
    
    return work_count

In [41]:
count_works()

10071

In [42]:
def count_stanzas(corpus=None) -> int:
    """Count stanzas in a corpus.
    
    Returns:
        int: Number of stanzas.
        
    TODO: handle multiple corpora
    """
    
    #For the production version we explicitly have to include the Union Graph with FROM to get results
    query = """
    PREFIX pdp: <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#>
    
    SELECT (COUNT(?Stanza) AS ?count) FROM <tag:stardog:api:context:local> WHERE {
        ?Stanza a pdp:Stanza .
    }
    """
    
    sparql_results = sparql(query)
    stanza_count = int(sparql_results["results"]["bindings"][0]["count"]["value"])
    return stanza_count

In [43]:
count_stanzas()

81122

In [44]:
def count_verses(corpus=None) -> int:
    """Count verses/lines in a corpus.
    
    Returns:
        int: Number of verselines.
        
    TODO: handle multiple corpora
    """
    
    query = """
    PREFIX pdp: <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#>
    
    SELECT (COUNT(?line) AS ?count) FROM <tag:stardog:api:context:local> WHERE {
        ?line a pdp:Line .
    } 
    LIMIT 1000000
    """
    
    sparql_results = sparql(query)
    verses_count = int(sparql_results["results"]["bindings"][0]["count"]["value"])
    return verses_count

In [45]:
count_verses()

544498

In [46]:
def count_words(corpus=None) -> int:
    """Count words in a corpus.
    
    Returns:
        int: Number of words.
        
    TODO: handle multiple corpora
    """
    
    query = """
    PREFIX pdp: <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#>
    
    SELECT (COUNT(?word) AS ?count) FROM <tag:stardog:api:context:local> WHERE {
        ?word a <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#Word> .
    } 
    LIMIT 1000000
    """
    
    sparql_results = sparql(query)
    word_count = int(sparql_results["results"]["bindings"][0]["count"]["value"])
    return word_count

In [47]:
count_words()

2988230

Syllables are different: There are "GrammaticalSyllables" `<http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#GrammaticalSyllable>` and "MetricalSyllables" `<http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#MetricalSyllable>`.

In [48]:
def count_metrical_syllables(corpus=None) -> int:
    """Count metrical syllables in a corpus.
    
    Returns:
        int: number of metrical syllables
    """
    
    query = """
    PREFIX pdp: <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#>
    
    SELECT (COUNT(?syllable) AS ?count) FROM <tag:stardog:api:context:local> WHERE {
        ?syllable a <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#MetricalSyllable> .
    } 
    LIMIT 1000000
    """
    
    sparql_results = sparql(query)
    syllable_count = int(sparql_results["results"]["bindings"][0]["count"]["value"])
    
    return syllable_count
    

In [49]:
count_metrical_syllables()

1259036

In [50]:
def count_grammatical_syllables(corpus=None) -> int:
    """Count grammatical syllables in a corpus.
    
    Returns:
        int: number of gramamtical syllables
    """
    
    query = """
    PREFIX pdp: <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#>
    
    SELECT (COUNT(?syllable) AS ?count) FROM <tag:stardog:api:context:local> WHERE {
        ?syllable a <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#GrammaticalSyllable> .
    } 
    LIMIT 1000000
    """
    
    sparql_results = sparql(query)
    syllable_count = int(sparql_results["results"]["bindings"][0]["count"]["value"])
    
    return syllable_count

In [51]:
count_grammatical_syllables()

2116388

Authors (Actors/Persons) should only be counted if they are connected to a WorkConception in the ActorRole of creator. See above.

In [52]:
def count_authors(corpus=None) -> int:
    """Count authors in a corpus.
    
    Authors (Actors/Persons) are only counted if they are 
    connected to a "WorkConception" in the "ActorRole" with the function "creator".
    
    Returns:
        int: Number of actors.
        
    TODO: handle multiple corpora
    """
    
    query = """
    PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>

    SELECT (COUNT(DISTINCT ?Agent) AS ?count) FROM <tag:stardog:api:context:local> WHERE {
        ?WorkConception a pdc:WorkConception ;
            pdc:hasAgentRole ?AgentRole .
        
        ?AgentRole pdc:roleFunction <http://postdata.linhd.uned.es/kos/Creator> ; 
            pdc:hasAgent ?Agent .
    }
    LIMIT 1000000
    """
    
    sparql_results = sparql(query)
    authors_count = int(sparql_results["results"]["bindings"][0]["count"]["value"])
    return authors_count

In [53]:
count_authors()

1192

#### Combine the corpus metrics

In [54]:
def get_corpus_metrics(corpus=None) -> dict:
    """Get metrics for a given corpus.
    
    Returns:
        dict: corpus metrics
    """
    
    metrics = {}
    metrics["authors"] = count_authors(corpus)
    metrics["poems"] = count_works(corpus)
    metrics["stanzas"] = count_stanzas(corpus)
    metrics["verses"] = count_verses(corpus)
    metrics["words"] = count_words(corpus)
    metrics["grammaticalSyllables"] = count_grammatical_syllables(corpus)
    metrics["metricalSyllables"] = count_metrical_syllables(corpus)
    
    return metrics 

In [55]:
%%time
get_corpus_metrics()

CPU times: user 9.68 ms, sys: 1.61 ms, total: 11.3 ms
Wall time: 3.88 s


{'authors': 1192,
 'poems': 10071,
 'stanzas': 81122,
 'verses': 544498,
 'words': 2988230,
 'grammaticalSyllables': 2116388,
 'metricalSyllables': 1259036}

The numbers might be wrong because, if a poem has to scansions (which is almost always the case), I think, these overall queries will count them twice.

#### Corpus info
Because we don't really have metadata on the corpus, we fake this here for demonstrator purposes.

DraCor example:
```
{
    "licence": "CC BY-NC 3.0",
    "licenceUrl": "https://creativecommons.org/licenses/by-nc/3.0/deed.en_US",
    "description": "Derived from the [Folger Shakespeare Library](https://shakespeare.folger.edu/). Enhancements documented in our [README at GitHub](https://github.com/dracor-org/shakedracor).",
    "uri": "https://dracor.org/api/corpora/shake",
    "title": "Shakespeare Drama Corpus",
    "name": "shake",
    "acronym": "ShakeDraCor",
    "metrics": {
      "plays": 37,
      "characters": 1433,
      "male": 797,
      "female": 116,
      "text": 37,
      "sp": 31066,
      "stage": 10450,
      "wordcount": {
        "text": 908286,
        "sp": 876744,
        "stage": 41230
      },
      "updated": "2022-07-02T23:36:24.109+02:00"
    },
    "repository": "https://github.com/dracor-org/shakedracor"
  }
```

In [56]:
def get_corpus_info(corpus=None, metrics=False) -> dict:
    """Get information on a corpus
    
     Args:
        corpus (optional): select a corpus (not implemented yet). defaults to None.
        metrics (optional): include corpus metrics. defaults to False.    
    Returns:
        dict: information on the given corpus.
    TODO: include more data (e.g. repository; see DraCor output)
    """
    #we don't have a mechanism yet that allows for filtering of a corpus,
    
    
    corpus_data = {}
    
    if corpus == None:
        # corpus defaults to None --> get the default POSTDATA corpus
        corpus_data["name"] = "postdata"
        corpus_data["title"] = "POSTDATA Corpus"
        corpus_data["description"] = "POSTDATA Knowledge Graph of Poetry. See https://postdata.linhd.uned.es"
    
    if metrics == True:
        corpus_data["metrics"] = get_corpus_metrics(corpus)
        
    return corpus_data  

In [57]:
get_corpus_info(metrics=True)

{'name': 'postdata',
 'title': 'POSTDATA Corpus',
 'description': 'POSTDATA Knowledge Graph of Poetry. See https://postdata.linhd.uned.es',
 'metrics': {'authors': 1192,
  'poems': 10071,
  'stanzas': 81122,
  'verses': 544498,
  'words': 2988230,
  'grammaticalSyllables': 2116388,
  'metricalSyllables': 1259036}}

#### List of available corpora
For the DraCor frontend (uses `/corpora` with param `include=metrics`) we need to have the on the corpora wrapped to an array even though we only have one corpus for the demonstrator.

In [58]:
def get_corpora(metrics=False) -> list:
    """Get a list of corpora
    
    Only one corpus is returned at the moment!
    TODO: handle more corpora
    """
    data = get_corpus_info(metrics=metrics)
    
    return [data]

In [59]:
get_corpora(metrics=True)

[{'name': 'postdata',
  'title': 'POSTDATA Corpus',
  'description': 'POSTDATA Knowledge Graph of Poetry. See https://postdata.linhd.uned.es',
  'metrics': {'authors': 1192,
   'poems': 10071,
   'stanzas': 81122,
   'verses': 544498,
   'words': 2988230,
   'grammaticalSyllables': 2116388,
   'metricalSyllables': 1259036}}]

### Data on a single Poem
(as included in the list of poems returned by the DraCor `/corpora/{corpus}` endpoint)

see DraCor example:
```
{
      "writtenYear": "1908",
      "wikidataId": "Q25556355",
      "source": "Татарская электронная библиотека",
      "id": "tat000001",
      "title": "Беренче театр",
      "sourceUrl": "http://kitap.net.ru/galiaskar/2.php",
      "networkSize": "7",
      "name": "qamal-berenche-teatr",
      "yearNormalized": 1908,
      "printYear": null,
      "subtitle": "Комедия 1 пәрдәдә",
      "premiereYear": null,
      "authors": [
        {
          "name": "Камал, Галиәсгар",
          "fullname": "Галиәсгар Камал",
          "shortname": "Камал",
          "refs": [
            {
              "ref": "Q2497099",
              "type": "wikidata"
            }
          ],
          "fullnameEn": "Ğäliäsğar Kamal",
          "nameEn": "Kamal, Ğäliäsğar",
          "shortnameEn": "Kamal",
          "alsoKnownAs": [
            "Ğäliäsğar Kamal"
          ]
        }
      ],
      "networkdataCsvUrl": "https://dracor.org/api/corpora/tat/play/qamal-berenche-teatr/networkdata/csv",
      "author": {
        "name": "Камал, Галиәсгар"
      },
      "subtitleEn": "Comedy in 1 Act",
      "titleEn": "First Theatre"
    }
```

In [60]:
#we test with 
poem_uri = "http://postdata.linhd.uned.es/resource/pw_juana-ines-de-la-cruz_sabras-querido-fabio"

In [61]:
#Function to get title of poem
def get_poem_title(poem_uri:str) -> str:
    """Get the title of a poem
    
    Args:
        poem_uri (str): URI of the poem
    
    Returns:
        str: Title of the poem
    """
    
    query = """
    PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>

    SELECT ?title FROM <tag:stardog:api:context:local> WHERE {
        <$> a pdc:PoeticWork ;
            pdc:title ?title.
    }
    """
    
    sparql_results = sparql(replace_placeholder(query,poem_uri))
    title = str(sparql_results["results"]["bindings"][0]["title"]["value"])
    return title

In [62]:
get_poem_title(poem_uri)

'Sabrás, querido Fabio'

In [182]:
def get_poem_creation_year(poem_uri:str) -> str:
    """Get the year attributed to the creation of the work"""
    query = """
    PREFIX pdp: <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#>
    PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>

    SELECT ?CreationDate FROM <tag:stardog:api:context:local> WHERE {
        ?Creation pdc:initiated <$>;
              pdc:hasTimeSpan ?ts.
  
      ?ts pdc:date ?CreationDate.
    } 
    LIMIT 1000000
    """
    #can be <http://www.w3.org/2001/XMLSchema#date> then somewhat valid
    #unclear: ¿?, but also ¿Ca. 1580? or ¿1603? not sure, if these are all the available modifiers
    #example, that works
    #http://postdata.linhd.uned.es/resource/pw_gongora-luis-de_la-que-ya-fue-de-las-aves>
    #this is unsure: <http://postdata.linhd.uned.es/resource/pw_gongora-luis-de_cantemos-a-la-jineta>
    #maybe throw an exception?
    
    sparql_results = sparql(replace_placeholder(query,poem_uri))
    date = sparql_results["results"]["bindings"][0]["CreationDate"]["value"]
    if "?" not in date:
        year = date.split("-")[0]
        return year
    

# CONTINUE HERE

In [183]:
get_poem_creation_year("http://postdata.linhd.uned.es/resource/pw_gongora-luis-de_la-que-ya-fue-de-las-aves")

'1611'

In [63]:
#Function to convert poem uri to postdata poetry lab link
# this will be used in "sourceUrl" (which is somewhat wrong, but will do because it links back to poetry lab)

def work_uri_to_poetry_lab_url(poem_uri:str) -> str:
    """Convert the URI of a poem into a link to poetry lab platform
    """
    poetry_lab_base_url = "http://poetry.linhd.uned.es:3000" + "/en/"
    
    #In the Graph: http://postdata.linhd.uned.es/resource/pw_juana-ines-de-la-cruz_sabras-querido-fabio
    # On the platform: http://poetry.linhd.uned.es:3000/es/author/juana-ines-de-la-cruz/poetic-work/sabras-querido-fabio
    #Split on "_"
    
    author_part = poem_uri.split("_")[1]
    title_part = poem_uri.split("_")[2]
    
    poetry_lab_url = poetry_lab_base_url + "author/" + author_part + "/poetic-work/" + title_part
    
    return poetry_lab_url

In [64]:
work_uri_to_poetry_lab_url(poem_uri)

'http://poetry.linhd.uned.es:3000/en/author/juana-ines-de-la-cruz/poetic-work/sabras-querido-fabio'

In [65]:
#in DraCor we have id/uri and a name. Don't know if this is feasible
def work_uri_to_poem_name(poem_uri:str) -> str:
    """Convert the URI to a local name consisting of author + "_" + "title"
    """
    author_part = poem_uri.split("_")[1]
    title_part = poem_uri.split("_")[2]
    
    poem_name = author_part + "_" + title_part
    
    return poem_name

In [66]:
work_uri_to_poem_name(poem_uri)

'juana-ines-de-la-cruz_sabras-querido-fabio'

In [67]:
def poem_name_to_work_uri(poem_name:str) -> str:
    """Convert poem_name back to URI"""
    uri = "http://postdata.linhd.uned.es/resource/" + "pw_" + poem_name
    return uri

In [68]:
poem_name_to_work_uri("juana-ines-de-la-cruz_sabras-querido-fabio")

'http://postdata.linhd.uned.es/resource/pw_juana-ines-de-la-cruz_sabras-querido-fabio'

In [69]:
#have not tested this!
def author_title_parts_to_poem_uri(author_part:str,title_part:str) -> str:
    """Concat author part and title part to poem URI"""
    uri = "http://postdata.linhd.uned.es/resource/" + "pw_" + author_part + "_" + title_part
    return uri
    

In [70]:
author_title_parts_to_poem_uri("juana-ines-de-la-cruz", "sabras-querido-fabio")

'http://postdata.linhd.uned.es/resource/pw_juana-ines-de-la-cruz_sabras-querido-fabio'

In [71]:
import hashlib
def shorthash(textstring:str, chars:int=8):
    """Create a trunctated md5 hash"""
    #set the number of characters to trunctate
    hash = hashlib.sha1(textstring.encode("UTF-8")).hexdigest()
    shorthash = hash[:chars]
    return shorthash

In [72]:
shorthash(poem_uri)

'572fb37a'

In [73]:
def poem_uri_to_id(poem_uri:str,prefix:str="pd") -> str:
    """Generate an ID by hashing the uri"""
    ""
    return prefix + "_" + shorthash(poem_uri)

In [74]:
poem_uri_to_id(poem_uri)

'pd_572fb37a'

In [75]:
# Dates might now be in the live version, could try to get additional info on authors and works

# DATES ?

In [76]:
def get_poem_metadata(poem_uri:str, **kwargs) -> dict:
    """Get Metadata of a single poem.
    """
    
    poem_data = {}
    #id is a trunctated md5 hash of the poem uri
    poem_data["id"] = poem_uri_to_id(poem_uri)
    poem_data["uri"] = poem_uri
    
    #don't know if this works; only for POSTDATA but assumes that the URIs are always structured the same way
    poem_data["name"] = work_uri_to_poem_name(poem_uri)
    poem_data["title"] = get_poem_title(poem_uri)
    
    #can include wikida with include_wikidata=True; might not be failproof
    if kwargs and "include_wikidata" in kwargs:
        poem_data["authors"] = get_authors_of_poem(poem_uri, include_wikidata=kwargs["include_wikidata"])
    else:
        poem_data["authors"] = get_authors_of_poem(poem_uri)
        
    
    
    # this only works for postdata
    poem_data["source"] = "POSTDATA Poetry Lab"
    poem_data["sourceUrl"] = work_uri_to_poetry_lab_url(poem_uri)
    
    return poem_data

In [77]:
#test this
get_poem_metadata(poem_uri)

{'id': 'pd_572fb37a',
 'uri': 'http://postdata.linhd.uned.es/resource/pw_juana-ines-de-la-cruz_sabras-querido-fabio',
 'name': 'juana-ines-de-la-cruz_sabras-querido-fabio',
 'title': 'Sabrás, querido Fabio',
 'authors': [{'name': 'Juana Ines de La Cruz',
   'uri': 'http://postdata.linhd.uned.es/resource/p_juana-ines-de-la-cruz'}],
 'source': 'POSTDATA Poetry Lab',
 'sourceUrl': 'http://poetry.linhd.uned.es:3000/en/author/juana-ines-de-la-cruz/poetic-work/sabras-querido-fabio'}

In [78]:
%%time
#with wikidata
get_poem_metadata("http://postdata.linhd.uned.es/resource/pw_lope-de-vega_1-al-sujeto-de-la-dama-que-le-dijo-dios-le-provea",include_wikidata=True)

CPU times: user 5 ms, sys: 1.44 ms, total: 6.44 ms
Wall time: 270 ms


{'id': 'pd_0360be3e',
 'uri': 'http://postdata.linhd.uned.es/resource/pw_lope-de-vega_1-al-sujeto-de-la-dama-que-le-dijo-dios-le-provea',
 'name': 'lope-de-vega_1-al-sujeto-de-la-dama-que-le-dijo-dios-le-provea',
 'title': '- 1 - Al sujeto de la dama que le dijo «Dios le provea» ',
 'authors': [{'name': 'Lope de Vega',
   'uri': 'http://postdata.linhd.uned.es/resource/p_lope-de-vega',
   'refs': [{'ref': 'Q165257', 'type': 'wikidata'}]}],
 'source': 'POSTDATA Poetry Lab',
 'sourceUrl': 'http://poetry.linhd.uned.es:3000/en/author/lope-de-vega/poetic-work/1-al-sujeto-de-la-dama-que-le-dijo-dios-le-provea'}

### List of Works need by DraCor frontend

To create a view like https://dracor.org/ger we need to create an response like the endpoint https://dracor.org/doc/api#/public/list-corpus-content

```
{
  "description": "Edited by Daniil Skorinkin and Frank Fischer. Features a handful of plays in Tatar language, provided through Tatar Electronic Library.",
  "title": "Tatar Drama Corpus",
  "repository": "https://github.com/dracor-org/tatdracor",
  "name": "tat",
  "dramas": [
    {
      "writtenYear": "1908",
      "wikidataId": "Q25556355",
      "source": "Татарская электронная библиотека",
      "id": "tat000001",
      "title": "Беренче театр",
      "sourceUrl": "http://kitap.net.ru/galiaskar/2.php",
      "networkSize": "7",
      "name": "qamal-berenche-teatr",
      "yearNormalized": 1908,
      "printYear": null,
      "subtitle": "Комедия 1 пәрдәдә",
      "premiereYear": null,
      "authors": [
        {
          "name": "Камал, Галиәсгар",
          "fullname": "Галиәсгар Камал",
          "shortname": "Камал",
          "refs": [
            {
              "ref": "Q2497099",
              "type": "wikidata"
            }
          ],
          "fullnameEn": "Ğäliäsğar Kamal",
          "nameEn": "Kamal, Ğäliäsğar",
          "shortnameEn": "Kamal",
          "alsoKnownAs": [
            "Ğäliäsğar Kamal"
          ]
        }
      ],
      "networkdataCsvUrl": "https://dracor.org/api/corpora/tat/play/qamal-berenche-teatr/networkdata/csv",
      "author": {
        "name": "Камал, Галиәсгар"
      },
      "subtitleEn": "Comedy in 1 Act",
      "titleEn": "First Theatre"
    }
  ],
  "acronym": "TatDraCor"
}
```

In [79]:
def get_poem_uris(corpus=None) -> list:
    """Helper function to get a list of URIs of PoeticWorks
    """
    query = """
    PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>

    SELECT ?work WHERE {
        ?work a pdc:PoeticWork .
    }
    LIMIT 1000000
    """
    
    sparql_results = sparql(query)
    bindings = sparql_results["results"]["bindings"]
    
    poem_uris = []
    
    for binding in bindings:
        poem_uris.append(binding["work"]["value"])
    
    return poem_uris

In [80]:
#get_poem_uris() returns a very long list
len(get_poem_uris())

10071

In [81]:
def get_corpus_content(corpus=None, **kwargs) -> dict:
    """Returns metadata on the corpus.
        
        Similar to DraCor's https://dracor.org/doc/api#/public/list-corpus-content
        
        Returns:
            dict: data on the corpus listing all the poems
    """
    
    corpus_data = get_corpus_info(metrics=False)
    
    corpus_data["poems"] = []
    
    poem_uris = get_poem_uris()
    
    for poem_uri in poem_uris:
        poem_data = get_poem_metadata(poem_uri)
        corpus_data["poems"].append(poem_data)
    
    return corpus_data

In [82]:
#%%time
#this is a little bit slow, hmpf
#uncomment to show
#get_corpus_content()

In [83]:
#this works with the poetry lab live system, but it takes a very, very long time...

In [84]:
#create example data for postman mock server
example_data = get_corpus_content()
with open("corpus_content_example.json", "w", encoding='utf-8') as outfile:
    json.dump(example_data, outfile, ensure_ascii=False)

In [85]:
#example poem uri
poem_uri

'http://postdata.linhd.uned.es/resource/pw_juana-ines-de-la-cruz_sabras-querido-fabio'

### Text of a single poem
The text can be retrieved by looking at the Redaction, which has a propery `pdc:text`. There are newlines `\n`. Later maybe use the `Accept` header `text/plain` returning plaintext with `\n`; no stanzas; but also a version, that would display the poem according to the postdata json, stanzas as lists e.g.

```
{
        "author": "Abschatz, Hans A\u00dfmann von",
        "authorRef": "pnd:118500279",
        "publicationDate": "1970",
        "title": "Die fremde Regung",
        "text": [
            [
                "Im Mittel aller Lust/ die Gl\u00fcck und Zeit mir geben/",
                "Kan ich ohn Silvien nicht fr\u00f6lich leben;",
                "Und wenn ich bey ihr bin/ so spielet um mein Hertz",
                "Ein angenehmer Schmertz."
            ],
            [
                "Mein Sinn f\u00fchlt sich gereizt von unbekandtem Triebe/",
                "Ich such/ und treffe sie doch ohne Furcht nicht an.",
                "Wofern ein Mensch iemahls unwissend lieben kan/",
                "So glaub ich/ da\u00df ich liebe."
            ]
        ]
    }
```

In [86]:
#use poem with stanzas as example 
poem_uri = "http://postdata.linhd.uned.es/resource/pw_carlos-mendoza_noviembre"

In [87]:
def get_poem_plaintext(poem_uri:str) -> str:
    """Returns the text of a poem"""
    
    query = """
    PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>
    
    SELECT ?Text FROM <tag:stardog:api:context:local> WHERE {
        <$> pdc:isRealisedThrough ?Redaction .
        ?Redaction pdc:text ?Text .
    }
    """
    
    sparql_results = sparql(replace_placeholder(query,poem_uri))
    text = sparql_results["results"]["bindings"][0]["Text"]["value"]
    return text

In [88]:
get_poem_plaintext(poem_uri)

'Conmemórase ahora los difuntos,\nva la gente en tropel al Campo santo,\ny oye luego el Tenorio con encanto,\nteatro y devoción andando juntos.\n\nSon la ropa y la lumbre los asuntos\nque ocasionan el mayor quebranto,\nteniéndose los pobres, con espanto,\nsin fuego, por cadáveres presuntos.\n\nS abren de los ricos los salones,\nse cierran como pueden los desvanes,\ngoza, el que tiene, caras diversiones;\n\npadece, el que no tiene, mil afanes;\ny si hay quien se duerme entre edredones\nmuchos más se adormitan como Adanes.'

In [89]:
print(get_poem_plaintext(poem_uri))

Conmemórase ahora los difuntos,
va la gente en tropel al Campo santo,
y oye luego el Tenorio con encanto,
teatro y devoción andando juntos.

Son la ropa y la lumbre los asuntos
que ocasionan el mayor quebranto,
teniéndose los pobres, con espanto,
sin fuego, por cadáveres presuntos.

S abren de los ricos los salones,
se cierran como pueden los desvanes,
goza, el que tiene, caras diversiones;

padece, el que no tiene, mil afanes;
y si hay quien se duerme entre edredones
muchos más se adormitan como Adanes.


#### text as json

List in List

```
[
    [
    "Im Mittel aller Lust/ die Gl\u00fcck und Zeit mir geben/",
    "Kan ich ohn Silvien nicht fr\u00f6lich leben;",
    "Und wenn ich bey ihr bin/ so spielet um mein Hertz",
    "Ein angenehmer Schmertz."
    ],
    [
    "Mein Sinn f\u00fchlt sich gereizt von unbekandtem Triebe/",
    "Ich such/ und treffe sie doch ohne Furcht nicht an.",
    "Wofern ein Mensch iemahls unwissend lieben kan/",
    "So glaub ich/ da\u00df ich liebe."
    ]
]
```
Probably, such a structure could be created with SPARQL CONSTRUCT but don't know, how to exactly do that.
Meanwhile, I created a query that would use `pdp:stanzaNumber` and `pdp:relativeLineNumber` . We then have to create the structure in Python.

In [90]:
def get_poem_text_json(poem_uri:str) -> list:
    """Returns the text of a poem as list
        
        stanzas are lists themselves.
    """
    
    query = """
    PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>
    PREFIX pdp: <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#>

    SELECT ?StanzaNo ?LineNo ?LineContent FROM <tag:stardog:api:context:local> {
        <$> pdc:isRealisedThrough ?Redaction .
    
        ?Redaction pdp:wasInputFor ?ScansionProcess .
    
        ?ScansionProcess pdp:generated ?Scansion .
    
        ?Scansion pdp:typeOfScansion <http://postdata.linhd.uned.es/kos/automaticscansion>. #only automatic
        ?Scansion pdp:hasStanza ?Stanza .
    
        ?Stanza pdp:stanzaNumber ?StanzaNo ;
            pdp:hasLine ?Line .

        ?Line pdp:content ?LineContent ;
          pdp:relativeLineNumber ?LineNo .
                            
    }
    ORDER BY ?StanzaNo ?LineNo
    """
    
    sparql_results = sparql(replace_placeholder(query,poem_uri))
    
    #very spaghetti: i need something to put the stanzas together
    stanzas = {}
    
    for binding in sparql_results["results"]["bindings"]:
        stanza_no = binding["StanzaNo"]["value"]
        line_no = binding["LineNo"]["value"]
        line_content = binding["LineContent"]["value"]
        
        if stanza_no not in stanzas:
            stanzas[stanza_no] = {}
            stanzas[stanza_no]["lines"] = []
        #hopefully the sorting in the query works, otherwhise I have to sort here as well
        stanzas[stanza_no]["lines"].append(line_content)
    
    text = []
    
    keys = list(stanzas.keys())
    keys.sort()
    for key in keys:
        text.append(stanzas[key]["lines"])
    
    return text

In [91]:
get_poem_text_json(poem_uri)
#I don't know why we have the strange whitespace handling here; this is what comes from pdp:content from Line

[['Conmemórase ahora los difuntos ,',
  'va la gente en tropel al Campo santo ,',
  'y oye luego el Tenorio con encanto ,',
  'teatro y devoción andando juntos .'],
 ['Son la ropa y la lumbre los asuntos',
  'que ocasionan el mayor quebranto ,',
  'teniéndose los pobres , con espanto ,',
  'sin fuego , por cadáveres presuntos .'],
 ['S abren de los ricos los salones ,',
  'se cierran como pueden los desvanes ,',
  'goza , el que tiene , caras diversiones ;'],
 ['padece , el que no tiene , mil afanes ;',
  'y si hay quien se duerme entre edredones',
  'muchos más se adormitan como Adanes .']]

In [92]:
get_poem_text_json("http://postdata.linhd.uned.es/resource/pw_gongora-luis-de_de-pura-honestidad-templo-sagrado")

[['De pura honestidad templo sagrado ,',
  'cuyo bello cimiento y gentil muro ,',
  'de blanco nácar y alabastro duro',
  'fue por divina mano fabricado ;'],
 ['pequeña puerta de coral preciado ,',
  'claras lumbreras de mirar seguro ,',
  'que a la esmeralda fina el verde puro',
  'habéis para viriles usurpado ;'],
 ['soberbio techo , cuyas cimbrias de oro',
  'al claro sol , en cuanto en torno gira ,',
  'ornan de luz , coronan de belleza ;'],
 ['ídolo bello , a quien humilde adoro ,',
  'oye piadoso al que por ti suspira ,',
  'tus himnos canta , y tus virtudes reza .']]

In [93]:
#need a function to clean remove the whitespace problems.

### Single Poem View
Would be some kind of equivalent to `https://dracor.org/api/corpora/ger/play/lessing-emilia-galotti` --> `/corpora/{corpus}/poems/{poem-name}` but also depends on the metadata of a single poem:

In [94]:
get_poem_metadata(poem_uri)

{'id': 'pd_c64285d2',
 'uri': 'http://postdata.linhd.uned.es/resource/pw_carlos-mendoza_noviembre',
 'name': 'carlos-mendoza_noviembre',
 'title': 'Noviembre',
 'authors': [{'name': 'Carlos Mendoza',
   'uri': 'http://postdata.linhd.uned.es/resource/p_carlos-mendoza'}],
 'source': 'POSTDATA Poetry Lab',
 'sourceUrl': 'http://poetry.linhd.uned.es:3000/en/author/carlos-mendoza/poetic-work/noviembre'}

The authoritative source of an analysis of a poem would be the scansion. POSTDATA implemented a query that returns all information on the scansion. There is no need to replicate that; we would rather add some aggregated information, like number of stanzas and lines in stanzas.

The automatic scansion can be viewed in the poem-viewer: http://poetry.linhd.uned.es:3000/en/poem-viewer/A_carlos-mendoza_noviembre_1645475669320137

Could add a link to the scansions: `/corpora/{corpus}/poems/{poem}/scansions`
```
{
"scansions" : [
    {
    "id" : "{URI}" , 
    "type" : "automatic",
    "viewerUrl" : "http://poetry.linhd.uned.es:3000/en/poem-viewer/A_carlos-mendoza_noviembre_1645475669320137"
    }
]
}
```

In [95]:
def scansion_graph_to_viewer_url(graph_name:str) -> str:
    """Transform the name of a graph to a url to view the scansion in poetry lab"""
    viewer_base_url = "http://poetry.linhd.uned.es:3000/en/poem-viewer/"
    
    graph_name_part = graph_name.replace("http://postdata.linhd.uned.es/","")
    viewer_url = "http://poetry.linhd.uned.es:3000/en/poem-viewer/" + graph_name_part
        
    return viewer_url

In [96]:
def get_scansions_metadata(poem_uri:str) -> dict:
    """Returns basic metadata about the Scansions for a given poem identified by URI
    """
    
    query= """
    PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>
    PREFIX pdp: <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#>

    SELECT ?Scansion ?ScansionType ?graph FROM <tag:stardog:api:context:local> WHERE {
        <$> pdc:isRealisedThrough ?Redaction .
    
        ?Redaction pdp:wasInputFor ?ScansionProcess .
    
        ?ScansionProcess pdp:generated ?Scansion .
    
        ?Scansion pdp:typeOfScansion ?ScansionType.
        
        OPTIONAL {
            ?Scansion pdp:graphName ?graph .
        }
    }
    """
    
    sparql_results = sparql(replace_placeholder(query,poem_uri))
    
    scansions = []
    
    for binding in sparql_results["results"]["bindings"]:
        scansion = {}
        scansion["uri"] = binding["Scansion"]["value"]
        
        #this returns the KOS entry, maybe, because we are simplifying anyways, we can translate it to a more readable form
        if binding["ScansionType"]["value"] == "http://postdata.linhd.uned.es/kos/automaticscansion":
            scansion["type"] = "automatic"
        elif binding["ScansionType"]["value"] == "http://postdata.linhd.uned.es/kos/ManualAnnotation":
            scansion["type"] = "manual"
        else:
            #fallback
            scansion["type"] = binding["ScansionType"]["value"]
        
        #get the graph and the viewer url
        try:
            scansion["graphUri"] = binding["graph"]["value"]
            scansion["viewerUrl"] = scansion_graph_to_viewer_url(scansion["graphUri"])
        except:
            pass
        
        scansions.append(scansion)
    
    return scansions

In [97]:
get_scansions_metadata("http://postdata.linhd.uned.es/resource/pw_lope-de-vega_1-al-sujeto-de-la-dama-que-le-dijo-dios-le-provea")

[{'uri': 'http://postdata.linhd.uned.es/resource/sc_lope-de-vega_1-al-sujeto-de-la-dama-que-le-dijo-dios-le-provea_adso_16454747423186843',
  'type': 'automatic',
  'graphUri': 'http://postdata.linhd.uned.es/A_lope-de-vega_1-al-sujeto-de-la-dama-que-le-dijo-dios-le-provea_16454747423186843',
  'viewerUrl': 'http://poetry.linhd.uned.es:3000/en/poem-viewer/A_lope-de-vega_1-al-sujeto-de-la-dama-que-le-dijo-dios-le-provea_16454747423186843'},
 {'uri': 'http://postdata.linhd.uned.es/resource/sc_lope-de-vega_1-al-sujeto-de-la-dama-que-le-dijo-dios-le-provea_adso_1645474742181371',
  'type': 'manual',
  'graphUri': 'http://postdata.linhd.uned.es/M_lope-de-vega_1-al-sujeto-de-la-dama-que-le-dijo-dios-le-provea_1645474742181371',
  'viewerUrl': 'http://poetry.linhd.uned.es:3000/en/poem-viewer/M_lope-de-vega_1-al-sujeto-de-la-dama-que-le-dijo-dios-le-provea_1645474742181371'}]

In [98]:
def get_graph_of_scansion(scansion_uri:str) -> str:
    """Returns the URI of the named graph of a scansion"""
    query = """
    PREFIX pdp: <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#>
    PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>

    SELECT ?graph FROM <tag:stardog:api:context:local> WHERE {
      <$> pdp:graphName ?graph .
    }
    """
    sparql_results = sparql(replace_placeholder(query,scansion_uri))
    
    return sparql_results["results"]["bindings"][0]["graph"]["value"]
    

In [99]:
get_graph_of_scansion("http://postdata.linhd.uned.es/resource/sc_lope-de-vega_1-al-sujeto-de-la-dama-que-le-dijo-dios-le-provea_adso_16454747423186843")

'http://postdata.linhd.uned.es/A_lope-de-vega_1-al-sujeto-de-la-dama-que-le-dijo-dios-le-provea_16454747423186843'

In [100]:
def scansion_uri_to_scansion_viewer_url(scansion_uri:str) -> str:
    """Get a viewer url for a scansion"""
    
    viewer_base_url = "http://poetry.linhd.uned.es:3000/en/poem-viewer/"
    scansion_graph = get_graph_of_scansion(scansion_uri)
        
    return scansion_graph_to_viewer_url(scansion_graph)

In [101]:
#test
scansion_uri = "http://postdata.linhd.uned.es/resource/sc_lope-de-vega_1-al-sujeto-de-la-dama-que-le-dijo-dios-le-provea_adso_16454747423186843"

In [102]:
scansion_uri_to_scansion_viewer_url(scansion_uri)

'http://poetry.linhd.uned.es:3000/en/poem-viewer/A_lope-de-vega_1-al-sujeto-de-la-dama-que-le-dijo-dios-le-provea_16454747423186843'

In [103]:
get_scansions_metadata(poem_uri)

[{'uri': 'http://postdata.linhd.uned.es/resource/sc_carlos-mendoza_noviembre_disco2-1_16454756691771069',
  'type': 'manual',
  'graphUri': 'http://postdata.linhd.uned.es/M_carlos-mendoza_noviembre_16454756691771069',
  'viewerUrl': 'http://poetry.linhd.uned.es:3000/en/poem-viewer/M_carlos-mendoza_noviembre_16454756691771069'},
 {'uri': 'http://postdata.linhd.uned.es/resource/sc_carlos-mendoza_noviembre_disco2-1_1645475669320137',
  'type': 'automatic',
  'graphUri': 'http://postdata.linhd.uned.es/A_carlos-mendoza_noviembre_1645475669320137',
  'viewerUrl': 'http://poetry.linhd.uned.es:3000/en/poem-viewer/A_carlos-mendoza_noviembre_1645475669320137'}]

`{ "source" : {"uri" : "scansion-uri"} }`

In [104]:
def get_source_scansion_uri(poem_uri:str, scansion:str="auto") -> str:
    """Get metadata on the scansion that is the source of the analysis"""
    query = """
    PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>
    PREFIX pdp: <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#>

    SELECT ?Scansion FROM <tag:stardog:api:context:local> WHERE {
        <$> pdc:isRealisedThrough ?Redaction .
    
        ?Redaction pdp:wasInputFor ?ScansionProcess .
    
        ?ScansionProcess pdp:generated ?Scansion .
    
        ?Scansion pdp:typeOfScansion <http://postdata.linhd.uned.es/kos/automaticscansion> .
    }
    """
    # this is the scansion, that is normally used for retrieving analysis data
    
    sparql_results = sparql(replace_placeholder(query,poem_uri))
    
    return sparql_results["results"]["bindings"][0]["Scansion"]["value"]

In [106]:
%%time
get_source_scansion_uri(poem_uri)

CPU times: user 2.46 ms, sys: 1.52 ms, total: 3.98 ms
Wall time: 125 ms


'http://postdata.linhd.uned.es/resource/sc_carlos-mendoza_noviembre_disco2-1_1645475669320137'

#### Metrics on a poem
An endpoint, that would return some kind of "metrics": `/corpora/{corpus}/poems/{poem}/analysis` or include this in the basic metadata?

```
{
    ""
    "numOfStanzas" : 4 ,
    "numOfLines" : 14 ,
    "numOfLinesInStanzas" : [4,4,3,3] ,
    "StanzaRhymeSchema" : ["abba", "abba", "cdc", "cdc"]
    ...
}
```

In [108]:
def get_numOfStanzas(poem_uri:str, scansion:str="auto") -> int:
    """Returns the number of stanzas
    
    Args:
        scansion (str, optional): Return data based on automatic or manual ("manual") scansion. Defaults to "auto" (automatic).
    
    TODO: implement retrieving data based on "manual".
    """
    
    query = """
    PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>
    PREFIX pdp: <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#>

    SELECT (COUNT(?Stanza) as ?count) FROM <tag:stardog:api:context:local> WHERE {
        <$> pdc:isRealisedThrough ?Redaction .
    
        ?Redaction pdp:wasInputFor ?ScansionProcess .
    
        ?ScansionProcess pdp:generated ?Scansion .
    
        ?Scansion pdp:typeOfScansion <http://postdata.linhd.uned.es/kos/automaticscansion> ;
              pdp:hasStanza ?Stanza .
    }
    """
    
    sparql_results = sparql(replace_placeholder(query,poem_uri))
    return int(sparql_results["results"]["bindings"][0]["count"]["value"])
    

In [109]:
get_numOfStanzas(poem_uri)

4

In [110]:
def get_numOfLines(poem_uri:str, scansion:str="auto") -> int:
    """Returns the number of lines/verses
    
    Args:
        scansion (str, optional): Return data based on automatic or manual ("manual") scansion. Defaults to "auto" (automatic).
    
    TODO: implement retrieving data based on "manual".
    """
    
    query = """
    PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>
    PREFIX pdp: <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#>

    SELECT (COUNT(?Line) AS ?count) FROM <tag:stardog:api:context:local> WHERE {
        <$> pdc:isRealisedThrough ?Redaction .
    
        ?Redaction pdp:wasInputFor ?ScansionProcess .
    
        ?ScansionProcess pdp:generated ?Scansion .
    
        ?Scansion pdp:typeOfScansion <http://postdata.linhd.uned.es/kos/automaticscansion> ;
              pdp:hasStanza ?Stanza .
    
        ?Stanza pdp:hasLine ?Line .
    }
    """
    
    sparql_results = sparql(replace_placeholder(query,poem_uri))
    return int(sparql_results["results"]["bindings"][0]["count"]["value"])

In [111]:
get_numOfLines(poem_uri)

14

In [139]:
def get_rhymeSchemes(poem_uri:str, scansion:str="auto") -> list:
    """Returns the rhyme schemes of the stanzas of a poem
    """
    
    #this changed in the Knowledge Graph
    query = """
    PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>
    PREFIX pdp: <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#>

    SELECT ?StanzaNo ?rhymeScheme FROM <tag:stardog:api:context:local> WHERE {
        <$> pdc:isRealisedThrough ?Redaction .
    
        ?Redaction pdp:wasInputFor ?ScansionProcess .
    
        ?ScansionProcess pdp:generated ?Scansion .
    
        ?Scansion pdp:typeOfScansion <http://postdata.linhd.uned.es/kos/automaticscansion> ;
              pdp:hasStanza ?Stanza .
  
        ?Stanza pdp:stanzaNumber ?StanzaNo ;
            pdp:rhymeScheme ?rhymeScheme .
    }
    ORDER BY ?StanzaNo
    """
    
    sparql_results = sparql(replace_placeholder(query,poem_uri))
    
    #hope, that ordering works for 1, 11 ...
    rhyme_schemes = []
    for binding in sparql_results["results"]["bindings"]:
        rhyme_schemes.append(binding["rhymeScheme"]["value"])
    
    return rhyme_schemes
    

In [140]:
#this still doesn't work!
get_rhymeSchemes(poem_uri)

['abba', 'abba', 'a-a', 'a-a']

In [115]:
#"numOfLinesInStanzas" : [4,4,3,3] ,
def get_numOfLines_in_stanzas(poem_uri:str, scansion:str="auto") -> list:
    """Returns the number of lines of all stanzas"""
    
    query = """
    PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>
    PREFIX pdp: <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#>

    SELECT ?StanzaNumber (COUNT(?Line) AS ?count) FROM <tag:stardog:api:context:local> WHERE {
        <$> pdc:isRealisedThrough ?Redaction .
    
    ?Redaction pdp:wasInputFor ?ScansionProcess .
    
    ?ScansionProcess pdp:generated ?Scansion .
    
    ?Scansion pdp:typeOfScansion <http://postdata.linhd.uned.es/kos/automaticscansion> ;
              pdp:hasStanza ?Stanza .
    
    ?Stanza pdp:stanzaNumber ?StanzaNumber ;
            pdp:hasLine ?Line .

    
    }
    GROUP BY ?StanzaNumber
    ORDER BY ?StanzaNumber
    """
    
    sparql_results = sparql(replace_placeholder(query,poem_uri))
    
    line_counts = []
    for binding in sparql_results["results"]["bindings"]:
        line_counts.append(int(binding["count"]["value"]))
    
    return line_counts
    

In [116]:
get_numOfLines_in_stanzas(poem_uri)

[4, 4, 3, 3]

In [117]:
def get_numOfWords(poem_uri:str, scansion:str="auto") -> int:
    """Returns the number of words in a poem"""
    
    query = """
    PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>
    PREFIX pdp: <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#>

    SELECT (COUNT(?Word) AS ?count) FROM <tag:stardog:api:context:local> WHERE {
        <$> pdc:isRealisedThrough ?Redaction .
    
    ?Redaction pdp:wasInputFor ?ScansionProcess .
    
    ?ScansionProcess pdp:generated ?Scansion .
    
    ?Scansion pdp:typeOfScansion <http://postdata.linhd.uned.es/kos/automaticscansion> ;
              pdp:hasStanza ?Stanza .
    
    ?Stanza pdp:hasLine ?Line .
    
    ?Line pdp:hasWord ?Word .
    }
    """
    
    sparql_results = sparql(replace_placeholder(query,poem_uri))
    
    return int(sparql_results["results"]["bindings"][0]["count"]["value"])
    
    
    

In [118]:
get_numOfWords(poem_uri)

87

In [119]:
def get_numOfWords_in_stanzas(poem_uri:str, scansion:str="auto") -> int:
    """Returns the words per line groupt into stanzas
    """
    
    query = """
    PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>
    PREFIX pdp: <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#>

    SELECT (SAMPLE(?StanzaNumber) AS ?StanzaNo) (SAMPLE(?relativeLineNumber) AS ?LineNo) ?absoluteLineNumber (COUNT(?Word) AS ?count) FROM <tag:stardog:api:context:local> WHERE {
        <$> pdc:isRealisedThrough ?Redaction .
    
        ?Redaction pdp:wasInputFor ?ScansionProcess .
    
        ?ScansionProcess pdp:generated ?Scansion .
    
        ?Scansion pdp:typeOfScansion <http://postdata.linhd.uned.es/kos/automaticscansion> ;
              pdp:hasStanza ?Stanza .
    
        ?Stanza pdp:stanzaNumber ?StanzaNumber ;
            pdp:hasLine ?Line .
    
        ?Line pdp:relativeLineNumber ?relativeLineNumber ;
          pdp:absoluteLineNumber ?absoluteLineNumber ;
          pdp:hasWord ?Word .
    }
    GROUP BY ?absoluteLineNumber
    ORDER BY ?absoluteLineNumber
    """
    
    
    #replace the uri of the poem
    query = replace_placeholder(query,poem_uri)
    sparql_results = sparql(query)
    
    words = []
    stanza = []
    current_stanza = 0
    for binding in sparql_results["results"]["bindings"]:
        stanza_no = int(binding["StanzaNo"]["value"])
        #print("this:" + str(stanza_no))
        #print("current:" + str(current_stanza))
        
        if current_stanza == stanza_no:
            stanza.append(int(binding["count"]["value"]))
        else:
            #next stanza
            #print("next")
            words.append(stanza)
            #set this as current stanza
            current_stanza = stanza_no
            #reset the stanza
            stanza = []
            stanza.append(int(binding["count"]["value"]))
        
    #append the last stanza
    words.append(stanza)
         
    return words
    
    

In [120]:
get_numOfWords_in_stanzas(poem_uri)

[[4, 8, 7, 5], [8, 5, 5, 5], [7, 6, 6], [7, 8, 6]]

In [125]:
#overall count of syllables in poem
def get_numOfSyllables(poem_uri:str, syllable_type:str="metrical", scansion:str="auto") -> int:
    """Returns the overall number of syllables in the poem
    
    Args:
        syllable_type (str, optional): Type of Syllable to count (grammatical or metrical). Defaults to "metrical".
    """
    
    query = """
    PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>
    PREFIX pdp: <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#>

    SELECT (COUNT(?Syllable) AS ?count) FROM <tag:stardog:api:context:local>  WHERE {
        <$> pdc:isRealisedThrough ?Redaction .
    
        ?Redaction pdp:wasInputFor ?ScansionProcess .
    
        ?ScansionProcess pdp:generated ?Scansion .
    
        ?Scansion pdp:typeOfScansion <http://postdata.linhd.uned.es/kos/automaticscansion> ;
              pdp:hasStanza ?Stanza .
    
        ?Stanza pdp:stanzaNumber ?StanzaNumber ;
            pdp:hasLine ?Line .
    
        
    #could be: pdp:hasGrammaticalSyllable or pdp:hasMetricalSyllable
    ?Line pdp:§ ?Syllable .
    }
    """
    
    #need to replace the work-uri and a property "§"
    
    query = replace_placeholder(query,poem_uri)
    
    #have to set the type of syllable by replacing "§" in the query as well
    # can be "hasGrammaticalSyllable"  or "hasMetricalSyllable"
    if syllable_type == "metrical":
        query = query.replace("§", "hasMetricalSyllable")
    elif syllable_type == "grammatical":
        query = query.replace("§", "hasGrammaticalSyllable")
    else:
        raise Exception("Syllable Type is not valid.")
    
    sparql_results = sparql(query)
    
    return int(sparql_results["results"]["bindings"][0]["count"]["value"])
    

In [123]:
#default metrical
#get_numOfSyllables(poem_uri)
get_numOfSyllables(poem_uri, syllable_type="metrical")

154

In [124]:
#grammatical syllables
get_numOfSyllables(poem_uri, syllable_type="grammatical")

166

Syllables
```
[
[10,10,10,10] ,
[...]
]
```

In [126]:
def get_numOfSyllables_in_stanzas(poem_uri:str, syllable_type:str="metrical", scansion:str="auto") -> list:
    """Returns the number of syllables per line grouped by stanzas
    
    Args:
        syllable_type (str, optional): Type of Syllable to count (grammatical or metrical). Defaults to "metrical".
    """
    
    query = """
    PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>
    PREFIX pdp: <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#>

    SELECT (SAMPLE(?StanzaNumber) AS ?StanzaNo) (SAMPLE(?relativeLineNumber) AS ?relativeLineNo) ?absoluteLineNumber (COUNT(?Syllable) AS ?count) FROM <tag:stardog:api:context:local>  WHERE {
        <$> pdc:isRealisedThrough ?Redaction .
    
        ?Redaction pdp:wasInputFor ?ScansionProcess .
    
        ?ScansionProcess pdp:generated ?Scansion .
    
        ?Scansion pdp:typeOfScansion <http://postdata.linhd.uned.es/kos/automaticscansion> ;
              pdp:hasStanza ?Stanza .
    
        ?Stanza pdp:stanzaNumber ?StanzaNumber ;
            pdp:hasLine ?Line .
    
    
        ?Line pdp:relativeLineNumber ?relativeLineNumber ;
          pdp:absoluteLineNumber ?absoluteLineNumber ;
          pdp:§ ?Syllable .
    }
    GROUP BY ?absoluteLineNumber
    ORDER BY ?absoluteLineNumber
    """
    
    #replace the uri of the poem
    query = replace_placeholder(query,poem_uri)
    
    #have to set the type of syllable by replacing "§" in the query as well
    # can be "hasGrammaticalSyllable"  or "hasMetricalSyllable"
    if syllable_type == "metrical":
        query = query.replace("§", "hasMetricalSyllable")
    elif syllable_type == "grammatical":
        query = query.replace("§", "hasGrammaticalSyllable")
    else:
        raise Exception("Syllable Type is not valid.")
    
    sparql_results = sparql(query)
    
    syllables = []
    stanza = []
    current_stanza = 0
    for binding in sparql_results["results"]["bindings"]:
        stanza_no = int(binding["StanzaNo"]["value"])
        #print("this:" + str(stanza_no))
        #print("current:" + str(current_stanza))
        
        if current_stanza == stanza_no:
            stanza.append(int(binding["count"]["value"]))
        else:
            #next stanza
            #print("next")
            syllables.append(stanza)
            #set this as current stanza
            current_stanza = stanza_no
            #reset the stanza
            stanza = []
            stanza.append(int(binding["count"]["value"]))
        
    #append the last stanza
    syllables.append(stanza)
         
    return syllables

In [127]:
#use default metrical syllables
get_numOfSyllables_in_stanzas(poem_uri)

[[11, 11, 11, 11], [11, 11, 11, 11], [11, 11, 11], [11, 11, 11]]

In [128]:
#get the counts for grammatical syllables
get_numOfSyllables_in_stanzas(poem_uri, syllable_type="grammatical")

[[12, 12, 13, 12], [12, 11, 11, 11], [11, 11, 12], [12, 13, 13]]

In [143]:
#grammatical stress
def get_grammaticalStressPatterns_in_stanza(poem_uri:str,scansion:str="auto") -> list:
    """Get grammatical stress patterns of lines grouped into stanzas"""
    query = """
    PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>
    PREFIX pdp: <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#>

    SELECT ?StanzaNumber ?absoluteLineNumber ?grammaticalStressPattern FROM <tag:stardog:api:context:local> WHERE {
        <$> pdc:isRealisedThrough ?Redaction .
    
    ?Redaction pdp:wasInputFor ?ScansionProcess .
    
    ?ScansionProcess pdp:generated ?Scansion .
    
    ?Scansion pdp:typeOfScansion <http://postdata.linhd.uned.es/kos/automaticscansion> ;
              pdp:hasStanza ?Stanza .
    
    ?Stanza pdp:stanzaNumber ?StanzaNumber ;
            pdp:hasLine ?Line .  
  
    ?Line pdp:absoluteLineNumber ?absoluteLineNumber ;
         pdp:grammaticalStressPattern ?grammaticalStressPattern .
    }
    ORDER BY ?absoluteLineNumber
    """
    
    query = replace_placeholder(query,poem_uri)
    sparql_results = sparql(query)
    
    stress_patterns = []
    stanza = []
    current_stanza = 0
    for binding in sparql_results["results"]["bindings"]:
        stanza_no = int(binding["StanzaNumber"]["value"])
        #print("this:" + str(stanza_no))
        #print("current:" + str(current_stanza))
        
        if current_stanza == stanza_no:
            stanza.append(binding["grammaticalStressPattern"]["value"])
        else:
            #next stanza
            #print("next")
            stress_patterns.append(stanza)
            #set this as current stanza
            current_stanza = stanza_no
            #reset the stanza
            stanza = []
            stanza.append(binding["grammaticalStressPattern"]["value"])
        
    #append the last stanza
    stress_patterns.append(stanza)
         
    return stress_patterns
    
    

In [144]:
#this does not work
get_grammaticalStressPatterns_in_stanza(poem_uri)

[['--+---+---+-', '+-+---+-+-+-', '-+-+---+---+-', '-+----+-+-+-'],
 ['+-+---+---+-', '---+---+-+-', '-+---+---+-', '-+---+---+-'],
 ['++---+---+-', '-+---+---+-', '+---+-+---+-'],
 ['-+---++-+-+-', '--+--+-----+-', '+-+---+----+-']]

In [145]:
#meter
#grammatical stress
def get_metricalPatterns_in_stanza(poem_uri:str,scansion:str="auto") -> list:
    """Get metrical patterns of lines grouped into stanzas"""
    query = """
    PREFIX pdc: <http://postdata.linhd.uned.es/ontology/postdata-core#>
    PREFIX pdp: <http://postdata.linhd.uned.es/ontology/postdata-poeticAnalysis#>

    SELECT ?StanzaNumber ?absoluteLineNumber ?metricalPattern FROM <tag:stardog:api:context:local> WHERE {
        <http://postdata.linhd.uned.es/resource/pw_carlos-mendoza_noviembre> pdc:isRealisedThrough ?Redaction .
    
    ?Redaction pdp:wasInputFor ?ScansionProcess .
    
    ?ScansionProcess pdp:generated ?Scansion .
    
    ?Scansion pdp:typeOfScansion <http://postdata.linhd.uned.es/kos/automaticscansion> ;
              pdp:hasStanza ?Stanza .
    
    ?Stanza pdp:stanzaNumber ?StanzaNumber ;
            pdp:hasLine ?Line .  
  
      ?Line pdp:absoluteLineNumber ?absoluteLineNumber ;
         pdp:patterningMetricalScheme ?metricalPattern .
    }
    ORDER BY ?absoluteLineNumber
    """
    
    query = replace_placeholder(query,poem_uri)
    sparql_results = sparql(query)
    
    metrical_patterns = []
    stanza = []
    current_stanza = 0
    for binding in sparql_results["results"]["bindings"]:
        stanza_no = int(binding["StanzaNumber"]["value"])
        #print("this:" + str(stanza_no))
        #print("current:" + str(current_stanza))
        
        if current_stanza == stanza_no:
            stanza.append(binding["metricalPattern"]["value"])
        else:
            #next stanza
            #print("next")
            metrical_patterns.append(stanza)
            #set this as current stanza
            current_stanza = stanza_no
            #reset the stanza
            stanza = []
            stanza.append(binding["metricalPattern"]["value"])
        
    #append the last stanza
    metrical_patterns.append(stanza)
         
    return metrical_patterns

In [146]:
#this does not work
get_metricalPatterns_in_stanza(poem_uri)

[['--+--+---+-', '+-+--+-+-+-', '+-+--+---+-', '-+---+-+-+-'],
 ['+-+--+---+-', '---+---+-+-', '-+---+---+-', '-+---+---+-'],
 ['++---+---+-', '-+---+---+-', '+--+-+---+-'],
 ['-+--++-+-+-', '--+--+---+-', '+-+--+---+-']]

In [147]:
def get_poem_analysis(poem_uri:str, scansion:str="auto") -> dict:
    """Returns metrics/analysis of a poem based on a scansion"""
    
    analysis = {}
    
    #based on? should somehow relate to the scansion
    scansion_uri = get_source_scansion_uri(poem_uri,scansion)
    analysis["source"] = {"uri": scansion_uri}
    
    #Number of Stanzas
    analysis["numOfStanzas"] = get_numOfStanzas(poem_uri)
    
    #Overall number of Lines
    analysis["numOfLines"] = get_numOfLines(poem_uri)
    
    #Overall count of words
    analysis["numOfWords"] = get_numOfWords(poem_uri)
    
    # Number of Lines in Stanzas
    analysis["numOfLinesInStanzas"] = get_numOfLines_in_stanzas(poem_uri)
    
    #rhyme scheme
    analysis["rhymeSchemesOfStanzas"] = get_rhymeSchemes(poem_uri)
    
    #overall count of metrical syllables
    analysis["numOfMetricalSyllables"] = get_numOfSyllables(poem_uri, syllable_type="metrical")
    
    #overall count of grammatical syllables
    analysis["numOfGrammaticalSyllables"] = get_numOfSyllables(poem_uri, syllable_type="grammatical")
    
    #metrical syllables
    analysis["numOfMetricalSyllablesInStanzas"] = get_numOfSyllables_in_stanzas(poem_uri, syllable_type="metrical")
    
    #grammatical syllables in lines of stanzas
    analysis["numOfGrammaticalSyllablesInStanzas"] = get_numOfSyllables_in_stanzas(poem_uri, syllable_type="grammatical")
    
    #or maybe put them together into one dictionary:
    # "numOfSyllablesInStanzas" : {"metricalSyllables" : [[],[]] , "grammaticalSyllables" : "[[],[]]"  }
    
    #Words in stanzas
    analysis["numOfWordsInStanzas"] = get_numOfWords_in_stanzas(poem_uri)
    
    #grammatical stress
    analysis["grammaticalStressPatternsInStanzas"] = get_grammaticalStressPatterns_in_stanza(poem_uri)
    
    #meter
    analysis["metricalPatternsInStanzas"] = get_metricalPatterns_in_stanza(poem_uri)
    
    
    return analysis

Maybe should include some information here; at least the ID/Uri of the scansion, this analysis is based on.

In [148]:
%%time
get_poem_analysis(poem_uri)

CPU times: user 20.3 ms, sys: 2.56 ms, total: 22.8 ms
Wall time: 3.81 s


{'source': {'uri': 'http://postdata.linhd.uned.es/resource/sc_carlos-mendoza_noviembre_disco2-1_1645475669320137'},
 'numOfStanzas': 4,
 'numOfLines': 14,
 'numOfWords': 87,
 'numOfLinesInStanzas': [4, 4, 3, 3],
 'rhymeSchemesOfStanzas': ['abba', 'abba', 'a-a', 'a-a'],
 'numOfMetricalSyllables': 154,
 'numOfGrammaticalSyllables': 166,
 'numOfMetricalSyllablesInStanzas': [[11, 11, 11, 11],
  [11, 11, 11, 11],
  [11, 11, 11],
  [11, 11, 11]],
 'numOfGrammaticalSyllablesInStanzas': [[12, 12, 13, 12],
  [12, 11, 11, 11],
  [11, 11, 12],
  [12, 13, 13]],
 'numOfWordsInStanzas': [[4, 8, 7, 5], [8, 5, 5, 5], [7, 6, 6], [7, 8, 6]],
 'grammaticalStressPatternsInStanzas': [['--+---+---+-',
   '+-+---+-+-+-',
   '-+-+---+---+-',
   '-+----+-+-+-'],
  ['+-+---+---+-', '---+---+-+-', '-+---+---+-', '-+---+---+-'],
  ['++---+---+-', '-+---+---+-', '+---+-+---+-'],
  ['-+---++-+-+-', '--+--+-----+-', '+-+---+----+-']],
 'metricalPatternsInStanzas': [['--+--+---+-',
   '+-+--+-+-+-',
   '+-+--+---+-',

### Sample Data to test with GraphQL

In [135]:
def get_poem_with_analysis(poem_uri:str, **kwargs) -> dict:
    """Returns data on poem with analysis included
    """
    poem_data = get_poem_metadata(poem_uri, **kwargs)
    poem_data["analysis"] = get_poem_analysis(poem_uri)
    return poem_data

In [136]:
%%time
get_poem_with_analysis(poem_uri)

CPU times: user 21.3 ms, sys: 2.92 ms, total: 24.2 ms
Wall time: 3.79 s


{'id': 'pd_c64285d2',
 'uri': 'http://postdata.linhd.uned.es/resource/pw_carlos-mendoza_noviembre',
 'name': 'carlos-mendoza_noviembre',
 'title': 'Noviembre',
 'authors': [{'name': 'Carlos Mendoza',
   'uri': 'http://postdata.linhd.uned.es/resource/p_carlos-mendoza'}],
 'source': 'POSTDATA Poetry Lab',
 'sourceUrl': 'http://poetry.linhd.uned.es:3000/en/author/carlos-mendoza/poetic-work/noviembre',
 'analysis': {'source': {'uri': 'http://postdata.linhd.uned.es/resource/sc_carlos-mendoza_noviembre_disco2-1_1645475669320137'},
  'numOfStanzas': 4,
  'numOfLines': 14,
  'numOfWords': 87,
  'numOfLinesInStanzas': [4, 4, 3, 3],
  'rhymeSchemesOfStanzas': [],
  'numOfMetricalSyllables': 154,
  'numOfGrammaticalSyllables': 166,
  'numOfMetricalSyllablesInStanzas': [[11, 11, 11, 11],
   [11, 11, 11, 11],
   [11, 11, 11],
   [11, 11, 11]],
  'numOfGrammaticalSyllablesInStanzas': [[12, 12, 13, 12],
   [12, 11, 11, 11],
   [11, 11, 12],
   [12, 13, 13]],
  'numOfWordsInStanzas': [[4, 8, 7, 5], [8

In [None]:
%%time
#with wikidata
get_poem_with_analysis("http://postdata.linhd.uned.es/resource/pw_lope-de-vega_1-al-sujeto-de-la-dama-que-le-dijo-dios-le-provea",include_wikidata=True)

In [None]:
%%time
poem_ids = get_poem_uris()

In [None]:
%%time
errors = []
poems_with_analysis = []
for poem_uri in poem_ids:
    try:
        poem_data = get_poem_with_analysis(poem_uri)
        poems_with_analysis.append(poem_data)
    except:
        errors.append(poem_uri)

In [None]:
with open("poems_analysis_example.json", "w", encoding='utf-8') as outfile:
    json.dump(poems_with_analysis, outfile, ensure_ascii=False)

In [None]:
len(errors)

#### Testing to filter these data

In [None]:
%%time
#filter for poems with 4 Stanzas
filtered = list(filter(lambda item: item["analysis"]["numOfStanzas"] == 4 , poems_with_analysis))

In [None]:
len(filtered)

In [None]:
filtered[0]

In [None]:
%%time

#filter for sonnet stanza structure
filtered = list(filter(lambda item: item["analysis"]["numOfLinesInStanzas"] == [4,4,3,3] , poems_with_analysis))

In [None]:
len(filtered)

In [None]:
filtered[0]

In [None]:
%%time
#Petrarca
filtered = list(filter(lambda item: item["analysis"]["rhymeSchemesOfStanzas"] == ['abba', 'abba', 'cdc', 'dcd'] , poems_with_analysis))

In [None]:
len(filtered)

In [None]:
filtered = list(filter(lambda item: item["analysis"]["rhymeSchemesOfStanzas"] == ['abba', 'abba', '---', '---'] , poems_with_analysis))

In [None]:
len(filtered)

In [None]:
#4 hebiger Jambus irgendwo in der ersten strophe
filtered = list(filter(lambda item: "-+-+-+-+-" in item["analysis"]["metricalPatternsInStanzas"][0] , poems_with_analysis))

In [None]:
len(filtered)

In [None]:
filtered