## Debates to RDF

This notebook is draft code updating the Akoma Ntoso (AKN) debates XML to reflect changes made to the ontology and then extracting metadata to RDF. 

For more information on the ontology, read the [Debates section of the wiki](https://github.com/Oireachtas/ontology/wiki/Debates)

I am making the changes to [the AKN file](../debates/AK-dail-2015-11-12.xml)  in the debates folder, and saving the changes as a new file.

In [31]:
import re
import json
from lxml import etree
from rdflib import URIRef, Literal, Namespace, Graph
from rdflib.namespace import RDF, OWL, SKOS, DCTERMS, XSD, RDFS, FOAF

In [3]:
AKN = {"akn": "http://docs.oasis-open.org/legaldocml/ns/akn/3.0/CSD13"}
xml = "../debates/AK-dail-2015-11-12.xml"
baseURI = "http://oireachtas.ie"

### Metadata elements


As the file contains data for the oral part of the debate (writtens are stored separately), the FRBRWork attributes need to be updated. See the [Metadata](https://github.com/Oireachtas/ontology/wiki/Debates#metadata) section of the wiki for the specification.

Note that under the Akoma Ntoso naming convention, the "@" character in the FRBRExpression URI denote them as the original expression of the work. This is not strictly true in the case of revised Official Reports but we have no way of telling the difference at the moment, so the original expression of a debate is whatever this file turns out to be.

The original FRBRExpression URIs have a language value of ``eng``, however, this should be ``mul`` because it is not (easily) possible to determine whether a debate is in English or Irish.

In [48]:
regex = re.compile("\d{4}-\d{2}-\d{2}(?!/debate|/writtens)")

root = etree.parse(xml).getroot()

work = root.find(".//{*}FRBRWork")
etree.SubElement(work, "FRBRname", {"value": "debate"})


name = root.find(".//{*}FRBRWork/{*}FRBRname").attrib['value']
for uri in root.xpath(".//akn:identification/*//*[starts-with(@value, '/akn')]", namespaces=AKN):
    print(re.sub("{.+}", "", uri.getparent().tag)+ "/" + re.sub("{.+}", "", uri.tag) )
    value = uri.attrib['value'].replace("eng@", "mul@")
    print("Original:", value)
    span = regex.search(value).span()
    uri.attrib['value'] = value[:span[1]] + "/" + name + value[span[1]:]
    print("New:", uri.attrib['value'], "\n---")

FRBRWork/FRBRthis
Original: /akn/ie/debateRecord/dail/2015-11-12/main
New: /akn/ie/debateRecord/dail/2015-11-12/debate/main 
---
FRBRWork/FRBRuri
Original: /akn/ie/debateRecord/dail/2015-11-12
New: /akn/ie/debateRecord/dail/2015-11-12/debate 
---
FRBRExpression/FRBRthis
Original: /akn/ie/debateRecord/dail/2015-11-12/mul@/main
New: /akn/ie/debateRecord/dail/2015-11-12/debate/mul@/main 
---
FRBRExpression/FRBRuri
Original: /akn/ie/debateRecord/dail/2015-11-12/mul@
New: /akn/ie/debateRecord/dail/2015-11-12/debate/mul@ 
---
FRBRManifestation/FRBRthis
Original: /akn/ie/debateRecord/dail/2015-11-12/mul@/main.xml
New: /akn/ie/debateRecord/dail/2015-11-12/debate/mul@/main.xml 
---
FRBRManifestation/FRBRuri
Original: /akn/ie/debateRecord/dail/2015-11-12/mul@.akn
New: /akn/ie/debateRecord/dail/2015-11-12/debate/mul@.akn 
---


Add a heading to the Prelude debateSection - this heading is displayed on the web but is not in the original XML.

In [49]:
heading = etree.SubElement(root.xpath(".//akn:debateSection[@name='prelude']", namespaces=AKN)[0], "heading")
heading.text = "Prelude"

### TLCPerson references

TLCPerson references are to the OIR:Member URI in the original AKN. However, using the specific org:Membership (of a Dáil or Seanad) would make it easier to link to other information needed in the website, like constituency and party, which otherwise would require a more expensive query within date ranges. The information associated with OIR:Member is only one step away. However, there may be a cost to this when it comes to searching for speeches by a Member over multiple houses. For that reason, it would be worthwhile testing this over a larger set of debate files.

In [50]:
for person in root.xpath(".//akn:meta/akn:references/akn:TLCPerson", namespaces=AKN):
    person.attrib['href'] = person.attrib['href'] + "/dail/31"

### Unmatched Members
Will need to audit unmatched member URIs. Thought I had fixed them already.


In [43]:
with open("../data/members.json", "r") as f:
    memberLU = {m['pId']: m['eId'] for m in json.load(f)}

In [51]:
# Michael Kitt the latter has a pId of MichaelPKitt
unm = root.xpath(".//akn:TLCPerson[contains(@href, 'unmatched')]", namespaces=AKN)[0]
unm.attrib['href'] = unm.attrib['href'].replace("unmatchedMember", memberLU['MichaelPKitt'])
unm.attrib['showAs'] = "Mr. Michael P. Kitt"
print(unm.attrib)

{'eId': 'MichaelKitt', 'href': '/ie/oireachtas/member//member/Michael-P-Kitt.D.1975-03-04/dail/31', 'showAs': 'Mr. Michael P. Kitt'}


### Converting to RDF

When converting to RDF, FRBR elements map to their RDA equivalents. I'm mapping only the FRBRuri elements for now. 

ToDo: Extend ontology to cover both contributors as those listed as TLCPerson as well as speakers as those identified in speech nodes.

In [85]:
g = Graph()

In [86]:
OIR = Namespace("http://oireachtas.ie/ontology#")
RDA = Namespace("http://www.rdaregistry.info/Elements/c/#")
METALEX = Namespace("http://www.metalex.eu/metalex/2008-05-02#")

In [87]:
workURI = baseURI+root.xpath(".//akn:FRBRWork/akn:FRBRuri/@value", 
                                  namespaces=AKN)[0]
work = URIRef(workURI)

In [88]:
# C10001 is RDA:Work
g.add(( work, 
       RDF.type, 
       RDA.C10001))
# C10006 is RDA:Expression
g.add(( URIRef(baseURI+root.xpath(".//akn:FRBRExpression/akn:FRBRuri/@value", 
                                  namespaces=AKN)[0]), 
       RDF.type, 
       RDA.C10006))
# C10007 is RDA:Manifestation
g.add(( URIRef(baseURI+root.xpath(".//akn:FRBRManifestation/akn:FRBRuri/@value", 
                                  namespaces=AKN)[0]), 
       RDF.type, 
       RDA.C10007))
g.add(( work, 
       RDF.type, 
       OIR.Debate))

### Member roles in debates

ToDo: these should be object properties, not classes.

Members are participants, OIR:debateParticipantOf in debate if they are recorded as speaking, voting or (in the case of a committee) attending. Members who are participants will also belong to one of the three categories of OIR:speakerOf, OIR:voterOf or OIR:attendeeOf. A voter will be one of OIR:TáVoter, OIR:NílVoter or OIR:StaonVoter



In [98]:
for member in root.xpath(".//akn:TLCPerson/@href", namespaces=AKN):
    memberURI = URIRef(baseURI+member)
    g.add(( work, METALEX.participant, memberURI ))
    g.add(( memberURI, METALEX.participantOf, work ))

In [96]:
for dbs in root.xpath(".//akn:debateSection[./akn:speech]", namespaces=AKN):
    
    spkrs = set(dbs.xpath(".//akn:speech/@by", namespaces=AKN))
    for spkr in spkrs: 
        href = root.xpath(".//akn:TLCPerson[@eId='{}']/@href".format(spkr[1:]), namespaces=AKN)
        if len(href) != 1:
            pass
        else:
            dbsURI = URIRef(workURI+"/"+dbs.attrib['eId'])
            spkrURI = URIRef(baseURI+href[0])
            g.add(( spkrURI, OIR.speakerOf, dbsURI))
            g.add(( dbsURI, OIR.speaker, spkrURI ))
            

In [108]:
for dbs in root.xpath(".//akn:debateSection[@name='ta']", namespaces=AKN):
    voteURI = URIRef(workURI+"/"+dbs.getparent().attrib['eId'])
    voters = [p[1:] for p in dbs.xpath(".//akn:person/@refersTo", namespaces=AKN)]
    for voter in voters:
        href = root.xpath(".//akn:TLCPerson[@eId='{}']/@href".format(voter), namespaces=AKN)
        voterURI = URIRef(baseURI+href[0])
        g.add(( voterURI, OIR.voterOf, voteURI))
        g.add(( voteURI, OIR.voter, voterURI ))