# DataDictionary_RDF_development

This Notebook steps through the development of a method to convert a UKDS DataDictionary .rtf file to a RDF file.

## Initial setup

### Import packages

In [1]:
import os, ukds
import pandas as pd

### Set filepaths

This sets a filepath to an example data dictionary on a local file system, in this case the 'uktus15_household_ukda_data_dictionary.rtf' file.

In [2]:
base_dir=os.path.join(*[os.pardir]*4,r'_Data\United_Kingdom_Time_Use_Survey_2014-2015\UKDA-8128-tab')
dd_fp=os.path.join(base_dir,r'mrdoc\allissue\uktus15_household_ukda_data_dictionary.rtf')

### Create DataDictionary

A ukds.DataDictionary instance is created and the .rtf file is read into it.

In [3]:
dd=ukds.DataDictionary()
dd.read_rtf(dd_fp)

### Number of variables

In [4]:
print('Number of variables in the file is:', len(dd.variable_list))

Number of variables in the file is: 335


### First two variables and their metadata

Shows the first two variables and their associated data as given in the `varible_list` attribute.

In [5]:
dd.variable_list[0:2]

[{'pos': '1',
  'variable': 'serial',
  'variable_label': 'Household number',
  'variable_type': 'numeric',
  'SPSS_measurement_level': 'SCALE',
  'SPSS_user_missing_values': '',
  'value_labels': ''},
 {'pos': '2',
  'variable': 'strata',
  'variable_label': 'Strata',
  'variable_type': 'numeric',
  'SPSS_measurement_level': 'SCALE',
  'SPSS_user_missing_values': '',
  'value_labels': {-2.0: 'Schedule not applicable'}}]

### First five variables and their metadata

Shows the first five variables and their associated data as pandas DataFrame.

In [6]:
df=pd.DataFrame(data=dd.variable_list)
df=df[dd.variable_list[0].keys()]
df.head()

Unnamed: 0,pos,variable,variable_label,variable_type,SPSS_measurement_level,SPSS_user_missing_values,value_labels
0,1,serial,Household number,numeric,SCALE,,
1,2,strata,Strata,numeric,SCALE,,{-2.0: 'Schedule not applicable'}
2,3,psu,Primary sampling unit,numeric,SCALE,,{-2.0: 'Schedule not applicable'}
3,4,HhOut,Final outcome - household,numeric,SCALE,,"{0.0: 'Outstanding', 640.0: 'Unknown whether a..."
4,5,hh_wt,Household weight,numeric,SCALE,,


## Discussion

### Aim

The aim here is to take the DataDictionary instance and convert it into RDF data. This will make it easier to query and to combine with the other UKDS data tables and data dictionaries.

The RDFlib Python package is used for working with RDF data.

The aim can be refined to: **create a method for the DataDictionary class which has an argument of a RDFlib Graph instance and returns the same Graph instance populated with the DataDictionary variable_list data.**

### Sample call

Sample code could look like:

```python
import rdflib
g=rdflib.Graph()
g=dd.to_rdf(g) # dd is a DataDictionary instance
```

### Format of RDF file

The sample code above would output rdf data in the form of a RDFlib Graph instance. Assuming the intial graph was empty, what would the returned RDF data look like?

The proposal is the RDF file would look as below. This shows the data in turtle (.ttl) format for the first variable *serial*:

```turtle
@prefix o8128: <http://purl.org/berg/ontology/10.5255/UKDA-SN-8128-1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ukds: <http://purl.org/berg/ontology/UKDS/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

o8128:serial a rdf:Property ;
    ukds:SPSS_measurement_level "SCALE" ;
    ukds:pos 1 ;
    ukds:variable "serial" ;
    ukds:variable_label "Household number" ;
    ukds:variable_type "numeric" .
```

Here each variable is designated as a rdf:Property (shown above as `o8128:serial a rdf:Property`). These properties will then be used as predicates when describing the ukds data tables (for example we might say that the first household in a .tab file has a *serial* value of "11010903" using the triple `_:household1 o8128:serial "11010903"`).

### Namespaces

#### `o8128: <http://purl.org/berg/ontology/10.5255/UKDA-SN-8128-1/>`

To create RDF data about the *serial* variable, the variable will need to be a RDF resource with a uri assigned to it. As the UK Data Service does not currently provide URIs for its variables, a custom one is needed. The domain should be under the control of the uri creators, so the base uri `http://purl.org/berg` is used. As this is a schema or ontology description (of the underlying .tab file data), it is placed in the `ontology` subdomain. Finally an identifier for the data dictionary is used, based on the DOI of the UKDS study `10.5255/UKDA-SN-8128-1`. The prefix used is "o8128" where "o" refers to "ontology".

#### `ukds: <http://purl.org/berg/ontology/UKDS/>`

All UKDS Data Dictionary variables have a number of properties in common, such as their position (*pos*) or their variable label (*variable_label*). These properties are defined in a separate ontology `http://purl.org/berg/ontology/UKDS/` which can be referred to in the RDF files of all UKDS data dictionaries.

## Developing the method

### Imports

In [7]:
import rdflib
from rdflib.namespace import RDF

### Set up input variables

In [8]:
g=rdflib.Graph() # an empty graph
dd_prefix='o8128' # the prefix for the Data Dictionary uri
dd_uri=r'http://purl.org/berg/ontology/10.5255/UKDA-SN-8128-1/' # the Data Dictionary uri

### Set up namespaces and bind them to the graph

In [9]:
dd_namespace=rdflib.Namespace(dd_uri)
ukds_namespace=rdflib.Namespace(r'http://purl.org/berg/ontology/UKDS/')
g.bind(dd_prefix,dd_namespace), g.bind('ukds',ukds_namespace)

(None, None)

### Function to add variable_list data to graph

In [10]:
def add_variable_list_data(pos,
                           variable,
                           variable_label,
                           variable_type,
                           SPSS_measurement_level,
                           SPSS_user_missing_values,
                           value_labels):
    """Adds the data from a variable_list variable to the RDFlib graph
    
    """
    
    g.add((dd_namespace[variable],RDF.type,RDF.Property))
    g.add((dd_namespace[variable],ukds_namespace.pos,rdflib.Literal(int(pos))))
    g.add((dd_namespace[variable],ukds_namespace.variable,rdflib.Literal(variable)))
    g.add((dd_namespace[variable],ukds_namespace.variable_label,rdflib.Literal(variable_label)))
    g.add((dd_namespace[variable],ukds_namespace.variable_type,rdflib.Literal(variable_type)))
    g.add((dd_namespace[variable],ukds_namespace.SPSS_measurement_level,rdflib.Literal(SPSS_measurement_level)))
    
    if SPSS_user_missing_values:
        for x in SPSS_user_missing_values.split(','):
            g.add((dd_namespace[variable],ukds_namespace.SPSS_user_missing_values,rdflib.Literal(x)))
    
    if value_labels:
        for k,v in value_labels.items():
            a=rdflib.BNode()
            g.add((dd_namespace[variable],ukds_namespace.value_labels,a))
            g.add((a,ukds_namespace.label,rdflib.Literal(v)))
            g.add((a,ukds_namespace.value,rdflib.Literal(str(k))))

#### Test on the *serial* variable

In [11]:
g.update("DELETE WHERE { ?s ?p ?o }")
add_variable_list_data(**dd.get_variable_dict("serial"))
print(g.serialize(format='ttl').decode())

@prefix o8128: <http://purl.org/berg/ontology/10.5255/UKDA-SN-8128-1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix ukds: <http://purl.org/berg/ontology/UKDS/> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

o8128:serial a rdf:Property ;
    ukds:SPSS_measurement_level "SCALE" ;
    ukds:pos 1 ;
    ukds:variable "serial" ;
    ukds:variable_label "Household number" ;
    ukds:variable_type "numeric" .




#### Test on the *strata* variable

In [12]:
g.update("DELETE WHERE { ?s ?p ?o }")
add_variable_list_data(**dd.get_variable_dict("strata"))
print(g.serialize(format='ttl').decode())

@prefix o8128: <http://purl.org/berg/ontology/10.5255/UKDA-SN-8128-1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix ukds: <http://purl.org/berg/ontology/UKDS/> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

o8128:strata a rdf:Property ;
    ukds:SPSS_measurement_level "SCALE" ;
    ukds:pos 2 ;
    ukds:value_labels [ ukds:label "Schedule not applicable" ;
            ukds:value "-2.0" ] ;
    ukds:variable "strata" ;
    ukds:variable_label "Strata" ;
    ukds:variable_type "numeric" .




#### Test on the *IYear* variable

In [13]:
g.update("DELETE WHERE { ?s ?p ?o }")
add_variable_list_data(**dd.get_variable_dict("IYear"))
print(g.serialize(format='ttl').decode())

@prefix o8128: <http://purl.org/berg/ontology/10.5255/UKDA-SN-8128-1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix ukds: <http://purl.org/berg/ontology/UKDS/> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

o8128:IYear a rdf:Property ;
    ukds:SPSS_measurement_level "NOMINAL" ;
    ukds:pos 7 ;
    ukds:value_labels [ ukds:label "Don't know" ;
            ukds:value "-8.0" ],
        [ ukds:label "Item not applicable" ;
            ukds:value "-1.0" ],
        [ ukds:label "No answer/refused" ;
            ukds:value "-9.0" ],
        [ ukds:label "Schedule not applicable" ;
            ukds:value "-2.0" ],
        [ ukds:label "Interview not achieved" ;
            ukds:value "-7.0" ] ;
    ukds:variable "IYear" ;
    ukds:variable_label "Interview Year" ;
    ukds:variable_type "numeric" .




## Putting it all together

### Final method for the DataDictionary class

In [14]:
import rdflib
from rdflib.namespace import RDF

def to_rdf(self,graph,prefix,uri):
    """Places the DataDictionary data in an rdflib Graph.
    
    Arguments:
        - graph (rdflib.Graph): a graph to place the data in
        - prefix (str): a prefix for the Data Dictionary ontology
        - uri (str): a uri for the Data Dictionary ontology
    
    Returns:
        - (rdflib.Graph): the input graph with the DataDictionary data inserted into it.
        
    """
    
    def add_variable_list_data(pos,
                               variable,
                               variable_label,
                               variable_type,
                               SPSS_measurement_level,
                               SPSS_user_missing_values,
                               value_labels):
        "Adds the data from a variable_list variable to the RDFlib graph"

        graph.add((dd_namespace[variable],RDF.type,RDF.Property))
        graph.add((dd_namespace[variable],ukds_namespace.pos,rdflib.Literal(int(pos))))
        graph.add((dd_namespace[variable],ukds_namespace.variable,rdflib.Literal(variable)))
        graph.add((dd_namespace[variable],ukds_namespace.variable_label,rdflib.Literal(variable_label)))
        graph.add((dd_namespace[variable],ukds_namespace.variable_type,rdflib.Literal(variable_type)))
        graph.add((dd_namespace[variable],ukds_namespace.SPSS_measurement_level,rdflib.Literal(SPSS_measurement_level)))

        if SPSS_user_missing_values:
            for x in SPSS_user_missing_values.split(','):
                graph.add((dd_namespace[variable],ukds_namespace.SPSS_user_missing_values,rdflib.Literal(x)))

        if value_labels:
            for k,v in value_labels.items():
                a=rdflib.BNode()
                graph.add((dd_namespace[variable],ukds_namespace.value_labels,a))
                graph.add((a,ukds_namespace.label,rdflib.Literal(v)))
                graph.add((a,ukds_namespace.value,rdflib.Literal(str(k))))
    
    
    dd_namespace=rdflib.Namespace(uri)
    graph.bind(prefix,dd_namespace)
    ukds_namespace=rdflib.Namespace(r'http://purl.org/berg/ontology/UKDS/')
    graph.bind('ukds',ukds_namespace)
    
    for x in self.variable_list:
        add_variable_list_data(**x)
    
    return graph

In [15]:
kls=ukds.DataDictionary
kls.to_rdf=to_rdf
dd=kls()
dd.read_rtf(dd_fp)
g=rdflib.Graph()
g=dd.to_rdf(graph=g,prefix='o8128',uri='http://purl.org/berg/ontology/10.5255/UKDA-SN-8128-1/')

In [16]:
print(g.serialize(format='ttl').decode())

@prefix o8128: <http://purl.org/berg/ontology/10.5255/UKDA-SN-8128-1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix ukds: <http://purl.org/berg/ontology/UKDS/> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

o8128:Accom a rdf:Property ;
    ukds:SPSS_measurement_level "NOMINAL" ;
    ukds:pos 23 ;
    ukds:value_labels [ ukds:label "No answer/refused" ;
            ukds:value "-9.0" ],
        [ ukds:label "Item not applicable" ;
            ukds:value "-1.0" ],
        [ ukds:label "Schedule not applicable" ;
            ukds:value "-2.0" ],
        [ ukds:label "Other" ;
            ukds:value "4.0" ],
        [ ukds:label "Interview not achieved" ;
            ukds:value "-7.0" ],
        [ ukds:label "House or bungalow" ;
            ukds:value "1.0" ],
        [ ukds:label "Room or rooms" ;
            ukds:value "3.0" ],
        [ ukds:labe

In [17]:
g.serialize('uktus15_household_ukda_data_dictionary.ttl',format='ttl')

## Some queries

In [18]:
def run_query(query):
    df=json_normalize(json.loads(g.query(query).serialize(format='json'))['results']['bindings'])
    df=df[[x for x in df.columns if x.endswith('value')]]
    for c in df.columns:
        df[c]=df[c].str.replace(r'http://purl.org/berg/ontology/10.5255/UKDA-SN-8128-1/','o8128:')
        df[c]=df[c].str.replace(r'http://purl.org/berg/ontology/UKDS/','ukds:')
    return df

### Variables by 'pos' order

In [20]:
from pandas.io.json import json_normalize
import json
query="""
PREFIX o8128: <http://purl.org/berg/ontology/10.5255/UKDA-SN-8128-1/>
PREFIX ukds: <http://purl.org/berg/ontology/UKDS/>

SELECT ?pos ?var ?variable_label
WHERE
    {
        ?var ukds:pos ?pos ;
             ukds:variable_label ?variable_label .
    }
ORDER BY ?pos
LIMIT 10
"""
run_query(query)

Unnamed: 0,pos.value,var.value,variable_label.value
0,1,o8128:serial,Household number
1,2,o8128:strata,Strata
2,3,o8128:psu,Primary sampling unit
3,4,o8128:HhOut,Final outcome - household
4,5,o8128:hh_wt,Household weight
5,6,o8128:IMonth,Interview month
6,7,o8128:IYear,Interview Year
7,8,o8128:DM014,Number of children aged 0-14
8,9,o8128:DM016,Number of children aged 0-16
9,10,o8128:DM510,Number of children aged 5-10


## Version 2

The problem with the above version is that it can take a long time to convert large file from .rtf to .ttl. This occurs both in converting the .rtf data into a rdflib Graph, and then serializing the rdflib Graph to a .ttl file.

So this is a second approach which doesn't use rdflib. Instead it reads the .rtf file and then writes it directly to a .ttl file using standard python write file methods.

The output of this new methods should be the same as the first method developed above.


### Function to add variable_list data to graph

In [50]:
def add_variable_list_data2(file,
                            prefix,
                            pos,
                            variable,
                            variable_label,
                            variable_type,
                            SPSS_measurement_level,
                            SPSS_user_missing_values,
                            value_labels):
    """Adds the data from a variable_list variable to the RDFlib graph
    
    """
    
    l=[]
    l.append('%s:%s a rdf:Property' % (prefix,variable))
    l.append('ukds:pos %s' % pos)
    l.append('ukds:variable "%s"' % variable)
    l.append('ukds:variable_label "%s"' % variable_label)
    l.append('ukds:variable_type "%s"' % variable_type)
    l.append('ukds:SPSS_measurement_level "%s"' % SPSS_measurement_level)
    
    if SPSS_user_missing_values:
        l1=[]
        for x in SPSS_user_missing_values.split(','):
            l1.append('"%s"' % x)
        l.append('ukds:SPSS_user_missing_values %s' % ' ,\t\t'.join(l1))
        
    if value_labels:
        l2=[]
        for k,v in value_labels.items():
            l2.append('[ ukds:label "%s" ; ukds:value "%s" ]' % (v,k))
        l.append('ukds:value_labels %s' % ' ,\n\t\t'.join(l2))
    
    file.write(' ;\n\t'.join(l)+' .')


#### Test on the *IYear* variable

In [51]:
with open('test.ttl','w') as file:
    add_variable_list_data2(file,'o8128',**dd.get_variable_dict("IYear"))
with open('test.ttl','r') as file:
    print(file.read())

o8128:IYear a rdf:Property ;
	ukds:pos 7 ;
	ukds:variable "IYear" ;
	ukds:variable_label "Interview Year" ;
	ukds:variable_type "numeric" ;
	ukds:SPSS_measurement_level "NOMINAL" ;
	ukds:value_labels [ ukds:label "Don't know" ; ukds:value "-8.0" ] ,
		[ ukds:label "Interview not achieved" ; ukds:value "-7.0" ] ,
		[ ukds:label "Schedule not applicable" ; ukds:value "-2.0" ] ,
		[ ukds:label "Item not applicable" ; ukds:value "-1.0" ] ,
		[ ukds:label "No answer/refused" ; ukds:value "-9.0" ] .


#### Test on the *strata* variable

In [48]:
with open('test.ttl','w') as file:
    add_variable_list_data2(file,'o8128',**dd.get_variable_dict("strata"))
with open('test.ttl','r') as file:
    print(file.read())

o8128:strata a rdf:Property ;
	ukds:pos 2 ;
	ukds:variable "strata" ;
	ukds:variable_label "Strata" ;
	ukds:variable_type "numeric" ;
	ukds:SPSS_measurement_level "SCALE" ;
	ukds:value_labels [ ukds:label "Schedule not applicable" ; ukds:value "-2.0" ] .


#### Test on the *serial* variable

In [32]:
with open('test.ttl','w') as file:
    add_variable_list_data2(file,'o8128',**dd.get_variable_dict("serial"))
with open('test.ttl','r') as file:
    print(file.read())

o8128:serial a rdf:Property ;
	ukds:pos 1 ;
	ukds:variable "serial" ;
	ukds:variable_label "Household number" ;
	ukds:variable_type "numeric" ;
	ukds:variable "SCALE" .


### Final method for the DataDictionary class

In [83]:
def to_ttl(self,filename,prefix,uri):
    """Places the DataDictionary data in an rdflib Graph.
    
    Arguments:
        - filename (str): the name of the output .ttl file
        - prefix (str): a prefix for the Data Dictionary ontology
        - uri (str): a uri for the Data Dictionary ontology
        
    """
    def write_variable_list_data(file,
                                 prefix,
                                 pos,
                                 variable,
                                 variable_label,
                                 variable_type,
                                 SPSS_measurement_level,
                                 SPSS_user_missing_values,
                                 value_labels):
        """Writes the data from a variable_list variable to the file

        """

        l=[]
        l.append('%s:%s a rdf:Property' % (prefix,variable))
        l.append('ukds:pos %s' % pos)
        l.append('ukds:variable "%s"' % variable)
        l.append('ukds:variable_label "%s"' % variable_label)
        l.append('ukds:variable_type "%s"' % variable_type)
        l.append('ukds:SPSS_measurement_level "%s"' % SPSS_measurement_level)

        if SPSS_user_missing_values:
            l1=[]
            for x in SPSS_user_missing_values.split(','):
                l1.append('"%s"' % x)
            l.append('ukds:SPSS_user_missing_values %s' % ' ,\t\t'.join(l1))

        if value_labels:
            l2=[]
            for k,v in value_labels.items():
                l2.append('[ ukds:label "%s" ; ukds:value "%s" ]' % (v,k))
            l.append('ukds:value_labels %s' % ' ,\n\t\t'.join(l2))

        file.write(' ;\n\t'.join(l)+' .')

    with open(filename,'w',encoding="UTF-8") as file:
        file.write('@prefix %s: <%s> .\n' % (prefix,uri))
        file.write('@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .\n')
        file.write('@prefix ukds: <http://purl.org/berg/ontology/UKDS/> .\n')
        file.write('\n')
    
        for x in self.variable_list:
            write_variable_list_data(file,prefix,**x)
            file.write('\n')
            
    return

In [87]:
def to_ttl(self,filename,prefix,uri):
    """Places the DataDictionary data in an rdflib Graph.
    
    Creates multiple files if size > 30 MB
    
    Arguments:
        - filename (str): the name of the output .ttl file NO EXTENSION
        - prefix (str): a prefix for the Data Dictionary ontology
        - uri (str): a uri for the Data Dictionary ontology
        
    """
    def write_variable_list_data(file,
                                 prefix,
                                 pos,
                                 variable,
                                 variable_label,
                                 variable_type,
                                 SPSS_measurement_level,
                                 SPSS_user_missing_values,
                                 value_labels):
        """Writes the data from a variable_list variable to the file

        """

        l=[]
        l.append('%s:%s a rdf:Property' % (prefix,variable))
        l.append('ukds:pos %s' % pos)
        l.append('ukds:variable "%s"' % variable)
        l.append('ukds:variable_label "%s"' % variable_label)
        l.append('ukds:variable_type "%s"' % variable_type)
        l.append('ukds:SPSS_measurement_level "%s"' % SPSS_measurement_level)

        if SPSS_user_missing_values:
            l1=[]
            for x in SPSS_user_missing_values.split(','):
                l1.append('"%s"' % x)
            l.append('ukds:SPSS_user_missing_values %s' % ' ,\t\t'.join(l1))

        if value_labels:
            l2=[]
            for k,v in value_labels.items():
                l2.append('[ ukds:label "%s" ; ukds:value "%s" ]' % (v,k))
            l.append('ukds:value_labels %s' % ' ,\n\t\t'.join(l2))

        file.write(' ;\n\t'.join(l)+' .\n\n')

    file_index=0
    i=iter(self.variable_list)
    index=0
    
    while True:
        
        with open(filename+'_'+str(file_index)+'.ttl','w',encoding="UTF-8") as file:
            file.write('@prefix %s: <%s> .\n' % (prefix,uri))
            file.write('@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .\n')
            file.write('@prefix ukds: <http://purl.org/berg/ontology/UKDS/> .\n')
            file.write('\n')

            while True: 

                try:
                    x = next(i)
                except StopIteration:
                    return
            
                write_variable_list_data(file,prefix,**x)
                
                if index%500==0:
                    filesize_mb=os.path.getsize(filename+'_'+str(file_index)+'.ttl')/(1024*1024.0)
                    if filesize_mb>30:
                        file_index+=1
                        break
                        
                index+=1
    return

In [88]:
kls=ukds.DataDictionary
kls.to_ttl=to_ttl
dd=kls()
dd.read_rtf(dd_fp)
dd.to_ttl(filename='dd',prefix='o8128',uri='http://purl.org/berg/ontology/10.5255/UKDA-SN-8128-1/')
with open('dd_0.ttl','r') as file:
    print(file.read())

@prefix o8128: <http://purl.org/berg/ontology/10.5255/UKDA-SN-8128-1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ukds: <http://purl.org/berg/ontology/UKDS/> .

o8128:serial a rdf:Property ;
	ukds:pos 1 ;
	ukds:variable "serial" ;
	ukds:variable_label "Household number" ;
	ukds:variable_type "numeric" ;
	ukds:SPSS_measurement_level "SCALE" .

o8128:strata a rdf:Property ;
	ukds:pos 2 ;
	ukds:variable "strata" ;
	ukds:variable_label "Strata" ;
	ukds:variable_type "numeric" ;
	ukds:SPSS_measurement_level "SCALE" ;
	ukds:value_labels [ ukds:label "Schedule not applicable" ; ukds:value "-2.0" ] .

o8128:psu a rdf:Property ;
	ukds:pos 3 ;
	ukds:variable "psu" ;
	ukds:variable_label "Primary sampling unit" ;
	ukds:variable_type "numeric" ;
	ukds:SPSS_measurement_level "SCALE" ;
	ukds:value_labels [ ukds:label "Schedule not applicable" ; ukds:value "-2.0" ] .

o8128:HhOut a rdf:Property ;
	ukds:pos 4 ;
	ukds:variable "HhOut" ;
	ukds:variable_label "Final outcome - 

In [80]:
g=rdflib.Graph()
g.parse('test.ttl',format='ttl')
len(g)

13779

In [82]:
print(g.serialize(format='ttl').decode())

@prefix o8128: <http://purl.org/berg/ontology/10.5255/UKDA-SN-8128-1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix ukds: <http://purl.org/berg/ontology/UKDS/> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

o8128:Accom a rdf:Property ;
    ukds:SPSS_measurement_level "NOMINAL" ;
    ukds:pos 23 ;
    ukds:value_labels [ ukds:label "House or bungalow" ;
            ukds:value "1.0" ],
        [ ukds:label "Flat or maisonette" ;
            ukds:value "2.0" ],
        [ ukds:label "Room or rooms" ;
            ukds:value "3.0" ],
        [ ukds:label "Other" ;
            ukds:value "4.0" ],
        [ ukds:label "Item not applicable" ;
            ukds:value "-1.0" ],
        [ ukds:label "No answer/refused" ;
            ukds:value "-9.0" ],
        [ ukds:label "Don't know" ;
            ukds:value "-8.0" ],
        [ ukds:label "Interview not a