# Using the LiDO link extractor

This notebooks shows how to use caselawnet to retrieve links between a set of nodes.

In [1]:
import sys
import os
sys.path.insert(0, os.path.abspath('..'))

In [2]:
import caselawnet
import pandas as pd
import rdflib

[nltk_data] Downloading package punkt to /home/dafne/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [26]:
from caselawnet import lido_parser

These are a few cases that concern employer liability.

In [27]:
eclis = ['ECLI:NL:HR:2008:BB6175',
    'ECLI:NL:HR:2011:BR5215',
    'ECLI:NL:HR:1999:AD2996',
    'ECLI:NL:HR:2001:ZC3689',
    'ECLI:NL:HR:2008:BC9344']

In order to make use of the link extractor API, you need valid a valid login, see http://linkeddata.overheid.nl/front/portal/services

In [28]:
auth = {}
filename = '../settings.cfg'
with open(filename) as f:
    exec(compile(f.read(), filename, 'exec'))
auth['username'] = LIDO_USERNAME
auth['password'] = LIDO_PASSWD

We extend the dataset with 1 degree: this means that all cases linking to or from the cases in our set are included.

In [14]:
parser = lido_parser.LinkExtractorXMLParser(auth)
links_df, leg_df = lido_parser.get_links_articles(eclis, parser, nr_degrees=1)

How many links do we get?

In [16]:
links_df.shape

(73, 2)

How many nodes do we have in the extended set?

In [19]:
len(set(links_df['source'].append(links_df['target'])))

44

How many of the links are only between original nodes?

In [24]:
sum(links_df['source'].isin(eclis) & links_df['target'].isin(eclis))

1

How many links are between nodes that were not in the original set?

In [25]:
sum(~links_df['source'].isin(eclis) & ~links_df['target'].isin(eclis))

35