# Scraping of information on UK employment law cases

As first attempt at webscraping and using BeautifulSoup, I sought to fulfill a brief a friend of mine hired a data scientist for to complete:

- For all the cases listed on the https://www.gov.uk/employment-tribunal-decisions/ website, retrieve case number, country, date, parties involved and their legal representation, and the outcome of the case. 
- There are approximately 1000 pages with 50 cases each on the website. 
- Most information could be found in various chunks of metadata in the page HTML. Only the legal representation was recorded in a case pdf which is available on the page of each case. Unfortunately, the text could not be read straight from the pdf. The pdf needed to be converted to docx before reading. 

In [10]:
from bs4 import BeautifulSoup
import requests

In [11]:
def get_html_data(url):
    response = requests.get(url, timeout = 5)
    data = BeautifulSoup(response.content, 'html.parser')
    return data



In [13]:
# retrieves the first webpage with the list of court cases
get_html_data("https://www.gov.uk/employment-tribunal-decisions?page=1")
BeautifulSoup

<!DOCTYPE html>

<!--[if lt IE 9]><html class="lte-ie8" lang="en"><![endif]--><!--[if gt IE 8]><!--><html lang="en">
<!--<![endif]-->
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="Find decisions on Employment Tribunal cases in England, Wales and Scotland." property="og:description"/>
<meta content="Employment tribunal decisions" property="og:title"/>
<meta content="https://www.gov.uk/employment-tribunal-decisions" property="og:url"/>
<meta content="article" property="og:type"/>
<meta content="GOV.UK" property="og:site_name"/>
<meta content="Employment tribunal decisions - GOV.UK" name="govuk:base_title"/>
<meta content="noindex" name="robots"/>
<meta content="54673" name="govuk:search-result-count"/>
<meta content="summary" name="twitter:card"/>
<meta content="&lt;EA73&gt;&lt;CO1133&gt;" name="govuk:analytics:organisations"/>
<meta content="true" name="govuk:static-analytics:strip-postcodes"/>
<meta content="1b5e08c8-ddde-4637-9375-f79e085ba

In [22]:
# creates a list with the html for all the pages on the government website within range. 
html = []
for i in range (1,2):
    html.append(get_html_data("https://www.gov.uk/employment-tribunal-decisions?page=" + str(i)))

In [24]:
html

[<!DOCTYPE html>
 
 <!--[if lt IE 9]><html class="lte-ie8" lang="en"><![endif]--><!--[if gt IE 8]><!--><html lang="en">
 <!--<![endif]-->
 <head>
 <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
 <meta content="Find decisions on Employment Tribunal cases in England, Wales and Scotland." property="og:description"/>
 <meta content="Employment tribunal decisions" property="og:title"/>
 <meta content="https://www.gov.uk/employment-tribunal-decisions" property="og:url"/>
 <meta content="article" property="og:type"/>
 <meta content="GOV.UK" property="og:site_name"/>
 <meta content="Employment tribunal decisions - GOV.UK" name="govuk:base_title"/>
 <meta content="noindex" name="robots"/>
 <meta content="54673" name="govuk:search-result-count"/>
 <meta content="summary" name="twitter:card"/>
 <meta content="&lt;EA73&gt;&lt;CO1133&gt;" name="govuk:analytics:organisations"/>
 <meta content="true" name="govuk:static-analytics:strip-postcodes"/>
 <meta content="1b5e08c8-ddde-4

In [27]:
page = requests.get("https://www.gov.uk/employment-tribunal-decisions?page=1")
soup = BeautifulSoup(page.content, 'html.parser')

In [28]:
print(soup.prettify())


<!DOCTYPE html>
<!--[if lt IE 9]><html class="lte-ie8" lang="en"><![endif]-->
<!--[if gt IE 8]><!-->
<html lang="en">
 <!--<![endif]-->
 <head>
  <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
  <meta content="Find decisions on Employment Tribunal cases in England, Wales and Scotland." property="og:description"/>
  <meta content="Employment tribunal decisions" property="og:title"/>
  <meta content="https://www.gov.uk/employment-tribunal-decisions" property="og:url"/>
  <meta content="article" property="og:type"/>
  <meta content="GOV.UK" property="og:site_name"/>
  <meta content="Employment tribunal decisions - GOV.UK" name="govuk:base_title"/>
  <meta content="noindex" name="robots"/>
  <meta content="54673" name="govuk:search-result-count"/>
  <meta content="summary" name="twitter:card"/>
  <meta content="&lt;EA73&gt;&lt;CO1133&gt;" name="govuk:analytics:organisations"/>
  <meta content="true" name="govuk:static-analytics:strip-postcodes"/>
  <meta content="1b5e

In [42]:
list(soup.children)


['html',
 '\n',
 '[if lt IE 9]><html class="lte-ie8" lang="en"><![endif]',
 '[if gt IE 8]><!',
 <html lang="en">
 <!--<![endif]-->
 <head>
 <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
 <meta content="Find decisions on Employment Tribunal cases in England, Wales and Scotland." property="og:description"/>
 <meta content="Employment tribunal decisions" property="og:title"/>
 <meta content="https://www.gov.uk/employment-tribunal-decisions" property="og:url"/>
 <meta content="article" property="og:type"/>
 <meta content="GOV.UK" property="og:site_name"/>
 <meta content="Employment tribunal decisions - GOV.UK" name="govuk:base_title"/>
 <meta content="noindex" name="robots"/>
 <meta content="54673" name="govuk:search-result-count"/>
 <meta content="summary" name="twitter:card"/>
 <meta content="&lt;EA73&gt;&lt;CO1133&gt;" name="govuk:analytics:organisations"/>
 <meta content="true" name="govuk:static-analytics:strip-postcodes"/>
 <meta content="1b5e08c8-ddde-4637-937

In [36]:
[type(item) for item in list(soup.children)]


[bs4.element.Doctype,
 bs4.element.NavigableString,
 bs4.element.Comment,
 bs4.element.Comment,
 bs4.element.Tag,
 bs4.element.NavigableString]

In [43]:
html = list(soup.children)[4]

In [53]:
body = list(html.children)[5]

In [54]:
list(body.children)

['\n',
 <script>document.body.className = ((document.body.className) ? document.body.className + ' js-enabled' : 'js-enabled');</script>,
 '\n',
 <div id="skiplink-container">
 <div>
 <a class="skiplink govuk-link" href="#content">Skip to main content</a>
 </div>
 </div>,
 '\n',
 <div aria-label="cookie banner" class="gem-c-cookie-banner govuk-clearfix" data-module="cookie-banner" data-nosnippet="" id="global-cookie-message" role="region">
 <div class="gem-c-cookie-banner__wrapper govuk-width-container">
 <div class="govuk-grid-row">
 <div class="govuk-grid-column-two-thirds">
 <div class="gem-c-cookie-banner__message">
 <span class="govuk-heading-m">Tell us whether you accept cookies</span>
 <p class="govuk-body">We use <a class="govuk-link" href="/help/cookies">cookies to collect information</a> about how you use GOV.UK. We use this information to make the website work as well as possible and improve government services.</p>
 </div>
 <div class="gem-c-cookie-banner__buttons">
 <div c

In [67]:
cases = list(soup.find_all("li", class_="gem-c-document-list__item"))

In [75]:
cases[0].get_text()

'\nMs A Mullally v Virgin Atlantic Airways Ltd and Mr Shaun Laverty: 2302421/2019\nEmployment Tribunal decision.\n\n\n                  Decided: 10 December 2019\n\n\n'

In [81]:
cases[0].find("a")['data-ecommerce-path']

'/employment-tribunal-decisions/ms-a-mullally-v-virgin-atlantic-airways-ltd-and-mr-shaun-laverty-2302421-2019'

In [83]:
cases[0].find("a")['href']

'/employment-tribunal-decisions/ms-a-mullally-v-virgin-atlantic-airways-ltd-and-mr-shaun-laverty-2302421-2019'

In [84]:
# retrieves the webpage for the specific court case
details = get_html_data("https://www.gov.uk/"+cases[0].find("a")['href'])

In [87]:
details

<!DOCTYPE html>

<!--[if lt IE 9]><html class="lte-ie8" lang="en"><![endif]--><!--[if gt IE 8]><!--><html lang="en">
<!--<![endif]-->
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="Employment Tribunal decision." property="og:description"/>
<meta content="Ms A Mullally v Virgin Atlantic Airways Ltd and Mr Shaun Laverty: 2302421/2019" property="og:title"/>
<meta content="https://www.gov.uk/employment-tribunal-decisions/ms-a-mullally-v-virgin-atlantic-airways-ltd-and-mr-shaun-laverty-2302421-2019" property="og:url"/>
<meta content="article" property="og:type"/>
<meta content="GOV.UK" property="og:site_name"/>
<meta content="summary" name="twitter:card"/>
<meta content="Employment Tribunal decision." name="description"/>
<meta content="courts-sentencing-tribunals" name="govuk:taxon-slugs"/>
<meta content="courts-sentencing-tribunals" name="govuk:taxon-slug"/>
<meta content="357110bb-cbc5-4708-9711-1b26e6c63e86" name="govuk:taxon-ids"/>
<meta cont

In [89]:
# the individual case pages seem to have the same structure as the government webpage. 
[type(item) for item in list(details.children)]


[bs4.element.Doctype,
 bs4.element.NavigableString,
 bs4.element.Comment,
 bs4.element.Comment,
 bs4.element.Tag,
 bs4.element.NavigableString]

## Now that we've messed around, let's get our first goal: retrieve the date for the first case on the first page

In [90]:
# get the first page
page = requests.get("https://www.gov.uk/employment-tribunal-decisions?page=1")
soup = BeautifulSoup(page.content, 'html.parser')

cases = list(soup.find_all("li", class_="gem-c-document-list__item"))
case = cases[0]

In [92]:
case.get_text()

'\nMs A Mullally v Virgin Atlantic Airways Ltd and Mr Shaun Laverty: 2302421/2019\nEmployment Tribunal decision.\n\n\n                  Decided: 10 December 2019\n\n\n'

In [100]:
str(case.get_text()).split('Decided: ')[1].split('\n')[0]

'10 December 2019'

That gives us the date of the first case on the first page. That was the easy part. 

## Next step: what else is in that first tag?

In [101]:
case

<li class="gem-c-document-list__item">
<a class="gem-c-document-list__item-title gem-c-document-list__item-link" data-ecommerce-path="/employment-tribunal-decisions/ms-a-mullally-v-virgin-atlantic-airways-ltd-and-mr-shaun-laverty-2302421-2019" data-ecommerce-row="1" data-track-action="Employment tribunal decisions.1" data-track-category="navFinderLinkClicked" data-track-label="/employment-tribunal-decisions/ms-a-mullally-v-virgin-atlantic-airways-ltd-and-mr-shaun-laverty-2302421-2019" data-track-options='{"dimension28":50,"dimension29":"Ms A Mullally v Virgin Atlantic Airways Ltd and Mr Shaun Laverty: 2302421/2019"}' href="/employment-tribunal-decisions/ms-a-mullally-v-virgin-atlantic-airways-ltd-and-mr-shaun-laverty-2302421-2019">Ms A Mullally v Virgin Atlantic Airways Ltd and Mr Shaun Laverty: 2302421/2019</a>
<p class="gem-c-document-list__item-description">Employment Tribunal decision.</p>
<ul class="gem-c-document-list__item-metadata">
<li class="gem-c-document-list__attribute">
 

Actually looks like rather than getting the date via the text, we can search for a datetime object.

In [103]:
case.find("time")['datetime']

'2019-12-10'

That's a lot easier!
There doesn't seem to be much more information at this stage, so we will have to follow the url to the case page.

In [106]:
casepage = requests.get("https://www.gov.uk/"+case.find("a")['href'])
casesoup = BeautifulSoup(casepage.content, 'html.parser')

In [109]:
list(casesoup.children)

['html',
 '\n',
 '[if lt IE 9]><html class="lte-ie8" lang="en"><![endif]',
 '[if gt IE 8]><!',
 <html lang="en">
 <!--<![endif]-->
 <head>
 <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
 <meta content="Employment Tribunal decision." property="og:description"/>
 <meta content="Ms A Mullally v Virgin Atlantic Airways Ltd and Mr Shaun Laverty: 2302421/2019" property="og:title"/>
 <meta content="https://www.gov.uk/employment-tribunal-decisions/ms-a-mullally-v-virgin-atlantic-airways-ltd-and-mr-shaun-laverty-2302421-2019" property="og:url"/>
 <meta content="article" property="og:type"/>
 <meta content="GOV.UK" property="og:site_name"/>
 <meta content="summary" name="twitter:card"/>
 <meta content="Employment Tribunal decision." name="description"/>
 <meta content="courts-sentencing-tribunals" name="govuk:taxon-slugs"/>
 <meta content="courts-sentencing-tribunals" name="govuk:taxon-slug"/>
 <meta content="357110bb-cbc5-4708-9711-1b26e6c63e86" name="govuk:taxon-ids"/>
 

In [110]:
[type(item) for item in list(casesoup.children)]


[bs4.element.Doctype,
 bs4.element.NavigableString,
 bs4.element.Comment,
 bs4.element.Comment,
 bs4.element.Tag,
 bs4.element.NavigableString]

In [112]:
casehtml = list(casesoup.children)[4]

In [123]:
casebody = list(casehtml.children)[5]

In [130]:
casebody.find_all("dd",class_="app-c-important-metadata__definition")

[<dd class="app-c-important-metadata__definition">10 December 2019</dd>,
 <dd class="app-c-important-metadata__definition"><a class="govuk-link app-link" href="/employment-tribunal-decisions?tribunal_decision_country%5B%5D=england-and-wales">England and Wales</a></dd>,
 <dd class="app-c-important-metadata__definition">
 <a class="govuk-link app-link" href="/employment-tribunal-decisions?tribunal_decision_categories%5B%5D=health-safety">Health &amp; Safety</a>, <a class="govuk-link app-link" href="/employment-tribunal-decisions?tribunal_decision_categories%5B%5D=public-interest-disclosure">Public Interest Disclosure</a>, <a class="govuk-link app-link" href="/employment-tribunal-decisions?tribunal_decision_categories%5B%5D=sex-discrimination">Sex Discrimination</a>, <a class="govuk-link app-link" href="/employment-tribunal-decisions?tribunal_decision_categories%5B%5D=unfair-dismissal">Unfair Dismissal</a>
 </dd>]

In [131]:
# Yet another way to get the date
casebody.find_all("dd",class_="app-c-important-metadata__definition")[0].get_text()

'10 December 2019'

In [133]:
# And here's the country
casebody.find_all("dd",class_="app-c-important-metadata__definition")[1].get_text()

'England and Wales'

In [142]:
# Another ctr+f search for 'withdrawal' gives us the outcome. 
casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text().split('- ')[1].split('.')[0]

'Withdrawal'

In [143]:
# And the same div also contains the case number. 
casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text()

'\nRead the full decision in  Ms A Mullally v Virgin Atlantic Airways Ltd and Mr Shaun Laverty: 2302421/2019 - Withdrawal.\n'

In [145]:
# casenumber
casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text().split('/')[0].split(': ')[1]

'2302421'

## Let's start tidying this up into a dataframe

In [146]:
import pandas as pd

In [179]:
case0 = {'CaseNr': [casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text().split('/')[0].split(': ')[1]], 
         'Country': [casebody.find_all("dd",class_="app-c-important-metadata__definition")[1].get_text()], 
         'Date': [case.find("time")['datetime']], 
         'Outcome' : [casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text().split('- ')[1].split('.')[0]]}

In [180]:
case_df = pd.DataFrame.from_dict(case0)

In [181]:
case_df

Unnamed: 0,CaseNr,Country,Date,Outcome
0,2302421,England and Wales,2019-12-10,Withdrawal


Search for jurisdiction

In [182]:
casebody

<body>
<script>document.body.className = ((document.body.className) ? document.body.className + ' js-enabled' : 'js-enabled');</script>
<div id="skiplink-container">
<div>
<a class="skiplink govuk-link" href="#content">Skip to main content</a>
</div>
</div>
<div aria-label="cookie banner" class="gem-c-cookie-banner govuk-clearfix" data-module="cookie-banner" data-nosnippet="" id="global-cookie-message" role="region">
<div class="gem-c-cookie-banner__wrapper govuk-width-container">
<div class="govuk-grid-row">
<div class="govuk-grid-column-two-thirds">
<div class="gem-c-cookie-banner__message">
<span class="govuk-heading-m">Tell us whether you accept cookies</span>
<p class="govuk-body">We use <a class="govuk-link" href="/help/cookies">cookies to collect information</a> about how you use GOV.UK. We use this information to make the website work as well as possible and improve government services.</p>
</div>
<div class="gem-c-cookie-banner__buttons">
<div class="gem-c-cookie-banner__butto

In [215]:
casebody.find_all('dd', class_='app-c-important-metadata__definition')[2].get_text()[1:-1]

'Health & Safety, Public Interest Disclosure, Sex Discrimination, Unfair Dismissal'

Slightly messy and seems to have several codes, but found.

Finally, claimant and defendant, which is actually in the URL

In [192]:
case.find("a")['href'].split('/')[-1].split('-v-')

['ms-a-mullally',
 'virgin-atlantic-airways-ltd-and-mr-shaun-laverty-2302421-2019']

In [193]:
claimant = case.find("a")['href'].split('/')[-1].split('-v-')[0]

In [194]:
claimant

'ms-a-mullally'

In [205]:
import re
defendent = case.find("a")['href'].split('/')[-1].split('-v-')[1].split(str(int(re.search(r'\d+', case.find("a")['href'].split('/')[-1].split('-v-')[1]).group())))[0]

In [206]:
defendent

'virgin-atlantic-airways-ltd-and-mr-shaun-laverty-'

## Let's add these to our dataframe

In [216]:
case0 = {'CaseNr': [casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text().split('/')[0].split(': ')[1]], 
         'Country': [casebody.find_all("dd",class_="app-c-important-metadata__definition")[1].get_text()], 
         'Date': [case.find("time")['datetime']], 
         'Jurisdiction' : [casebody.find_all('dd', class_='app-c-important-metadata__definition')[2].get_text()[1:-1]],
         'Outcome' : [casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text().split('- ')[1].split('.')[0]],
         'Claimant' : [case.find("a")['href'].split('/')[-1].split('-v-')[0]],
         'Defendent' : [case.find("a")['href'].split('/')[-1].split('-v-')[1].split(str(int(re.search(r'\d+', case.find("a")['href'].split('/')[-1].split('-v-')[1]).group())))[0]]}

In [217]:
case_df = pd.DataFrame.from_dict(case0)

In [218]:
case_df

Unnamed: 0,CaseNr,Country,Date,Jurisdiction,Outcome,Claimant,Defendent
0,2302421,England and Wales,2019-12-10,"Health & Safety, Public Interest Disclosure, S...",Withdrawal,ms-a-mullally,virgin-atlantic-airways-ltd-and-mr-shaun-laverty-


## That looks great. Let's tidy everything up and call it a day

In [None]:
import re
import pandas as pd

In [264]:
page = requests.get("https://www.gov.uk/employment-tribunal-decisions?page=1")
soup = BeautifulSoup(page.content, 'html.parser')

cases = list(soup.find_all("li", class_="gem-c-document-list__item"))
case = cases[0]
casepage = requests.get("https://www.gov.uk/"+case.find("a")['href'])
casesoup = BeautifulSoup(casepage.content, 'html.parser')
casehtml = list(casesoup.children)[4]
casebody = list(casehtml.children)[5]

case0 = {'CaseNr': [casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text().split('/20')[0].split(': ')[1]], 
         'Country': [casebody.find_all("dd",class_="app-c-important-metadata__definition")[1].get_text()], 
         'Date': [case.find("time")['datetime']], 
         'Jurisdiction' : [casebody.find_all('dd', class_='app-c-important-metadata__definition')[2].get_text()[1:-1]],
         'Outcome' : [casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text().split('- ')[1].split('.')[0]],
         'Claimant' : [case.find("a")['href'].split('/')[-1].split('-v-')[0]],
         'Defendent' : [case.find("a")['href'].split('/')[-1].split('-v-')[1].split(str(int(re.search(r'\d+', case.find("a")['href'].split('/')[-1].split('-v-')[1]).group())))[0]]}
case_df = case_df.append(pd.DataFrame.from_dict(case0))


In [267]:
columns = case_df.columns

Next steps:
- Ms K Game v Unique Associates Ltd T/a Threshold Surveyors: 3325880/2017 is an anomaly. Have a closer look
- loop over case in cases
- loop for pagenumber in range...
- Find where legal representation sits

the / in casenumber wasn't specific enough. Changed to /20 assuming all cases are from this millenium

In [317]:
page = requests.get("https://www.gov.uk/employment-tribunal-decisions?page=1")
soup = BeautifulSoup(page.content, 'html.parser')
case_df = pd.DataFrame(columns = columns)
cases = list(soup.find_all("li", class_="gem-c-document-list__item"))

for case in cases:
    casepage = requests.get("https://www.gov.uk/"+case.find("a")['href'])
    casesoup = BeautifulSoup(casepage.content, 'html.parser')
    casehtml = list(casesoup.children)[4]
    casebody = list(casehtml.children)[5]

    case0 = {'CaseNr': [casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text().split('/20')[0].split(': ')[1]], 
             'Country': [casebody.find_all("dd",class_="app-c-important-metadata__definition")[1].get_text()], 
             'Date': [case.find("time")['datetime']], 
             'Jurisdiction' : [find_jurisdiction(casebody)],
             'Outcome' : [casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text().split('- ')[1].split('.')[0]],
             'Claimant' : [casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text()[27:].split(' v ')[0]],
             'Defendent' : [casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text()[27:].split(' v ')[1].split(': ')[0]]}
    case_df = case_df.append(pd.DataFrame.from_dict(case0))


In [318]:
case_df

Unnamed: 0,CaseNr,Country,Date,Jurisdiction,Outcome,Claimant,Defendent
0,1601153,England and Wales,2018-06-19,"\nSex Discrimination, Unfair Dismissal\n",Final,Miss N Wood,Nicola Rebustini
0,2302421,England and Wales,2019-12-10,"\nHealth & Safety, Public Interest Disclosure,...",Withdrawal,Ms A Mullally,Virgin Atlantic Airways Ltd and Mr Shaun Laverty
0,2303643,England and Wales,2020-01-09,Disability Discrimination,Withdrawal,Mr G Gray,London Fire Brigade
0,2301192,England and Wales,2020-01-06,"\nPublic Interest Disclosure, Sex Discriminati...",Withdrawal,Mrs C Henderson,Ashton Care Home Ltd
0,2303800,England and Wales,2019-12-19,"\nBreach of Contract, Unlawful Deduction from ...",Withdrawal,Miss S Brown,Heatherwood Nursing Home Ltd
0,2303040,England and Wales,2019-12-19,Sex Discrimination,Withdrawal,Miss S Parker,Pharmaceuticals Direct Ltd and Pharmadent Ltd
0,2300468,England and Wales,2020-01-08,"\nDisability Discrimination, Unfair Dismissal\n",Withdrawal,Mr R Hamilton,St. Mungo’s Community Housing Association
0,3325880,England and Wales,2020-01-09,"\nBreach of Contract, Contract of Employment, ...",Judgment,Ms K Game,Unique Associates Ltd T/a Threshold Surveyors
0,2302522,England and Wales,2019-12-19,"\nBreach of Contract, Unfair Dismissal, Unlawf...",Withdrawal,Miss H Barrett-Smith,PJ (SPAR) Ltd
0,2302201,England and Wales,2019-12-09,"\nDisability Discrimination, Public Interest D...",Withdrawal,Mrs T Prime,Cloudflare Ltd


This time we made it to dickson

In [316]:
def find_jurisdiction(casebody):
    text = casebody.find_all('dd', class_='app-c-important-metadata__definition')[2].get_text()
    if text[0:2] == '/n':
        text = text[2:]
    if text[-2:-1] == '/':
        text = text[:-1]
    return text


'Working Time Regulations'

In [340]:
case_df = pd.DataFrame(columns = columns)
cases = list(soup.find_all("li", class_="gem-c-document-list__item"))

case = cases[0]
casepage = requests.get("https://www.gov.uk/"+case.find("a")['href'])
casesoup = BeautifulSoup(casepage.content, 'html.parser')
casehtml = list(casesoup.children)[4]
casebody = list(casehtml.children)[5]

case0 = {'CaseNr': [casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text().split('/20')[0].split(': ')[1]], 
         'Country': [casebody.find_all("dd",class_="app-c-important-metadata__definition")[1].get_text()], 
         'Date': [case.find("time")['datetime']], 
         'Jurisdiction' : [find_jurisdiction(casebody)],
         'Outcome' : [casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text().split('- ')[1].split('.')[0]],
         'Claimant' : [casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text()[27:].split(' v ')[0]],
         'Defendent' : [casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text()[27:].split(' v ')[1].split(': ')[0]]}
case_df = case_df.append(pd.DataFrame.from_dict(case0))


In [341]:
case_df

Unnamed: 0,CaseNr,Country,Date,Jurisdiction,Outcome,Claimant,Defendent
0,1601153,England and Wales,2018-06-19,"Sex Discrimination, Unfair Dismissal",Final,Miss N Wood,Nicola Rebustini


In [338]:
def find_jurisdiction(casebody):
    text = casebody.find_all('dd', class_='app-c-important-metadata__definition')[2].get_text()
    if text[0:1] == '\n':
        text = text[1:]
    if text[-1] == '\n':
        text = text[:-1]
    return text

'\nSex Discrimination, Unfair Dismissal'

'Sex Discrimination, Unfair Dismissal'

## Tidying up time

In [364]:
def find_jurisdiction(casebody):
    text = casebody.find_all('dd', class_='app-c-important-metadata__definition')[2].get_text()
    if text[0:1] == '\n':
        text = text[1:]
    if text[-1] == '\n':
        text = text[:-1]
    return text

In [343]:
page = requests.get("https://www.gov.uk/employment-tribunal-decisions?page=1")
soup = BeautifulSoup(page.content, 'html.parser')
case_df = pd.DataFrame(columns = columns)
cases = list(soup.find_all("li", class_="gem-c-document-list__item"))

In [None]:
case_df = pd.DataFrame(columns = columns)

for i in range(10,1095):
    print("page ", i)
    page = requests.get("https://www.gov.uk/employment-tribunal-decisions?page="+str(i))
    soup = BeautifulSoup(page.content, 'html.parser')
    cases = list(soup.find_all("li", class_="gem-c-document-list__item"))

    for case in cases:
        casepage = requests.get("https://www.gov.uk/"+case.find("a")['href'])
        casesoup = BeautifulSoup(casepage.content, 'html.parser')
        casehtml = list(casesoup.children)[4]
        casebody = list(casehtml.children)[5]
        
        try:
            case0 = {'CaseNr': [casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text().split('/20')[0].split(': ')[1]], 
                     'Country': [casebody.find_all("dd",class_="app-c-important-metadata__definition")[1].get_text()], 
                     'Date': [case.find("time")['datetime']], 
                     'Jurisdiction' : [find_jurisdiction(casebody)],
                     'Outcome' : [casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text().split('- ')[1].split('.')[0]],
                     'Claimant' : [casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text()[27:].split(' v ')[0]],
                     'Defendent' : [casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text()[27:].split(' v ')[1].split(': ')[0]]}
            print(casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text()[27:].split(' v ')[0])
            case_df = case_df.append(pd.DataFrame.from_dict(case0))
        except:
            print("error on case: ",casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text() )


page  10
A
Ms D Pfeiffer
Mrs T Marshall
Mr Stephen Dillon
Mr Mustapha Alli
Mr GJ Edwards
Mr F Kabengele
Mr B Kell
Mr A Leyland and others
Miss R Simon
Miss N Crookes
Mr M Hazelgrove
 Ms A Logan
Mrs V Gibson James
Mrs P Price
Mrs A Webb
Mr S Lamberton Pine
Mr S Collett
Mr M Bennett
Leigh Thomas
Mr J Taylor
Mr G Saggu
Miss L Ballard
Miss E Lee
Miss A Shipperbottom
Mr C Palenzuela
Dr Uwhubetine and Dr Njoku
Mr J Esteller Dura
Mr L Stoker
Mrs K Abbiss
Mr P Collopy
Mr S Thomas
Mr S Millea
Mr S Togwell
Miss E Striano
Ms M Scott-Barnes
Mr M Gurure
Miss E Webb
Mrs N Moore
Mr A Lasisi
Ms E Sule
Miss E Otamere
Miss S Smyth
Mr B Bokesse
Miss Amy Maynard
Mr M Rahman
Facilicom Cleaning Services Ltd
Mx G Pedliham
Mrs Claire Macdonald
Ms E Sule
page  11
Mr J Brookes
Mr O Lyons
Mr L Williams
Mrs J Cuthew
Miss Andrea Fisher
Mr E Montenegro
Miss Joanna Bednarz
Mr D Schofield
Ms V Baer
Mrs M Murray
Mrs K Mikolajczyk-Bieniecka
Mr S Skupien
Mr P Pagkalis
Mr J Linton
Mr J Evans
Mr J Dockerill
Mr H Oliver
Mi

  Ms C Knox
Mr L Rhodes-Kelly
  Ms C Father
Mr J Szydelko and Others
Miss K Holdsworth
  Ms B Carter
Miss F Anwar
  Mrs S Raj
Miss E Zajko
  Miss O Jones
 Mr C Jones
 Mr W Newell
  Mrs L Dolan
  Mrs C Rotsou
  Mr W Thompson
  Mr S Budd
  Mr S Barns
  Mr R Waterman
Ms S Howe
  Mr R Earnshaw
Leanne Prins
  Mr P Mills
  Mr P Brazier
 Mr R Jones
 Mr NG Pullin
  Mr Canales-Tilve
 Mr L Gueye
  Mr K McCall
 Mr K Roberts
  Mr J Bliss
 Mr H Mohammed
  Mr D Strand
Mr Mavar
 Mr E Hughes
 Mr D Roberts
 Mr C Matthews
 Mr B Roberts
 Mr A Cassy
Mr Alberto Scotti and Others
 Mrs J Griffiths
page  20
 Mrs J Heighton
 Mrs J Mwape
 Mr S L Jones
 Mr M Adamczewski
 Mr J Evans
Mr P Wilshaw
Ms J D Colclough
Miss T Jan
Mr M Brookes
Mrs M Bradbourn-Miles
Ms E Bogulska
Mr F Wheatland
Mrs C Otshudy
Mr C Edwards
Mr E Creez
Mrs S Calder
  Mr C Pitterson
 Mr A Thomas
  Miss S Wedderburn-Stewart
  Miss S Ashley
  Miss L Slaoui
  Miss J Giles
  Miss D Ashby
  Miss C Bennett
Mr Robert S Rae
Ms J MacPherson
Mr P Joseph

Ms T Stephens
Ms M Duggan
Ms C Obaseki
Ms A Moss
Ms N Eadon
Mrs T Stevens
Mrs T Amos
Mrs M McKinson
Mrs JK Poonia
Mrs D Hooper
Mr V Georgiev
Mr R Choudhury
Mr P Mefful
Mr N Jellow
Mr M D Almond
Mr K O Apedo
Mrs J Shah
  Mr I Paso
 Ms G Marett
Mr R Davda
Mr D Cooper and Ms G Hayles
Ms T Bibi
 Miss K Gill
Miss E Rudnicka
Miss J Hardaker
page  30
Mr J Franklin
Mr J Akpan
Mr E Neves
Mr D Gillespie
Mr B Da Silva
Mr AG Campbell
Mr A Hussain
Mr A Hunt
Miss P Smith
Miss M Hafeez
Miss I Kavaliauskaite
Miss G Marsh
Miss C Harris
Miss A Wood
Miss A Gadd
 Mr S Cornell
 Mr P Kenyon
 Mr M H Mateparae
 Mr M Wilson
 Mr M Vickers
 Mr M Toqeer
Mr M Connor
 Mr J Hornby
 Mr G Davenport
 Mr E Eaton
Mr G Martin
 Mr C Realey
Mr C Hickson
 Miss V Hagan
 Miss H Arkwright
 Miss C Waude
Mr G Young
Mr J Dunderdale
Miss C Lomas
 Miss K Spence
 Mr M Gregory
Miss S Bywater
Mr E Wylie
 Ms L Merlock
Miss C Hissey
Ms S Cowley
Mr N Tilley
Miss E Duncan
Ms S Scott
  Ms J Pickering
error on case:  
Read the full decision 

Miss K Touch
 Mr R Lord
Miss E Bowman
Mr O Ogebule
Mr D Bishop
Miss H Crawshaw
 Mrs S Ward
Mrs C Sinclair
Miss K Stefanko and others
Mrs J Pratten
Mr Bradley Rainford
Mrs Helen Davis and Miss Charmaine Hanlon
Miss M Brooks
Miss K Langner
Mr H Dymond
Mrs Helen Larkin
Mr T Pedersen
Miss H Paterson
Mr S Moody
Miss H Crocker
Mr J Abraham
 Mrs L O’Dwyer
Mrs L Buckland
Mr T Nelson
 Mr D Wells
page  41
Mr J Bosheya
Ms K Allen
 Mr F Robson
Ms S Brierley and others
 Ms R Day
 Rebecca Marsden
 Mr A Tomassetti
Mr R Fray
 Miss H Collings
 Mrs E Grindall
Miss S Holding
Mrs J Hawksworth
Miss S Shahriar
  Miss M Clayton
 Ms J A Winstanley
Mr Arthur Moan
 Ms E Farrow
Mr L Theaker
Ms R Campbell and others
Mr P Wilson
Mr P Richardson
Mr D P Herbert OBE
Mrs D Murgatroyd
Mr L Parker
Miss R Heer
Mr M Owen
Ms T Abbey-Philip
Mr J Gwynn and others
Mr D Cox
Miss C Osborne
Mr K Lipton
Mr H Wilson
Miss N McKay
Ms E Whelan
Mrs L Haverstock
Mr J McLaughlin
Ms P Mauchline
Mr William Campbell as Executor for The Lat

  Mr N Stothard
 Mr S Fitzgerald
  Dr I Ogilvie
  Miss M Mahlangobe
 Mr Barry Walsh
page  51
Miss Stacy Worrall
  Mr B Stifani
  Ms C A Roche
  Mr M Cooke
  Ms JE Somerfield
  Ms S Stevens and Ms K Whitfield
  Mr S Kerry
  Mrs A Roberts
  Mr S Shaw
  Mrs N Watkins
  Mrs A Jones
  Mrs A Stevenson
  Miss J Cheeseman
  Mr S Garvey
Ms Angelika Niemanski
 Mr A McCarthy
Mr R Smith
Mr S Carrubba
Mr C Wallis
Miss L Parson
Ms A Veloso
Mr S Baker
Miss Y Carr
Mr E Gonsalves
 Mrs S Board
Mr J Hanley
Mr A Banner
Miss C Lamb
Mr G Thornton
Mrs D Gaull
Mr A Vij
Mrs J Swanwick
Mrs M Hart
Mr J Davies
Mrs I Phoenix
Mrs A Miskiewicz
Mr N Toure
Mr Andrzej Tomala
Mr O Petrov
Miss L Hirons
 Mrs Marylyn Smith
 Miss A Chadwick
Mr K Irving
Mrs E Kelly
Miss A Johnston
Mrs J McConaghy
Miss G Kelly
 Mr K Mitchell
Mr A Thompson
Mr L Allman
page  52
Mr W Readman
Mr K Lidster
Miss E Peace
Ms G Mucklow
Mr D Ions
Mr JA Seaman and others
Mr G Allan
Mr N Hallam
Mrs Joanne Thompson
Mrs Julie Waugh
Ms S M Ditchburn and oth

Mr M Doherty
Mr Colin Flannigan
Mr G Imrie
Ms Jessica Ann Skinner
 Gareth Lush
Mr C Maison
Mr AG Badita
Archibald Park
Miss M Paul
Miss T Thomas
Miss EM Inglis
Miss A Noble
Miss A Nichol
EFG
Ms F White
Ms D Thorne
Mrs S Robinson
Mrs S Hirani
Mrs N Dourado
Mrs M Tameczka
Mrs G Badra
Mrs D Goodwin
Mrs A Hooker
Mr W Adams
Mr T Vasarevicius
 Mr T Pickett
Mr Stephen Duorado
Mr Robin Smith
Mr R Fisher
Mr R Fighiroaia
Mr P J MacCormick
Mr K Walsom
Mr J Kuisys
Mr J Bobo
Mr D Hicklin
Mr C Gibson
Mr A Micallef
Mr A Hooley
Miss W Ridehalgh
Miss S Simpson
Miss J Bowden
Miss D Challis
Mr Ibrahim Sesay
  Ms A Russell
  Mrs J Hastings
  Ms L Boyd
page  63
  Ms A Currie
  Ms E Shaw
  Mrs K McMahon
  Mrs A Hogg
  Ms C Nolan
  Ms B Wason
  Ms J Connor
  Mr H Ridley
  Mrs ME Thorburn
  Mrs J Morrison
  Ms A Jamieson
  Ms E Hutchison
  Ms T O’Neill
  Mr J Condy
  Mrs A Blake
  Mr R Whyte
  Mr William Munro
Ms V Rich
Ms S Anderson
Ms M Burrows
Ms D Arteaga
Mrs R Hackett
Mrs N Williams
Mrs G Apic
Mrs C Mord

Ms S Bennett
Mr M Imran
Mr J Graham
Mr A Conte
Mr A Bancroft
Mr U Hasan
Ms S Jowett
Miss M Grinberg
Miss L Grady
Mr P Kolomanski
Ms P Bhudia
Ms E Raisey-Skeats
Ms S Rose
Mrs C Horne-Seabridge
Mr G Vitale
Mrs A Wisniewska
Mrs D Marshall
Mr Minal Devkvaran and others
Mr Leo Smith
Mr Stephen Beech
Mr M Holt
Mrs Rebecca Chapman
Mr F Morris
Miss N Lunt
Miss G Dixon
Miss C Reid
Mr L Green
Miss S El-Raie
Mr Glen Peacock
Mr S Barallon
page  74
 Mr C Warner
Mrs K Tuggey
Ms T Sera Nkosi
Mr Z Wozniak
Mr K Burger
Mrs U Rodrigo
Mr N Gregory
Dr D Gurdasani
Miss M Higgs
Mr R Lappage
Mrs H Branchflower
Ms A Shaikh
Mr M Ahmed
Sam Edgington
Mrs MD Webb
G
Mr F Khan
Ms A Rafeeq
Miss N Green
Mr T Dworakowski
Mrs R McGuire
Exeter City AFC Ltd
Mrs L McPherson
Ms L Taylor-Jones
Mrs J Hickman
Mrs RK Tamang
Mr B Singh
Mr M Byrne
Mr JA Lawal
Mr C Hingston
Mr R Raybold
Mr S Mayhew
Mr J Libby
Miss S Tehrani
Mr M Adamczyk
Miss P Smith
Ms C Butterfield
Mrs M Giannopoulou
Mr N Benatar
Mr I Morrell
Mrs S Maszuchin
Mis

  Mr N Charlesworth
  Mr K Wildon
  Ms Lolita Boiko
  Ms K Pattni
Mr P Allwright
Mr L D’Rozario
  Mr G Park
  Miss M Bryan
Mr K Shehzad
Miss N Green
Mr J Tanner
Mr J Reeves
Mr I S Pennels
Mr C Wilmont
Ms R Magigi
Mr D Mellin
Mr D Coughtrey
Miss S Chase
Miss M Miller
  Mrs D Anker
  Miss K Ditchfield
  Mr G O’Connor
Miss D Magyar
Mrs C Gammon
X
Miss N Harrison
Ms B Cluffer
Mr J Fulham
Mr T Morris
Miss C Cummings
  Miss S Lucas
Mr A Smart
  Miss E Harkin
Mr T Jackson
  Mrs S M Tranter
 Mr M Young
page  84
  Mr D Tickle and others
Miss L Klimaite
  Miss E Hindle
Mr A Hope
Mr J Sideeq
Mr W Hoch
Miss E Corpuz
 Mr G Cunningham
Miss L Garcia Cerezo
Mrs H Crosby
 Mr M Parekh
Mrs S Nkomazana
Mr O Blackwell
Mrs K Higgs
Mrs S Ward
Mrs K Milsom
Mrs C Robinson
Mr C Cutrano
Mr WG Brumeau-Green
Mr T Rose and Mr K Tooze
Mr T Miles
Mr P Jasina
Mr O Justice
Mr M Winston-Smith
Mr L Fraser
 Mr Lee Ford
Mr J W Greenaway
Mr J Thomas
Mr G Newman
Mr D Halpin
Miss C Ashenden
Mr A Smith
Mrs M Williams
 Miss D S

Mr G Kardamis
Mr B Taylor
Mr A Sommerville
Mr M Madden
Mr G Campbell
Ms M Logan
Ms M MacKenzie
Mr G Knight
Mrs B Watson
Mrs L Muir
Mrs Nicola Davidson
Mrs C Mochan
 Martha Kirby
Mrs A Cassie
Miss S Dale
Ms M Palanga
Mrs A Macey
Mr S Miah
Mrs S Ajith
Ms R Carter
Mrs M Maviyani
Miss S McMahon
Mr A Goodey
Mr N Champion
Mrs K Kricinaite Poderiene
Ms Anser Karim
Mr F Gueret
Ms E Kingstone
Mrs A Eni
Mr R Thompson
Mrs A Jenkins
Mr C Loncar-Martin
Mr A James
Mr C Smith
Mr D Gonthier
Mrs L Harle
Mr A Keen and Mr A McElwee
Mr A Viljoen
Miss S Dad
Mr P Volkner
Mr A Crofton
Ms S Jackson
Mr R White
Mrs S Millard
page  95
Mr T Brown
Mr L Cree
Miss C Newell
Mr J Rose
Ms T Liles-Taylor
Ms B McInerney
Ms C Gonzalez-Garcia
Miss J Kavanagh
 Ms C Baldwin and others
Mrs J Lawson-Davies
Miss A Blackshields
Mr D Clarke and others
Ms S Iqbal
Mr M Sekha
Ms A Joaquim and Ms K Garcia Ferrus
Miss D Barahona Guerra
Mr Nathan Noel
Ms H Rawlins
Levinna Ola
Mr G Alex
Ms D Radovic
Mr M Johnson
Mr R Reid
Mr B Allen
Mr 

  Ms F Nell
  Mrs M Sidders
  Mr A Varga
  Mr K Adugyamfi
 Mr W Augustine
 Ms M Patel
Mr M Case
  Ms D Ward
  Miss S Drewett
  Mr P Martyna
  Mrs Z Venskiene
  Mrs L Lososovakaja
  Miss E Stupiak
  Ms R Cepaite
  Mr M Smith
  Ms R Matuzeviciute
  Ms Z Belevica
  Mrs K Fracasso
  Mrs Z Venskiene
  Mr P Martyna
  Mrs M Smith
  Mr G Wolf
  Mr AJ Gooneratne
  Mr T Premraj
  Ms K Guobuziene
  Miss A Visan
  Ms D Holmes
  Miss I Noor
  Mr V Birlea
Miss E Walker
  Mr C Playfair
 Mr G Clark
  Mrs J Chapman
error on case:  
Read the full decision in   Mrs K Cocklin v WM Morrison Supermarkets plc:3202500/2018 - Withdrawal.

  Ms H Huggins
  Mr B Marsh
page  106
  Miss D Rivett
  Miss J Lynch
  Mrs L Gara
  Mr N Perry
Ms B Piotrowska
Mr David Towers
Miss Janette Stevenson
Miss Olivia Pih
Mr S Frodsham
 Miss K Bednarek
Mrs J Burns
Mr K Coates
Miss VM Jones
Ms Y Hough
Mrs C Thompson
Mr M Livesey
Mr B Gleaves
Mr P Pickard
Mr H Sulaiman
Davron Hackney
Miss C Wall
Mr Pawan Punn
Mr S Heath
Mr A Alexand

  Mrs K Smith
Mr J Hilaire
  Mrs B Melville
  Mrs M Houston
  Ms F Gray
  Mrs S Hall
  Mr I Murdoch for the Late Mrs A Murdoch
  Mrs M McLaughlin
  Miss K Hastings
  Mrs I Nelson
  Mrs A Byron
  Mr A Shepherdson
  Ms L Carstairs
  Miss A Stewart
  Mrs E Mackie
page  116
  Mr J Kerr
  Ms S Fox
 Ms P Whitelaw
  Ms E Mackie
  Miss M Elo
  Mr J Hetherington
  Mr L Royal
  Ms M Kunz
error on case:  
Read the full decision in   Mrs M A Adenusi V Amritpal Singh Walia Whitfield Ventures Ltd: 2202383/2019 - Judgment.

  Mr M Sawney
  Mr M Tingling
  Mr N Saeed
  Mrs I Simkuviene
  Mr G Ballantyne
  Mr V Sehdev
  Mrs M Kashmiri
  Mr D Pearce
  Braye Demolition and Plant Services Ltd
  Mr P Philipse
  Mrs M Da Silvarodrigues
  Mr P Mitchell
  Mr G Bruccoleri
  Ms D Callegaris
  Mrs J Hutton
  Mr M Kelly
  Mr T Rollo
Mr S Inchiostro
Ms D Donnelly
Mr O Navarro
 Mr B Ellis
Miss N Thompson
Mrs G Jellicoe
 Ms K Soos
Mr M Ionel Stelian
Mr J Hart
Miss J Martindale
  Mr A Bagnall and others
 Mrs H Kirsop

In [368]:
case_df.to_csv(r'case summary.csv')


In [379]:
page = requests.get("https://www.gov.uk/employment-tribunal-decisions/ms-c-gray-as-executor-for-ms-gp-simpson-v-north-lanarkshire-council-103476-2007-and-others")
casesoup = BeautifulSoup(page.content, 'html.parser')

casehtml = list(casesoup.children)[4]
casebody = list(casehtml.children)[5]

try:
    case0 = {'CaseNr': [casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text().split('/20')[0].split(': ')[1]], 
             'Country': [casebody.find_all("dd",class_="app-c-important-metadata__definition")[1].get_text()], 
             'Date': [case.find("time")['datetime']], 
             'Jurisdiction' : [find_jurisdiction(casebody)],
             'Outcome' : [casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text().split('- ')[1].split('.')[0]],
             'Claimant' : [casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text()[27:].split(' v ')[0]],
             'Defendent' : [casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text()[27:].split(' v ')[1].split(': ')[0]]}
except:
    print("error on case: ",casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text().split('/20')[0].split(': ')[1] )
    
    

error on case:  103476


In [378]:
casebody.find_all("div", class_="gem-c-govspeak govuk-govspeak direction-ltr")[0].get_text()

'\nRead the full decision in Ms C Gray As Executor for Ms GP Simpson v North Lanarkshire Council: 103476/2007 and Others.\n'