Census of Drug and Alcohol Treatment Services in Northern Ireland

In [78]:
from gssutils import *
scraper = Scraper('https://www.health-ni.gov.uk/publications/census-drug-and-alcohol-treatment-services-northern-ireland-2017')
scraper

## Census of drug and alcohol treatment services in Northern Ireland 2017

### Distributions

1. Drug and Alcohol Census 2017 ([application/pdf](https://www.health-ni.gov.uk/sites/default/files/publications/health/drug-alcohol-census-2017.pdf))
1. Data from Census of Drug and Alcohol Treatment Services ([MS Excel Spreadsheet](https://www.health-ni.gov.uk/sites/default/files/publications/dhssps/data-census-drug-alcohol-treatment-services.xlsx))
1. Pre-release Access List Drug and Alcohol Census ([application/pdf](https://www.health-ni.gov.uk/sites/default/files/publications/dhssps/pre-release-drug-alcohol-census.pdf))


In [79]:
tabs = {tab.name: tab for tab in scraper.distribution(
    title='Data from Census of Drug and Alcohol Treatment Services').as_databaker()}
tabs.keys()
tabs

{'Contents & Notes to tables': {<B6 'Breakdown by Age and Gender'>, <A4 'Contents'>, <A16 ''>, <B4 ''>, <A19 3.0>, <B9 'Breakdown by Trust'>, <B1 ''>, <A8 'Table 3'>, <B17 'A dashed line (-) represents a cell count of less than three and a * represents a cell that has been masked. This is in order to avoid issues involving personal disclosure, where it may be possible to identify an individual from the data provided.  '>, <A6 'Table 1'>, <B2 ''>, <A14 ''>, <A9 'Table 4'>, <A15 'Notes to Tables'>, <A13 ''>, <B15 ''>, <A12 ''>, <B16 ''>, <B7 'Breakdown by Service Type'>, <A18 2.0>, <A11 ''>, <B13 ''>, <A1 'Census of Drug and Alcohol Treatment Services in Northern Ireland:'>, <A17 1.0>, <B18 'All percentages are rounded to the nearest percentage point.'>, <B14 ''>, <B11 ''>, <A5 ''>, <B8 'Breakdown by Residential Status'>, <B10 'Comparison Table'>, <A10 'Table 5'>, <B12 ''>, <B19 'Hospital Information System (HIS) figures relate to the number of emergency admissions to HSC hospitals for o

In [80]:
tables = []
for tab_name, script in [
    ('Table 1', 'Treatment Services by Age and Gender.ipynb'),
    ('Table 2', 'Treatment Services by Service Type(Age).ipynb'),
    ('Table 2', 'Treatment Services by Service Type(sex).ipynb'),
    ('Table 3', 'Treatment Services by Residential status(Age).ipynb'),
    ('Table 3', 'Treatment Services by Residential Status(sex).ipynb'),
    ('Table 4', 'Treatment Services by Trust(Age).ipynb'),
    ('Table 4', 'Treatment Services by Trust(sex).ipynb'))
]:
    tab = tabs[tab_name]
    %run "$script"
    tables.append(new_table)











In [84]:
tidy = pd.concat(tables)
tidy.count()
tables[7].to_csv('testCompareCC.csv', index = False)

In [11]:
tidy['Treatment Type'].fillna('Total', inplace = True)

In [12]:
tidy.count()

Period                          724
Sex                             724
Age                             724
Service Type                    724
Residential Status              724
Treatment Type                  724
Health and Social Care Trust    724
Measure Type                    724
Unit                            724
Value                           724
dtype: int64

In [13]:
tidy['Treatment Type'].unique()

array(['Alcohol Only', 'Drugs Only', 'Drugs & Alcohol', 'Total',
       'Under 18s', '18 and over', 'All'], dtype=object)

In [14]:
tidy.head()

Unnamed: 0,Period,Sex,Age,Service Type,Residential Status,Treatment Type,Health and Social Care Trust,Measure Type,Unit,Value
0,1 March 2017,Persons,Under 18,All,All,Alcohol Only,All,Count,People,95
1,1 March 2017,Persons,Under 18,All,All,Drugs Only,All,Count,People,324
2,1 March 2017,Persons,Under 18,All,All,Drugs & Alcohol,All,Count,People,294
3,1 March 2017,Persons,Under 18,All,All,Total,All,Count,People,713
4,1 March 2017,Persons,18 and over,All,All,Alcohol Only,All,Count,People,2482


In [15]:
tidy = tidy[tidy['Treatment Type'] == 'Alcohol Only']

In [16]:
tidy.head()

Unnamed: 0,Period,Sex,Age,Service Type,Residential Status,Treatment Type,Health and Social Care Trust,Measure Type,Unit,Value
0,1 March 2017,Persons,Under 18,All,All,Alcohol Only,All,Count,People,95
4,1 March 2017,Persons,18 and over,All,All,Alcohol Only,All,Count,People,2482
8,1 March 2017,Persons,All Ages,All,All,Alcohol Only,All,Count,People,2577
12,1 March 2017,Male,Under 18,All,All,Alcohol Only,All,Count,People,31
16,1 March 2017,Male,18 and over,All,All,Alcohol Only,All,Count,People,1536


In [17]:
tidy['Sex'].unique()

array(['Persons', 'Male', 'Female', 'Male (%)', 'Female (%)',
       '% of Total', '% of all Males ', '% of all Females', 'Female  ',
       'All'], dtype=object)

In [18]:
tidy.drop(tidy[tidy.Sex.isin(['% of Total', '% of all Males ','% of all Females'])].index, inplace = True)

In [19]:
tidy['Sex'].unique()

array(['Persons', 'Male', 'Female', 'Male (%)', 'Female (%)', 'Female  ',
       'All'], dtype=object)

In [20]:
tidy['Period'].unique()

array(['1 March 2017', '1st March 2007', '1st March 2010',
       '1st March 2012', '1st September2014'], dtype=object)

In [21]:
from datetime import datetime

In [22]:
tidy.Period = pd.to_datetime(tidy.Period).dt.strftime('%Y-%m-%d')

In [23]:
tidy['Period'] = str('day/') + tidy['Period']

In [24]:
tidy['Age'].unique()

array(['Under 18 ', '18 and over', 'All Ages', 'All'], dtype=object)

In [25]:
tidy['Age'] = tidy['Age'].map(
    lambda x: {
        'Under 18 ' : 'under-18', 
        '18 and over' : '18-plus',
        'All years': 'all' ,
        'All': 'all'
        }.get(x, x))

In [26]:
tidy['Treatment Type'] = 'alcohol'

In [27]:
def user_perc(x,y):
    
    if ( (str(x) == 'Statutory (%)')) | ((str(x) == 'Non-statutory (%)')) | ((str(x) == 'Prison (%)')): 
        
        return 'Percentage'
    else:
        return y
    
tidy['Measure Type'] = tidy.apply(lambda row: user_perc(row['Service Type'], row['Measure Type']), axis = 1)



In [28]:
def user_perc(x,y):
    
    if ( (str(x) == 'Residential (%)')) | ((str(x) == 'Non-residential (%)')) | ((str(x) == 'Mixed (%)')): 
        
        return 'Percentage'
    else:
        return y
    
tidy['Measure Type'] = tidy.apply(lambda row: user_perc(row['Residential Status'], row['Measure Type']), axis = 1)



In [29]:
tidy['Health and Social Care Trust'].unique()

array(['All', 'Belfast', 'Northern', 'South Eastern', 'Southern',
       'Western', 'Northern (%)', 'South Eastern (%)', 'Southern (%)',
       'Western (%)', 'Emergency admissions (HIS) (%)',
       'Emergency admissions (HIS)', 'Belfast (%)',
       'Emergency admissions (HIS)  (%)'], dtype=object)

In [30]:
def user_perc(x,y):
    
    if ( (str(x) == 'Belfast (%)')) | ((str(x) == 'Northern (%)')) | ((str(x) == 'South Eastern (%)')) | ((str(x) == 'Southern (%)')) | ((str(x) == 'Western (%)')) | ((str(x) == 'Emergency admissions (HIS) (%)')) : 
        
            return 'Percentage'
    else:
            return y
    
tidy['Measure Type'] = tidy.apply(lambda row: user_perc(row['Health and Social Care Trust'], row['Measure Type']), axis = 1)

In [31]:
def user_perc(x,y):
    
    if  ((str(x) == 'Emergency admissions (HIS)  (%)')) : 
        
            return 'Percentage'
    else:
            return y
    
tidy['Measure Type'] = tidy.apply(lambda row: user_perc(row['Health and Social Care Trust'], row['Measure Type']), axis = 1)

In [32]:
tidy['Health and Social Care Trust'] = tidy['Health and Social Care Trust'].str.rstrip(' (%)')

In [33]:
tidy['Health and Social Care Trust'] = tidy['Health and Social Care Trust'].str.lower()

In [34]:
tidy['Health and Social Care Trust'].unique()

array(['all', 'belfast', 'northern', 'south eastern', 'southern',
       'western', 'emergency admissions (his'], dtype=object)

In [35]:
tidy['Health and Social Care Trust'] = tidy['Health and Social Care Trust'].map(
    lambda x: {
        'south eastern' : 'south-eastern', 
        'emergency admissions (his)' : 'emergency-admissions',
        'emergency admissions (his' : 'emergency-admissions'
        }.get(x, x))

In [36]:
tidy['Measure Type'] = tidy['Measure Type'].map(
    lambda x: {
        'Headcount' : 'Count', 
        'Percentage of Headcount' : 'Percentage',
        }.get(x, x))

In [37]:
tidy['Sex'].unique()

array(['Persons', 'Male', 'Female', 'Male (%)', 'Female (%)', 'Female  ',
       'All'], dtype=object)

In [38]:
def user_perc(x,y):
    
    if ( (str(x) == 'Male (%)')) | ((str(x) == 'Female (%)')): 
        
        return 'Percentage'
    else:
        return y
    
tidy['Measure Type'] = tidy.apply(lambda row: user_perc(row['Sex'], row['Measure Type']), axis = 1)



In [39]:
tidy['Sex'] = tidy['Sex'].map(
    lambda x: {
        'Female' : 'F', 
        'Male' : 'M',
        'Persons' : 'T',
        'Female  ' : 'F',
        'Male (%)' : 'M',
        'Female (%)': 'F'
        }.get(x, x))

In [40]:
tidy['Sex'] = tidy['Sex'].str.rstrip('(%)')

In [41]:
tidy['Service Type'].unique()

array(['All', 'Statutory', 'Non-statutory', 'Prison', 'Statutory (%)',
       'Non-statutory (%)', 'Prison (%)'], dtype=object)

In [42]:
tidy['Service Type'] = tidy['Service Type'].map(
    lambda x: {
        'Total' : 'all',
        'total' : 'all'
        }.get(x, x))

In [43]:
tidy['Service Type'] = tidy['Service Type'].str.rstrip(' (%)')

In [44]:
tidy['Service Type'] = tidy['Service Type'].str.lower()

In [45]:
tidy['Residential Status'] = tidy['Residential Status'].str.rstrip(' (%)')

In [46]:
tidy['Residential Status'] = tidy['Residential Status'].str.lower()

In [47]:
tidy['Health and Social Care Trust'] = tidy['Health and Social Care Trust'].map(
    lambda x: {
        'total' : 'all' 
        }.get(x, x))

In [48]:
tidy['Residential Status'] = tidy['Residential Status'].map(
    lambda x: {
        'total' : 'all' 
        }.get(x, x))

In [49]:
tidy.head()

Unnamed: 0,Period,Sex,Age,Service Type,Residential Status,Treatment Type,Health and Social Care Trust,Measure Type,Unit,Value
0,day/2017-03-01,T,under-18,all,all,alcohol,all,Count,People,95
4,day/2017-03-01,T,18-plus,all,all,alcohol,all,Count,People,2482
8,day/2017-03-01,T,All Ages,all,all,alcohol,all,Count,People,2577
12,day/2017-03-01,M,under-18,all,all,alcohol,all,Count,People,31
16,day/2017-03-01,M,18-plus,all,all,alcohol,all,Count,People,1536


In [50]:
from pathlib import Path
out = Path('out')
out.mkdir(exist_ok=True)
tidy.to_csv(out / 'observations.csv', index = False)

In [51]:
tidy[tidy['Measure Type'] != 'Percentage'].to_csv(out / 'observations-no-percentages.csv', index=False)

There's a metadata tab in the spreadsheet with abstract and contact details. **Todo: extract these and also figure out what the license should really be.**

In [52]:
md_tab = tabs['Metadata']
md_heading = md_tab.filter('Metadata')
md_heading.assert_one()
properties = md_heading.fill(DOWN).is_not_whitespace()
values = properties.fill(RIGHT).is_not_whitespace()
headings = properties - values.fill(LEFT)
properties = properties - headings
cs = ConversionSegment(values, [
    HDim(properties, "Property", DIRECTLY, LEFT),
    HDim(headings, "Heading", CLOSEST, UP)
], includecellxy=True)
savepreviewhtml(cs)

0,1,2
OBS,Property,Heading

0,1
Metadata,
,
Identification,
Dataset Title,Census of Drug and Alcohol Treatment Services in Northern Ireland: 1 March 2017 (Tables)
Abstract,This bulletin summarises the number of people in treatment for problem drug and/or alcohol misuse and relates to a snapshot of those in treatment on 1 March 2017.
Year of Data,"1 March 2017 (with 2007, 2010, 2012 and 2014 comparisons)"
Reporting period,Snapshot
,
Classification,
National Statistics Theme,Health and Social Care


In [53]:
from gssutils.metadata import THEME

for i, row in cs.topandas().iterrows():
    v = md_tab.get_at(row.__x, row.__y).value.strip()
    if row.Property == 'Abstract':
        scraper.dataset.description = v
    elif row.Property == 'National Statistics Theme':
        scraper.dataset.theme = THEME[pathify(v)]
    elif row.Property == 'Keyword':
        scraper.dataset.keyword = map(lambda x: x.strip(), v.split(' '))
    elif row.Property == 'Email Address':
        scraper.dataset.contactEmail = f'mailto:{v}'




In [55]:
scraper.dataset.family = 'health'
scraper.dataset.license = 'http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/'

with open(out / 'dataset.trig', 'wb') as metadata:
    metadata.write(scraper.generate_trig())