Table 2.1.1: Age and Gender of all young people in treatment 2016-17

In [1]:
from gssutils import *

if is_interactive():
    import requests
    from cachecontrol import CacheControl
    from cachecontrol.caches.file_cache import FileCache
    from cachecontrol.heuristics import LastModified
    from pathlib import Path

    session = CacheControl(requests.Session(),
                           cache=FileCache('.cache'),
                           heuristic=LastModified())

    sourceFolder = Path('in')
    sourceFolder.mkdir(exist_ok=True)

    inputURL = 'https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/664944/'\
                    'Young-people-statistics-data-tables-from-the-national-drug-treatment-monitoring-system-2016-2017.xls'
    inputFile = sourceFolder / 'Young-people-statistics-data-tables-from-the-national-drug-treatment-monitoring-system-2016-2017.xls'
    response = session.get(inputURL)
    with open(inputFile, 'wb') as f:
      f.write(response.content)    

In [2]:
tab = loadxlstabs(inputFile, sheetids='2.1.1 Age and Gender')[0]

Loading in\Young-people-statistics-data-tables-from-the-national-drug-treatment-monitoring-system-2016-2017.xls which has size 281600 bytes
Table names: ['2.1.1 Age and Gender']


In [3]:
observations = tab.excel_ref('B5').expand(DOWN).expand(RIGHT).is_not_blank()
observations

{<D10 2794.0>, <C9 0.27>, <D6 179.0>, <F12 16436.0>, <C8 0.19>, <E11 0.26>, <C7 0.07>, <B6 105.0>, <B10 1315.0>, <C12 1.0>, <E10 0.26>, <B9 1555.0>, <B7 396.0>, <E9 0.26>, <G5 0.0>, <C10 0.23>, <F6 284.0>, <B8 1061.0>, <B12 5669.0>, <F11 4029.0>, <D9 2791.0>, <B5 12.0>, <G7 0.06>, <D8 1549.0>, <D7 594.0>, <E6 0.02>, <E12 1.0>, <C6 0.02>, <G10 0.25>, <D12 10767.0>, <D5 56.0>, <G9 0.26>, <F5 68.0>, <F9 4346.0>, <E8 0.14>, <C11 0.22>, <F10 4109.0>, <E5 0.01>, <F8 2610.0>, <G11 0.25>, <E7 0.06>, <D11 2804.0>, <G6 0.02>, <G12 1.0>, <G8 0.16>, <B11 1225.0>, <C5 0.0>, <F7 990.0>}

In [4]:
age = tab.excel_ref('A5').expand(DOWN).is_not_blank()
age

{<A10 '16-17'>, <A7 '13-14'>, <A12 'Total clients'>, <A6 '12-13'>, <A11 '17-18'>, <A9 '15-16'>, <A8 '14-15'>, <A5 'Under 12'>}

In [5]:
sex = tab.excel_ref('B3').expand(RIGHT).is_not_blank() 
sex

{<D3 'Male'>, <B3 'Female'>, <F3 'Persons'>}

In [6]:
measuretype = tab.excel_ref('B4').expand(RIGHT).is_not_blank() 
measuretype

{<F4 'n'>, <C4 '%'>, <D4 'n'>, <G4 '%'>, <B4 'n'>, <E4 '%'>}

In [7]:
Dimensions = [
            HDim(age,'Basis of treatment',DIRECTLY,LEFT),
            HDim(sex,'Clients in treatment',CLOSEST,LEFT),
            HDim(measuretype,'Measure Type',DIRECTLY,ABOVE),
            HDimConst('Unit','People')            
            ]

In [8]:
c1 = ConversionSegment(observations, Dimensions, processTIMEUNIT=True)
# if is_interactive():
#     savepreviewhtml(c1)

In [9]:
new_table = c1.topandas()
new_table




Unnamed: 0,OBS,Basis of treatment,Clients in treatment,Measure Type,Unit
0,12.0,Under 12,Female,n,People
1,0.0,Under 12,Female,%,People
2,56.0,Under 12,Male,n,People
3,0.01,Under 12,Male,%,People
4,68.0,Under 12,Persons,n,People
5,0.0,Under 12,Persons,%,People
6,105.0,12-13,Female,n,People
7,0.02,12-13,Female,%,People
8,179.0,12-13,Male,n,People
9,0.02,12-13,Male,%,People


In [10]:
new_table = new_table[new_table['OBS'] != 0 ]

In [11]:
new_table.columns = ['Value' if x=='OBS' else x for x in new_table.columns]

In [12]:
new_table['Basis of treatment'] = 'Ag/' + new_table['Basis of treatment']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [13]:
new_table.head()

Unnamed: 0,Value,Basis of treatment,Clients in treatment,Measure Type,Unit
0,12.0,Ag/Under 12,Female,n,People
2,56.0,Ag/Under 12,Male,n,People
3,0.01,Ag/Under 12,Male,%,People
4,68.0,Ag/Under 12,Persons,n,People
6,105.0,Ag/12-13,Female,n,People


In [14]:
new_table['Basis of treatment'].unique()

array(['Ag/Under 12', 'Ag/12-13', 'Ag/13-14', 'Ag/14-15', 'Ag/15-16',
       'Ag/16-17', 'Ag/17-18', 'Ag/Total clients'], dtype=object)

In [15]:
new_table['Basis of treatment'] = new_table['Basis of treatment'].map(
    lambda x: {
        'Ag/Total clients' : 'Ag/All years' 
        }.get(x, x))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [16]:
new_table['Clients in treatment'] = new_table['Clients in treatment'].map(
    lambda x: {
        'Female' : 'F', 
        'Male' : 'M',
        'Persons' : 'T'
        }.get(x, x))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [17]:
new_table.head()

Unnamed: 0,Value,Basis of treatment,Clients in treatment,Measure Type,Unit
0,12.0,Ag/Under 12,F,n,People
2,56.0,Ag/Under 12,M,n,People
3,0.01,Ag/Under 12,M,%,People
4,68.0,Ag/Under 12,T,n,People
6,105.0,Ag/12-13,F,n,People


In [18]:
new_table['Measure Type'] = new_table['Measure Type'].map(
    lambda x: {
        'n' : 'Count', 
        '%' : 'Percentage',
        }.get(x, x))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [19]:
new_table.tail()

Unnamed: 0,Value,Basis of treatment,Clients in treatment,Measure Type,Unit
43,1.0,Ag/All years,F,Percentage,People
44,10767.0,Ag/All years,M,Count,People
45,1.0,Ag/All years,M,Percentage,People
46,16436.0,Ag/All years,T,Count,People
47,1.0,Ag/All years,T,Percentage,People


In [20]:
new_table.dtypes

Value                   float64
Basis of treatment       object
Clients in treatment     object
Measure Type             object
Unit                     object
dtype: object

In [21]:
new_table['Value'] = new_table['Value'].astype(str)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [22]:
new_table.head(3)

Unnamed: 0,Value,Basis of treatment,Clients in treatment,Measure Type,Unit
0,12.0,Ag/Under 12,F,Count,People
2,56.0,Ag/Under 12,M,Count,People
3,0.01,Ag/Under 12,M,Percentage,People


In [23]:
new_table['Period'] = '2016-17'
new_table['Substance'] = 'All'
new_table = new_table[['Period','Basis of treatment','Substance','Clients in treatment','Measure Type','Value','Unit']]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [24]:
if is_interactive():
    SubstancetinationFolder = Path('out')
    SubstancetinationFolder.mkdir(exist_ok=True, parents=True)
    new_table.to_csv(SubstancetinationFolder / ('table2.1.1.csv'), index = False)

In [25]:
new_table.head()

Unnamed: 0,Period,Basis of treatment,Substance,Clients in treatment,Measure Type,Value,Unit
0,2016-17,Ag/Under 12,All,F,Count,12.0,People
2,2016-17,Ag/Under 12,All,M,Count,56.0,People
3,2016-17,Ag/Under 12,All,M,Percentage,0.01,People
4,2016-17,Ag/Under 12,All,T,Count,68.0,People
6,2016-17,Ag/12-13,All,F,Count,105.0,People
