Table 2.2.1: Ethnicity of all young people in treatment 2016-17

In [1]:
from gssutils import *

if is_interactive():
    import requests
    from cachecontrol import CacheControl
    from cachecontrol.caches.file_cache import FileCache
    from cachecontrol.heuristics import LastModified
    from pathlib import Path

    session = CacheControl(requests.Session(),
                           cache=FileCache('.cache'),
                           heuristic=LastModified())

    sourceFolder = Path('in')
    sourceFolder.mkdir(exist_ok=True)

    inputURL = 'https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/664944/'\
                    'Young-people-statistics-data-tables-from-the-national-drug-treatment-monitoring-system-2016-2017.xls'
    inputFile = sourceFolder / 'Young-people-statistics-data-tables-from-the-national-drug-treatment-monitoring-system-2016-2017.xls'
    response = session.get(inputURL)
    with open(inputFile, 'wb') as f:
      f.write(response.content)    

In [2]:
tab = loadxlstabs(inputFile, sheetids='2.2.1 Ethnicity')[0]

Loading in\Young-people-statistics-data-tables-from-the-national-drug-treatment-monitoring-system-2016-2017.xls which has size 281600 bytes
Table names: ['2.2.1 Ethnicity']


In [3]:
observations = tab.excel_ref('B5').expand(DOWN).expand(RIGHT).is_not_blank()
observations

{<C10 0.01>, <B19 98.0>, <B21 16351.0>, <B15 170.0>, <C8 0.02>, <B14 180.0>, <B7 460.0>, <C5 0.04>, <B13 181.0>, <C15 0.01>, <B12 220.0>, <B10 243.0>, <B17 103.0>, <C18 0.01>, <C6 0.03>, <C17 0.01>, <C20 0.0>, <C19 0.01>, <C7 0.03>, <B6 522.0>, <B9 302.0>, <B16 152.0>, <B22 85.0>, <B23 16436.0>, <B18 100.0>, <C9 0.02>, <B11 224.0>, <B20 7.0>, <C12 0.01>, <C11 0.01>, <C21 1.0>, <C14 0.01>, <B8 321.0>, <C13 0.01>, <B5 610.0>, <C16 0.01>}

In [4]:
age = tab.excel_ref('A4').expand(DOWN).is_not_blank() - tab.excel_ref('A23')
age

{<A19 'Indian'>, <A5 'Other White'>, <A16 'White & Asian'>, <A10 'Not Stated'>, <A13 'Bangladeshi'>, <A15 'Other Asian'>, <A17 'White Irish'>, <A8 'Other Mixed'>, <A18 'White & Black African'>, <A6 'White & Black Caribbean'>, <A12 'Other'>, <A22 'Missing or inconsistent data'>, <A20 'Chinese'>, <A7 'Caribbean'>, <A14 'Pakistani'>, <A9 'African'>, <A4 'White British'>, <A11 'Other Black'>, <A21 'Total'>}

In [5]:
measuretype = tab.excel_ref('B3').expand(RIGHT).is_not_blank() 
measuretype

{<C3 '%'>, <B3 'n'>}

In [6]:
Dimensions = [
            HDim(age,'Basis of treatment',DIRECTLY,LEFT),
            HDimConst('Clients in treatment','All young clients'),
            HDim(measuretype,'Measure Type',DIRECTLY,ABOVE),
            HDimConst('Unit','People')            
            ]

In [7]:
c1 = ConversionSegment(observations, Dimensions, processTIMEUNIT=True)
# if is_interactive():
#     savepreviewhtml(c1)

In [8]:
new_table = c1.topandas()
new_table




Unnamed: 0,OBS,Basis of treatment,Clients in treatment,Measure Type,Unit
0,610.0,Other White,All young clients,n,People
1,0.04,Other White,All young clients,%,People
2,522.0,White & Black Caribbean,All young clients,n,People
3,0.03,White & Black Caribbean,All young clients,%,People
4,460.0,Caribbean,All young clients,n,People
5,0.03,Caribbean,All young clients,%,People
6,321.0,Other Mixed,All young clients,n,People
7,0.02,Other Mixed,All young clients,%,People
8,302.0,African,All young clients,n,People
9,0.02,African,All young clients,%,People


In [9]:
new_table = new_table[new_table['OBS'] != 0 ]

In [10]:
new_table.columns = ['Value' if x=='OBS' else x for x in new_table.columns]

In [11]:
new_table['Basis of treatment'] = 'Ethnicity/' + new_table['Basis of treatment']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [12]:
new_table.head()

Unnamed: 0,Value,Basis of treatment,Clients in treatment,Measure Type,Unit
0,610.0,Ethnicity/Other White,All young clients,n,People
1,0.04,Ethnicity/Other White,All young clients,%,People
2,522.0,Ethnicity/White & Black Caribbean,All young clients,n,People
3,0.03,Ethnicity/White & Black Caribbean,All young clients,%,People
4,460.0,Ethnicity/Caribbean,All young clients,n,People


In [13]:
new_table['Basis of treatment'].unique()

array(['Ethnicity/Other White', 'Ethnicity/White & Black Caribbean',
       'Ethnicity/Caribbean', 'Ethnicity/Other Mixed',
       'Ethnicity/African', 'Ethnicity/Not Stated',
       'Ethnicity/Other Black', 'Ethnicity/Other',
       'Ethnicity/Bangladeshi', 'Ethnicity/Pakistani',
       'Ethnicity/Other Asian', 'Ethnicity/White & Asian',
       'Ethnicity/White Irish', 'Ethnicity/White & Black African',
       'Ethnicity/Indian', 'Ethnicity/Chinese', 'Ethnicity/Total',
       'Ethnicity/Missing or inconsistent data', nan], dtype=object)

In [14]:
new_table['Basis of treatment'] = new_table['Basis of treatment'].map(
    lambda x: {
        'Ethnicity/Total' : 'Ethnicity/All' 
        }.get(x, x))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [15]:
new_table.head()

Unnamed: 0,Value,Basis of treatment,Clients in treatment,Measure Type,Unit
0,610.0,Ethnicity/Other White,All young clients,n,People
1,0.04,Ethnicity/Other White,All young clients,%,People
2,522.0,Ethnicity/White & Black Caribbean,All young clients,n,People
3,0.03,Ethnicity/White & Black Caribbean,All young clients,%,People
4,460.0,Ethnicity/Caribbean,All young clients,n,People


In [16]:
new_table['Measure Type'] = new_table['Measure Type'].map(
    lambda x: {
        'n' : 'Count', 
        '%' : 'Percentage',
        }.get(x, x))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [17]:
new_table.tail()

Unnamed: 0,Value,Basis of treatment,Clients in treatment,Measure Type,Unit
30,7.0,Ethnicity/Chinese,All young clients,Count,People
32,16351.0,Ethnicity/All,All young clients,Count,People
33,1.0,Ethnicity/All,All young clients,Percentage,People
34,85.0,Ethnicity/Missing or inconsistent data,All young clients,Count,People
35,16436.0,,All young clients,Count,People


In [18]:
new_table['Basis of treatment'].fillna('Ethnicity/All inclusice Missing or inconsistent data', inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._update_inplace(new_data)


In [19]:
new_table.dtypes

Value                   float64
Basis of treatment       object
Clients in treatment     object
Measure Type             object
Unit                     object
dtype: object

In [20]:
new_table['Value'] = new_table['Value'].astype(str)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [21]:
new_table.head(3)

Unnamed: 0,Value,Basis of treatment,Clients in treatment,Measure Type,Unit
0,610.0,Ethnicity/Other White,All young clients,Count,People
1,0.04,Ethnicity/Other White,All young clients,Percentage,People
2,522.0,Ethnicity/White & Black Caribbean,All young clients,Count,People


In [22]:
new_table['Period'] = '2016-17'
new_table['Substance'] = 'All'
new_table = new_table[['Period','Basis of treatment','Substance','Clients in treatment','Measure Type','Value','Unit']]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [23]:
if is_interactive():
    SubstancetinationFolder = Path('out')
    SubstancetinationFolder.mkdir(exist_ok=True, parents=True)
    new_table.to_csv(SubstancetinationFolder / ('table2.2.1.csv'), index = False)

In [24]:
new_table.tail()

Unnamed: 0,Period,Basis of treatment,Substance,Clients in treatment,Measure Type,Value,Unit
30,2016-17,Ethnicity/Chinese,All,All young clients,Count,7.0,People
32,2016-17,Ethnicity/All,All,All young clients,Count,16351.0,People
33,2016-17,Ethnicity/All,All,All young clients,Percentage,1.0,People
34,2016-17,Ethnicity/Missing or inconsistent data,All,All young clients,Count,85.0,People
35,2016-17,Ethnicity/All inclusice Missing or inconsisten...,All,All young clients,Count,16436.0,People
