In [1]:
import pandas as pd
import os
from collections import defaultdict

# Intro

This notebook shows the methodology for building a 'work from home' and a 'key worker' sampler for the population of London. The sampler uses a distribution based on person occupation, gender and work status (full-time or part-time). 

## Methodology

Firstly we manually define three mappings based on SOC2010 occupation categories (level 2):

- occ_mapping: mapping from SOC2010 cats to the lopops occupation cats
- key_worker_mapping: mapping from SOC2010 cats to key worker cat {0: not key worker, 1: key worker}
- home_worker_mapping: mapping from SOC2010 cats to home worker cat {0: not able to WFH, 1: able to WFH}

We then use this mapping to build a frequency based distributions that are dependant on person occupation, work status and gender.

## Occupation Descriptions

We use the following descriptions to make the occ_mapping:

### occ1 Modern professional occupations

eg: teacher – nurse – physiotherapist – social worker – welfare officer – artist – musician – police officer (sergeant or above) – software designer

### occ2 Clerical and intermediate occupations

eg: secretary – personal assistant – clerical worker – office clerk – call centre agent – nursing auxiliary – nursery nurse

### occ3 Senior managers or administrators

(usually responsible for planning, organising and co-ordinating work, and for finance)

eg: finance manager – chief executive

### occ4 Technical and craft occupations

eg: motor mechanic – fitter – inspector – plumber – printer – tool maker – electrician – gardener – train driver

### occ5 Semi-routine manual and service occupations

eg: postal worker – machine operative – security guard – caretaker – farm worker – catering assistant – receptionist – sales assistant

### occ6 Routine manual and service occupations

eg: HGV driver – van driver – cleaner – porter – packer – sewing machinist – messenger – labourer – waiter/waitress – bar staff

### occ7 Middle or junior managers

eg: office manager – retail manager – bank manager – restaurant manager – warehouse manager – publican

### occ8 Traditional professional occupations

eg: accountant – solicitor – medical practitioner – scientist – civil/mechanical engineer

## Mappings

In [2]:
# {i:[occ, key_worker, home_worker]}

full_mapping = {
    '111 Chief Executives and Senior Officials': ['occ3', 0, 1.0],
    '112 Production Managers and Directors': ['occ3', 0, 1.0],
    '113 Functional Managers and Directors': ['occ3', 0, 1.0],
    '115 Financial Institution Managers and Directors': ['occ3', 0, 1.0],
    '116 Managers and Directors in Transport and Logistics': ['occ3', 0, 1.0],
    '117 Senior Officers in Protective Services': ['occ3', 0, 1.0],
    '118 Health and Social Services Managers and Directors': ['occ3', 0, 1.0],
    '119 Managers and Directors in Retail and Wholesale': ['occ3', 0, 1.0],
    '121 Managers and Proprietors in Agriculture Related Services': ['occ3', 0, 1.0],
    '122 Managers and Proprietors in Hospitality and Leisure Services': ['occ3', 0, 1.0],
    '124 Managers and Proprietors in Health and Care Services': ['occ3', 0, 1.0],
    '125 Managers and Proprietors in Other Services': ['occ3', 0, 1.0],
    '211 Natural and Social Science Professionals': ['occ8', 0, 1.0],
    '212 Engineering Professionals': ['occ8', 0, 1.0],
    '213 Information Technology and Telecommunications Professionals': ['occ1', 0, 1.0],
    '214 Conservation and Environment Professionals': ['occ1', 0, 1.0],
    '215 Research and Development Managers': ['occ1', 0, 1.0],
    '221 Health Professionals': ['occ8', 1, 0.0],
    '222 Therapy Professionals': ['occ8', 1, 0.0],
    '223 Nursing and Midwifery Professionals': ['occ8', 1, 0.0],
    '231 Teaching and Educational Professionals': ['occ8', 1, 0.0],
    '241 Legal Professionals': ['occ8', 0, 1.0],
    '242 Business, Research and Administrative Professionals': ['occ1', 0, 1.0],
    '243 Architects, Town Planners and Surveyors': ['occ8', 0, 1.0],
    '244 Welfare Professionals': ['occ1', 0, 1.0],
    '245 Librarians and Related Professionals': ['occ8', 0, 1.0],
    '246 Quality and Regulatory Professionals': ['occ1', 0, 1.0],
    '247 Media Professionals': ['occ1', 0, 1.0],
    '311 Science, Engineering and Production Technicians': ['occ1', 0, 1.0],
    '312 Draughtspersons and Related Architectural Technicians': ['occ8', 0, 1.0],
    '313 Information Technology Technicians': ['occ1', 0, 1.0],
    '321 Health Associate Professionals': ['occ1', 0, 1.0],
    '323 Welfare and Housing Associate Professionals': ['occ1', 0, 1.0],
    '331 Protective Service Occupations': ['occ1', 0, 1.0],
    '341 Artistic, Literary and Media Occupations': ['occ1', 0, 1.0],
    '342 Design Occupations': ['occ1', 0, 1.0],
    '344 Sports and Fitness Occupations': ['occ1', 0, 1.0],
    '351 Transport Associate Professionals': ['occ1', 0, 1.0],
    '352 Legal Associate Professionals': ['occ8', 0, 1.0],
    '353 Business, Finance and Related Associate Professionals': ['occ1', 0, 1.0],
    '354 Sales, Marketing and Related Associate Professionals': ['occ1', 0, 1.0],
    '355 Conservation and Environmental associate professionals': ['occ1', 0, 1.0],
    '356 Public Services and Other Associate Professionals': ['occ1', 0, 1.0],
    '411 Administrative Occupations: Government and Related Organisations': ['occ2', 0, 1.0],
    '412 Administrative Occupations: Finance': ['occ2', 0, 1.0],
    '413 Administrative Occupations: Records': ['occ2', 0, 1.0],
    '415 Other Administrative Occupations': ['occ2', 0, 1.0],
    '416 Administrative Occupations: Office Managers and Supervisors': ['occ7', 0, 1.0],
    '421 Secretarial and Related Occupations': ['occ2', 0, 1.0],
    '511 Agricultural and Related Trades': ['occ5', 1, 0.0],
    '521 Metal Forming, Welding and Related Trades': ['occ4', 0, 0.0],
    '522 Metal Machining, Fitting and Instrument Making Trades': ['occ4', 0, 0.0],
    '523 Vehicle Trades': ['occ4', 0, 0.0],
    '524 Electrical and Electronic Trades': ['occ4', 0, 0.0],
    '525 Skilled Metal, Electrical and Electronic Trades Supervisors': ['occ4', 0, 0.0],
    '531 Construction and Building Trades': ['occ4', 0, 0.0],
    '532 Building Finishing Trades': ['occ4', 0, 0.0],
    '533 Construction and Building Trades Supervisors': ['occ4', 0, 0.0],
    '541 Textiles and Garments Trades': ['occ6', 0, 0.0],
    '542 Printing Trades': ['occ6', 0, 0.0],
    '543 Food Preparation and Hospitality Trades': ['occ5', 0, 0.0],
    '544 Other Skilled Trades': ['occ4', 0, 0.0],
    '612 Childcare and Related Personal Services': ['occ2', 0, 0.0],
    '613 Animal Care and Control Services': ['occ2', 0, 0.0],
    '614 Caring Personal Services': ['occ2', 0, 0.0],
    '621 Leisure and Travel Services': ['occ2', 0, 0.0],
    '622 Hairdressers and Related Services': ['occ2', 0, 0.0],
    '623 Housekeeping and Related Services': ['occ2', 0, 0.0],
    '624 Cleaning and Housekeeping Managers and Supervisors': ['occ7', 0, 0.0],
    '711 Sales Assistants and Retail Cashiers': ['occ6', 0, 0.0],
    '712 Sales Related Occupations': ['occ6', 0, 0.0],
    '713 Sales Supervisors': ['occ7', 0, 0.0],
    '721 Customer Service Occupations': ['occ6', 0, 0.0],
    '722 Customer Service Managers and Supervisors': ['occ7', 0, 0.0],
    '811 Process Operatives': ['occ5', 0, 0.0],
    '812 Plant and Machine Operatives': ['occ5', 0, 0.0],
    '813 Assemblers and Routine Operatives': ['occ5', 0, 0.0],
    '814 Construction Operatives': ['occ5', 0, 0.0],
    '821 Road Transport Drivers': ['occ5', 1, 0.0],
    '822 Mobile Machine Drivers and Operatives': ['occ5', 0, 0.0],
    '823 Other Drivers and Transport Operatives': ['occ5', 1, 0.0],
    '911 Elementary Agricultural Occupations': ['occ6', 1, 0.0],
    '912 Elementary Construction Occupations': ['occ6', 0, 0.0],
    '913 Elementary Process Plant Occupations': ['occ6', 0, 0.0],
    '921 Elementary Administration Occupations': ['occ6', 0, 0.0],
    '923 Elementary Cleaning Occupations': ['occ6', 0, 0.0],
    '924 Elementary Security Occupations': ['occ6', 1, 0.0],
    '925 Elementary Sales Occupations': ['occ6', 0, 0.0],
    '926 Elementary Storage Occupations': ['occ6', 0, 0.0],
    '927 Other Elementary Services Occupations': ['occ6', 0, 0.0]
}

In [3]:
occ_mapping = {k: v[0] for k, v in full_mapping.items()}
key_worker_mapping = {k: v[1] for k, v in full_mapping.items()}
home_worker_mapping = {k: v[2] for k, v in full_mapping.items()}

## Distribution
We retrieve the SOC2010 codes from the NOMIS official labour market statistics from https://www.nomisweb.co.uk/datasets/aps210/reports/employment-by-sex-by-ftpt-by-emp-self. We extract data for London only and use the provided breakdown by gender and work status (full-time/part-time) to build a distribution of key workers and workers able to work from home.

In [4]:
def clean(df):
    """
    Prepare SOC2010 table by combining full time and part time columns and forcing to numeric.
    """
    for col in ['Full-time', 'Full-time.1', 'Part-time','Part-time.1']:
        df[col] = pd.to_numeric(df[col], errors='coerce').fillna(0).astype(int)

    df['ft'] = df['Full-time'] + df['Full-time.1']
    df['pt'] = df['Part-time'] + df['Part-time.1']
        
    df = df[['SOC2010', 'ft', 'pt']]
    
    return df

In [5]:
def load_data(occ_mapping):
    """
    Load and combine male and female data
    """
    male = pd.read_excel('/Users/fred.shone/Downloads/emp04sep2018.xls', sheet_name='Men', skiprows=4)
    male = male.loc[male.SOC2010.isin(list(occ_mapping))]  # filter for level 2 SOC2010 codes
    male = clean(male)
    male['gender'] = 'male'

    female = pd.read_excel('/Users/fred.shone/Downloads/emp04sep2018.xls', sheet_name='Women', skiprows=4)
    female = female.loc[female.SOC2010.isin(list(occ_mapping))]  # filter for level 2 SOC2010 codes
    female = clean(female)
    female['gender'] = 'female'

    data = pd.concat([female, male])
    data['occ'] = data['SOC2010'].map(occ_mapping)
    
    return data

In [6]:
def build_dist(occ_mapping, name, mapping):
    """
    Build distribution dict.
    """
    data = load_data(occ_mapping)
    data[name] = data['SOC2010'].map(mapping) # apply wfh mapping
    
    melted = pd.melt(data, id_vars=['gender', 'occ', name], value_vars=['ft', 'pt']) # melt work status into index
    
    grouped = melted.groupby(['gender', 'occ', 'variable', name]).value.sum()  # group to find unique outcomes
    
    grouped = grouped.unstack(level=name)  # unstack wfh (we only care about wfh=1)
    grouped = grouped.fillna(0).astype(int)  # fill for zero freq of wfh
    
    totals = grouped[0] + grouped[1]  # calc probability for each line
    grouped[0] = grouped[0] / totals
    grouped[1] = grouped[1] / totals

    dist = defaultdict(lambda: defaultdict(dict))  # build dict

    for (gender, occ, work), line in grouped.iterrows():
        dist[occ][work][gender] = line[1]
    
    return dist

## Work From Home Sampler Distribution

In [7]:
wfh_distribution = build_dist(occ_mapping, 'wfh', home_worker_mapping)
wfh_distribution

defaultdict(<function __main__.build_dist.<locals>.<lambda>()>,
            {'occ1': defaultdict(dict,
                         {'ft': {'female': 1.0, 'male': 1.0},
                          'pt': {'female': 1.0, 'male': 1.0}}),
             'occ2': defaultdict(dict,
                         {'ft': {'female': 0.5097493472126429,
                           'male': 0.6306422558286575},
                          'pt': {'female': 0.49521826378117134,
                           'male': 0.42618583086831563}}),
             'occ3': defaultdict(dict,
                         {'ft': {'female': 1.0, 'male': 1.0},
                          'pt': {'female': 1.0, 'male': 1.0}}),
             'occ4': defaultdict(dict,
                         {'ft': {'female': 0.0, 'male': 0.0},
                          'pt': {'female': 0.0, 'male': 0.0}}),
             'occ5': defaultdict(dict,
                         {'ft': {'female': 0.0, 'male': 0.0},
                          'pt': {'female': 0.0, 'male': 0.0

## Key Worker Sampler Distribution

In [8]:
key_distribution = build_dist(occ_mapping, 'key', key_worker_mapping)
key_distribution

defaultdict(<function __main__.build_dist.<locals>.<lambda>()>,
            {'occ1': defaultdict(dict,
                         {'ft': {'female': 0.0, 'male': 0.0},
                          'pt': {'female': 0.0, 'male': 0.0}}),
             'occ2': defaultdict(dict,
                         {'ft': {'female': 0.0, 'male': 0.0},
                          'pt': {'female': 0.0, 'male': 0.0}}),
             'occ3': defaultdict(dict,
                         {'ft': {'female': 0.0, 'male': 0.0},
                          'pt': {'female': 0.0, 'male': 0.0}}),
             'occ4': defaultdict(dict,
                         {'ft': {'female': 0.0, 'male': 0.0},
                          'pt': {'female': 0.0, 'male': 0.0}}),
             'occ5': defaultdict(dict,
                         {'ft': {'female': 0.2146727132583496,
                           'male': 0.5181683994382312},
                          'pt': {'female': 0.28206362192598294,
                           'male': 0.726142049613971}}