# ILCH Databaker Recipes

The purpose of this recipe is to create 4 datasets. One growth and one level datasets from the ILCH SA (Seasonally adjusted) spreadsheet, and the same for the NSA (not seasonally adjusted) spreadsheet.

This notebooks contains the following sections

1.) Introduction, selecting filenames & shared functions

2.) The recipe code

3.) Loading and running the NSA recipe

4.) Loading and running the SA recipe


In [1]:
from databaker.framework import *

In [2]:

# FILENAMES - this is the ONLY bit that should need to change
sa_inputfile = "ilchtablestemplatesa.xls"
nsa_inputfile = "ilchtablestemplatensa.xls"


In [3]:
# Shared functions

# Get the growth period
def get_growthPeriod(tab):
    tab_title = tab.excel_ref('A1')
    
    if tab_title.filter(contains_string("year on year")):
        gp = "Annual"
    elif tab_title.filter(contains_string("quarter on quarter")):
        gp = "Quarterly"
    elif tab_title.filter(contains_string("growth rates")):
        gp = "Annual"
    return gp


# Get the measure type
def get_measureType(tab):
    tab_title = tab.excel_ref('A1')
                       
    if tab_title.filter(contains_string("year on year")):
        mt = "Percent"
    elif tab_title.filter(contains_string("quarter on quarter")):
        mt = "Percent"
    elif tab_title.filter(contains_string("growth rates")):
        mt = "Percent"
    else:
        mt = "Index"
    return mt



## The Recipe Code

There are 2 blocks. One defines a function for the "Growth" recipe. The other for the "Level" recipe.

In [4]:

def growth_recipe(saOrNsa):
    
    conversionsegments = []

    for tab in tabs_growth:

        # Set anchor one to the left of cell with "Agriculture" 
        anchor = tab.filter(contains_string("eriod")).assert_one()

        # set up a waffle
        datarows = anchor.fill(DOWN).is_not_blank()
        datacols = anchor.shift(DOWN).fill(RIGHT).is_not_blank()
        obs = datarows.waffle(datacols).is_not_blank()

        # set the growth period & measuretype
        gp = get_growthPeriod(tab)
        mt = get_measureType(tab)

        dimensions = [
                HDimConst(MEASURETYPE, mt),
                HDim(datarows, TIME, DIRECTLY, LEFT),
                HDim(datacols.parent(), "Costs", DIRECTLY, ABOVE),
                HDim(anchor.fill(RIGHT).parent(), "SIC", CLOSEST, LEFT),
                HDimConst("Growth Period", gp),
                HDimConst("SA / NSA", saOrNsa)
                     ]

        # TIME has wierd data markings, get them out
        time = dimensions[1]
        assert time.name == 'TIME', "Time needs to be dimension 0"
        for val in time.hbagset:
            if '(r)' in val.value or ('p') in val.value:
                time.cellvalueoverride[val.value] = val.value[:6]

        conversionsegment = ConversionSegment(tab, dimensions, obs)
        conversionsegments.append(conversionsegment)
    
    return conversionsegments


In [9]:

def level_recipe(saOrNsa):
    
    conversionsegments = []

    for tab in tabs_level:

        # Set anchor one to the left of cell with "Agriculture" 
        anchor = tab.filter(contains_string("eriod")).assert_one()

        # set up a waffle
        datarows = anchor.fill(DOWN).is_not_blank()
        datacols = anchor.shift(DOWN).fill(RIGHT).is_not_blank()
        obs = datarows.waffle(datacols).is_not_blank()
        
        # set the measuretype
        mt = get_measureType(tab)

        dimensions = [
                HDim(datarows, TIME, DIRECTLY, LEFT),
                HDim(datacols.parent(), "Costs", DIRECTLY, ABOVE),
                HDim(anchor.fill(RIGHT).parent(), "SIC", CLOSEST, LEFT),
                HDimConst(MEASURETYPE, mt),
                HDimConst("SA / NSA", saOrNsa)
                     ]

        # TIME has wierd data markings, get them out
        time = dimensions[0]
        assert time.name == 'TIME', "Time needs to be dimension 0"
        for val in time.hbagset:
            if '(r)' in val.value or ('p') in val.value:
                time.cellvalueoverride[val.value] = val.value[:6]

        conversionsegment = ConversionSegment(tab, dimensions, obs)
        conversionsegments.append(conversionsegment)
    
    return conversionsegments


## Loading and Running the NSA Recipes

In [10]:

tabs = loadxlstabs(nsa_inputfile)

Loading ilchtablestemplatensa.xls which has size 289792 bytes
Table names: ['INTRODUCTION', 'DEFINITIONS', '1. Industry level', '2. Sector level', '3. Industry growth rates', '4. Sector growth Rates']


In [11]:

# get the growth and level tabs
tabs_growth = [x for x in tabs if 'growth' in x.name]
tabs_level = [x for x in tabs if 'level' in x.name]

# Sanity check
assert len(tabs_growth) == 2, "We expect the NSA file to have 2 tabs with the word 'growth' in them"
assert len(tabs_level) == 2, "We expect the NSA file require 2 tabs with the word 'level' in them"


In [12]:

# Growth, NSA
outputfile = 'Output-NSA-growth-' + nsa_inputfile[:-4] + '.csv'
writetechnicalCSV(outputfile, growth_recipe("Not seasonally adjusted"))

# LEvel SA
outputfile = 'Output-NSA-level-' + nsa_inputfile[:-4] + '.csv'
writetechnicalCSV(outputfile, level_recipe("Not seasonally adjusted"))


writing 2 conversion segments into /home/goatchurch/sensiblecode/quickcode-ons-recipes/ILCH/Output-NSA-growth-ilchtablestemplatensa.csv
conversionwrite segment size 5952 table '3. Industry growth rates; TIMEUNIT='Quarter'
conversionwrite segment size 2232 table '4. Sector growth Rates; TIMEUNIT='Quarter'
writing 2 conversion segments into /home/goatchurch/sensiblecode/quickcode-ons-recipes/ILCH/Output-NSA-level-ilchtablestemplatensa.csv
conversionwrite segment size 6336 table '1. Industry level; TIMEUNIT='Quarter'
conversionwrite segment size 2376 table '2. Sector level; TIMEUNIT='Quarter'


## Loading and Running the SA Recipes

In [13]:
tabs = loadxlstabs(sa_inputfile)

Loading ilchtablestemplatesa.xls which has size 407552 bytes
Table names: ['INTRODUCTION', 'DEFINITIONS', '1. Industry level SA', '2. Sector level SA', '3. Industry annual growth SA', '4. Sector annual growth SA', '5. Industry quarterly growth SA', '6. Sector quarterly growth SA']


In [14]:

# get the growth and level tabs
tabs_growth = [x for x in tabs if 'growth' in x.name]
tabs_level = [x for x in tabs if 'level' in x.name]

# Sanity check
assert len(tabs_growth) == 4, "We expect the SA file to have 4 tabs with the word 'growth' in them"
assert len(tabs_level) == 2, "We expect the SA file require 2 tabs with the word 'level' in them"

In [15]:

# Growth, SA
outputfile = 'Output-SA-growth-' + sa_inputfile[:-4] + '.csv'
writetechnicalCSV(outputfile, growth_recipe("Seasonally Adjusted"))   # 'A' to match previous months

# LEvel SA
outputfile = 'Output-SA-level-' + sa_inputfile[:-4] + '.csv'
writetechnicalCSV(outputfile, level_recipe("Seasonally adjusted"))

writing 4 conversion segments into /home/goatchurch/sensiblecode/quickcode-ons-recipes/ILCH/Output-SA-growth-ilchtablestemplatesa.csv
conversionwrite segment size 5952 table '3. Industry annual growth SA; TIMEUNIT='Quarter'
conversionwrite segment size 2232 table '4. Sector annual growth SA; TIMEUNIT='Quarter'
conversionwrite segment size 6240 table '5. Industry quarterly growth SA; TIMEUNIT='Quarter'
conversionwrite segment size 2340 table '6. Sector quarterly growth SA; TIMEUNIT='Quarter'
writing 2 conversion segments into /home/goatchurch/sensiblecode/quickcode-ons-recipes/ILCH/Output-SA-level-ilchtablestemplatesa.csv
conversionwrite segment size 6336 table '1. Industry level SA; TIMEUNIT='Quarter'
conversionwrite segment size 2376 table '2. Sector level SA; TIMEUNIT='Quarter'


In [16]:
topandas(level_recipe("Seasonally adjusted")[0])


Unnamed: 0,SIC,-9,Costs,-6,SA / NSA,-2
0,"ILCH_A\nAgriculture, Forestry and Fishing",92.8,Labour Costs per Hour,Index,Seasonally adjusted,2000Q1
1,"ILCH_A\nAgriculture, Forestry and Fishing",92.3,Wage Costs per Hour,Index,Seasonally adjusted,2000Q1
2,"ILCH_A\nAgriculture, Forestry and Fishing",94.1,Other Costs per Hour,Index,Seasonally adjusted,2000Q1
3,"ILCH_A\nAgriculture, Forestry and Fishing",92.7,Labour Costs per Hour Excluding Bonuses and Ar...,Index,Seasonally adjusted,2000Q1
4,ILCH_B\nMining and Quarrying,90.8,Labour Costs per Hour,Index,Seasonally adjusted,2000Q1
5,ILCH_B\nMining and Quarrying,91.3,Wage Costs per Hour,Index,Seasonally adjusted,2000Q1
6,ILCH_B\nMining and Quarrying,82.6,Other Costs per Hour,Index,Seasonally adjusted,2000Q1
7,ILCH_B\nMining and Quarrying,96.4,Labour Costs per Hour Excluding Bonuses and Ar...,Index,Seasonally adjusted,2000Q1
8,"ILCH_C1\nManufacturing - Food Products, Bevera...",94.0,Labour Costs per Hour,Index,Seasonally adjusted,2000Q1
9,"ILCH_C1\nManufacturing - Food Products, Bevera...",93.8,Wage Costs per Hour,Index,Seasonally adjusted,2000Q1
