In [1]:
%run StdPackages.ipynb
os.chdir(d['py'])
from IOfunctions import *
os.chdir(d['curr'])

No clean-up of work-folder


# Running ```readIO``` with options

We can use the class ```readIO``` to perform the operations in ```readIO_tutorial``` as follows:

### Initialize and adjust settings

*Required settings:*

In [2]:
name = 'GR18'
file_v = os.path.join(d['data'], 'IO2018_v.xlsx')
file_i = os.path.join(d['data'], 'IO2018_I.xlsx')
file_k = os.path.join(d['data'], 'IO2018_K.xlsx')
file_mappings = os.path.join(d['data'], 'GR2018_mappings.xlsx')

*For v:*

In [3]:
rowMarkers = {'P': {'ref': 'Dansk produktion','offset': {}},
              'M': {'ref': 'Import', 'offset': {}},
              'OT': {'ref': 'Andre udenlandske transaktioner', 'offset': {}},
              'PI': {'ref': 'Primære inputs', 'offset': {}},
              'TI': {'ref': 'Input/ endelig anvendelse i køberpriser', 'offset': {}},
              'PV': {'ref': 'Produktionsværdi', 'offset': {}}
             }
colMarkers = {'In': {'ref': 'Input i produktionen (Transaktionskode 2000)', 'offset': {'colE': -2}},
              'C' : {'ref': 'Privat forbrug (Transaktions-kode 3110)', 'offset': {'colE': -1}},
              'G_NPISH': {'ref': 'NPISH (Transaktionskode 3130)', 'offset': {}},
              'G_MVPC' : {'ref': 'Markedsmæssigt individuelt offentligt forbrug (Transaktionskode 3141)', 'offset': {}},
              'G_NMVPC': {'ref': 'Ikke markedsmæssigt individuelt offentligt forbrug (Transaktionskode 3142)', 'offset': {}},
              'G_CPC':   {'ref': 'Kollektivt offentligt forbrug (Transaktionskode 3200)', 'offset': {}},
              'I': {'ref': 'Faste bruttoinvesteringer', 'offset': {}},
              'Other': {'ref': 'Andre Anvendelser', 'offset':{}},
              'T': {'ref': 'Total'}
             }
category = {'taxCategories': ['Produktskatter og subsidier, netto', 'Moms', 'Andre produktionsskatter', 'Andre produktionssubsidier'],
            'wageCategory' : 'Aflønning af ansatte',
            'residualIncomeCategory': 'Overskud af produktionen og blandet indkomst',
            'itoryCategories': ['5300','5200'],
            'exportCategory': '6000'}

*For i:*

In [4]:
kwargs_i = {'rMarker': 'Investering i alt, købepriser',
            'cMarkers': ['Investerende brancher', 'Total'],
            'row': 0, 'col': 1, 'rowIndex': 3} # look for identifiers and sectors in these rows/columns

*For k (similar to v):*

In [5]:
kwargs_k = {'rMarker': {'init': {'ref': 'Investerende brancher', 'offset': {}},
                        'end':  {'ref': 'Total', 'offset':{}}},
            'cMarker': {'init': {'ref': 'Typer af durables', 'offset': {}},
                        'end':  {'ref': 'Total', 'offset':{}}}}

*Initialize (note: this takes somewhere close to a second because the excel data has to be processed): Because of the default options, the following two lines initialize the same classes:*

In [6]:
# I = readIO(name = name, file = file, kwargs_v = {'category': category, 'rowMarkers': rowMarkers, 'colMarkers': colMarkers}, kwargs_i = kwargs_i)
I = readIO(name = name, file_v = file_v, file_i = file_i, file_k = file_k) # because of default options in the class, this is an equivalent statement
I.__dict__.keys()

dict_keys(['db', 'wb', 'IO', 'locs', 'blocks', 's'])

*If we want to use default options most of the time, we can adjust the settings in the initialization phase or after:*

In [7]:
I.s['v']['exportCategory'] = '5000' # change export category 
I.s['v']['exportCategory'] = category['exportCategory'] # change it back again

### Process data

In [8]:
I()

<pyDatabases.gpyDB.gpyDB.GpyDB at 0x1e94ceb71c0>

# Squaring IO value, IO investment, and durables data

The data on values may be on a different level than investments/durables data. For the Danish case, for instance, we are able to distinguish between 146 sectors in the IO values data (```IO_v```), but only 69 in the data on investments/durables. Similarly, the data on values distinguishes between 12 categories of investment goods, while investment/durable data only has 7 types. If the three data sources are consistent, this step can be skipped. We proceed as follows:
1. Define a mapping from the 69 to the 146 branch data. This is a so-called one-to-many mapping as it associates one category (69-level) with multiple categories in the other category (146 level). This is essentially a *disaggregation* of data. Doing this, we need some kind of distributional key that determines how the one sector is allocated onto the many. Here, we simply use the relative sizes of the subsectors. Another approach is to use residual income for this; however, as this variable may be negative  for some subsectors, this can give some pretty weird distributions.
2. Define a mapping from the 12 investment goods to the 7. Sum over values and keep the 7-investment good level.

#### Mapping from 69 to 146 sector level

*Read mapping from 69 to 146 sector levels:*

In [9]:
wb_mappings = read.simpleLoad(file_mappings)
m = read.maps(wb_mappings['69to146'])['s69tos146'].vals

*Force it to use strings in the mapping:*

In [10]:
m = m.set_levels([level.astype(str) for level in m.levels])

*Create weights using the size the sectors:*

In [11]:
sectorValue = adjMultiIndex.applyMult( adj.rc_pd(I.db.get('vD').groupby('s').sum()+I.db.get('TotalTax'), I.db.get('s_p')),
                                      m.rename(['sAgg','s']))
weights = sectorValue / (sectorValue.groupby('sAgg').sum())

*Apply to ```vD_inv``` and ```vD_dur``` - the only two variables defined over the smaller 69 index: (NB: Only run this cell once!)*

In [12]:
dataCheck = {'vD_inv': sum(I.db.get('vD_inv')),
             'vD_dur': sum(I.db.get('vD_dur'))}
I.db['vD_inv'] = (I.db.get('vD_inv').rename_axis(index = {'s':'sAgg'}) * weights).droplevel('sAgg')
I.db['vD_dur'] = (I.db.get('vD_dur').rename_axis(index = {'s':'sAgg'}) * weights).droplevel('sAgg')
for k in dataCheck:
    assert abs(dataCheck[k]-sum(I.db.get(k)))<1e-6, f"Disaggregation from 69 to 146 sector changed the sum of {k}"

#### Mapping from 12 to 7 investment goods

*In this case, the aggregation may affect a lot of sets and variables. Thus, we use the more general ```aggregateDB``` class from ```pyDatabases.gpyDB_wheels``` to deal with this. In the case of the Danish data, we can infer the mapping from 12 to 7 investment goods from the names as follows from the following:*

In [13]:
nfull = I.db.get('s_i') # original set, lots of indices
ni = I.db.get('vD_inv').index.levels[0] # new index - fewer, aggregated indices
syntax = ni[ni.str.endswith('x')].str.rstrip('x') 
subset = nfull[nfull.str.startswith(tuple(syntax))]
nfull2ni = {k: k  if not k.startswith(tuple(syntax)) else k[:-1]+'x' for k in nfull} # mapping from full set to more aggregated one

*Apply mapping to all symbols in the database:*

In [14]:
aggregateDB.readSets(I.db) # the aggDB method works through manipulations of sets s,n - this defines them by reading from other symbols in the database.
m = pd.MultiIndex.from_tuples(nfull2ni.items(), names = ['s','sAgg']) # define mapping as multiindex
m = m.union(adj.rc_pd(pd.MultiIndex.from_arrays([I.db.get('s'), I.db.get('s').rename('sAgg')]), ('not', m.droplevel('sAgg'))), sort = False) # all elements that are not in the mapping, fill in as a mapping on the form (x,x).
aggregateDB.aggDB(I.db, m)

<pyDatabases.gpyDB.gpyDB.GpyDB at 0x1e94ceb71c0>