In [1]:
%run StdPackages.ipynb
d['data'] = os.path.join(d['data'], 'IO2018')

No clean-up of work-folder


### 1. Load data:

*Specify raw data:*

In [2]:
name = 'IO2018'
file_v = os.path.join(d['data'], 'IO2018_v.xlsx')
file_i = os.path.join(d['data'], 'IO2018_I.xlsx')
file_k = os.path.join(d['data'], 'IO2018_K.xlsx')
file_p = None # no specific price data
file_mappings = os.path.join(d['data'], 'GR2018_mappings.xlsx')

*Initialize class and process data:*

In [3]:
I = IOfunctions.readIO(name = name, file_v = file_v, file_i = file_i, file_k = file_k) # because of default options in the class, this is an equivalent statement
I()

<pyDatabases.gpyDB.gpyDB.GpyDB at 0x204a2f2a1c0>

### 2. Square value, investment, and durables data

#### 2.1. Align scales for the different datasets

The value and investment data is measured in 1000 DKK - the durable data is on mio. DKK. So, we need to align data (here to mio DKK):

In [4]:
[I.db.__setitem__(k, I.db.get(k)/1000) for k in ('vTax','TotalTax','vD','vC','vC_tax','vD_inv')];

#### 2.2. From 69 to 146 sectors 

Investments and durables data are on 69 sector specification; the value data uses 146 sector. We employ the mapping from ```GR2018_mappings``` to split data to 146 sectors.

In [5]:
wb_mappings = read.simpleLoad(file_mappings)
auxMaps = read.maps(wb_mappings['AuxMaps'])
m = auxMaps['s69tos146'].vals

*Force it to use strings in the mapping:*

In [6]:
m = m.set_levels([level.astype(str) for level in m.levels])

*Create weights using the size the sectors:*

In [7]:
sectorValue = adjMultiIndex.applyMult( adj.rc_pd(I.db.get('vD').groupby('s').sum()+I.db.get('TotalTax'), I.db.get('s_p')),
                                      m.rename(['sAgg','s']))
weights = sectorValue / (sectorValue.groupby('sAgg').sum())

*Apply to ```vD_inv```, ```vD_dur``` and ```vD_depr``` - the only three variables defined over the smaller 69 index: (NB: Only run this cell once!)*

In [8]:
dataCheck = {'vD_inv': sum(I.db.get('vD_inv')),
             'vD_dur': sum(I.db.get('vD_dur')),
             'vD_depr': sum(I.db.get('vD_depr'))}
I.db['vD_inv'] = (I.db.get('vD_inv').rename_axis(index = {'s':'sAgg'}) * weights).droplevel('sAgg')
I.db['vD_dur'] = (I.db.get('vD_dur').rename_axis(index = {'s':'sAgg'}) * weights).droplevel('sAgg')
I.db['vD_depr']= (I.db.get('vD_depr').rename_axis(index = {'s':'sAgg'}) * weights).droplevel('sAgg')
for k in dataCheck:
    assert abs(dataCheck[k]-sum(I.db.get(k)))<1e-6, f"Disaggregation from 69 to 146 sector changed the sum of {k}"

#### 2.3. From 12 to 7 durables

*Use a specific syntax to detect mapping from 12 to 7-level aggregation:*

In [9]:
nfull = I.db.get('s_i') # original set, lots of indices
ni = I.db.get('vD_inv').index.levels[0] # new index - fewer, aggregated indices
syntax = ni[ni.str.endswith('x')].str.rstrip('x') 
subset = nfull[nfull.str.startswith(tuple(syntax))]
nfull2ni = {k: k  if not k.startswith(tuple(syntax)) else k[:-1]+'x' for k in nfull} # mapping from full set to more aggregated one

*Apply mapping to all symbols in the database:*

In [10]:
aggregateDB.readSets(I.db) # the aggDB method works through manipulations of sets s,n - this defines them by reading from other symbols in the database.
m = pd.MultiIndex.from_tuples(nfull2ni.items(), names = ['s','sAgg']) # define mapping as multiindex
m = m.union(adj.rc_pd(pd.MultiIndex.from_arrays([I.db.get('s'), I.db.get('s').rename('sAgg')]), ('not', m.droplevel('sAgg'))), sort = False) # all elements that are not in the mapping, fill in as a mapping on the form (x,x).
aggregateDB.aggDB(I.db, m)

<pyDatabases.gpyDB.gpyDB.GpyDB at 0x16e16da4ac0>

#### 2.4. Clean up "other foreign transactions"

*Map "other foreign transactions" to the standard import categories:*

In [11]:
I.cleanOtherForeignTransactions()

*Here, we remove the ```n_Fother``` entirely from the database as well (not in automated clean up):*

In [12]:
I.db['n'] = adj.rc_pd(I.db.get('n'), ('not', I.db.get('n_Fother')))
del(I.db.series['n_Fother'])

### 3. Reorder sets

In [13]:
[I.db.__setitem__(k,IOfunctions.stdSort(v.vals)) for k,v in I.db.getTypes(['variable','parameter']).items()];

### 4. Add additional data/regulation

Add ```vAssets``` defined over sectors ```s``` and types of investment goods ```a``` - here just the totals for the households and the government sectors:

In [14]:
totalNetWealth = pd.Series([3520405.512, 25287], index = pd.MultiIndex.from_tuples([('HH','total'), ('G','total')], names = ['s','a']))
gpyDB.add_or_merge_vals(I.db, totalNetWealth, 'vAssets')

### 5. Export full database

In [15]:
I.db.export(repo = d['data'])