In [1]:
%run StdPackages.ipynb
d['rawData'] = os.path.join(d['data'],'rawData69') # add
d['processedData'] = os.path.join(d['data'],'processedData') # update to raw data folder
os.chdir(d['py'])
from loadIO import *
import RAS

No clean-up of work-folder


# Create Stylized Model Data

The final data used for a CGE model varies based on specific framework. This shows a stylized version that we use in various UCPH projects. Specifically, we assume the following:
* All equilibrium prices are normalized at 1.
* Government consumption is aggregated into one type.

In [2]:
IO = os.path.join(d['processedData'], 'IO68_2dur')
name = 'stylizedCGEData'

This notebook shows by example how to aggregate IO data using a few simple mappings.

*Load IO:*

In [3]:
db = GpyDB(IO, name = name)
ws = db.ws # work from this workspace as the main one

We only model total government consumption and not split it into the ```gc``` set that is included in the full data. We already have the total in the ```vD``` variable, so here we simply remove some components that we do not need:

In [4]:
[db.series.database.pop(k) for k in ('gc','vC','vC_tax')];

Next, we remove all zero elements in variables:

In [5]:
[db.__setitem__(k, db(k)[db(k)!=0]) for k in db.getTypes(['var'])];

Translate depreciation of durables to rates, distinguish between investments and durables (flow, stock) with investment good syntax ```I_x``` for durable ```x```. Define mapping ```dur2inv``` and subsets ```dur_p, inv_p```. Add investments and value of durables to the vector ```vD```:

In [6]:
db['rDepr'] = db('vD_depr')/db('vD_dur')
db['dur_p'] = db('vD_dur').index.levels[db['vD_dur'].domains.index('n')]
db['inv_p'] = db('dur_p').map(lambda x: f'I_{x}')
db['dur2inv'] = pd.MultiIndex.from_arrays([db('dur_p'), db('inv_p').rename('nn')])
db('vD_inv').index = db('vD_inv').index.set_levels(db('vD_inv').index.levels[db['vD_inv'].domains.index('n')].map(lambda x: f'I_{x}'), level = 'n')
db['vD'] = db('vD_inv').combine_first(db('vD')).combine_first(db('vD_dur'))

Define subset of values to fix (at zero). Here, if the value is less than the absolute threshold value *and* less than relative value of the given sector:

### 1. RAS adjustment

The RAS adjustment removes negative and (potentially) small values in the IO data. We remove small values to get a more sparse demand system (computational reasons) and negative values because standard CGE formulations require positive values. So, we can in principle skip this part, if we have sufficient computing power and a sufficiently flexible mathematical framework to handle negative values.

We have three types of implementation here:
* The simple RAS adjustment (```simpleRAS```): This performs matrix adjustments iteratively to make sure that the sum across rows (index $s$) and columns (index $n$) remain the same after we have imposed zero values in the IO matrix.
* Minimize distortions (```absRAS```): This minimizes the sum of squared differences from new data values compared to original ones under the constraint that row sums (index $s$) and column sums (index $n$) remain the same. This is a qudratic programming (QP) problem that may be solved more efficiently with other solvers (e.g. KNITRO or CPLEX) than we have available here (CONOPT, which we prefer for CNS/NLP type models).
* Minimize relative distortions (```shareRAS```): This minimizes the sum of squared differences from new data input *shares* instead. The ```shareRAS.pdf``` explains this in a bit more detail. This is also a QP problem, so it will probably be better to implement with CPLEX, but it works well even with CONOPT in a bit more aggregated data samples.

#### ```simpleRAS```:

Define threshold (fix at zero if value is lower than this), the initial data that we want to adjust (```v0```), if there are any rows/columns that we do not *need* to balance, and the matrix with values fixed at zero (```vBar```):

In [7]:
threshold = 1/100 # everything less than 1 million 
v0 = adj.rc_pd(db('vD'), ('and', [('or', [db('n_p'), db('n_F')]),
                                  ('or', [db('s_p'), db('s_i')])]))
leaveCols = db('n_F') # are there any type of goods that we do not need to balance
leaveRows = None # are there any type of sectors that we do not need to balance
vBar = v0[v0<threshold] * 0

This runs the simple RAS algorithm (it takes a bit of time and may require a lot of iterations to exit):

In [None]:
vD = RAS.simpleRAS(v0, vBar, leaveCols = leaveCols, leaveRows = leaveRows, tol = 1e-6, iterMax = 250)

#### ```absRAS```:

In [None]:
# ras = RAS.absRAS(v0, vBar, ws = ws, leaveCols = leaveCols)
# ras.db['adjTerm'] = adj.rc_pd(vD, ras.db('active'))/ras.db('vD0')-1
# ras.db.mergeInternal() # write gdx file based on python symbols
# job = ws.add_job_from_string(ras())
# job.run(databases = ras.db.database)
# vD = GpyDB(job.out_db)('vD') # get solution

#### ```shareRAS```:

In [None]:
# threshold = 1 # only remove negative values and less than values less than 100,000 
# rasSettings = IOfunctions.standardCleanSettings(dbi, threshold) #
# sols = {}
# for k,v in rasSettings.items():
#     ras = shareRAS(v['v0'], v['vBar'], ws = ws, **v['kwargs'])
#     job = ws.add_job_from_string(ras())
#     ras.db.mergeInternal()
#     job.run(databases = ras.db.database)
#     sols[k] = GpyDB(job.out_db) # store databases 

*Merge things back up again:*

In [None]:
vD_full = vD.combine_first(db('vD'))
vD_full = vD_full[vD_full!=0] # remove zero values again

*Remove residual income category (we don't currently use this in the model, this will enter the return on durables instead):*

In [None]:
db['vD'] = adj.rc_pd(vD_full, ('not', pd.Index(['resIncome'], name = 'n')))

### 2. Create other variables

Next, we create other model variables: Value of supply - and then distinguish between prices and quantities (we assume that all prices = 1 in baseline year at least):
* The value of supply ```vS[t,s,n]``` is defined for domestic production and investment sectors by summing over demand by summing over demand from all other sectors. The default options is here that $s = n$ as each sector only produces a single output. 
* The equilibrium price ```p[t,n]``` is set to $1$ for each good $n$. This is defined for all domestic and foreign goods.
* For the durable types, we set the quantity $qD[t,s,n] = vD[t,s,n]$, then we add the price vector ```pD_dur[t,s,n]``` from the static user cost term:
$$\begin{align}
    \text{staticUserCost}_K = p_I \left(\dfrac{R}{1+\pi}+\delta_k-1\right),
\end{align}$$
    where $p_I$ is the price on the investment good that corresponds to durable $K$, $R$ is the real interest rate factor, $\pi$ is the inflation rate, and $\delta_k$ is the depreciation rate.
* Finally, we set $qD[t,s,n] = vD[t,s,n] / p[t,n]$ for non-durables.

In [None]:
db['R_LR'] = 1.03
db['infl_LR'] = 0.02
model_vS(db)
model_p(db)
model_durables(db, db('R_LR'), db('infl_LR'))
model_quantNonDurables(db) 

### 3. Create other subsets and mappings

Subsets of goods/sectors:

In [None]:
db['nEqui'] = db('vS').index.droplevel('s').unique() # what goods require an equilibrium condition
db['d_qS'] = db['vS'].index 
db['d_qD'] = adj.rc_pd(db('vD'), db('nEqui')).index 
db['d_qSEqui'] = adj.rc_pd(db['d_qS'].vals, ('not', db('s_HH'))) # Subset of qS values to be endogenized in general equilibrium
db['d_pEqui'] = pd.Index(['L'], name ='n') # Subset of prices to be endogenized in general equilibrium 

#### 3.1. Trade mappings

Define the mappings:
* ```dom2for[n,nn]```: Mapping from domestic to the equivalent foreign goods (with syntax ```x,x_F```).
* ```dExport[t,s,n]```: Foreign sectors' demand for domestic goods.
* ```dImport[t,s,n,nn]```: sector, domestic good, foreign good combinations in data - i.e. where a sector demands both domestic and foreign type of product.
* ```dImport_dom[t,s,n]```: sector, domestic good combination (s,n) where the sector only demands the domestic and not the corresponding foreign good.
* ```dImport_for[t,s,n]```: sector, foreign good combinations (s,n) where the sector only demand the foreign and not the corresponding domestic good.

In [None]:
db['dom2for'] = pd.MultiIndex.from_arrays([db('n_p').sort_values(), db('n_F').sort_values().rename('nn')])
db['dExport'] = adj.rc_pd(db('vD'), db('s_f')).index
vD_dom = stdSort(adjMultiIndex.applyMult(adj.rc_pd(db('vD'), db('n_p')), db('dom2for')))
vD_for = adj.rc_pd(db('vD'), db('n_F')).rename_axis(index= {'n':'nn'})
db['dImport'] = stdSort(adj.rc_pd(vD_dom, vD_for)).index
db['dImport_dom'] = adj.rc_pd(vD_dom, ('not', vD_for)).droplevel('nn').index
db['dImport_for'] = adj.rc_pd(vD_for, ('not', db('dImport'))).rename_axis(index = {'nn':'n'}).index

### Export

In [None]:
AggDB.readSets(db) # define sets from variables/parameters defined throughout
db.export()