In [1]:
import numpy as np, scipy, pandas as pd
from scipy import stats
rng = np.random.default_rng(seed = 105)
stats.truncnorm.random_state = rng
from symMaps.__init__ import *
_numtypes = (int,float,np.generic)

# Documentation for `SimpleSys` and use of helper classes `Lag`, `Lead`, `Roll`.

The class is used to navigate problems that take a vector of inputs and deals with many different variables potentially defined over different indices. Below, we also show examples using the helper classes `Lag`, `Lead`, `Roll`.

## 1. Example

As an example, we'll use the optimization problem defined by:
$$\begin{align}
    \max &\sum_{i, t_0\leq t\leq T} \beta_i^{t-t_0} \omega_i \ln\left(c_{i,t}\right) \tag{1} \\
    m_{i,t} &= R_t m_{i,t-1}-c_{i,t} + y_{i,t}, && \forall  i, t_0 \leq t\leq T \tag{2}\\
    m_{i,T} &= m_{i,T-1}, && \forall i, \tag{3} \\ 
\end{align}$$
given $m_{i,t_0-1}, y_{i,t}, R_t, \beta_i, \omega_i$.

As an example, let us consider the simple case with three types of agents $i \in \lbrace L, M, H\rbrace $ and 5 time periods $t = 2010, 2011, 2012,..., 2014$, and let us collect all relevant objects in a database.

**Code:**

The ```SimpleSys``` class is initialized with a database (`pyDbs.SimpleDB`)

In [2]:
system = SimpleSys()

Add data:

In [3]:
db = system.db
db['i'] = pd.Index(['L','M','H'], name  = 'i')
t0, T = 2010, 2014
db['t'] = pd.Index(range(t0, T+1), name = 't')
# Parameters:
db['β'] = pd.Series(sorted(stats.truncnorm.rvs(0, 1, size = len(db('i')))), index = db('i'))
db['ω'] = pd.Series(1/len(db('i')), index = db('i'))
db['y'] = pd.Series(1, index = pd.MultiIndex.from_product([db('i'), db('t')]))
db['R'] = pd.Series(1/db('β').mean()-0.5+stats.truncnorm.rvs(0,1, size = len(db['t'])), index = db('t'))
db['m0'] = pd.Series((10*db('β')**2).values, index = pd.MultiIndex.from_product([db('i'), db('t')[0:1]]))
db['weights'] = adjMultiIndex.bc(db('β'), db('t')).pow(pd.Series(db('t')-t0, index = db('t'))) * db('ω') # define weighting in welfare function

## 2. Define key variables and compile

The endogenous variables in the system are specified using a dictionary that ties the variable name to the relevant domains. In some instances (most nonlinear programming applications), it makes sense to provide not only domains, but also starting values for the optimization problem. We start by adding some naive values to the database:

In [4]:
db['m'] = pd.Series(.5, index = pd.MultiIndex.from_product([db('t'), db('i')]))
db['c'] = pd.Series(1.5, index = pd.MultiIndex.from_product([db('t'), db('i')]))

Next, we add these to the ```self.v``` that contains variables required to stack:

In [5]:
system.v = {k: db[k].index for k in ('m','c')}

*Note: We add scalars by mapping from symbol name to None.*

The compilation stage is now straightforward: We stack indices for each symbol to a global integer index by calling

In [6]:
system.compile()

The ```self.maps``` attribute now contains a mapping from the ```pd.Index``` to the global index.

### 2.A. Auxiliary methods

We can now get an initial guess of the stacked vector $x_0$. This call uses data from ```self.db``` if available, and otherwise adds a vector with values specified by kwarg ```fill_value = 0```: 

In [7]:
x0 = system.x0() # get vector using database

We can also get vectors for lower and upper bounds adding:

In [8]:
system.x0(attr = 'lo')

array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan])

Given a stacked vector, $x$, we can also split the solution into symbols again using ```__call__```:

In [9]:
system(x0, 'm') # extract m part of the stacked vector

array([0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
       0.5, 0.5])

We can also use the ```get``` method to return the symbol as pandas series:

In [10]:
system.get(x0, 'm')

t     i
2010  L    0.5
      M    0.5
      H    0.5
2011  L    0.5
      M    0.5
      H    0.5
2012  L    0.5
      M    0.5
      H    0.5
2013  L    0.5
      M    0.5
      H    0.5
2014  L    0.5
      M    0.5
      H    0.5
Name: m, dtype: float64

*Note: If the symbol is a scalar, this returns a scalar instead of a pandas series.*

The ```self.unloadSol(x)``` method returns dictionary of symbols from the stacked vector:

In [11]:
system.unloadSol(x0)

{'m': t     i
 2010  L    0.5
       M    0.5
       H    0.5
 2011  L    0.5
       M    0.5
       H    0.5
 2012  L    0.5
       M    0.5
       H    0.5
 2013  L    0.5
       M    0.5
       H    0.5
 2014  L    0.5
       M    0.5
       H    0.5
 Name: m, dtype: float64,
 'c': t     i
 2010  L    1.5
       M    1.5
       H    1.5
 2011  L    1.5
       M    1.5
       H    1.5
 2012  L    1.5
       M    1.5
       H    1.5
 2013  L    1.5
       M    1.5
       H    1.5
 2014  L    1.5
       M    1.5
       H    1.5
 Name: c, dtype: float64}

## 3. Adjusted symbols

A common case with symbols defined over indices is the need for a lagged or shifted index. For instance, in equations (1)-(3) we use the lagged version $m_{i,t-1}$. We may also need these adjustments for parameters that are not ultimately a part of the vector $x$. The following first provides some different ways to create shifted/lagged/rolled indices and set up new vectors based on this.

### 3.A. Lags and leads using pandas indices

*To do: Create mapping to global linear index and define auxiliary symbols from the stacked vector $x$.* 

A related operation to the "lagging", is to shift or roll an index a certain number of elements. The lagging requires the index to be numerical and the operation is done directly on set elements. Rolling uses the defined index and rolls them. That is, this relies on the ordering of the main symbol. Shifts are similar to rolls, but it keeps a break between first and last element in an index. For example, rolling the index $\lbrace 1,2,3\rbrace$ one element yields $\lbrace 3,1,2 \rbrace$, whereas shifting it leaves $\lbrace NaN, 1,2 \rbrace$ or some other value that we use to fill in for ```NaN```. {1,2,3}

**Example: Symbols defined over 1d index**

In [12]:
v = pd.Series(np.linspace(1, 2, 4), index = pd.Index(range(4), name = 't'), name = 'v') # 1d variable defined over numerical index

Roll one level:

In [13]:
roll = 1
v.index.values

array([0, 1, 2, 3])

In [14]:
np.roll(v.index, roll)

array([3, 0, 1, 2])

Assume we are interested in the symbol $v[t-lag]$, where $lag$ is an integer. The simple first step is to introduce the shift in the index, e.g.:

In [15]:
lag = 1
vl = pd.Series(v.values, index = v.index+lag, name = f'v[t-{lag}]')
pd.concat([v, vl],axis = 1).sort_index()

Unnamed: 0_level_0,v,v[t-1]
t,Unnamed: 1_level_1,Unnamed: 2_level_1
0,1.0,
1,1.333333,1.0
2,1.666667,1.333333
3,2.0,1.666667
4,,2.0


The interpretation of the symbol $v_{t-1}(t)$ is that when the index $t$ evaluates to $1$, then $v_{t-1}(1) = v(0)$. As we can see above, this lag poses a question: The original index $t$ runs from $0-3$, but how do we treat the endpoints of the system? The ```adjMI``` includes a couple of options to choose from here:
* Option 1: fkeep = False. Only uses the part of the index where $t-lag$ maps to an element in original index. In our instance, we lose the $t=4$ observation. 
* Option 2 bfill = 'exo'. Options $\in$ {False, 'SS', 'Exo'}.
    * If bfill == False: Drop nan values.
    * If bfill == 'ss': Impose steady state for values.
    * IF bfill == 'exo': Add values exogenously.

This is done using the Lag class:

In [16]:
Lag.series(v, lag, fkeep = False, bfill = 'exo', exo = 0) # default options - drops the value with t=4 and adds the new value for t = 0 as 0

t
0    0.000000
1    1.000000
2    1.333333
3    1.666667
dtype: float64

In [17]:
Lag.series(v, lag, fkeep = True) # If we do not drop t = 4 

t
0    0.000000
1    1.000000
2    1.333333
3    1.666667
4    2.000000
dtype: float64

In [18]:
Lag.series(v, lag, fkeep = False, bfill = False) # Drop outside values (t=0)

t
1    1.000000
2    1.333333
3    1.666667
dtype: float64

In [19]:
Lag.series(v, lag, fkeep = False, bfill = 'ss') # use steady state assumption for missing data (t=0)

t
0    1.000000
1    1.000000
2    1.333333
3    1.666667
dtype: float64

**Example: Symbols defined over nd index**

In [20]:
v = pd.Series(np.linspace(1, 2, 16), index = pd.MultiIndex.from_product([pd.Index(range(4), name = 't'), pd.Index(np.arange(10,10+5*4, 5), name = 'i')]), name = 'v')

Similar procedure:

In [21]:
lag = 1
level = 't'
# lag = {'t': 1} # equivalent
Lag.series(v, lag, level = level, fkeep = False, bfill = 'exo', exo = 0) # default options - drops the value with t=4 and adds the new value for t = 0 as 0

t  i 
0  10    0.000000
   15    0.000000
   20    0.000000
   25    0.000000
1  10    1.000000
   15    1.066667
   20    1.133333
   25    1.200000
2  10    1.266667
   15    1.333333
   20    1.400000
   25    1.466667
3  10    1.533333
   15    1.600000
   20    1.666667
   25    1.733333
dtype: float64

In [22]:
Lag.series(v, lag, level = level, fkeep = True) # If we do not drop t = 4 

t  i 
0  10    0.000000
   15    0.000000
   20    0.000000
   25    0.000000
1  10    1.000000
   15    1.066667
   20    1.133333
   25    1.200000
2  10    1.266667
   15    1.333333
   20    1.400000
   25    1.466667
3  10    1.533333
   15    1.600000
   20    1.666667
   25    1.733333
4  10    1.800000
   15    1.866667
   20    1.933333
   25    2.000000
dtype: float64

In [23]:
Lag.series(v, lag, level = level, fkeep = False, bfill = False) # Drop outside values (t=0)

t  i 
1  10    1.000000
   15    1.066667
   20    1.133333
   25    1.200000
2  10    1.266667
   15    1.333333
   20    1.400000
   25    1.466667
3  10    1.533333
   15    1.600000
   20    1.666667
   25    1.733333
dtype: float64

In [24]:
Lag.series(v, lag, level = level, fkeep = False, bfill = 'ss') # use steady state assumption for missing data (t=0)

t  i 
0  10    1.000000
   15    1.066667
   20    1.133333
   25    1.200000
1  10    1.000000
   15    1.066667
   20    1.133333
   25    1.200000
2  10    1.266667
   15    1.333333
   20    1.400000
   25    1.466667
3  10    1.533333
   15    1.600000
   20    1.666667
   25    1.733333
dtype: float64

**Example: Symbols defined over nd index - adjust multiple indices**

In [25]:
lag = {'t': 1, 'i': 5}
Lag.series(v, lag, fkeep = False, bfill = 'exo', exo = 0) # default options - drops the value with t=4, i = 30 and adds the new value for t = 0 as 0

t  i 
0  10    0.000000
   15    0.000000
   20    0.000000
   25    0.000000
1  10    0.000000
   15    1.000000
   20    1.066667
   25    1.133333
2  10    0.000000
   15    1.266667
   20    1.333333
   25    1.400000
3  10    0.000000
   15    1.533333
   20    1.600000
   25    1.666667
dtype: float64

In [26]:
Lag.series(v, lag, fkeep = True) # If we do not drop t = 4, i = 30

t  i 
0  10    0.000000
   15    0.000000
   20    0.000000
   25    0.000000
1  10    0.000000
   15    1.000000
   20    1.066667
   25    1.133333
   30    1.200000
2  10    0.000000
   15    1.266667
   20    1.333333
   25    1.400000
   30    1.466667
3  10    0.000000
   15    1.533333
   20    1.600000
   25    1.666667
   30    1.733333
4  15    1.800000
   20    1.866667
   25    1.933333
   30    2.000000
dtype: float64

In [27]:
Lag.series(v, lag, fkeep = False, bfill = False) # Drop outside values (t=0, i = 10)

t  i 
1  15    1.000000
   20    1.066667
   25    1.133333
2  15    1.266667
   20    1.333333
   25    1.400000
3  15    1.533333
   20    1.600000
   25    1.666667
dtype: float64

In [28]:
Lag.series(v, lag, fkeep = False, bfill = 'ss') # use steady state assumption for missing data (t=0)

t  i 
0  10    1.000000
   15    1.066667
   20    1.133333
   25    1.200000
1  10    1.266667
   15    1.000000
   20    1.066667
   25    1.133333
2  10    1.533333
   15    1.266667
   20    1.333333
   25    1.400000
3  10    1.800000
   15    1.533333
   20    1.600000
   25    1.666667
dtype: float64

### 3.B. Lags and leads in the global index

*To do: Create mapping to global linear index and define auxiliary symbols from the stacked vector $x$.* 

### 3.C. Rolling/shifting

A related operation to the "lagging", is to shift or roll an index a certain number of elements. The lagging requires the index to be numerical and the operation is done directly on set elements. Rolling uses the defined index and rolls them. That is, this relies on the ordering of the main symbol. Shifts are similar to rolls, but it keeps a break between first and last element in an index. For example, rolling the index $\lbrace 1,2,3\rbrace$ one element yields $\lbrace 3,1,2 \rbrace$, whereas shifting it leaves $\lbrace NaN, 1,2 \rbrace$ or some other value that we use to fill in for ```NaN```. 

Test rolling (for circular indices):

In [25]:
Roll.series(v, -1, level = 't')

t  i 
1  10    1.000000
   15    1.066667
   20    1.133333
   25    1.200000
2  10    1.266667
   15    1.333333
   20    1.400000
   25    1.466667
3  10    1.533333
   15    1.600000
   20    1.666667
   25    1.733333
0  10    1.800000
   15    1.866667
   20    1.933333
   25    2.000000
dtype: float64

Shifting: Akin to lag operations, but on categorical indices.

In [26]:
shift = 1
x = np.roll(v.index.levels[0],shift)