In [1]:
import pandas as pd, numpy as np, pyDbs
from pyDbs import SymMaps as SM, adj, adjMultiIndex
from scipy import stats
rng = np.random.default_rng(seed = 105)
stats.truncnorm.random_state = rng

# SymMaps

The class is used to navigate problems that take a vector of inputs $x$ and deals with many different variables potentially defined over different indices. 

## Example

As an example, we'll use the optimization problem defined by:
$$\begin{align}
    \max &\sum_{i, t_0\leq t\leq T} \beta_i^{t-t_0} \omega_i \ln\left(c_{i,t}\right) \tag{1} \\
    m_{i,t} &= R_t m_{i,t-1}-c_{i,t} + y_{i,t}, && \forall  i, t_0 \leq t\leq T \tag{2}\\
    m_{i,T} &= m_{i,T-1}, && \forall i, \tag{3} \\ 
\end{align}$$
given $m_{i,t_0-1}, y_{i,t}, R_t, \beta_i, \omega_i$.

As an example, let us consider the simple case with three types of agents $i \in \lbrace L, M, H\rbrace $ and 5 time periods $t = 2010, 2011, 2012,..., 2014$, and let us collect all relevant objects in a database:

In [2]:
db = pyDbs.SimpleDB() # datatbase
# Sets:
db['i'] = pd.Index(['L','M','H'], name  = 'i')
t0, T = 2010, 2014
db['t'] = pd.Index(range(t0, T+1), name = 't')
# Parameters:
db['β'] = pd.Series(sorted(stats.truncnorm.rvs(0, 1, size = len(db['i']))), index = db['i'])
db['ω'] = pd.Series(1/len(db['i']), index = db['i'])
db['y'] = pd.Series(1, index = pd.MultiIndex.from_product([db['i'], db['t']]))
db['R'] = pd.Series(1/db['β'].mean()-0.5+stats.truncnorm.rvs(0,1, size = len(db['t'])), index = db['t'])
db['m0']= pd.Series((10*db['β']**2).values, index = pd.MultiIndex.from_product([db['i'], db['t'][0:1]]))
db['weights'] = adjMultiIndex.bc(db['β'], db['t']).pow(pd.Series(db['t']-t0, index = db['t'])) * db['ω'] # define weighting in welfare function

### Define core symbols

The model is solved by setting up a single vector $x$ that is the stacked representation of vectors **$c, m$** that are defined for all $t,i$. We initialize a SymMaps object with these two symbols (either as ```pd.Series``` or ```pd.Index```). The ```self.compile``` method defines an index used to navigate the full vector $x$ using a global linear index:

In [3]:
s = SM(symbols = {'c': pd.MultiIndex.from_product([db['i'], db['t']]), 
                   'm': pd.MultiIndex.from_product([db['i'], db['t']])}
       )
s.compile()

To test different methods, we define a stacked vector $x$ as a numpy array:

In [4]:
x = rng.random(size = s.len)

The ```self.compile``` establishes a dictionary of mappings (stored at ```self.maps```) that, for each symbol ($c,m$ in our instance) contains a mapping from the relevant ```pd.Index``` to the numerical indices in the stacked $x$. For the $m$ variable, for instance, this mapping looks as follows:

In [5]:
s['m']

i  t   
L  2010    15
   2011    16
   2012    17
   2013    18
   2014    19
M  2010    20
   2011    21
   2012    22
   2013    23
   2014    24
H  2010    25
   2011    26
   2012    27
   2013    28
   2014    29
dtype: int64

Now, we can use the custom ```__call__``` method (with syntax ```self(k)```) to get the relevant part of the vector $x$:

In [50]:
s(x, 'm') # get the values in vector x that we refer to as the symbol 'm'

array([0.380102  , 0.35163855, 0.21946259, 0.72390011, 0.37699435,
       0.37691073, 0.63893963, 0.34841258, 0.67772011, 0.61233288,
       0.50691337, 0.64130159, 0.63316618, 0.33799887, 0.65869185])

If we prefer to return the object as a ```pd.Series```, we can instead use the ```get``` method:

In [59]:
s.get(x, 'm')

i  t   
L  2010    0.380102
   2011    0.351639
   2012    0.219463
   2013    0.723900
   2014    0.376994
M  2010    0.376911
   2011    0.638940
   2012    0.348413
   2013    0.677720
   2014    0.612333
H  2010    0.506913
   2011    0.641302
   2012    0.633166
   2013    0.337999
   2014    0.658692
Name: m, dtype: float64

Finally, the ```getr``` method performs the "getting" more robustly; we can for instance pass kwargs used to subset the symbol relying on the ```adj.rc_pd``` method. This gets the $m$ vector, but only keeps the "L" and "M" types:

In [1]:
s.getr(x, 'm', c = pd.Index(['L','M'], name = 'i')) # the 'c' keyword will be parsed to the method adj.rc_pd

NameError: name 's' is not defined

### Adjusted symbols

In some instances, we need some adjusted version of the symbols $m,c$ that $x$ consists of. For instance, in the example in equations (1)-(3), we need to get the lagged version $m_{i,t-1}$.

* Adjusted symbols are defined using mappings from the original main index to a new adjusted one. The ```pd.Index``` to ```pd.Index``` mapping is stored at ```self.auxMapsIdx```.
* We include default methods for creating lagged, rolled, or shifted indices, and to fill in assumptions of missing values, steady state and similar.
* The compilation stage creates fixed maps to the global linear index and commits this to the ```self.auxMaps``` attribute.
* For dynamically compiled objects, we can always use the ```getr``` method directly with relevant mappings or lags/rolls/shifts in the indices.

#### Roll/shift/lag index

The three methods roll/shift/lag has slighly different uses:
1. Lag: Numerical index levels can be lagged, i.e. the operation is transformed directly on the elements; this can result in new index elements. 
2. Roll: As the name suggests, this uses already defined index levels and rolls them. That is, this relies on the ordering of the main symbol. 
3. Shift: Functions as the Roll, but keeps a break between the first and last element in an index. For example, rolling the index $\lbrace 1,2,3\rbrace$ one element yields $\lbrace 3,1,2 \rbrace$, whereas shifting it leaves $\lbrace NaN, 1,2 \rbrace$ or some other value that we use to fill in for ```NaN```.

##### 1. Lag index:

In [9]:
lags = {'t': 1} # lag 't' index with 1 
symbol = 'm'
name = 'm[t-1]'
kwargs = {'c': ('not', pd.Index([t0], name = 't'))} # 'c' is a condition on the index. Here, don't use initial year t0

Add this as an adjusted symbol that we can access with syntax ```'m[t-1]'```. Note that lagging the index, we end up with values of $t$ that is not defined (in our case $t=2009$) - hence the condition not to include $t_0$:

In [10]:
s.addLaggedSym(name, symbol, lags, **kwargs)

We add this symbol by (1) adding a mapping to ```self.auxMapsIdx``` from the original to new index, (2) using this mapping on the "full" symbol already stored in ```self.maps```: 

In [11]:
m = s.lagMaps(adj.rc_pd(s[symbol], **kwargs), lags) # step 1: obtain new mapping - this is added to self.auxMapsIdx
newMap = s.applyMapGlobalIdx(s[symbol], m) # step 2: use mapping to map to global linear index - this is added to self.auxMaps

Note that, instead of adding the crtieria to exclude the first period $t_0$ from the lagged symbol, we can choose to "dropna" automatically. This eliminates any mapped index that is not included in the "global" pendant:

In [12]:
s.addLaggedSym(name, symbol, lags, dropna = True)

This is a slower version than explicitly dropping the unavailable index levels, because this does a check of the entire index of the symbol against the "global" counterpart.

**2. Roll index:**

Roll the yearly index with 1 in the order that the index is sorted. Generally, the method returns e.g. $m[t-1]$ when the index $t$ is a range index. Rolling the index adopts the convention that for the first element $t_0$, the rolling maps to the last element $T$. That is, rolling the index by 1 uses the mapping $\text{roll}(t_0, 1)\mapsto T$. 

In [13]:
rolls = {'t': 1} # roll 't' index with 1
symbol = 'm'
name = 'mRoll'
s.addRolledSym(name, symbol, rolls)
pd.concat([s['m'].rename('m'), s['mRoll'].rename('rolled')], axis = 1) # compare original to the rolled symbol

Unnamed: 0_level_0,Unnamed: 1_level_0,m,rolled
i,t,Unnamed: 2_level_1,Unnamed: 3_level_1
L,2010,15,19
L,2011,16,15
L,2012,17,16
L,2013,18,17
L,2014,19,18
M,2010,20,24
M,2011,21,20
M,2012,22,21
M,2013,23,22
M,2014,24,23


**3. Shift index:**

Shifting the index by 1 is similar to rolling the index, i.e. it returns $x[t-1]$ with a range index. However, whereas rolling the index around the first element by convention mapped to the end ($\text{roll}(t_0,1)\mapsto T$), shifting the index allows us to handle this break in different ways. For instance, adding the ```kwargs = {'useLoc': 'nn'}``` options means that we use the "nearest neighbor" assumption (or steady state). That is, the convention is that $\text{shift}(t_0, 1) \mapsto t_0$:

In [14]:
shifts = {'t': 1}
symbol = 'm'
name = 'mShifted'
opt = {'useLoc': 'nn'} # nearest-neighbor loc
s.addShiftedSym(name, symbol, shifts, opt = opt)
pd.concat([s['m'].rename('m'), s['mShifted'].rename('Shifted')], axis = 1) # compare original to the rolled symbol

Unnamed: 0_level_0,Unnamed: 1_level_0,m,Shifted
i,t,Unnamed: 2_level_1,Unnamed: 3_level_1
L,2010,15,15
L,2011,16,15
L,2012,17,16
L,2013,18,17
L,2014,19,18
M,2010,20,20
M,2011,21,20
M,2012,22,21
M,2013,23,22
M,2014,24,23


## Example - revisited

Now, let us create the relevant symbols to use throughout the example. Now, in the first equation, we need to sum over (t) and match operations on the $i$-index. We can match $i$ index by either setting up a matrix with corresponding dimensions, or lazily use the pandas version:

In [15]:
def fOpt(x):
    return sum(s.get(x,'c').apply(np.log) * db['weights'])

Next, let us define the law of motion *gap* for all $i,t$:

In [16]:
def LOM(x):
    return (db['R'] * pd.concat([s.get(x,'m[t-1]'), db['m0']], axis = 0) - s.get(x,'c')+ db['y'] - s.get(x,'m')).values

Finally, we can define the transversality condition either directly or by adding the specific elements $m[i,T]$ and $m[i,T-1]$ to the compiler as adjusted symbols. This last version means that the function call does not need to identify the right part of the vector $x$ every time it is evaluated:

In [17]:
s.auxMaps['m[T]'] = adj.rc_pd(s['m'], db['t'][-1:])
s.auxMaps['m[T-1]'] = adj.rc_pd(s['m'], db['t'][-2:-1])
def TVC(x):
    return s(x,'m[T]')-s(x,'m[T-1]')

## Calling adjusted symbols on the fly (without compilation)

Now, assume that we have solved the model and the vector $x$ is the solution vector. We can unload the solution to a dictionary of ```pandas``` defined series with the call:

In [18]:
sol = s.unloadSol(x)

This dict also includes the adjusted symbols that we have added along the way. But, let us now assume that all we want is the adjusted symbol $m[t+1]$ with the steady state assumption. We could include this in the compilation stage as a symbol and then automatically have it unloaded with the method ```unloadSol```, but we can also call it *ad hoc* from the solution vector $x$:

In [19]:
# Options used to create symbol:
name = 'm[t+1]'
symbol = 'm'
shifts = {'t': -1}
opt = {'useLoc': 'nn'}
s.getShiftFromSol(x, symbol, shifts, opt = opt) # get the solution directly from $x$

i  t   
L  2010    0.351639
   2011    0.219463
   2012    0.723900
   2013    0.376994
   2014    0.376994
M  2010    0.638940
   2011    0.348413
   2012    0.677720
   2013    0.612333
   2014    0.612333
H  2010    0.641302
   2011    0.633166
   2012    0.337999
   2013    0.658692
   2014    0.658692
dtype: float64

We can also get the symbol directly from the "non-shifted" version of the symbol:

In [20]:
s.getShift(sol['m'], shifts, opt = opt)

i  t   
L  2010    0.351639
   2011    0.219463
   2012    0.723900
   2013    0.376994
   2014    0.376994
M  2010    0.638940
   2011    0.348413
   2012    0.677720
   2013    0.612333
   2014    0.612333
H  2010    0.641302
   2011    0.633166
   2012    0.337999
   2013    0.658692
   2014    0.658692
dtype: float64

Finally, we can use lags/shifts/rolls of the indices of symbols that are not included in the symbols using the ```getShift``` method. For instance, the interest rate:

In [21]:
x = db['R']
shifts = -1
opt = {'useLoc': 'nn'}
s.getShift(x, shifts, opt = opt)

t
2010    1.667108
2011    1.498320
2012    2.168236
2013    1.801419
2014    1.801419
dtype: float64

## Adding a scalar

In [22]:
sTest = SM(symbols = s.symbols.copy())
sTest.symbols['scalar'] = None # add a scalar symbol.
sTest.compile()

Test vector:

In [23]:
x = rng.random(size = sTest.len)

The ```get``` method still returns a pandas series (always does so):

In [24]:
sTest.get(x, 'scalar')

0    0.274523
Name: scalar, dtype: float64

But the more robust get method, ```getr``` returns it as a scalar:

In [25]:
sTest.getr(x,'scalar')

0.27452286136122217