In [1]:
import numpy as np, scipy, pandas as pd
from pyDbs.__init__ import *

# Documentation for classes in `pyDbs.simpleDB`

In [2]:
pd.set_option('display.max_rows', 5)

This contains two related classes: The `GpyDict` class and the `SimpleDB` class. They are both simple (keyword, value)-databases organized with keys = string and value = `Gpy` symbols (see `docs_Gpy.ipynb`). We go through each of them here.

*Define some pandas series and indices that we can add to the databases:*

In [3]:
idx1 = pd.Index(range(10), name = 'a')
idx2 = pd.Index(range(11,15), name = 'b')
mIdx1 = pd.MultiIndex.from_product([idx1,idx2])
mIdx2 = pd.MultiIndex.from_arrays([idx1[0:4], idx2[0:4]])
s1 = pd.Series(range(len(idx1)), index = idx1, name = 's1')
s2 = pd.Series(range(len(idx2[2:])), index = idx2[2:], name = 's2')
s3 = pd.Series(range(len(mIdx1[0:20])), index = mIdx1[0:20], name = 's3')
s4 = pd.Series(range(len(mIdx2)), index = mIdx2, name = 's4')

## 1. ```pyDbs.SimpleDB```

The `pyDbs.SimpleDB` class organizes `Gpy` symbols in a (keyword, value) database with a number of added methods and properties included.

#### A. Main attributes

The database is organized with three attributes - all can be specified at initialization:
* `self.name`: Name of database. The idea is to use this for storing/loading specific databases, but it is not used internally in the code for this class.
* `self.symbols`: Dictionary with (key,value) pairs.
* `self.alias`: pd.MultiIndex with two levels 'from' and 'to'. We elaborate on the use of aliased sets in a separate subsection below.

*Initialization:*

In [4]:
db = SimpleDB(name = None, symbols = None, alias = None) # initialize database with default options explicitly written

#### B. Aliased sets

The database class builds in a logic concerning aliased sets; these are sets that we can reference with more than one identifier. As an application, we can think of a geographical index, $g$ that contains a set of countries `{Denmark, Germany, Sweden}`. We can set up a dummy (defined as a multiindex) that identifies neighboring countries: This would be a dummy that links a selection of the geographics areas to itself. We handle this by specifying a second identifier for `g`  - e.g. `gg` (we will revisit the methods for setting/getting symbols to/from the database below):

In [5]:
db['g'] = pd.Index(['Denmark', 'Germany', 'Sweden'], name = 'g')

We add an alias with the `self.updateAlias(alias = None)` method by specifying the sets and their alias:

In [6]:
db.updateAlias(alias = [('g','gg')])
db.alias # print alias mapping to see the added alias

MultiIndex([('g', 'gg')],
           names=['from', 'to'])

We add the indication of neighboring countries as follows - with names indicating the two names used for the same geographic index:

In [7]:
db['neighbours'] = pd.MultiIndex.from_tuples([('Denmark', 'Germany'),
                                              ('Denmark', 'Sweden')], names = ['g','gg'])

#### C. Buildin methods (setitem, getitem, call, iter, len, delitem, ... )

The database has specified methods for the following:
* `self.__iter__`: Iterates over `self.symbols.values()`, i.e. the `Gpy` instances stored in the (key,value) database.
* `self.__len__`: Returns length of `self.symbols`, i.e. the number of `Gpy` symbols stored in the (key,value) database.
* `self.__delitem__(item)`: Deletes `item` from `self.symbols`.
* `self.__getitem__(item)`: Returns `self.symbols[item]`, i.e. the `Gpy` instance from the (key,value) database.
* `self.__setitem__(item, value)`: Adds the (item,value) as a key,value pair in the dictionary `self.symbols`.
    * If `value` is not a `Gpy` symbol, the `value` is added as `Gpy.c(value, name = item)`, i.e. by initializing a suitable `Gpy` instance and making sure that the name of this symbol is set to `item`. If `value` is a pd.Series or pd.DataFrame --> add as `GpyVariable`, if `value` is a `pd.Index` --> add as `GpySet`, if scalar --> add as `GpyScalar`.
* `self.__call__(item, attr = 'v')`: The call method is implemented as a version of `self.__getitem__(item)` that returns a specific attribute of the `Gpy` instance. The default option is the attribute `v`, which returns the (main) pandas object of the `Gpy` symbol `item` (or simply the scalar value if the symbol is a `GpyScalar).
* `self.set(item, value, **kwargs)`: This method is added as a version of `self.__setitem__(item,value)` that allows for addititional arguments (`**kwargs`) to be passed to the `Gpy.c(value, name = item, **kwargs)` method when initializing as a `Gpy` instance.

**Examples:**

*Iter:*

In [14]:
[symbol.name for symbol in db] # test __iter__

['g', 'neighbours']

*len:*

In [16]:
len(db) # two symbols so far

2

*delitem*:

In [20]:
del db['neighbours'] # remove a symbol

*getitem*

In [23]:
db['g'] # __getitem__ returns Gpy instance

<pyDbs.gpy.GpySet at 0x186b5329550>

*setitem*

In [32]:
db['xVariable'] = pd.Series(1, index = db('g')) # add using pandas-like object
db['yVariable'] = Gpy.c(pd.Series(1, index = db('g')), name = 'yVariable') # add using Gpy instance; equivalent method (but name 'yVariable')

*call:*

In [35]:
db('xVariable') # return pandas-like object of the Gpy symbol

g
Denmark    1
Germany    1
Sweden     1
dtype: int64

#### D. Other methods

The database has a few other methods included:
* `self.getTypes(types = None)`: Returns selection of `self.symbols` with `Gpy` instances of the types included in the `types` argument (`scalar`,`set`,`variable` can be added). If `types = None` this defaults to `types = ['variable']`.
* `self.getdomains(setName, types = None)`: Returns selection of `self.getTypes(types)` where the index level `setName` is in the domain of the relevant set/variable. If `types = None` this defaults to `types = ['variable']`.
* `self.aomGpy(symbol, **kwargs)`: Add or merge (aom) the symbol `symbol` to the database, where the symbol is a `Gpy` instance. This works through the method `mergeGpy` defined for `Gpy` instances (see `docs_Gpy.ipynb`).
* `self.aom(symbol, **kwargs)`: Add or merge (aom) the symbol `symbol` to the database, where the symbol and kwargs are first passed through the `Gpy.c(symbol, **kwargs)` method and then added/merged using the `self.aomGpy` method.
* `self.mergeDbs(dbOther, **kwargs):` Iterates through `self.aomGpy` for all symbols in `dbOther`.
* `self.readSets(types = None)`: Iterates over all symbols of specified types and runs the ```self.aom``` method on every index level in the symbols. This can be useful if we e.g. supply a database with all relevant variables used in a model; we can then infer the relevant sets used in the model from this statement.

## 2. ```pyDbs.GpyDict```

The `pyDbs.GpyDict` class is very similar to `pyDbs.SimpleDB`, but differs in a couple of important ways: First, it does not have all the same methods as implemented in ```SimpleDB```. It only has 7 methods (we'll go through them below). Second, it the key,value pairs in the database, it does not assume that the key matches the name of the value.

**Methods:**
* Methods that are identical to `SimpleDB`: `self.__len__`, `self.__iter__`, `self.__delitem__`, `self.__call__`, `self.__getitem__`.

    This leaves two methods work differently here; this pertains to the way symbols are added:

* `self.__setitem__(item, value)`: This allows for a couple of different ways of adding symbols to `self.symbols`
    * If `value` is a `Gpy` symbol: Straightforward add with `item` as the identifier (key) in `self.symbols` and `value` as the value.
    * Elif `item` is a string: In this case, we add the symbol as in `SimpleDB` by passing `value` through the `Gpy.c(value, name = item)` method.
    * Elif `item` is a tuple: The first element in the tuple is the identifier used as key in `self.symbols`, the second element is used as `name` when adding the `value` using `Gpy.c(value, name = item[1]`). 
* `self.set(item ,value, **kwargs)`: Akin to `self.__setitem__`, except it allows us to pass `**kwargs` to the `Gpy.c(value, name = item[1], **kwargs)` call.

*Initialization:*

In [38]:
dbDict = GpyDict(symbols = None) # initialize database with default options explicitly written

*Add a symbol with a different identifier than the name*

In [48]:
dbDict[('keyId', 'nameOfSymbol')] = pd.Series(0, index = db('g'))
print(dbDict.symbols) # stored as 'keyId'

{'keyId': <pyDbs.gpy.GpyVariable object at 0x00000186B73F3A70>}


In [49]:
print(dbDict['keyId'].name) # name of the Gpy symbol

nameOfSymbol
