# Some Python basics

Before using `modelflow` with the World Bank's MFMod models, users  will have to understand at least some basic elements of `python` syntax and usage.  Notably they will need to understand about packages, libraries and classes, how to access them. 

## Starting python in windows

To begin using `modelflow`, python itself needs to be started.  This can be done either using the `Anaconda` navigator or from the command line shell. In either case, the user will need to start python and select the `modelflow` environment. 

```{warning}
*** Ib recipie, no navigator the -old version is saved 
```

## Starting a Python session with modelflow.

This can be done by starting the ```Anaconda Prompt (miniconda3)``` app. This will create a command window with a base python environment. Now you want to change the environment to the modelflow environment. This is done like this: 

```
conda activate modelflow

```

Now we are ready to start jupyter by: 

```
cd <path where you want to start> 
jupyter notebook
```

 
## Start python from command line 

First you have to make anaconda/conda active this is done by opening a command prompt and issue: 

> %userPROFILE%\miniconda3\Scripts\activate.bat 

Then the modelflow enviroment has to be activated in order to get access to the library this is done like this: 



## Python  packages, libraries and classes

Some features of `python` are built-in out-of-the-box.  Others build up on these basic features.

A **python class** is a code template that defines a python object. Classes can have properties [variables or data] associated with them and methods (behaviours or functions) associated with them. In python a class is created by the keyword class. An object of type class is created (instantiated) using the class's "constructor" -- a special method that creates an object that is an instance of a class.

A **module** is a Python object consisting of Python code. A module can define functions, classes and variables. A module can also include runnable code.

A **python package** is a collection of modules that are related to each other. When a module from an external package is required by a program, that package (or module in the package) must  be **imported** into the current session for its modules can be put to use.  

A **python library** is a collection of related modules or packages. 


`Modelflow` is a python package that *inherits* (build on or adds to) the methods and properties of other `python` classes like `pandas`, `numpy` and `mathplotlib`.

```{note}
In modelflow the model is a class and we can create an instance of a model (an object filled with the characteristics of the class) by executing the code ```mymodel = model(myformulas)``` see below for a working example.
```
## Importing packages, libraries, modules and classes

Some libraries, packages, and modules are part of the core python package and will be available (importable) from the get-go.  Others are not, and need to be installed before importing them into a session.

If you followed the modelflow installation instructions you have already downloaded and installed on your computer all the packages necessary for running World Bank models under modelflow.  But to work with them in a given Jupyter Notebook session or in a program context, you will also need to ```import``` them into your session before you call them.  

:::{note}
**Installation** of a package is not the same as **import**ing a package. To be imported a package must be installed once on the computer that wishes to use it.  Once it has been installed, the package must be imported into each python session where it is to be used.
:::

Typically a python program will start with the importation of the libraries, classes and modules that will be used.  Because a Jupyter Notebook is essentially a heavily annotated program, it also requires that packages used be imported.

As described above packages, libraries and modules are containers that can include other elements.  Take for example the package Math.

To import the Math Package we execute the command ``` import math```.  Having done that we can can call the functions and data that are defined in it.

In [1]:
# the "#"" in a code cell indicates a comment, test after the # will not be executed
import math

# Now that we have imported math we can access some of the elements identified in the package, 
# For example math contains a definition for pi, we can access that by executing the pi method 
# of the library math
math.pi

3.141592653589793

### Import specific elements or classes from a module or library

The python package ```math``` contains several functions and classes.  

If I want I can import them directly. Then when I call them I will not have to precede them with the name of their libary. to do this I use the **from** syntax.  ```from math import pi,cos,sin``` will import the pi constant and the two functions cos and sin and allow me to call them directly.

Compared these calls with the one in the preceding section -- there the call to the method pi has to be preceded by its namespace designator math.  i.e. ```math.pi```. Below we import pi directly and can just call it with pi.

In [2]:
from math import pi,cos,sin

print(pi)
print(cos(3))


3.141592653589793
-0.9899924966004454


### import a class but give it an alias

A class and instead of using its full name as above or it can be given an alias, that is hopefully shorter but still obvious enough that the user knows what class is being referred to.

For example  ```import math as m``` allows a call to pi using the more succint syntax ```m.py```.

In [3]:
import math as m
print(m.pi)
print(m.cos(3))

3.141592653589793
-0.9899924966004454


### Standard aliases

Some packages are so frequently used that by convention they have been "assigned" specific aliases.

For example:

**Common aliases**

|Alias|aliased package | example | functionalty|
|:--|:--|:--|:--|
|pd|pandas| import pandas as pd |Pandas are used for storing and retriveing data|
|np|numpy| import numpy as np | Numpy gives access to some advanced mathematical features|


You don't have to use those conventions but it will make your code easier to read by others who are familiar with it.




# Introduction to Pandas dataframes

Modelflow is built on top of the Pandas library. Pandas is the Swiss knife of data science and can perform an impressing array of date oriented tasks.

This tutorial is a very short introduction to how pandas dataframes are used with Modelflow. For a more complete discussion see any of the many tutorials on the internet, notably:


* [Pandas homepage](https://pandas.pydata.org/)
* [Pandas community tutorials](https://pandas.pydata.org/pandas-docs/stable/getting_started/tutorials.html)



## Import the pandas library

As with any python program, in order to use a package or library it must first be imported into the session. As noted above, by  convention pandas is imported as pd 

In [4]:
import pandas as pd 

Pandas, like any library, contains many classes and methods.  The discussion below focuses on **Series** and **DataFrames** two classes that are part of the pandas library.  Both `series` and `dataframes` are containers that can be used to store time-series data and that have associated with them a number of very useful methods for displaying and manipulating time-series data.  
Unlike other statistical packages neither `series` nor `dataframes` are inherently or exclusively time-series in nature.  `Modelflow` and macro-economists use them in this way, but the classes themselves are not dated in anyway out-of-the-box.

## The `Pandas` class `series`

A pandas series is class that can be used to instantiate an object that holds a two dimensional array comprised of values and an index.

The constructor for a `Series` object is ```pandas.Series()```.  The content inside the parentheses will determine the nature of the series-object generated.  As an object-oriented language Python supports ```overrides``` (which is to say a method can have more than one way in which it can be called). Specifically there can be different constructors defined for a class, depending on how the data that is to be used to initialize it is organized.

### Series declared from a list

The simplest way to create a Series is to pass an array of values as a Python list to the Series constructor.

```{note} 
A list in python is a comma delimited collection of items.  It could be text, numbers or even more complex objects.  When declared (and returned) list are enclosed in square brackets.

For example both of the following two lines are perfectly good examples of lists.

mylist=[2,7,8,9]
mylist2=["Some text","Some more Text",2,3]

The list is entirely agnostic about the type of data it contains.

```

In the examples below Simplest, Simple and simple3 are all series -- although series3 which is derived from a list mixing text and numeric values would be hard to interpret as an economic series.

In [5]:
values=[7,8,9,10,11]
weird=["Some text","Some more Text",2,3]

# Here the constructor is passed a numeric list
Simplest=pd.Series([2,3,4,5,6])
Simplest



0    2
1    3
2    4
3    5
4    6
dtype: int64

In [6]:
# In this case the constructor is passed a variable that contains a list
simple2=pd.Series(values)
simple2

0     7
1     8
2     9
3    10
4    11
dtype: int64

In [7]:
# Here the constructor is passed a variable containing a list that is a mix of 
# alphanumerics and numerical values
simple3=pd.Series(weird)
simple3

0         Some text
1    Some more Text
2                 2
3                 3
dtype: object

Note that all three series have different length.  

Moreover, constructed in this way (by passing a list to the constructor) each of these `Series` are automatically assigned a zero-based index (a numerial index that starts with 0).

### Series declared using a specific index

In this example the series Simple and Simple2 are recreated (overwritten), but this time an index is specified. Here the index is declared as a(nother) list. 




In [8]:
# In this example the constructor is given both the values 
# and specific values for the index
Simplest=pd.Series([2,3,4,5,6],index=[1966,1967,1996,1999,2000])
Simplest



1966    2
1967    3
1996    4
1999    5
2000    6
dtype: int64

In [9]:
simple2=pd.Series(values,index=[1966,1967,1996,1999,2000])
simple2

1966     7
1967     8
1996     9
1999    10
2000    11
dtype: int64

Now the Series look more like time series data!

### Create Series from a dictionary

In python a dictionary is a data structure that is more generally known in computer science as an associative array. A dictionary consists of a collection of key-value pairs, where each key-value pair *maps* or *links* the key to its associated value.  

```{note}
A dictionary is enclosed in curly brackets {}, versus a list which is enclosed in square brackets[].
```

Thus mydict={"1966":2,"1967":3,"1968":4,"1969":5,"2000":-15} creates an object called mydict.   ```mydict```maps (or links) the key "1966" links to the value 2.
```{note}
In this example the Key was a string but we could just as easily made it a numerical value:  
```

mydict2={1966:2,1967:3,1968:4,1969:5,2000:-15} creates an object called mydict2 that links (maps) the key "1966" to the value 2.


The series constructor also accepts a dictionary, and maps the key to the index of the Series. 



In [10]:
mydict2={1966:2,1967:3,1968:4,1969:5,2000:6}
simple2=pd.Series(mydict2)
simple2

1966    2
1967    3
1968    4
1969    5
2000    6
dtype: int64

## Properties and methods of `DataFrames` in `modelflow`

Any class can have both properties (data) and methods (functions that operate on the data of the particular instance of the class). With object-oriented programming languages like python, classes can be built as supersets of existing classes. The `modelflow` class ```model``` inherits or encapsulates all of the features of the pandas dataframe and extends it in many important ways.  Some of the methods below are standard pandas methods, others have been added to it by `modelflow` features

Much more detail on standard pandas dataframes can be found on the [official pandas website](https://pandas.pydata.org/docs/reference/frame.html).

### `DataFrame`s
The `DataFrame` is the primary structure of pandas and is a two-dimensional data structure with named rows and columns.  Each columns can have different data types (numeric, string, etc).

By convention, a dataframe if often called df or some other modifier followed by df, to assist in reading the code.

### Creating or instantiating a dataframe

Like any object, a `DataFrame` can be created by calling the constructor of the pandas class `DataFrame`.  

Each class has many constructors, so there are very many ways to create a `dataframe`. The `pandas.DataFrame()` method is constructor for the `DataFrame` class. It takes several forms (as with `Series`), but always returns an instance (instantiates) of a `DataFrame` object -- i.e. a variable whose contents are a `DataFrame`.

The code example below creates a `DataFrame` of three columns A,B,C; indexed between 2019 and 2021.  Macroeconomists may interpret the index as dates, but for pandas they are just numbers.  

Below a `DataFrame` named `df` is instantiated from a dictionary and assigned a specific index by passing a list of years as the index.

In [11]:

df = pd.DataFrame({'B': [1,1,1,1],'C':[1,2,3,6],'E':[4,4,4,4]},index=[2018,2019,2020,2021])
df 

Unnamed: 0,B,C,E
2018,1,1,4
2019,1,2,4
2020,1,3,4
2021,1,6,4


```{note}
In the `DataFrame`s that are used in macrostructural models like MFMod, each  column is often interpreted as a time-series of an economic variable. So in this dataframe,  normally A, B and C each be interpreted as economic time series. 

That said, there is nothing in the `DataFrame` class that suggests that the data it stores must be time-series or even numeric in nature.

```

### Adding a column to a dataframe

If a value is assigned to a column that does not exist, pandas will add a column with that name and fill it with values resulting from  the calculation.

:::{note}

The size of the object assigned to the new column must match the size (number of rows) of the pre-existing `DataFarame`.
:::



In [12]:
df["NEW"]=[10,12,10,13]
df

Unnamed: 0,B,C,E,NEW
2018,1,1,4,10
2019,1,2,4,12
2020,1,3,4,10
2021,1,6,4,13


### Revising values

If the column exists than the = method will revise the values of the rows with the values assigned in the statement.

```{warning}
The dimensions of the list assigned via the `=` method must be the same as the `DataFrame` (i.e. there must be exactly as many values as there are rows).  Alternatively if only one value is provided, then that value will replace all of the values in the specified column (be broadcast to the other rows in the column).
```

In [13]:
df["NEW"]=[11,12,10,14]

df

Unnamed: 0,B,C,E,NEW
2018,1,1,4,11
2019,1,2,4,12
2020,1,3,4,10
2021,1,6,4,14


In [14]:
# replace all of the rows of column B with the same value
df['B']=17
df

Unnamed: 0,B,C,E,NEW
2018,17,1,4,11
2019,17,2,4,12
2020,17,3,4,10
2021,17,6,4,14


## Column names in  Modelflow 
```{margin} Modelflow variable names
Modelflow places more restrictions on column names than do pandas *per se*.

```
While pandas dataframes are very liberal in what names can be given to columns, ```modelflow``` is more restrictive.

Specifically, in modelflow a variable name must:

* start with a letter
* be upper case

Thus while all these are legal column names in pandas, some are illegal in modelflow.

| Variable Name | Legal in</br> modelfow? | Reason |
|:-------|:-------------|:--------|
| IB | yes | <span style='color:Green'>Starts with a letter and is uppercase</span> |
| ib | no |<span style='color:red'> lowercase letters are not allowed</span>|
| 42ANSWER | No |<span style='color:Red'> does not start with a letter </span>|
| \_HORSE1 | No |<span style='color:Red'>does not start with a letter </span>|
| A_VERY_LONG_NAME_THAT_IS_LEGAL_3 | Yes |<span style='color:Green'> Starts with a letter and is uppercase </span>|

## .index and time dimensions in Modelflow

As we saw above, series have indices.  Dataframes also have indices, which are the row names of the dataframe.

In ```modelflow``` the index series is typically understood to represent a date. 

For yearly models a list of integers like in the above example works fine.<br>

For higher frequency models the index can be one of pandas datatypes.

:::{warning}

Not all datetypes work well with the graphics routines of modelflow.  Users are advised to use the ```pd.period_range()``` method to generate date indexes.

For example:
```   
    dates = pd.period_range(start='1975q1',end='2125q4',freq='Q')
    df.index=dates
```

:::

### Leads and lags
In modelflow leads and lags can be indicated by following the variable with a parenthesis and either -1 or -2 two for one or two period lags (where the number following the negative sign indicates the number of time periods that are lagged). Positive numbers are used for forward leads (no +sign required).

When a method defined by the `modelflow` class encounters something like `A(-1)`, it will take the value from the row above the current row. No matter if the index is an integer, a year, quarter or a millisecond. The same goes for leads, `A(+1)` will return the value of `A` in the next row. 

As a result in a quarterly model `B=A(-4)` would assign B the value of A from the same quarter in the previous year. 



### .columns lists the column names of a dataframe

The method ```.columns``` returns the names of the columns in the dataframe.

In [15]:
df.columns


Index(['B', 'C', 'E', 'NEW'], dtype='object')

### .size indicates the dimension of a list

so ```df.columns.size``` returns the number of columns in a dataframe.

In [16]:
df.columns.size

4

The dataframe df has 4 columns. 

### .eval() evaluates calculates an expression on the data of a dataframe

`.eval` is a native dataframe method, which does calculations on a `dataframe` and returns a revised `datafame`. With this method expressions can be evaluated and new columns created.  

In [17]:
df.eval('''X = B*C
           THE_ANSWER = 42''')

Unnamed: 0,B,C,E,NEW,X,THE_ANSWER
2018,17,1,4,11,17,42
2019,17,2,4,12,34,42
2020,17,3,4,10,51,42
2021,17,6,4,14,102,42


In [18]:
df

Unnamed: 0,B,C,E,NEW
2018,17,1,4,11
2019,17,2,4,12
2020,17,3,4,10
2021,17,6,4,14


In the above example the resulting dataframe is displayed but is not stored.

To store it, the results of the calculation must be assigned to a variable.  The pre-existing dataframe can be overwritten by assigning it the result of the eval statement.

In [19]:
df=df.eval('''X = B*C
           THE_ANSWER = 42''')
df

Unnamed: 0,B,C,E,NEW,X,THE_ANSWER
2018,17,1,4,11,17,42
2019,17,2,4,12,34,42
2020,17,3,4,10,51,42
2021,17,6,4,14,102,42


With this operation the new columns, x and THE_ANSWER have been appended to the dataframe df.

:::{note}
The ```.eval()``` method is a native pandas method.  As such it cannot handle lagged variables (because pandas do not support the idea of a lagged variable. 

The ```.mfcalc()``` and the ```.upd()``` methods discussed below are `modelflow` features that extend the functionalities native to `dataframe` that allows such calculations to be performed.  
:::

### .loc[] selects a portion (slice) of a dataframe 

The ```.loc[]``` method allows you to display and/or revise specific sub-sections of a column or row in a dataframe.

#### .loc[row,column] A single element

```.loc[row,column]``` operates on a single cell in the dataframe.  Thus the below displays the value of the cell with index=2019 observation from the  column C.

In [20]:
df.loc[2019,'C']

2

#### .loc[:,column] A single column

The lone colon in a loc statement indicates all the rows or columns.  Here all of the rows.

In [21]:
df.loc[:,'C']

2018    1
2019    2
2020    3
2021    6
Name: C, dtype: int64

#### .loc[row,:] A single row 

Here all of the columns, for the selected row.

In [22]:
df.loc[2019,:]

B             17
C              2
E              4
NEW           12
X             34
THE_ANSWER    42
Name: 2019, dtype: int64

####  .loc[:,[names...]] Several columns

Passing a list in either the rows or columns portion of the loc statement will allow multiple rows or columns to be displayed.

In [23]:
df.loc[[2018,2021],['B','C']]

Unnamed: 0,B,C
2018,17,1
2021,17,6


#### .loc using the colon to select a range

with the colon operator we can also select a range of results.

Here from 2018 to 2019.

In [24]:

df.loc[2018:2020,['B','C']]


Unnamed: 0,B,C
2018,17,1
2019,17,2
2020,17,3


#### .loc[] can also be used on the left hand side to assign values to specific cells
This can be very handy when updating scenarios.<br>


In [42]:
df.loc[2019:2020,'C'] = 17
df

Unnamed: 0,B,C,E,NEW,X,THE_ANSWER
2018,17,1,4,11,17,42
2019,17,17,4,12,34,42
2020,17,17,4,10,51,42
2021,17,6,4,14,102,42


```{warning}
The dimensions on the right hand side of = and the left hand side should match. That is: either the dimensions should be the same, or the right hand side should be ```broadcasted``` into the left hand slice.

For more on broadcasting [see here](https://jakevdp.github.io/PythonDataScienceHandbook/02.05-computation-on-arrays-broadcasting.html)
```

**For more info on the .loc[] method**
- [Description](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html)
- [Search](https://www.google.com/search?q=pandas+dataframe+loc&newwindow=1)


**For more info on pandas:**
- [Pandas homepage](https://pandas.pydata.org/)
- [Pandas community tutorials](https://pandas.pydata.org/pandas-docs/stable/getting_started/tutorials.html)