
# Some Python basics

Before using `modelflow` with the World Bank's MFMod models, users  will have to understand at least some basic elements of `python` syntax and usage.  Notably they will need to understand about packages, libraries and classes, and how to access them. 

## Starting python in windows

To begin using `modelflow`, python itself needs to be started.  This can be done either using the `Anaconda` navigator or from the command line shell. In either case, the user will need to start python and select the `modelflow` environment. 



## Anaconda navigator

1. Start Anaconda Navigator by typing Anaconda in the Start window and opening the Navigator (see Figure).
2. From Anaconda Navigator select the `Modelflow` environment (see figure)

```{figure} ./AnacondaNav1.png
---
height: 225px
name: Start Anaconda Navigator
---
A newly created Jupyter Notebook session
```
3. Once the environment is selected the user can either select a command line environment or start jupyter notebook by clicking on either the 
    1. Jupyter Notebook environment
    2. The command line environment
    3. A programming IDE environment

```{figure} ./NavigatorChoices.png
---
height: 225px
name: Start Anaconda Navigator
---
A newly created Jupyter Notebook session
```
 


## Python  packages, libraries and classes

Some features of `python` are built-in out-of-the-box.  Others build up on these basic features.

A **python class** is a code template that defines a python object. Classes can have properties [variables or data] associated with them and methods (behaviours or functions) associated with them. In python, a class is created by the keyword class. An object of type class is created (instantiated) using the class's "constructor" -- a special method that creates an object that is an instance of a class.

A **module** is a Python object consisting of Python code. A module can define functions, classes and variables. A module can also include runnable code.

A **python package** is a collection of modules that are related to each other. When a module from an external package is required by a program, that package (or module in the package) must  be **imported** into the current session in order for its modules to be accessible.  

A **python library** is a collection of related modules or packages. 

`Modelflow` is a python package that *inherits* (build on or adds to) the methods and properties of other `python` classes like `pandas`, `numpy` and `mathplotlib`.
 

## Importing packages, libraries, modules and classes

Some libraries, packages, and modules are part of the core python package and will be available (importable) from the get-go.  Others are not, and need to be installed before importing them into a session.

If you followed the modelflow installation instructions you have already downloaded and installed on your computer all the packages necessary for running World Bank models under modelflow.  But to work with them in a given Jupyter Notebook session or in a program context, you will also need to ```import``` them into your session before you call them.  

:::{note}
**Installation** of a package is not the same as **import**ing a package. Installation downloads a package's programs from the internet onto the user's machine, making them available to be imported when required.  To be imported a package must be installed once on the computer that wishes to use it.  Once it has been installed, the package must be imported into each python session where it is to be used.
:::

Typically a python program will start with the importation of the libraries, classes and modules that will be used.  Because a Jupyter Notebook is essentially a heavily annotated program, it also requires that packages used be imported.

As described above: packages, libraries and modules are containers that can include other elements.  Take for example the package Math.

To import the Math Package we execute the command ``` import math```.  Having done that we can can call the functions and data that are defined in it.



In [16]:
# the "#" in a code cell indicates a comment, test after the # will not be executed
import math

# Now that we have imported math we can access some of the elements identified in 
# the package.
# For example math contains a definition for pi, we can access that by executing 
# the pi method of the library math

math.pi

3.141592653589793

### Import specific elements or classes from a module or library

The python package ```math``` contains several functions and classes.  

Rather than importing the whole package (as above), these classes can be imported directly using the **from syntax**. 

 ```from math import pi,cos,sin```

When imported in this fashion, the user does not have to precede the class or method with the name of their libary. The above ```from math import pi,cos,sin``` command imports the pi constant and the two functions cos and sin from the math package directly and allow  the user to call them using their names without preceding them iwth `math.`.

Compare these calls with the one in the preceding section -- there the call to the method pi has to be preceded by its namespace designator math.  i.e. ```math.pi```. Below we import pi directly and can just call it with pi.

In [17]:
from math import pi,cos,sin

print(pi)
print(cos(3))


3.141592653589793
-0.9899924966004454


### import a class but give it an alias

An imported class can also be given an alias, that is hopefully shorter than its official name but still obvious enough that the user knows what class is being referred to.

For example  ```import math as m``` allows a call to pi using the more succinct syntax ```m.py```.

In [18]:
import math as m
print(m.pi)
print(m.cos(3))

3.141592653589793
-0.9899924966004454


### Standard aliases

Some packages are so frequently used that by convention they have been "assigned" specific aliases.

For example:

**Common aliases**

|Alias|aliased package | example | functionalty|
|:--|:--|:--|:--|
|pd|pandas| import pandas as pd |Pandas are used for storing and retriveing data|
|np|numpy| import numpy as np | Numpy gives access to some advanced mathematical features|


You don't have to use those conventions but it will make your code easier to read by others who are familiar with it.




# Introduction to Pandas, Pandas Series and Pandas dataframes

Modelflow is built on top of the Pandas library. Pandas is the Swiss knife of data science and can perform an impressive array of data oriented tasks.

This tutorial is a very short introduction to how pandas series and dataframes are used with Modelflow. For a more complete discussion see any of the many tutorials on the internet, notably:


* [Pandas homepage](https://pandas.pydata.org/)
* [Pandas community tutorials](https://pandas.pydata.org/pandas-docs/stable/getting_started/tutorials.html)



## Import the pandas library

As with any python program, in order to use a package or library it must first be imported into the session. As noted above, by  convention pandas is imported as pd 

In [19]:
import pandas as pd 

Pandas, like any library, contains many classes and methods.  The discussion below focuses on **Series** and **DataFrames** two classes that are part of the pandas library.  Both `series` and `dataframes` are containers that can be used to store time-series data and that have associated with them a number of very useful methods for displaying and manipulating time-series data.  

Unlike other statistical packages neither `series` nor `dataframes` are inherently or exclusively time-series in nature. `Modelflow` and macro-economists use them in this way, but the classes themselves are not dated or exclusively numerical out-of-the-box.

## The `series` class in `Pandas`

`Series` is a class that is part of the pandas package and can be used to instantiate an object that holds a two dimensional array comprised of values and an index.

The constructor for a `Series` object is ```pandas.Series()```.  The content inside the parentheses will determine the nature of the series-object generated.  As an object-oriented language, Python supports ```overrides``` (which is to say a method can have more than one way in which it can be called). Specifically there can be different constructors defined for a class, depending on how the data used to initialize the object it is organized.

### Series declared from a list

The simplest way to create a Series is to pass an array of values as a Python list to the Series constructor.

```{note} 
A list in python is a comma delimited collection of items.  It could be text, numbers or even more complex objects.  When declared (and returned) lists are enclosed in square brackets.

For example both of the following two lines are perfectly good examples of lists.

mylist=[2,7,8,9]

mylist2=["Some text","Some more Text",2,3]

The list is entirely agnostic about the type of data it contains.

```

In the examples below Simplest, Simple and simple3 are all series -- although series3 which is derived from a list mixing text and numeric values would be hard to interpret as an economic series.

In [20]:
values=[7,8,9,10,11]
weird=["Some text","Some more Text",2,3]

# Here the constructor is passed a numeric list
Simplest=pd.Series([2,3,4,5,6,7])
Simplest



0    2
1    3
2    4
3    5
4    6
5    7
dtype: int64

In [21]:
# In this case the constructor is passed a variable that contains a list
simple2=pd.Series(values)
simple2

0     7
1     8
2     9
3    10
4    11
dtype: int64

In [22]:
# Here the constructor is passed a variable containing a list that is a mix of 
# alphanumerics and numerical values
simple3=pd.Series(weird)
simple3

0         Some text
1    Some more Text
2                 2
3                 3
dtype: object

Note that all three series have different length.  

Moreover, constructed in this way (by passing a list to the constructor) each of these `Series` are automatically assigned a zero-based index (a numerial index that starts with 0).

### Series declared using a specific index

In this example the series Simple and Simple2 are recreated (overwritten), but this time an index is specified. Here the index is declared as a(nother) list. 




In [23]:
# In this example the constructor is given both the values 
# and specific values for the index
Simplest=pd.Series([2,3,4,5,6],index=[1966,1967,1996,1999,2000])
Simplest



1966    2
1967    3
1996    4
1999    5
2000    6
dtype: int64

In [24]:
simple2=pd.Series(values,index=[1966,1967,1996,1999,2000])
simple2

1966     7
1967     8
1996     9
1999    10
2000    11
dtype: int64

Now these Series look more like time-series data!

### Create Series from a dictionary

In python, a dictionary is a data structure that is more generally known in computer science as an associative array. A dictionary consists of a collection of key-value pairs, where each key-value pair *maps* or *links* the key to its associated value.  

```{note}
A dictionary is enclosed in curly brackets {}, versus a list which is enclosed in square brackets[].
```

Thus mydict={"1966":2,"1967":3,"1968":4,"1969":5,"2000":-15} creates a dictionary object called mydict.   ```mydict```maps (or links) the key "1966" links to the value 2.
```{note}
In this example the Key was a string but we could just as easily made it a numerical value:  
```

mydict2={1966:2,1967:3,1968:4,1969:5,2000:-15} creates an object called mydict2 that links (maps) the key "1966" to the value 2.


The series constructor also accepts a dictionary, and maps the key to the index of the Series. 



In [25]:
mydict2={1966:2,1967:3,1968:4,1969:5,2000:6}
simple2=pd.Series(mydict2)
simple2

1966    2
1967    3
1968    4
1969    5
2000    6
dtype: int64

## The `DataFrame` class in `Pandas`

The `DataFrame` is the primary structure of pandas. It is a two-dimensional data structure with named rows and columns.  Each columns can have different data types (numeric, string, etc).

By convention, a dataframe if often called df or some other modifier followed by df, to assist in reading the code.

Much more detail on standard pandas dataframes can be found on the [official pandas website](https://pandas.pydata.org/docs/reference/frame.html).

### Creating or instantiating a dataframe

Like any object, a `DataFrame` can be created by calling the constructor of the pandas class `DataFrame`.  

Each class has many constructors, so there are very many ways to create a `dataframe`. The `pandas.DataFrame()` method is constructor for the `DataFrame` class. It takes several forms (as with `Series`), but always returns an instance (instantiates) of a `DataFrame` object -- i.e. a variable whose contents are a `DataFrame`.

The code example below creates a `DataFrame` of three columns A,B,C; indexed between 2019 and 2021.  Macroeconomists may interpret the index as dates, but for pandas they are just numbers.  

Below a `DataFrame` named `df` is instantiated from a dictionary and assigned a specific index by passing a list of years as the index.

In [26]:

df = pd.DataFrame({'B': [1,1,1,1],'C':[1,2,3,6],'E':[4,4,4,4]},index=[2018,2019,2020,2021])
df 

Unnamed: 0,B,C,E
2018,1,1,4
2019,1,2,4
2020,1,3,4
2021,1,6,4


```{note}
In the `DataFrame`s that are used in macrostructural models like MFMod, each  column is often interpreted as a time-series of an economic variable. So in this dataframe,  normally A, B and C each be interpreted as economic time series. 

That said, there is nothing in the `DataFrame` class that suggests that the data it stores must be time-series or even numeric in nature.

```

### Alternative ways to set the time period of a dated index

A somewhat more creative way to initialize the dataframe for dates would use a loop to specify the dates that get passed to the constructor as an argument.  

Below a dataframe df with two Series (A and B), is initialized with the values 100 for all data points.

The index is defined dynamically by a loop ```index=[2020+v for v in range(number_of_rows)]``` that runs for number_of_rows times (6 times in this example) setting v equal to 2020+0, 2020+1,...,202+5. The resulting list whose values are assigned to index is \[2020,2021,2022,2023,2024,2025\].

The big advantage of this method is that if the user wanted to have data created for the period 1990 to 2030, they would only have to change number_of_rows from 6 to 41, and the change the staring date in the loop from 2020 to 1990.

In [30]:
#define the number of years for which the data is to be created.
number_of_rows = 6 

# call the dataframe constructor
df = pd.DataFrame(100,
       index=[2020+v for v in range(number_of_rows)], # create row index
       # equivalent to index=[2020,2021,2022,2023,2024,2025] 
       columns=['A','B'])                                 # create column name 
df

Unnamed: 0,A,B
2020,100,100
2021,100,100
2022,100,100
2023,100,100
2024,100,100
2025,100,100


This second example simplifies the creation even further by  specifying the begin and end point as a range.

In [28]:


df1 = pd.DataFrame(200,
       index=[v for v in range(2020,2030)], # create row index
       # equivalent to index=[2020,2021,...,2030] 
       columns=['A1','B1'])                                 # create column name 
df1

Unnamed: 0,A1,B1
2020,200,200
2021,200,200
2022,200,200
2023,200,200
2024,200,200
2025,200,200
2026,200,200
2027,200,200
2028,200,200
2029,200,200


### Adding a column to a dataframe

If a value is assigned to a column that does not exist, pandas will add a column with that name and fill it with values resulting from  the calculation.

:::{note}

The size of the object assigned to the new column must match the size (number of rows) of the pre-existing `DataFarame`.
:::



In [32]:
df["NEW"]=[10,12,10,13,14,15]  #df origiall has 6 rows so we must suup,y 6 data points for this command to run error free
df

Unnamed: 0,A,B,NEW
2020,100,100,10
2021,100,100,12
2022,100,100,10
2023,100,100,13
2024,100,100,14
2025,100,100,15


### Revising values

If the column exists than the = method will revise the values of the rows with the values assigned in the statement.

```{warning}
The dimensions of the list assigned via the `=` method must be the same as the `DataFrame` (i.e. there must be exactly as many values as there are rows).  Alternatively if only one value is provided, then that value will replace all of the values in the specified column (be broadcast to the other rows in the column).
```

In [33]:
df["NEW"]=[11,12,10,14,2,1]

df

Unnamed: 0,A,B,NEW
2020,100,100,11
2021,100,100,12
2022,100,100,10
2023,100,100,14
2024,100,100,2
2025,100,100,1


In [35]:
# replace all of the rows of column B with the same value
df['B']=17
df

Unnamed: 0,A,B,NEW
2020,100,17,11
2021,100,17,12
2022,100,17,10
2023,100,17,14
2024,100,17,2
2025,100,17,1


### .columns lists the column names of a dataframe

The method ```.columns``` returns the names of the columns in the dataframe.

In [36]:
df.columns


Index(['A', 'B', 'NEW'], dtype='object')

### .size indicates the dimension of a list

so ```df.columns.size``` returns the number of columns in a dataframe.

In [37]:
df.columns.size

3

The dataframe df has 4 columns. 

### .eval() evaluates calculates an expression on the data of a dataframe

`.eval` is a native dataframe method, which does calculations on a `dataframe` and returns a revised `dataframe`. With this method expressions can be evaluated and new columns created.  

In [39]:
df.eval('''X = B*NEW
           THE_ANSWER = 42''')

Unnamed: 0,A,B,NEW,X,THE_ANSWER
2020,100,17,11,187,42
2021,100,17,12,204,42
2022,100,17,10,170,42
2023,100,17,14,238,42
2024,100,17,2,34,42
2025,100,17,1,17,42


Because the result of the `.df.eval()` call was not assigned to anything, least of all the dataframe df, the value of df is unchanged.

In [40]:
df

Unnamed: 0,A,B,NEW
2020,100,17,11
2021,100,17,12
2022,100,17,10
2023,100,17,14
2024,100,17,2
2025,100,17,1


To store the results of the calculation must be assigned to a variable.  The pre-existing dataframe can be overwritten by assigning it the result of the eval statement.

In [42]:
df=df.eval('''X = B*NEW
           THE_ANSWER = 42''')
df

Unnamed: 0,A,B,NEW,X,THE_ANSWER
2020,100,17,11,187,42
2021,100,17,12,204,42
2022,100,17,10,170,42
2023,100,17,14,238,42
2024,100,17,2,34,42
2025,100,17,1,17,42


With this operation the new columns, x and THE_ANSWER have been appended to the dataframe df.

:::{note}
The ```.eval()``` method is a native pandas method.  As such it cannot handle lagged variables (because pandas do not support the idea of a lagged variable. 

The ```.mfcalc()``` and the ```.upd()``` methods discussed in the next chapter are `modelflow` features that extend the functionalities native to `dataframe` that allows such calculations to be performed.  
:::

### .loc[] selects a portion (slice) of a dataframe 

The ```.loc[]``` method allows you to display and/or revise specific sub-sections of a column or row in a dataframe.

#### .loc[row,column] A single element

```.loc[row,column]``` operates on a single cell in the dataframe.  Thus the below displays the value of the cell with index=2023 observation from the column NEW.

In [44]:
df.loc[2023,'NEW']

14

#### .loc[:,column] A single column

The lone colon in a loc statement indicates all the rows or columns.  Here all of the rows.

In [45]:
df.loc[:,'NEW']

2020    11
2021    12
2022    10
2023    14
2024     2
2025     1
Name: NEW, dtype: int64

#### .loc[row,:] A single row 

Here all of the columns, for the selected row.

In [46]:
df.loc[2023,:]

A             100
B              17
NEW            14
X             238
THE_ANSWER     42
Name: 2023, dtype: int64

####  .loc[:,[names...]] Several columns

Passing a list in either the rows or columns portion of the loc statement will allow multiple rows or columns to be displayed.

In [48]:
df.loc[[2021,2024],['B','NEW']]

Unnamed: 0,B,NEW
2021,17,12
2024,17,2


#### .loc using the colon to select a range

with the colon operator we can also select a range of results.

Here from 2018 to 2019.

In [49]:

df.loc[2021:2023,['B','NEW']]


Unnamed: 0,B,NEW
2021,17,12
2022,17,10
2023,17,14


#### .loc[] can also be used on the left hand side to assign values to specific cells
This can be very handy when updating scenarios.<br>


In [50]:
df.loc[2022:2024,'NEW'] = 17
df

Unnamed: 0,A,B,NEW,X,THE_ANSWER
2020,100,17,11,187,42
2021,100,17,12,204,42
2022,100,17,17,170,42
2023,100,17,17,238,42
2024,100,17,17,34,42
2025,100,17,1,17,42


```{warning}
The dimensions on the right hand side of = and the left hand side should match. That is: either the dimensions should be the same, or the right hand side should be ```broadcasted``` into the left hand slice.

For more on broadcasting [see here](https://jakevdp.github.io/PythonDataScienceHandbook/02.05-computation-on-arrays-broadcasting.html)
```

**For more info on the .loc[] method**
- [Description](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html)
- [Search](https://www.google.com/search?q=pandas+dataframe+loc&newwindow=1)


**For more info on pandas:**
- [Pandas homepage](https://pandas.pydata.org/)
- [Pandas community tutorials](https://pandas.pydata.org/pandas-docs/stable/getting_started/tutorials.html)