##  ```.mfcalc()``` an extension of standard Pandas

Like```.upd()```, the ```.mfcalc()``` method can be used to extend the functionality of standard pandas.  It is actually a much more powerful method that can be used to solve models or mini-models or see how modelflow normalizes equations.  It can be particularly useful when creating scenarios -- uses that are presented elsewhere.

Here , the focus is but is on using ```mfcalc()```to perform quick and dirty calculations and modify datafames. 


### workspace initialization

Setting up our python session to use pandas and modelflow by importing their packages.  ```modelmf``` is an extension of dataframes that is part of the modelflow installation package (and also used by modelflow itself).

In [2]:
#some stuff to make Jupyter notebooks run a bit more smoothly 
%load_ext autoreload
%autoreload 2

In [3]:
import pandas as pd  # Python data science library
import modelmf       # Add useful features to pandas dataframes 
                     # using utlities initially developed for modelflow

### Create a  simple dataframe 

Create a Pandas dataframe with one column with the name A and 6 rows.

Set set the index to 2020 through 2026 and set the values of all the cells to 100. 


* ```pd.DataFrame``` creates a dataframe  [Description](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame)

* The expression ```[v for v in range(2020,2026)]``` dynamically creates a  python list, and fills it with  integers beginning with 2020 and ending 2025 


In [4]:
df = pd.DataFrame(                                 # call the dataframe constructure 
    100.000,                                           # the values 
    index=[v for v in range(2020,2026)],           #index
    columns=['A']                                  # the column name 
                 )
df   # the result of the last statement is displayed in the output cell 

Unnamed: 0,A
2020,100.0
2021,100.0
2022,100.0
2023,100.0
2024,100.0
2025,100.0


## ```.mfcalc()``` in action

### ```.mfcalc()``` example to calculate a new series

Use  mfcalc to calculate a new column (series) as a function of the existing A column series

The below call creates a new column x.

In [5]:
df.mfcalc('x = x(-1) + a')

Unnamed: 0,A,X
2020,100.0,0.0
2021,100.0,100.0
2022,100.0,200.0
2023,100.0,300.0
2024,100.0,400.0
2025,100.0,500.0



By default ```.mfcalc``` will initialize a new variable with zeroes. 
Moreover, if a formula passed to ```.mfcalc``` contains a lag a value will be calculated for the a row only if there is data in the series for the preceding row.

Combining these two behaviours generates the result where the command ```df.mfcalc('x = x(-1) + a')``` results in a zero in 2020 for X (because there was no X variable defined for 2019 (indeed no such row exists), but then the subsequent rows add the contemporaneous value of A to the preceding value of x.

Once again, the result of the .mfcalc is displayed. However, because the results from ```df.mfcalc()``` call was not assigned to a variable (no equals sign to the left of the call), the output of the command is displayed but not stored.

:::{note}
In the above example a  dataframe with the result is created and displayed, but the df dataframe did not change.  To have it change we would have had to assign it the result of the initial operation, as below.
:::

In [6]:
df

Unnamed: 0,A
2020,100.0
2021,100.0
2022,100.0
2023,100.0
2024,100.0
2025,100.0


### Stopring the result of an ```.mfcalc()``` call

In this instance, the results of the ```.mfcalc()``` call is assigned to the variable df2 and therefore stored.


In [7]:
df2=df.mfcalc('x = x(-1) + a') # Assign the result to df2
df2

Unnamed: 0,A,X
2020,100.0,0.0
2021,100.0,100.0
2022,100.0,200.0
2023,100.0,300.0
2024,100.0,400.0
2025,100.0,500.0


### Recalculate A so  it grows by 2 percent

```mfcalc()```knows that it can not start to calculate in 2020 A (the lagged variable) has no value in 2019. 

```.mfcalc()``` therefore begins its calculation in 2021. Note, the existing value for 2020 is preserved.  This behaviour differs from other programs that might return a n/a value for the 2020.



In [8]:

res = df.mfcalc('a =  1.02 *  a(-1)')
res

Unnamed: 0,A
2020,100.0
2021,102.0
2022,104.04
2023,106.1208
2024,108.243216
2025,110.40808


In [9]:
res.pct_change()*100 # to display the percent changes

Unnamed: 0,A
2020,
2021,2.0
2022,2.0
2023,2.0
2024,2.0
2025,2.0




### mfcalc(), the showeq option

The ```showeq``` option is by default ```= False```.

By setting equal to ```True```, mfcalc can be used to express the normalization of an entered equation.


In [10]:
df.mfcalc('dlog( a) =  0.02',showeq=1);


FRML <> A=EXP(LOG(A(-1))+0.02)$


In ```modelflow``` the expression ```dlog(a)``` refers to the difference in the natural logarithm $dlog(x_t) \equiv ln(x_t)-ln(x_{t-1})$ and is equal to the growth rate for the variable.

```.mfcalc()``` normalizes the equation such that the systems solves for a as follows:<br>

$$dlog(a)  = 0.02$$ <br>
$$log(a)-log(a_{t-1}) = .02$$<br>
$$log(a)=log(a_{t-1})+.02$$ <br>
$$a = e^{log(a_{t-1})+0.02}$$ <br>
$$a =a_{t-1}*e^{0.02}$$

which expressed in the business logic language of ```modelflow``` is:

A=EXP(LOG(A(-1))+0.02)


In [11]:
df

Unnamed: 0,A
2020,100.0
2021,100.0
2022,100.0
2023,100.0
2024,100.0
2025,100.0


### Using the diff() operator  ($\Delta$) with mfcalc

The diff() operator, effectively normalizes to an equation that will add the value to the right of the equals sign to the lagged variable inserted in the diff operator.  Thus,  diff(a)=x normalizes to a=a(-1)+x


In [16]:
df.mfcalc('diff(a) =  2',showeq=1)

FRML <> A=A(-1)+(2)$


Unnamed: 0,A
2020,100.0
2021,102.0
2022,104.0
2023,106.0
2024,108.0
2025,110.0


### mfcalc with several equations and arguments
In addition to a single equation multiple commands can be executed with one command. 

However, **be careful** because the equation commands are executed simultaneously, which, combined with the treatments of lags, means that results may differ from what they would be if the commands were run sequentially.

For example:

In [11]:
res = df.mfcalc('''
diff(a) =  2
x = a + 42 
''')

res

# use res.diff() to see the difference

Unnamed: 0,A,X
2020,100.0,0.0
2021,102.0,144.0
2022,104.0,146.0
2023,106.0,148.0
2024,108.0,150.0
2025,110.0,152.0


Here the diff(a) is not defined for 2020 because there is no value for a in 2019.

As a result ```modelflow``` generates a result only for the periods 2021 through 2025 and it is this result that is passed to the second equation, which adds 42 to this number. X in 2020 is not 142 as one might have expected but zero, the value to which the newly created variable defaults.

Compare the results above with the results (below) when the two steps are not undertaken in two separate calls to ```.mfcalc()```.



In [12]:
res1 = df.mfcalc('''
diff(a) =  2
''')

res2 = res1.mfcalc('''
x = a + 42 
''')
res2



Unnamed: 0,A,X
2020,100.0,142.0
2021,102.0,144.0
2022,104.0,146.0
2023,106.0,148.0
2024,108.0,150.0
2025,110.0,152.0


:::{Danger}
In ```.mfcalc()```, when there are multiple equation commands is single call, they are executed simultaneously. This, combined with ```mfcalc```'s  treatments of lags, means only the results of the lagged calculation will be passed to other commands equations defined in the ```.mfcalc``` command. As a consequence, results may differ from what would be expected and what you would see if you ran the two commands sequentially.  
:::

### Setting a time frame with mfcalc.
It can useful in some circumstances to limit the time frame for which the calculations are performed. By specifying a start date and end date enclosed in <> in a  line we can restrict the time period over which calculation is performed.

Below, as in the example above we have zeroes for x prior to 2023 when the expressions are executed.

In [13]:
res = df.mfcalc('''
<2023 2025>
diff(a) =  2
x = a + 42 
''')

res.diff()

res

Unnamed: 0,A,X
2020,100.0,0.0
2021,100.0,0.0
2022,100.0,0.0
2023,102.0,144.0
2024,104.0,146.0
2025,106.0,148.0
