# Software Engineering for Economists

## Building Confidence in a Model

* ***Verification*** How accurately does the computaitonal solve the underlying equations of the model for the quantities of interest?

* ***Validation*** How accurately does the model represent the reality for the quantities of interest?

* ***Uncertainty Quantification*** How doe the varouous sources of error and uncertainty feed into uncertainty in the model-based prediction of the quantities of interest?

<img src='images/conceptual_model.png' width=750 height=750>

<h3>Devloping, Estimating, and Validating Dynamic Microeconometric Models<h3>
<br>
<center><a href="http://www.youtube.com/watch?feature=player_embedded&v=0hazaPBAYWE
" target="_blank"><img src="http://img.youtube.com/vi/0hazaPBAYWE/0.jpg" 
alt="IMAGE ALT TEXT HERE" width="500" height="400" border="10" /></a></center>

### Software Engineering as part of Verification Step

* verifying correctness

* increase transparency

* free cognitive resources

$\Rightarrow$ work responsibly

<img src='images/venn_diagram.png' width=750 height=750>

### Basic Elements



<style>
table,td,tr,th {border:none!important}
</style>
<table style="width:90%">
  <tbody>
    <tr height="40">
      <td style="vertical-align:top; padding-left:30px;">
          <li>Version Control</li>
      </td>
      <td style="vertical-align:top; padding-left:30px;">
          <li>Code Review</li>
      </td>
    </tr>
    <tr height="40">
      <td style="vertical-align:top; padding-left:30px;">
          <li>Testing</li>
      </td>
      <td style="vertical-align:top; padding-left:30px;">
          <li>Profiling </li>
      </td>
    </tr>
    <tr height="40">
      <td style="vertical-align:top; padding-left:30px;">
          <li>Continous Integration</li>
    </tr>
  </tbody>
</table> 


## Research Example

### Ambiguity in Dynamic Models of Educational Choice

* Plausible

    * better description of agent decision problem
    
* Meaningful

    * reinterpretation of economic phenomenon
    * reevaluation of policy interventions

* Tractable

### Computational Challenges and Software Engineering

* ensure correctness of implementation

* manage numerous numerical components

* address performance constraints

* guarantee recomputability and extensibility

* ...

$\Rightarrow$ building, documenting, and communicating expertise

## Running Example

Throughout this lecture we will work with a simple example from expected utility theory. We consider a simple utility function that captures an agent's preferences over uncertain outcomes.

$$ u(x) = x^\alpha,$$

where $\alpha, x > 0$. We can calculate the expected utility (EU) as:

$$E U= \int^{-\infty}_{\infty} u(x) f(x)dx,$$

where $f(x)$ is the probablity density function of the realizations of $x$. In our case, $x$ is drawn from a lognormal distribution with mean $\mu$ and standard deviation $\sigma$. To solve the integral, we implement a simple Monte Carlo Iitegration. See Skrainka & Judd (2013) for the importance of choosing the right integration strategy.

In [1]:
# SciPy Stack
import numpy as np

In [2]:
def get_expected_utility(alpha, mean, sd):
    """ Get the expected returns by drawing numerous
    random deviates from a lognormal distribution.
    """
    # Guard interface
    assert (isinstance(mean, float))
    assert (isinstance(sd, float))
    assert (isinstance(alpha, float))
    assert (sd >= 0.00)
    assert (alpha >= 0.00)
    
    # Set parametrization for Monte Carlo 
    # integration.
    num_draws = 10000000
    
    # Draw ten-thousand deviates from the 
    deviates = get_random_deviates(mean, sd, num_draws)
    
    # Calculate the average utility from all deviates.
    rslt = np.mean(deviates ** alpha)
    
    # Check result
    assert (isinstance(rslt, float))
        
    # Finishing
    return rslt

In [3]:
def get_random_deviates(mean, sd, num_draws):
    """ Get random deviates from a lognormal 
    distribution.
    """
    # Draw deviates from lognormal distribution.
    deviates = []
    for _ in range(num_draws):
        deviate = np.random.lognormal(mean, sd)
        deviates += [deviate]
    
    # Type Conversion
    deviates = np.array(deviates)
    
    # Finishing
    return deviates

## Version Control

Version Control records changes to a file or set of files over time so that you can recall (and compare) specific versions later. This part of the lecture draws heavily on the free e-book [*Pro Git*](http://www.git-scm.com/book/en/v2) and Blischak & al. (2016).

Tracking changes to your code over time has a variety of benefits:

* Tracability

* Clarity 

* Flexibility

* Reduced duplication and error 

$\Rightarrow$ transparent, reproducible, and collaborative research

In our quick tour, we will do the following:

* set up local repository

* commit files

* track changes to files

* compare versions of files 

Let us initialize a repository ...

```shell
$ git init
Initialized empty Git repository in .../git_material/.git/
```

... and provide some information about ourselves.

```shell
$ git config user.name "Philipp Eisenhauer"
$ git config user.email "eisenhauer@policy-lab.org"
```

First of all, let us take stock and check the status of our local repository:

```shell
$ git status
On branch master

Initial commit

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	auxiliary.py
	get_expected_utility.py
	integration_rules.py
	utility_functions.py
```

<img src='images/lifecycle.png' width=1000 height=1000>

We now set up tracking of all our files and revisit the status:

```shell
$ git add *.py
$ git status
On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	new file:   auxiliary.py
	new file:   get_expected_utility.py
	new file:   integration_rules.py
	new file:   utility_functions.py
```

We are now ready for our initial commit.

```shell
$ git commit -a -m'Initial commit of directory content.'
[master (root-commit) eaafed6] Initial commit of directory content.
 4 files changed, 92 insertions(+)
 create mode 100644 auxiliary.py
 create mode 100644 get_expected_utility.py
 create mode 100644 integration_rules.py
 create mode 100644 utility_functions.py
```

By adding a message to our commit, it is much easier to return to it later.
    
```shell
$ git log
commit eaafed680ea54c95393f35faf56f160d560e6f1e
Author: Philipp Eisenhauer <eisenhauer@policy-lab.org>
Date:   Mon Jan 25 14:45:28 2016 +0100

    Initial commit of directory content.

```  


This brings us back to the beginning of our repository lifecycle.

```shell
$ git status
On branch master
nothing to commit, working directory clean
```   

Let us modify the default number of simulation draws in *integration_rules.py* using our favourite editor and check for the current status of our files.

```shell
$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   integration_rules.py

no changes added to commit (use "git add" and/or "git commit -a")
```   



What exactly is the difference?

```shell
$ git diff
```  

In case we want to discard our changes now:

```shell
$ git checkout integration_rules.py
```  

In this case, we discard our changes immediately. However, we can always return to any version of our file ever commited.

```shell
$ git checkout IDENTIFIER integration_rules.py
```  


Once you are familiar with the basic workflow for versioning your own code. The natural next step is to share your code online. [*GitHub*](https://github.com) is a popular online hosting service. Among other benefits, you then have a backup copy of your files and can establish a tranparent workflow with your co-authors. 

Let us check out the repository for my current research on ambiguity [here](https://github.com/robustToolbox).

For a more detailed look at what *Git* has to offer, check out [Blischak & al. (2016)](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004668)

## Testing

### Types of Tests

* Unit Tests

* Integration Tests

* Regression Tests 

In [4]:
def generate_random_request():
    """ Generate a random admissible request.
    """
    # Draw random deviates that honor the suppert
    # constraints.
    mean = np.random.normal()
    alpha, sd = np.random.uniform(size=2)

    # Finishing
    return alpha, mean, sd

In [5]:
# Generate and evaluate a random request.
alpha, mean, sd = generate_random_request()
rslt = get_expected_utility(alpha, mean, sd)
# Print out request and result
print('Request: {0:.3f}, {1:.3f}, {2:.3f}  Result: {3:.3f}'.format(alpha, mean, sd, rslt))

Request: 0.235, 1.078, 0.745  Result: 1.309


In [6]:
def test_random_requests():
    """ Draw a whole host of random requests to 
    ensure that the function works for all admissible 
    values.
    """    
    for _ in range(5):
        # Get expected returns.
        alpha, mean, sd = generate_random_request()
        get_expected_utility(alpha, mean, sd)


def test_results():
    """ Test some previous knowledge about the results.
    """
    for _ in range(5):
        # Get expected returns.
        alpha, mean, sd = generate_random_request()
        rslt = get_expected_utility(alpha, mean, sd)
        # Assertions
        assert rslt > 0   


In [7]:
def test_closed_form():
    """ Test the simulated result against the closed 
    form solution in the special case of linear utility.
    """
    for _ in range(1):
        _, mean, sd = generate_random_request()
        alpha = 1.0
        # Get expected returns using simulation.
        simulated = get_expected_utility(alpha, mean, sd)
        # Get expected returns using closed form.
        closed = np.exp(mean + (sd ** 2) * 0.5)
        # Assertions. Note the small number of decimal points
        # required. Given the precision of the Monte Carlo integration
        # this test if bound to fail sometimes.
        np.testing.assert_almost_equal(closed, simulated, decimal=3)

In principle, we can now run our tests one-by-one:

In [8]:
test_random_requests()

test_results()

test_closed_form()

KeyboardInterrupt: 

How about test automation? Let us now run our test battery in the terminal. See our repository [here](https://github.com/softEcon/talks/blob/master/ZICE/software_engineering/tests.py) for the script. We are using [*py.test*](http://pytest.org). As usual, several alternatives exist: (1) [*nose*](https://nose.readthedocs.org/en/latest/), (2) [unittest](https://docs.python.org/2/library/unittest.html#module-unittest).

In [None]:
%%bash
py.test tests.py --verbose 

How do we know how much of our code base we in fact cover with our testing efforts so far?

In [None]:
%%bash
py.test tests.py --cov=sandbox

## Automated Code Review

There are multiple dimensions to code quality:

* Correctness

* Maintainability

* Readability

* Scalabilty

There are several tools out there that I found useful in the past to improve the quality of my code along these dimensions (1) [*QuantifiedCode*](https://www.quantifiedcode.com/) and (2) [*Codacy*](https://www.codacy.com). Let us visit *QuantifiedCode* online and take a look around.

<img src='images/quantifiedcode.png' width=950 height=1000>

## Profiling

Using these tools helps you to improve your own programming skills over time as each of the issues is well explained and best practices provided. More lightweight solutions are also available: (1) [*Pylint*](http://www.pylint.org/), (2) [*Pyflakes*](https://github.com/pyflakes/pyflakes), and (3) [*pep8*](https://github.com/PyCQA/pep8).

In [None]:
%load_ext snakeviz

In [None]:
# standard library
import cProfile

import sandbox
from sandbox import get_expected_utility

# As a start, let us start with a table of profile data.
cProfile.run("get_expected_utility(1.0, 0.0, 1.0)")

<img src='images/snakeviz.png' width=950 height=1000>

Let us inspect our code.

In [None]:
cProfile.run("get_expected_utility(1.0, 0.0, 1.0)", "sandbox.prof")

In [None]:
%time rslt = get_expected_utility(1.0, 0.0, 1.0, version='slow')
print('')
%time rslt = get_expected_utility(1.0, 0.0, 1.0, version='fast')

cProfile.run("get_expected_utility(1.0, 0.0, 1.0, 'fast')", "sandbox_fast.prof")

## Continuous Integration Workflow

** Automation Steps**

* Version Control

* Testing and Coverage

* Code Review

* Build

* ...

using [Github](https://github.com) and its [integrations](https://github.com/integrations).

<img src='images/continous_integration.png' width=950 height=1000>

## Best Practices

* Set up a scalable workflow right from the beginning.

* Build your model in a hierarchical way.

* Develop a testing harness as you impelement and refine your code.

* Intiaially focus on the reability and then tackle performance issues.


## Next Steps

* Check out more detailed lectures at the *Software Engineering for Economists Initiative* [online](https://github.com/softEcon).
    * Contribute a lecture on a special topic of your choice. See [here](http://nbviewer.ipython.org/github/softEcon/specials/blob/master/toolkit_for_advanced_optimization/lecture.ipynb) for an example.
    
* Explore the material on [*Software Carpentry*](http://software-carpentry.org).
    
* Sign up for personal [*GitHub*](https://github.com/) account and create a remote repository.

* If you are using a Windows machine, download and install [*Ubuntu Desktop*](http://www.ubuntu.com/desktop).

* Set aside a day or two to establish an continuous integration workflow for your current research project.

<style>
li { margin: 1em 3em; padding: 0.2em; }
</style>

<h2>Contact</h2>

<br><br>
<b>Philipp Eisenhauer</b>
<ul>
  <li> Mail  <a href="mailto:eisenhauer@policy-lab.org">eisenhauer@policy-lab.org</a></li><br>
   <li>Web  <a href="http://www.policy-lab.org/peisenha">http://www.policy-lab.org/peisenha</a></li><br>
  <li>Repository  <a href="https://github.com/peisenha">https://github.com/peisenha</a></li>
</ul>

<br><br>
<b>Software Engineering for Economists Initiative</b>
<ul>
  <li>Repository  <a href="http://softecon.github.io">http://softecon.github.io</a></li><br>
</ul>


In [11]:
import urllib; from IPython.core.display import HTML
HTML(urllib.urlopen('http://bit.ly/1K5apRH').read())