# Software Engineering for Economists

## Building Confidence in a Model

* ***Verification*** How accurately does the computaitonal solve the underlying equations of the model for the quantities of interest?

* ***Validation*** How accurately does the model represent the reality for the quantities of interest?

* ***Uncertainty Quantification*** How does the varouous sources of error and uncertainty feed into uncertainty in the model-based prediction of the quantities of interest?

<img src='images/conceptual_model.png' width=750 height=750>

### Developing, Estimating, and Validating Dynamic Microeconometric Models

<br>
<center><a href="http://www.youtube.com/watch?feature=player_embedded&v=0hazaPBAYWE
" target="_blank"><img src="http://img.youtube.com/vi/0hazaPBAYWE/0.jpg" 
alt="IMAGE ALT TEXT HERE" width="500" height="400" border="10" /></a></center>


### Software Engineering facilitates Verification Step

* verifying correctness

* increase transparency

* free cognitive resources

<img src='images/venn_diagram.png' width=750 height=750>

## Research Example

### Ambiguity in Dynamic Models of Educational Choice

* Plausible

    * better description of agent decision problem
    
* Meaningful

    * reinterpretation of economic phenomenon
    * reevaluation of policy interventions

* Tractable

### Computational Challenges and Software Engineering

* ensure correctness of implementation

* manage numerous numerical components

* address performance constraints

* guarantee recomputability and extensibility

* ...

$\Rightarrow$ building, documenting, and communicating expertise



## Example

Throughout this lecture we will work with a simple example. We are intersted in building a code to calculate an agent's expected utility $EU$ from a lottery. The generic problem is

$$E U= \int^{-\infty}_{\infty} u(x) f(x)dx,$$

where $x$ is a possible outome, $f(x)$ denotes the probability density function, and $u(x)$ the utility functions.

We start with a simple utiliy function

$$ u(x) = x^{0.3},$$

where $x > 0$ and $x$ is drawn from a standard lognormal distribution with scale parameter $s$. To solve the integral, we opt for a naive Monte Carlo integration. See [Skrainka & Judd (2013)](http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1870703) for the importance of choosing the right integration strategy.

# Prototyping

In [None]:
from scipy.stats import lognorm

deviates = lognorm.rvs(1, size=1000)

Rslt = 0.0
for deviate in deviates:
    Rslt += deviate ** 0.3

print('Expected Utility {}'.format(Rslt/10000))

### Problems

* Hard-coded problem parameters are spread out across the problem.

* No naming convention.

* There is no obvious way the code scales to solve a more generic problem.

* The implementation is not tested.

* We have not spend any time on performance considerations.

### What would a more mature implementation look like?

$\quad$
```shell

run.py

.. /testing
   ... tests.py
   ... checks.py

eu_calculation.py
integration_rules.py
utility_functions.py

auxiliary.py
```

### Main Improvements

* Clear and extensible structure

* Testing infrastucture available

* Performance considerations adressed

* Alternative integration strategies available

## Goal of the Lectures

Provide some guidance on how to get from a protoype to a more mature implementation of your research project in a systematic way.

### Dimensions of Code Quality

* Correctness

* Maintainability

* Readability

* Scalabilty

###  Basic Tools

<style>
table,td,tr,th {border:none!important}
</style>
<table style="width:90%">
  <tbody>
    <tr height="40">
      <td style="vertical-align:top; padding-left:30px;">
          <li>Version Control</li>
      </td>
      <td style="vertical-align:top; padding-left:30px;">
          <li>Code Review</li>
      </td>
    </tr>
    <tr height="40">
      <td style="vertical-align:top; padding-left:30px;">
          <li>Testing</li>
      </td>
      <td style="vertical-align:top; padding-left:30px;">
          <li>Profiling </li>
      </td>
    </tr>
    <tr height="40">
      <td style="vertical-align:top; padding-left:30px;">
          <li>Continous Integration</li>
    </tr>
  </tbody>
</table> 

## Version Control

Version Control records changes to a file or set of files over time so that you can recall (and compare) specific versions later. I draw heavily on the free e-book [*Pro Git*](http://www.git-scm.com/book/en/v2) and [Blischak & al. (2016)](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004668).

Tracking changes to your code over time has a variety of benefits:

* Tractability

* Clarity 

* Flexibility

* Reduced duplication and error 

$\Rightarrow$ transparent, reproducible, and collaborative research

In our quick tour, we will do the following:

* set up local repository

* commit files

* track changes to files

* compare versions of files 

Let us initialize a repository ...

```shell
$ git init
Initialized empty Git repository in material/.git/
```

... and provide some information about ourselves.

```shell
$ git config user.name "Philipp Eisenhauer"
$ git config user.email "eisenhauer@policy-lab.org"
```

First of all, let us take stock and check the status of our local repository:

```shell
$ git status
On branch master

Initial commit

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	__init__.py
	auxiliary.py
	eu_calculations.py
	integration_rules.py
	run.py
	testing/
	utility_functions.py
    
```

<img src='images/lifecycle.png' width=1000 height=1000>

We now set up tracking of all our files and revisit the status:

```shell
$ git add *
$ git status
On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	new file:   __init__.py
	new file:   auxiliary.py
	new file:   eu_calculations.py
	new file:   integration_rules.py
	new file:   run.py
	new file:   testing/__init__.py
	new file:   testing/checks.py
	new file:   testing/tests.py
	new file:   utility_functions.py
```

We are now ready for our initial commit.

```shell
$ git commit -a -m'Initial commit of directory content.'
 9 files changed, 443 insertions(+)
 create mode 100644 __init__.py
 create mode 100644 auxiliary.py
 create mode 100644 eu_calculations.py
 create mode 100644 integration_rules.py
 create mode 100644 run.py
 create mode 100644 testing/__init__.py
 create mode 100644 testing/checks.py
 create mode 100644 testing/tests.py
 create mode 100644 utility_functions.py
 
```

By adding a message to our commit, it is much easier to return to it later.
    
```shell
$ git log
commit 49593b8c447a9f023a35ff724a3467db2fe10037
Author: Philipp Eisenhauer <eisenhauer@policy-lab.org>
Date:   Mon Feb 1 21:11:58 2016 +0100

    Initial commit of directory content.

```  


This brings us back to the beginning of our repository lifecycle.

```shell
$ git status
On branch master
nothing to commit, working directory clean
```   

Let us modify the setup of the random seed in *integration_rules.py* using our favourite editor and check for the current status of our files.

```shell
$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   integration_rules.py

no changes added to commit (use "git add" and/or "git commit -a")
```   



What exactly is the difference?

```shell
$ git diff
diff --git a/integration_rules.py b/integration_rules.py
index 708080c..d54aa06 100644
--- a/integration_rules.py
+++ b/integration_rules.py
@@ -19,7 +19,8 @@ def naive_monte_carlo(func, bounds, num_draws, implementation, seed):
     lower, upper = bounds
 
     # Draw requested number of deviates.
-    np.random.seed(seed)
+
+    np.random.seed(123)
     deviates = np.random.uniform(lower, upper, size=num_draws)
 
     # Implement native Monte Carlo approach.
```  



In case we want to discard our changes now:

```shell
$ git checkout integration_rules.py
```  

In this case, we discard our changes immediately. However, we can always return to any version of our file ever commited.

```shell
$ git checkout IDENTIFIER integration_rules.py
```  

Once you are familiar with the basic workflow for versioning your own code. The natural next step is to share your code online. [*GitHub*](https://github.com) is a popular online hosting service. Among other benefits, you then have a backup copy of your files and can establish a tranparent workflow with your co-authors. 

Let us check out the repository for my current research on ambiguity [here](https://github.com/robustToolbox).


## Testing

### Types of Tests

* Unit Tests

* Integration Tests

* Regression Tests 

* ...


### Example Tests

In [None]:
def test_random_requests():
    """ Draw a whole host of random requests to ensure that the function
    works for all admissible values.
    """
    for _ in range(100):
        # Generate random request.
        alpha, shape, technique, int_options = generate_random_request()
        # Perform calculation.
        get_baseline_lognormal(alpha, shape, technique, int_options)

In [None]:
def test_naive_implementations():
    """ Test whether the results from the fast and slow implementation of the
    naive monte carlo integration are identical.
    """
    technique = 'naive_mc'
    for _ in range(10):
        # Generate random request.
        alpha, shape, _, int_options = generate_random_request()
        # Loop over alternative implementations.
        baseline = None
        for implementation in ['fast', 'slow']:
            int_options['naive_mc']['implementation'] = implementation
            rslt = get_baseline_lognormal(alpha, shape, technique, int_options)
            if baseline is None:
                baseline = rslt
        # Test equality.
        np.testing.assert_almost_equal(baseline, rslt)

In [None]:
def test_regression():
    """ This regression test ensures that the code does not change during
    refactoring without noticing.
    """
    # Set seed to avoid dependence of seed.
    np.random.seed(123)
    # Generate random request.
    alpha, shape, technique, int_options = generate_random_request()
    # Perform calculation.
    rslt = get_baseline_lognormal(alpha, shape, technique, int_options)
    # Ensure equivalence with expected results up to numerical precision.
    np.testing.assert_almost_equal(rslt, 0.21990743996551923)

### Key Challenges

* How do I come up with good tests?

* How can I make the process of testing the most convenient?K

In principle, we can now each of the tests one-by-one:

```shell
$ python tests/tests.py
```  


How about test automation? Let us now run our test battery in the terminal. We are using [*py.test*](http://pytest.org). As usual, several alternatives exist: (1) [*nose*](https://nose.readthedocs.org/en/latest/), (2) [unittest](https://docs.python.org/2/library/unittest.html#module-unittest).

```shell
$ py.test 
```  


How do we know how much of our code base we in fact cover with our testing efforts so far?

```shell
$ py.test --cov=example
```

## Automated Code Review


There are several tools out there that I found useful in the past to improve the quality of my code along these dimensions (1) [*QuantifiedCode*](https://www.quantifiedcode.com/) and (2) [*Codacy*](https://www.codacy.com). Let us visit *QuantifiedCode* online and take a look around.

<img src='images/quantifiedcode.png' width=950 height=1000>

## Profiling

Using these tools helps you to improve your own programming skills over time as each of the issues is well explained and best practices provided. More lightweight solutions are also available: (1) [*Pylint*](http://www.pylint.org/), (2) [*Pyflakes*](https://github.com/pyflakes/pyflakes), and (3) [*pep8*](https://github.com/PyCQA/pep8).

In [None]:
%load_ext snakeviz

In [None]:
# standard library
import cProfile

import sandbox
from sandbox import get_expected_utility

# As a start, let us start with a table of profile data.
cProfile.run("get_expected_utility(1.0, 0.0, 1.0)")

<img src='images/snakeviz.png' width=950 height=1000>

Let us inspect our code.

In [None]:
cProfile.run("get_expected_utility(1.0, 0.0, 1.0)", "sandbox.prof")

In [None]:
%time rslt = get_expected_utility(1.0, 0.0, 1.0, version='slow')
print('')
%time rslt = get_expected_utility(1.0, 0.0, 1.0, version='fast')

cProfile.run("get_expected_utility(1.0, 0.0, 1.0, 'fast')", "sandbox_fast.prof")

## Continuous Integration Workflow

** Automation Steps**

* Version Control

* Testing and Coverage

* Code Review

* Build

* ...

using [Github](https://github.com) and its [integrations](https://github.com/integrations).

<img src='images/continous_integration.png' width=950 height=1000>

## Best Practices

* Set up a scalable workflow right from the beginning.

* Build your model in a hierarchical way.

* Develop a testing harness as you impelement and refine your code.

* Intiaially focus on the reability and then tackle performance issues.


## Next Steps

* Check out more detailed lectures at the *Software Engineering for Economists Initiative* [online](https://github.com/softEcon).
    * Contribute a lecture on a special topic of your choice. See [here](http://nbviewer.ipython.org/github/softEcon/specials/blob/master/toolkit_for_advanced_optimization/lecture.ipynb) for an example.
    
* Explore the material on [*Software Carpentry*](http://software-carpentry.org).
    
* Sign up for personal [*GitHub*](https://github.com/) account and create a remote repository.

* If you are using a Windows machine, download and install [*Ubuntu Desktop*](http://www.ubuntu.com/desktop).

* Set aside a day or two to establish an continuous integration workflow for your current research project.

<style>
li { margin: 1em 3em; padding: 0.2em; }
</style>

<h2>Contact</h2>

<br><br>
<b>Philipp Eisenhauer</b>
<ul>
  <li> Mail  <a href="mailto:eisenhauer@policy-lab.org">eisenhauer@policy-lab.org</a></li><br>
   <li>Web  <a href="http://www.policy-lab.org/peisenha">http://www.policy-lab.org/peisenha</a></li><br>
  <li>Repository  <a href="https://github.com/peisenha">https://github.com/peisenha</a></li>
</ul>

<br><br>
<b>Software Engineering for Economists Initiative</b>
<ul>
  <li>Repository  <a href="http://softecon.github.io">http://softecon.github.io</a></li><br>
</ul>


In [None]:
import urllib; from IPython.core.display import HTML
HTML(urllib.urlopen('http://bit.ly/1K5apRH').read())