### SynthPop Explanation
This notebook summarizes the way SynthPop works.

First some notes about the structure of synthpop. 
It has a few dependencies: 
- `census`, a python library that is a wrapper for the U.S. Census Bureau API
- `pandas` a python library used to create and work with dataframes (like tables)
- `numpy`
- `os`which allows you to set up a development environment (not actually entirely necessary)

SynthPop itself is actually a few separate scripts that each handle different aspects of the synthesizing process:  

If you are using ACS estimates via the Census API then you use the following sets of SynthPop tools 
- `synthesizer.py` which relies on a 'recipe' that is the output of:
    - `starter.py` which relies on data queried and formatted from the Census API by:
        - `census_helpers.py` which relies on the Census API python wrapper provided by: 
            - `census.py`
            
You can also use SynthPop with any other data source (i.e. not ACS data). To do this you use: 
- `zone_synthesizer.py`which a set of functions that accepts marginals and sample files from a CSV and produces a synthesized population. 

Here are slightly more detailed descriptions about each element of SynthPop: 

The Synth pop library `census_helpers.py` relies on `census` and is a set of funcitons to assist with downloading and processing census data for a given geography. It allows you to select geography and columns of interest and to download data at the block or tract level (or both).  

`synthesizer.py` uses a 'recipe' which is the output of the `starter.py` script.  

`starter.py` uses `census_helpers.py` to generate and return: 
    - household marginals
    - person marginals
    - household joint distribution
    - person joint distribution
    - tract to PUMA map (a disctionary showing the relationship between tracts and PUMAs)
It returns them as a 'recipe' i.e. not easily accessible as individual files. 

### Using SynthPop:
**Step 1:**
install SynthPop and dependencies:
1. Install Python (Anaconda recommended), and depdendencies: 
    `census`
    `numexpr`
    `numpy`
    `pandas`
    `scipy`
    `us`
(Everything except cenus and us are included with Anaconda.)

2. Download the source code from here on GitHub. Install SynthPop by running python setup.py install in the synthpop directory.

**Step 2:**  
1. Load in libraries

2. Set API key. (If you don't already have you can get one [here](https://api.census.gov/data/key_signup.html))

In [1]:
%load_ext autoreload
%autoreload 2
from synthpop.recipes.starter2 import Starter
from synthpop.synthesizer import synthesize_all, enable_logging 
import os
import pandas as pd
enable_logging()

# setting API Key
os.environ["CENSUS"] = "d95e144b39e17f929287714b0b8ba9768cecdc9f"

### Synthesis of a whole county with pre-set variables:
The following code takes a long time (multiple hours) to execute but will run the synthesize for block groups in a whole county. 

Note: the county name must be 'Name County' ie 'Kings County' otherwise it will not work. 

In [None]:
# TAKES FOREVER TO RUN PLEASE SKIP
def synthesize_counties(counties):
    for county in counties:
        starter = Starter(os.environ["CENSUS"], "NC", county)
        synthesize_all(starter)
%time hh = synthesize_counties(["Mecklenburg County"]) 

### Test synthesis for just one block

In [2]:
starter = Starter(os.environ["CENSUS"], "NC", "Mecklenburg County")

In [None]:
ind = pd.Series(["37", "119", "005706", "4"], index=["state", "county", "tract", "block group"])
synthesize_all(starter, indexes=[ind])