# Percent change computation walkthrough
In this walkthrough, we're going to learn about computing new columns from existing columns. The ideal test of this is percent change. Percent change is a very common computation in data journalism, so knowing how to do it in Agate is important. As always, you start by importing Agate.

In [1]:
import agate

Now get some data. We'll be using [some old UCR data](https://www.dropbox.com/s/b22egk8gdsoyc9e/ucrdata.csv?dl=0) from the FBI that we've used before to demonstrate Excel.

In [3]:
crimes = agate.Table.from_csv('ucrdata.csv')

In [4]:
print(crimes)

|-------------------------+---------------|
|  column_names           | column_types  |
|-------------------------+---------------|
|  CleanState             | Text          |
|  APStyle                | Text          |
|  CleanName              | Text          |
|  2012Population         | Number        |
|  2012ViolentCrime       | Number        |
|  2012Murder             | Number        |
|  2012ForcibleRape       | Number        |
|  2012Robbery            | Number        |
|  2012AggravatedAssault  | Number        |
|  2012PropertyCrime      | Number        |
|  2012Burglary           | Number        |
|  2012LarcenyTheft       | Number        |
|  2012MotorVehicleTheft  | Number        |
|  2012Arson              | Number        |
|  2011Population         | Number        |
|  2011ViolentCrime       | Number        |
|  2011Murder             | Number        |
|  2011Forciblerape       | Number        |
|  2011Robbery            | Number        |
|  2011AggravatedAssault  | Numb

So the code for calculating a percent change is really quite easy. It's about the same as calculating a median or a mean. Instead of an aggregate, which works on the whole table column wise, we use compute, which works on the table row wise. Aggregate = single column. Compute = single row. Got it?

In [5]:
change1112 = crimes.compute([
    ('vc_change_1112', agate.PercentChange('2012ViolentCrime', '2011ViolentCrime')),
    ('pc_change_1112', agate.PercentChange('2012PropertyCrime', '2011PropertyCrime'))
])

TypeError: unsupported operand type(s) for -: 'NoneType' and 'NoneType'

Sad trombone. Unsupported operand type errors means you just tried to do something with a thing that doesn't do that. What I mean is you just tried to subtract two nouns, or capitalize a number. In this case, we tried to subtract two null objects. Null is nothing. You cannot subtract nothing from nothing. So we're going to have to filter those out. We've done this before. You have code you can copy and reuse.

In [5]:
changes = crimes.where(lambda row: row['2012ViolentCrime'] != None)

Now that we've done that, we can calculate the change in reported violent crime and the reported property crime for each city in our table.

In [6]:
change1112 = changes.compute([
    ('vc_change_1112', agate.PercentChange('2011ViolentCrime', '2012ViolentCrime')),
    ('pc_change_1112', agate.PercentChange('2011PropertyCrime', '2012PropertyCrime')),
])        

And let's see what that looks like. 

In [7]:
change1112.print_table(max_rows=10)

|---------------+---------+--------------+----------------+------------------+------------+------------------+-------------+-----------------------+-------------------+--------------+------------------+-----------------------+-----------+----------------+------------------+------------+------------------+-------------+-----------------------+-------------------+--------------+------------------+-----------------------+----------------+-----------------------+------------+------------------+-------------+-----------------------+------------------------+--------------+------------------+-----------------------+----------------+-----------------------+------------+----------+-------------+-----------------------+------------------------+--------------+------------------+-----------------------+--------------------------------+--------------------------------|
|  CleanState   | APStyle | CleanName    | 2012Population | 2012ViolentCrime | 2012Murder | 2012ForcibleRape | 2012Robbery | 2012Ag

Oy. That's ugly. There's a handy little trick called select where we can only select the fields from the table we need to go on. In this case, we need a city, a state and the changes in violent crime and property crime. So we're going to create a new table only for the purposes of printing it out. 

In [8]:
for_printing = change1112.select(['CleanName', 'APStyle', 'vc_change_1112', 'pc_change_1112'])

In [9]:
sorted_cities = for_printing.order_by('vc_change_1112', reverse=True)

In [10]:
sorted_cities.print_table(max_rows=10)

|------------------+---------+-------------------------------+--------------------------------|
|  CleanName       | APStyle |                vc_change_1112 |                pc_change_1112  |
|------------------+---------+-------------------------------+--------------------------------|
|  North Las Vegas | Nev.    | 117.0294494238156209987195903 | 91.02519848118743527787366241  |
|  Fullerton       | Calif.  |  47.7124183006535947712418301 | 10.90140845070422535211267606  |
|  Odessa          | Texas   |  47.4598930481283422459893048 | 18.60242501595405232929164008  |
|  Sioux Falls     | S.D.    |  43.4090909090909090909090909 |  7.09581474399830040365413214  |
|  Providence      | R.I.    |  38.6780905752753977968176255 | 30.79193310378750614854894245  |
|  Green Bay       | Wisc.   |  37.8016085790884718498659517 | 23.05785123966942148760330579  |
|  Antioch         | Calif.  |  30.5623471882640586797066015 | 22.82468370772011360702297960  |
|  Santa Clarita   | Calif.  |  30.53435

Much better. Much much better. But, what's wrong with this? It's the percent change in the number, NOT the rate. So in order to calculate the rates, we have to create a formula that does that. To create a formula, you'll use something called Formula, but it will call a function that you'll create. This will seem like a lot, but you'll see pretty quickly that it's pretty straight forward.

In [19]:
def pcvc_rate_12(row):
    rate = (row['2012ViolentCrime']/row['2012Population'])*100000
    return rate

def pcpc_rate_12(row):
    rate = (row['2012PropertyCrime']/row['2012Population'])*100000
    return rate

def pcvc_rate_11(row):
    rate = (row['2011ViolentCrime']/row['2011Population'])*100000
    return rate

def pcpc_rate_11(row):
    rate = (row['2011PropertyCrime']/row['2011Population'])*100000
    return rate

ratechange1112 = changes.compute([
    ('vc_rate_11', agate.Formula(agate.Number(), pcvc_rate_11)),
    ('vc_rate_12', agate.Formula(agate.Number(), pcvc_rate_12)),
    ('pc_rate_11', agate.Formula(agate.Number(), pcpc_rate_11)),
    ('pc_rate_12', agate.Formula(agate.Number(), pcpc_rate_12)),
])

Let's unpack one of the functions. You start any function by defining it -- `def` -- and giving it a name. I've called mine pcvc_rate for per capital violent crime rate and then the year. If your function takes an input -- there's information sent in with it -- then you have to tell it in the parenthesis after the name of your function. In this case, we're just giving the whole row of data to my function so we can use it. 

Now, inside the function we create something called rate and set it equal to a calculation. This calculation is almost EXACTLY like the Excel formulas you learned, with just a different way of referencing cells. So it says first divide row['2012ViolentCrime'] by the row['2012Population']. See how that's just the row that was passed in (called row) and then we reference WHICH FIELD WE WANT with ['2012Population'] or whatever we need as specified in our print statement earlier. Then, after we've divided those two numbers, we multiply it by 100,000 -- per capita. Then, on the next line, we return the rate. Every function has to return something. 

All the other functions are just the same thing with small adjustments for year or property crimes. 

Then, to get those rates, we create a new table called ratechange1112 and, just like our percent change calculations before, we compute new fields using our formulas. The agate.Formula bits go like this: agate.Formula(WHAT WILL THIS RETURN, WHICH FUNCTION ARE YOU USING). So in the agate.Formula parens, we tell it we're going to return an agate.Number(), and which function we are using. Simple.

Then, after we've created that new table, we can do our percent change calculations on the rates.

In [20]:
pctratechange1112 = ratechange1112.compute([
    ('vcrate_change_1112', agate.PercentChange('vc_rate_11', 'vc_rate_12')),
    ('pcrate_change_1112', agate.PercentChange('pc_rate_11', 'pc_rate_12'))
])

In [21]:
for_rate_printing = pctratechange1112.select(['CleanName', 'APStyle', 'vcrate_change_1112', 'pcrate_change_1112'])

In [22]:
sorted_rate_cities = for_rate_printing.order_by('vcrate_change_1112', reverse=True)

In [23]:
sorted_rate_cities.print_table(max_rows=10)

|------------------+---------+-------------------------------+--------------------------------|
|  CleanName       | APStyle |            vcrate_change_1112 |            pcrate_change_1112  |
|------------------+---------+-------------------------------+--------------------------------|
|  North Las Vegas | Nev.    | 114.0031423601369171202513888 | 88.36150049439796904889932853  |
|  Fullerton       | Calif.  |  45.8934180969584275393977846 |  9.53571633840455611499338015  |
|  Odessa          | Texas   |  45.1946723241198478101750049 | 16.78050133548510987444306295  |
|  Sioux Falls     | S.D.    |  41.0599037599302827841418594 |  5.34147608854323396234492103  |
|  Providence      | R.I.    |  38.6375510001080715879520038 | 30.75369887816941690781787048  |
|  Green Bay       | Wisc.   |  35.7621239875616156959793610 | 21.23657648056043777968911657  |
|  Santa Clarita   | Calif.  |  29.9117117279790786682683823 |  6.26638856122483880736218020  |
|  Milwaukee       | Wisc.   |  29.56126

But that still looks uglier than heck to me. How about we clean up those changes and limit the number of decimal places they can print. To do that, we'll have to do a little more than you might think. For technical reasons I won't get into, you can't use Python's round functions. You must use a function inside Decimal called quantize. It works almost exactly like our rate function earlier. You'll create a function, pass in the row, and just attach `.quantize` to the row[''] bits you need. This example is almost straight out of the documentation.

In [24]:
from decimal import Decimal

def round_vcchange(row):
    return row['vcrate_change_1112'].quantize(Decimal('0.1'))

def round_pcchange(row):
    return row['pcrate_change_1112'].quantize(Decimal('0.1'))

rounded_change = sorted_rate_cities.compute([
    ('vc_rounded', agate.Formula(agate.Number(), round_vcchange)),
    ('pc_rounded', agate.Formula(agate.Number(), round_pcchange)),
])

In [25]:
for_rate_printing = rounded_change.select(['CleanName', 'APStyle', 'vc_rounded', 'pc_rounded'])

In [26]:
for_rate_printing.print_table()

|-------------------------+---------+------------+-------------|
|  CleanName              | APStyle | vc_rounded | pc_rounded  |
|-------------------------+---------+------------+-------------|
|  North Las Vegas        | Nev.    |      114.0 |       88.4  |
|  Fullerton              | Calif.  |       45.9 |        9.5  |
|  Odessa                 | Texas   |       45.2 |       16.8  |
|  Sioux Falls            | S.D.    |       41.1 |        5.3  |
|  Providence             | R.I.    |       38.6 |       30.8  |
|  Green Bay              | Wisc.   |       35.8 |       21.2  |
|  Santa Clarita          | Calif.  |       29.9 |        6.3  |
|  Milwaukee              | Wisc.   |       29.6 |        0.1  |
|  Antioch                | Calif.  |       28.8 |       21.1  |
|  Burbank                | Calif.  |       26.6 |       -2.9  |
|  Escondido              | Calif.  |       25.3 |       19.6  |
|  Carlsbad               | Calif.  |       24.7 |        5.8  |
|  Santa Clara           

# Assignment
This is an assignment I've given to 202 and 302 students with Excel. If you've taken a class with me, or been in 302 when I was there, you've done this before in Excel. Now you're going to do it with Agate.

1. Download [this dataset of population estimates](https://www.dropbox.com/s/p5isgdfgpam7w13/population.csv?dl=0) from the US Census Bureau. 
2. Calculate the percent change in population for every county in the US from 2010 to 2014. 
3. Round that change off to a single decimal point. 
4. Sort it fastest growing to fastest shrinking. Print it to the screen but limit it to 50.

After you've done that, submit it to Blackboard and enjoy this NYTimes story about [the end of the North Dakota oil boom](http://www.nytimes.com/2016/02/08/us/built-up-by-oil-boom-north-dakota-now-has-an-emptier-feeling.html).