# An introduction to Agate

Agate is a Python library that makes analyzing data repeatable and transparent. Steps to get an answer are explicit and documented, and easily sharable in Github or as Markdown files exported from Jupyter Notebook. 

The first step is always import libraries needed. In this case, it's Agate

In [2]:
import agate

Our next step is to get some data to work with. I've created a file that has the graduation rates, in-state and out-of-state costs for each university in the Big 10 athletic conference. It's a simple file, with 14 records, that we can demonstrate functions and processes that data journalists would normally use a spreadshseet for. So let's first make an Agate table from that CSV file. It's easy. 

In [4]:
big10 = agate.Table.from_csv('../../../Data/colleges.csv')

To see the table structure that we just imported, we simply put the table name inside a print function.

In [5]:
print(big10)

|-----------------------------------------------------------------------------+---------------|
|  column_names                                                               | column_types  |
|-----------------------------------------------------------------------------+---------------|
|  UnitID                                                                     | Number        |
|  Institution Name                                                           | Text          |
|  Total price for in-state students living on campus 2012-13 (DRVIC2012)     | Number        |
|  Total price for out-of-state students living on campus 2012-13 (DRVIC2012) | Number        |
|  Graduation rate  total cohort (DRVGR2012)                                  | Number        |
|-----------------------------------------------------------------------------+---------------|



If we want to see the table itself, we use `print_table()` like this: 

In [6]:
big10.print_table()

|----------+--------------------------------------------+------------------------------------------------------------------------+----------------------------------------------------------------------------+--------------------------------------------|
|   UnitID | Institution Name                           | Total price for in-state students living on campus 2012-13 (DRVIC2012) | Total price for out-of-state students living on campus 2012-13 (DRVIC2012) | Graduation rate  total cohort (DRVGR2012)  |
|----------+--------------------------------------------+------------------------------------------------------------------------+----------------------------------------------------------------------------+--------------------------------------------|
|  151,351 | Indiana University-Bloomington             |                                                                 23,116 |                                                                     44,566 |                                  

But that's rarely useful. We want to see it in some kind of order. Let's sort the table by graduation rate and see who is doing well and who isn't. 

In [7]:
big10.order_by('Graduation rate  total cohort (DRVGR2012)', reverse=True).print_table()

|----------+--------------------------------------------+------------------------------------------------------------------------+----------------------------------------------------------------------------+--------------------------------------------|
|   UnitID | Institution Name                           | Total price for in-state students living on campus 2012-13 (DRVIC2012) | Total price for out-of-state students living on campus 2012-13 (DRVIC2012) | Graduation rate  total cohort (DRVGR2012)  |
|----------+--------------------------------------------+------------------------------------------------------------------------+----------------------------------------------------------------------------+--------------------------------------------|
|  147,767 | Northwestern University                    |                                                                 60,840 |                                                                     60,840 |                                  

But as great as that is, it's still hard to read. Agate has a neat feature called `print_bars` where you can make a little chart right as you go. 

In [6]:
big10.order_by('Graduation rate  total cohort (DRVGR2012)', reverse=True).print_bars('Institution Name', 'Graduation rate  total cohort (DRVGR2012)', width=115)

Institution Name                           Graduation rate  total cohort (DRVGR2012)
Northwestern University                                                           93 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░  
University of Michigan-Ann Arbor                                                  91 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░   
Pennsylvania State University-Main Campus                                         86 ▓░░░░░░░░░░░░░░░░░░░░░░░░░    
University of Illinois at Urbana-Champaign                                        84 ▓░░░░░░░░░░░░░░░░░░░░░░░░     
Ohio State University-Main Campus                                                 82 ▓░░░░░░░░░░░░░░░░░░░░░░░░     
University of Maryland-College Park                                               82 ▓░░░░░░░░░░░░░░░░░░░░░░░░     
University of Wisconsin-Madison                                                   82 ▓░░░░░░░░░░░░░░░░░░░░░░░░     
Michigan State University                                                         79 ▓░░░░░░░░░░░░░░░░░

Okay, so the Harvard of the Plains doesn't have such a hot graduation rate. But what else can we do? What about the average tuition? 

In [8]:
average_tuition = big10.aggregate(agate.Mean('Total price for in-state students living on campus 2012-13 (DRVIC2012)'))

In [9]:
print(average_tuition)

27652.85714285714285714285714


The median tuition is just as easy -- you change the variable name and the type of aggregate and boom, done. 

In [9]:
median_tuition = big10.aggregate(agate.Median('Total price for in-state students living on campus 2012-13 (DRVIC2012)'))

In [10]:
print(median_tuition)

24473.5
