# Data wrangling

### What we'll cover

* Loading data
* Transforming it
* Storing it

### How we'll do it

We'll be working with a dataset, **TODO**. We'll use it to answer some real-life questions, with examples before each step. The goal is to have you write code you can use as a reference later on, so we'll try to explain what's happening.

## Loading data

### Loading from CSVs

The `agate` library (formerly `csvkit` and `journalism`), by ex-Tribune developer Chris Groskopf, was built with journalists handling CSVs in mind ([here's the documentation](https://agate.readthedocs.org/)). The fundamental unit in `agate` is the table, and there's a one-step method of creating a table from a CSV file:

    data = agate.Table.from_csv('data.csv')

#### Your turn

Load the data located in the **TODO** file into an `agate` table and play around with it. How do you access a row's data? How would you answer a question like "sum up all the values of a column in this spreadsheet?"

In [1]:
import agate

It's worth mentioning that there's another way to load data from a CSV, and it may be more general-purpose: `csv.DictReader`. When you load a CSV file with `DictReader`, you'll get a list of dictionaries, one per line, with the header row (assuming there is one) converted into the dictionary's keys.

    from csv import DictReader
    
    with open('data.csv') as fh:
        reader = DictReader(fh)
        for row in reader:
            # row is now a dict
            # row[column_name] = column_value

### Loading from an API

### Loading from JSON

The `json` module includes two main methods: `loads` to load JSON data, and `dumps` to create JSON from python objects. If you haven't worked with JSON before, it's a very convenient way of passing around data objects using pure text. 

`loads` just takes a single string of JSON-formatted data as an argument, and returns a python object.

    >>> data = json.loads('{"foo":"bar"}')
    >>> print data['foo']
    bar

#### Your turn

Load the data located in the **TODO** file into a data object.

In [2]:
import json

## Transforming data

### Summing up a column of data

### Filtering rows of data

### Sorting rows of data

### String cleaning

### Geocoding addresses

### Comparing dates and date strings 

## Storing data

### Saving data as a CSV

### Saving data as JSON

### Saving data to S3