In [0]:
pylab inline

In [0]:
import os
import json
import rasterio
import shapely.geometry
from affine import Affine

RASTER_FILE = os.path.join(
    os.path.expanduser('~'), 'bh', 'data', 'satellite',
    'SVDNB_npp_20140201-20140228_75N180W_vcmcfg_'
    'v10_c201507201052.avg_rade9.tif'
)
COUNTIES_GEOJSON_FILE = os.path.join(
    os.path.expanduser('~'), 'bh', 'data',
    'us_counties_5m.json'
)
STATES_TEXT_FILE = os.path.join(
    os.path.expanduser('~'), 'bh', 'data',
    'state.txt'
)


The geojson file for us counties is just regular json file. It happens to have
a non-`utf8` encoding (`latin-1`), so to read it we need to specify the encoding
in our call to `json.load`.

We obtained our geojson from [Eric Celeste](http://eric.clst.org/Stuff/USGeoJSON)'s
awesome website. We use the `5m` counties file, which is geojson for U.S. counties
at a resoution of 5 million inches. You can find more information about geojson
and shapefiles
[here](http://chimera.labs.oreilly.com/books/1230000000345/ch12.html#_choose_a_resolution).


In the geojson, the `STATE` property attached to each county is a number rather than a
name. So to convert to readable states, we need to be able to translate these integer codes
to states. It's actually pretty hard to find a reference on what this mapping is, but it turns
out this [census.gov](http://www2.census.gov/geo/docs/reference/state.txt) resource has
everything we need, in a pipe-separated file we can read using `pandas`.


In [0]:
with open(COUNTIES_GEOJSON_FILE, 'r') as f:
    counties_raw_geojson = json.load(f, 'latin-1')

states_df = pd.read_csv(STATES_TEXT_FILE, sep='|').set_index('STATE')
# here we're selecting just the state names column from the indexed DataFrame, so that
# later in the code we can use it as if it were a dict
states = states_df['STATE_NAME']


This top-level geojson object is a `dict` with two keys:
* `type`, which specifies that this is a `FeatureCollection`, and
* `features`, which is a json array of geojson objects for each county.

Since we want to be able to look up counties by name, we'll rearrange this
into a dictionary with county names as keys and the geojson objects for each
county as values, by looking up the `properties.NAME` and `properties.STATE`
key in each county's geojson object (it is important to use the state as well as
name because several states have counties with the same name).

Note that since there are unicode characters in some counties' names, we
have to make sure there's a `u` in front of the formatting string we use,
otherwise we get an ascii error.


In [0]:
def get_county_name_from_geo_obj(geo_obj):
    """
    Use the NAME and STATE properties of a county's geojson
    object to get a name "state: county" for that county.
    """
    return u'{state}: {county}'.format(
        state=states[int(geo_obj['properties']['STATE'])],
        county=geo_obj['properties']['NAME']
    )

print states.index
counties_geojson = {
    get_county_name_from_geo_obj(county_geojson): county_geojson
    for county_geojson in counties_raw_geojson['features']
}

print len(counties_geojson)
print sorted(counties_geojson.keys())[:10]


These geojson objects contain a `properties` subobject with metadata
about each county, and a `geometry` object describing the shape of each
county. Python's `shapely` package has tools for working with these.

If you hand the `geometry` dictionary to `shapely.geometry.shape`, you get back
a `MultiPolygon` object that knows how to describe itself. As an extra bonus, it
knows how to display itself in an ipython notebook. Let's take a look at
Manhattan (which is New York county of New York state):


In [0]:
ny_shape = shapely.geometry.shape(counties_geojson['New York: New York']['geometry'])
print '%r' % ny_shape
ny_shape


Now we're ready to combine our county geojson with raster data. We'll use the
`rasterio` package to work with the `.tif` files we pulled from the noaa.gov website.


In [0]:
raster_file = rasterio.open(RASTER_FILE, 'r')


A rasterio file lets you index raster image data as an array of pixels. The
data files are pretty large, but they allow indexing into small slices. Furthermore,
raster files include metadata indicating an affine mapping between pixels and
latitude / longitude coordinates.

Let's try doing this for New York county. We can get the longitude and latitude
bounds of the smallest rectangle containing the whole county by using the
`bounds` property of the `shapely.geometry.MultiPolygon` instance:


In [0]:
ny_shape.bounds


Next, we can use the `index` method of the rasterio file, which accepts
`(latitude, longitude)` coordinates and returns `(row, col)` indices for
the corresponding pixels. This tells us which part of the raster file we
want to slice into:


In [0]:
left, bottom = raster_file.index(*ny_shape.bounds[0:2])
right, top = raster_file.index(*ny_shape.bounds[2:4])
raster_window = ((left, right), (top, bottom))
raster_window


Note that we xused `(top, bottom)` rather than `(bottom, top)`. The
reason, which we will see in more detail later, is that this raster
file orders pixels in reverse order of lattitude.

Now we're ready to load the pixels themselves. To do this we just
pass the window to the `read` method of the raster file, which returns
a numpy float-32 array of pixel intensities. We specify `indexes=1` so
that we'll get a 2d array rather than a 2d array with size 1 - this is
related to more advanced features of the rasterio api that we won't be
using here.


In [0]:
raster_array = raster_file.read(indexes=1, window=raster_window)
raster_array
(raster_array < 0).mean()
raster_array.shape


We can plot the array using an image plot to see what the light
distribution looks like. Remember that because this is a bounding box
and manhattan isn't a rectangle, our plot will include light originating from
outside of New York County.


In [0]:
from matplotlib import pyplot as plt
plt.imshow(raster_array)
plt.show()


This is pretty cool, but how can we isolate manhattan?

To do so, we'll need to use some more functionality from `shapely`, but we'll
also need to adjust the rasterio file's affine mapping to properly map into
the slice of data we are using for our bounding box.

To understand how to do this, we need to think about how rasterio represents
the mapping between latitude and longitude vs pixels. The `affine` property
of the raster file is an `Affine` object:
In 


In [0]:
raster_file.affine


An
```
Affine(a, b, c,
       d, e, f)
```
instance represents a 2d
transformation of the form
\[
  \begin{pmat} x' \\ y' \\ 1 \end{pmat}
  =
  \begin{pmat}
    a & b & c \\
    d & e & f \\
    0 & 0 & 1 \\
  \end{pmat}
  \begin{pmat} x \\ y \\ 1 \end{pmat}.
\]

In the context of a rasterio file, the original coordinates
$x$ and $y$ represent rows and columns for the pixel array,
and the affine map $x'$ and $y'$ represent the latitude and
longitude.

In the case of our image, it is aligned perfectly with the
equator, which is why `b` and `d` are zero. The `a` and `c`
coordinates are giving the scalings for latitude and longitude
(the negative `c` means we index top to bottom), while
`c




In [0]:
rfa = raster_file.affine

ny_affine = Affine(
  rfa.a, rfa.b,
  rfa.c + left * rfa.a,
  raster.t.d, t.e, t.f+br[0]*t.e)
