# Download

In this notebook, we download aerial images like this one which covers a small part of Phoenix, AZ.

![example_aerial_image](img/notebook/download_example.png)

First, we review the [source](#Source) for the images and how they are stored. Then, we [systematically determine](#Find-Images) which images are needed to encompass entire cities. Lastly, the required images are [downloaded](#Download-Images) locally.

## Source

We'll be working with aerial imagery from the National Agriculture Imagery Program (NAIP) available from an [Amazon Web Service (AWS) S3 bucket](https://registry.opendata.aws/usda-naip/)

First, install the AWS command line interface.

```shell
pip install awscli
```

You'll need an AWS account and to [configure the AWS CLI to work with your account](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html).

Then get the NAIP manifest.

```shell
aws s3api get-object --bucket aws-naip --key manifest.txt --request-payer requester data/naip/manifest.txt
```

Here's a preview of the manifest contents:

```shell
grep "^az/2015/1m/rgbir" data/naip/manifest.txt | head
```

returns

```shell
az/2015/1m/rgbir/31109/m_3110901_ne_12_1_20150621.tif
az/2015/1m/rgbir/31109/m_3110901_nw_12_1_20150621.tif
az/2015/1m/rgbir/31109/m_3110901_se_12_1_20150621.tif
az/2015/1m/rgbir/31109/m_3110901_sw_12_1_20150621.tif
az/2015/1m/rgbir/31109/m_3110902_ne_12_1_20150621.tif
az/2015/1m/rgbir/31109/m_3110902_nw_12_1_20150621.tif
az/2015/1m/rgbir/31109/m_3110902_se_12_1_20150621.tif
az/2015/1m/rgbir/31109/m_3110902_sw_12_1_20150621.tif
az/2015/1m/rgbir/31109/m_3110903_ne_12_1_20150621.tif
az/2015/1m/rgbir/31109/m_3110903_nw_12_1_20150621.tif
```

Notice the naming convention as detailed [here](https://docs.opendata.aws/aws-naip/readme.html). For this project, we'll primarily be using RGB images at 1m resolution (i.e. 1 meter-per-pixel).

One could download one of these files with a command like:

```shell
aws s3api get-object --bucket aws-naip --key az/2015/1m/rgb/34114/m_3411438_ne_11_1_20150605.tif --request-payer requester data/naip/m_3411438_ne_11_1_20150605.tif
```

In the rest of this notebook, we programmatically download all the image files for a select group of cities in Arizona.

## Find Images

In [1]:
from geopy.geocoders import GoogleV3
from shapely.geometry import box
from math import ceil

import subprocess
import pprint
import json
import os

import numpy as np

Since the filenames in the manifest are based on geographic latitude/longitude coordinates, we need a process that figures out which files correspond to a given city. The basic idea is to take the viewport recommended by Google Maps (i.e. a bounding box in latitude/longitude coordinates) and download all images within that viewport. 

We'll use the Google geocoder available via `geopy` to convert city names to viewport coordinates. The functions `make_grid` and `get_quad_ids` defined below are then used to get a list of all quadrangles that overlap with the viewport. 

In [2]:
def make_grid(lat, lon):
    '''
    Create a 1x1 degree grid divided into 64 7.5x7.5 minute USGS quadrangle cells.
    The grid is in dictionary format with keys being the USGS quadrangle IDs and
    values defining the NW and SE latitude, longitude coordinates e.g.
        grid['3311064'] = {'northeast': {'lat': 33.125, 'lng': -110}, 
                           'southwest': {'lat': 33, 'lng': -110.125}}
    
    Input: A latitude, longitude pair. 
    
    Returns the 1x1 degree block in dictionary format that contains the input coordinate pair. 
    '''
    nw_lat = int(lat) + 1
    nw_lon = int(lon) - 1
    
    ## Format quadrangle ID
    lat_str = str(int(lat))
    if abs(lon) < 100:
        lon_str = '0' + str(abs(int(lon)))
    else:
        lon_str = str(abs(int(lon)))
    
    grid = {}
    step = 0.125
    grid_size = 8
    for i in range(grid_size):
        for j in range(grid_size):
            ne = (nw_lat - i*step , nw_lon + (j+1)*step)
            sw = (ne[0] - step, ne[1] - step)
            
            ## Continue format quadrangle ID
            ndx = 1 + np.ravel_multi_index((i , j), (grid_size, grid_size))
            if ndx < 10:
                ndx_str = '0' + str(ndx)
            else:
                ndx_str = str(ndx)
            
            quad_id = lat_str + lon_str + ndx_str
            grid[quad_id] = {'northeast': {'lat': ne[0], 'lng': ne[1]}, 
                             'southwest': {'lat': sw[0], 'lng': sw[1]}}
    return grid

In [3]:
def get_quad_ids(viewport):
    '''
    Create a list of USGS quadrangles that encompass the input viewport
    
    viewport: A dictionary giving the NE and SW latitude, longitude coordinates
    that bound a viewport. Format is as returned by a geopy geolocator e.g.
    {'northeast': {'lat': 32.32016610000001, 'lng': -110.708204}, 'southwest': {'lat': 31.9916539, 'lng': -111.059406}}
    
    Returns a list of USGS quadrangle IDs (strings)
    '''
    view_ne_lat = viewport['northeast']['lat']
    view_ne_lon = viewport['northeast']['lng']
    view_sw_lat = viewport['southwest']['lat']
    view_sw_lon = viewport['southwest']['lng']
    
    
    ## Get a grid of quadrangles that encompass the viewport
    lats = range(int(view_sw_lat), int(view_ne_lat) + 1)
    lons = range(int(view_sw_lon), int(view_ne_lon) + 1)
    grid = {}
    for i in lats:
        for j in lons:
            grid_ij = make_grid(i, j)
            grid.update(grid_ij)
            
    ## Check each quadrangle to see if it intersects the viewport 
    quad_ids = []
    view_box = box(view_sw_lon, view_sw_lat, view_ne_lon, view_ne_lat)
    for quad_id, tile_view in grid.items():
        tile_ne_lat = tile_view['northeast']['lat']
        tile_ne_lon = tile_view['northeast']['lng']
        tile_sw_lat = tile_view['southwest']['lat']
        tile_sw_lon = tile_view['southwest']['lng']
        
        tile_box = box(tile_sw_lon, tile_sw_lat, tile_ne_lon, tile_ne_lat)
        intersection = tile_box.intersection(view_box)
        if intersection.area > 0:
            quad_ids.append(quad_id)
        
    return quad_ids

With those functions in hand, we apply them to the cities of interest. In particular, we'll create a dictionary of dictionaries for each city. The primary key is the city name like 'Phoenix' and the value will be a dictionary of other useful info that we'll add such as state, geographic coordinates, and a list of all the quadrangle IDs that encompass the city viewport.

For this project, we download data for the state capital, Phoenix, plus a few other cities. Later, we'll see that Phoenix will in fact makeup the training data for a neural network and the other cities will be test cases for the perimeter detection procedure.

In [4]:
places = [
    'Lake Havasu City',
    'Bullhead',
    'Phoenix',
    'Yuma',
    'Globe',
    'Flagstaff'
]

places = {place: {'state': 'az'} for place in places}

In [5]:
api_key = os.environ.get('GOOG_GEOLOCATOR_API_KEY')
geolocator = GoogleV3(api_key)

for place, place_info in places.items():
    
    state = place_info['state']
    loc = geolocator.geocode(place + ' ' + state)
    lat, lon = loc.latitude, loc.longitude
    viewport = loc.raw['geometry']['viewport']
    
    quad_ids = get_quad_ids(viewport)
    place_info['viewport'] = viewport
    place_info['quad_ids'] = quad_ids
    place_info['coords'] = (lat, lon)

pprint.pprint(places)

{'Bullhead': {'coords': (35.1359386, -114.5285981),
              'quad_ids': ['3511451',
                           '3511452',
                           '3511453',
                           '3511459',
                           '3511460',
                           '3511461'],
              'state': 'az',
              'viewport': {'northeast': {'lat': 35.205767, 'lng': -114.457185},
                           'southwest': {'lat': 35.040306,
                                         'lng': -114.6467591}}},
 'Flagstaff': {'coords': (35.1982836, -111.651302),
               'quad_ids': ['3511151', '3511152', '3511159', '3511160'],
               'state': 'az',
               'viewport': {'northeast': {'lat': 35.2401021, 'lng': -111.50679},
                            'southwest': {'lat': 35.12231300000001,
                                          'lng': -111.709128}}},
 'Globe': {'coords': (33.3942223, -110.7864984),
           'quad_ids': ['3311034', '3311035', '3311042', '3311043'],

Next, loop through the manifest to find the RGB images corresponding to each quadrangle ID we found above. We add the key for each image file to a list in the city dictionaries for later download.

Note that it took a little bit of manual work to figure out that the latest collection years for the states I wanted (mostly Arizona but also bit of Nevada and California since some cities are on the border) were 2014 and 2015, so the search is limited to those years. 

In [6]:
manifest_path = 'data/naip/manifest.txt'

for _, place_info in places.items():
    place_info['naip_keys'] = []

with open(manifest_path, 'r') as manifest:
    for line in manifest:
        
        ## Only want RGB
        if '/rgb/' not in line: continue
        
        items = line.split('/')
        man_year = items[1]
        man_res = items[2]
        man_file = items[5]
        man_quad = man_file[2:9]
        man_doqq = man_file[2:12]
        
        ## Only consider 1m resolution images from 2014/2015
        if man_res == '1m' and man_year in ['2014', '2015']:
    
            ## Check if DOQQ already exists in any of the keys
            for place, place_info in places.items():
                if man_quad in place_info['quad_ids']:
                    if not any(man_doqq in key for key in place_info['naip_keys']):
                        place_info['naip_keys'].append(line.strip())

print()
pprint.pprint(places)


{'Bullhead': {'coords': (35.1359386, -114.5285981),
              'naip_keys': ['az/2015/1m/rgb/35114/m_3511452_ne_11_1_20150528.tif',
                            'az/2015/1m/rgb/35114/m_3511452_nw_11_1_20150528.tif',
                            'az/2015/1m/rgb/35114/m_3511452_se_11_1_20150528.tif',
                            'az/2015/1m/rgb/35114/m_3511452_sw_11_1_20150528.tif',
                            'az/2015/1m/rgb/35114/m_3511453_ne_11_1_20150528.tif',
                            'az/2015/1m/rgb/35114/m_3511453_nw_11_1_20150528.tif',
                            'az/2015/1m/rgb/35114/m_3511453_se_11_1_20150528.tif',
                            'az/2015/1m/rgb/35114/m_3511453_sw_11_1_20150528.tif',
                            'az/2015/1m/rgb/35114/m_3511459_ne_11_1_20150528.tif',
                            'az/2015/1m/rgb/35114/m_3511459_se_11_1_20150528.tif',
                            'az/2015/1m/rgb/35114/m_3511460_ne_11_1_20150528.tif',
                            'az/20

## Download Images

Finally, loop through each key and download it locally. The path to local file is added to the info dictionary, and the entire dictionary is saved to a local file for downstream use.

In [16]:
aws_cmd_base = 'aws s3api get-object --bucket aws-naip --key {} --request-payer requester {}'
img_dir = 'data/naip/img/download/'

for place, place_info in places.items():
    place_info['img_paths'] = []
    
    for key in place_info['naip_keys']:
        local_download_path = img_dir + key.split('/')[-1]
        place_info['img_paths'].append(local_download_path)
        
        if os.path.isfile(local_download_path):
            print('Skipping {}\n'.format(key))
        else:
            aws_cmd = aws_cmd_base.format(key, local_download_path)
            print('Downloading {} with the following command:\n{}\n'.format(key, aws_cmd))
            subprocess.run(aws_cmd, shell = True)
            
with open('data/naip/download_info.json', 'w') as places_out:
    json.dump(places, places_out, sort_keys = True, indent = 4)

Skipping az/2015/1m/rgb/34114/m_3411429_ne_11_1_20150606.tif

Skipping az/2015/1m/rgb/34114/m_3411429_nw_11_1_20150606.tif

Skipping az/2015/1m/rgb/34114/m_3411429_se_11_1_20150606.tif

Skipping az/2015/1m/rgb/34114/m_3411430_ne_11_1_20150605.tif

Skipping az/2015/1m/rgb/34114/m_3411430_nw_11_1_20150605.tif

Skipping az/2015/1m/rgb/34114/m_3411430_se_11_1_20150605.tif

Skipping az/2015/1m/rgb/34114/m_3411430_sw_11_1_20150605.tif

Downloading az/2015/1m/rgb/34114/m_3411431_ne_11_1_20150606.tif with the following command:
aws s3api get-object --bucket aws-naip --key az/2015/1m/rgb/34114/m_3411431_ne_11_1_20150606.tif --request-payer requester data/naip/img/download/m_3411431_ne_11_1_20150606.tif

Downloading az/2015/1m/rgb/34114/m_3411431_nw_11_1_20150606.tif with the following command:
aws s3api get-object --bucket aws-naip --key az/2015/1m/rgb/34114/m_3411431_nw_11_1_20150606.tif --request-payer requester data/naip/img/download/m_3411431_nw_11_1_20150606.tif

Downloading az/2015/1m/rgb