https://www.kaggle.com/rtatman/188-million-us-wildfires

https://www.fs.usda.gov/rds/archive/Product/RDS-2013-0009.4/

## Prepare Notebook

In [1]:
import os
import pandas as pd
from geopandas import GeoDataFrame
import shapely
shapely.speedups.enable()

import palettable

import mapbox
import palettable

import sqlite3

import warnings
warnings.filterwarnings('ignore')

import fiona
fiona.supported_drivers

input_filename = '../data/188-million-us-wildfires/src/FPA_FOD_20170508.sqlite'
conn = sqlite3.connect(input_filename)

The whole process of bringing the data into a DataFrame, converting that into a GeoDataFrame, and then dumping out a file is not particularly optimal. That said, it may prove to be useful if we choose to engineer any new features before rendering the map.

The data we're working with contains coordinates in the [NAD83](https://en.wikipedia.org/wiki/North_American_Datum) coordinate system. In order to work with `tippecanoe` we will want to reproject the data into [WGS 84](https://en.wikipedia.org/wiki/World_Geodetic_System).

Let's take a quick peek at our data to ensure it look's reasonable.

## Load Data

In [2]:
query = '''
    SELECT
        NWCG_REPORTING_AGENCY,
        NWCG_REPORTING_UNIT_ID,
        NWCG_REPORTING_UNIT_NAME,
        FIRE_NAME,
        COMPLEX_NAME,
        FIRE_YEAR,
        DISCOVERY_DATE,
        DISCOVERY_DOY,
        DISCOVERY_TIME,
        STAT_CAUSE_CODE,
        STAT_CAUSE_DESCR,
        CONT_DATE,
        CONT_DOY,
        CONT_TIME,
        FIRE_SIZE,
        FIRE_SIZE_CLASS,
        LATITUDE,
        LONGITUDE,
        OWNER_CODE,
        OWNER_DESCR,
        STATE,
        COUNTY
    FROM
        Fires;
'''

df = pd.read_sql_query(query, conn)
geometry = [shapely.geometry.Point(xy) for xy in zip(df.LONGITUDE, df.LATITUDE)]
df.drop(['LONGITUDE', 'LATITUDE'], axis=1, inplace=True)
crs = {'init': 'epsg:4269'}
gdf = GeoDataFrame(df, crs=crs, geometry=geometry)
del df
gdf = gdf.to_crs({'init': 'epsg:4326'})

## Map Visualization

### Generate Colors for STAT_CAUSE_CODE

Let's dig into the causes of fires a bit more. First, we'll list out all of the possible values.

To save space in our vector tiles, we'll include the Cause Code rather than the Cause Description. To display human readable names, let's build a quick mapping and print it to json for inclusion in our JS.

In [3]:
cause_desc_counts = gdf \
    .groupby(['STAT_CAUSE_DESCR', 'STAT_CAUSE_CODE']) \
    .size()\
    .reset_index()\
    .rename(columns={0:"n_occurances"})\
    .sort_values(['n_occurances'], ascending=False)\
    .reset_index(drop=True)
cause_desc_counts['STAT_CAUSE_CODE'] = cause_desc_counts['STAT_CAUSE_CODE'].astype(int)

def assign_color(row):
    try:
        return palettable.matplotlib.Inferno_13.hex_colors[row.name]
    except IndexError:
        return '#ff0000'
    
cause_desc_counts['color'] = cause_desc_counts.apply(assign_color, axis=1)
display(cause_desc_counts)
cause_desc_counts.to_json(orient='records')

Unnamed: 0,STAT_CAUSE_DESCR,STAT_CAUSE_CODE,n_occurances,color
0,Debris Burning,5,429028,#000004
1,Miscellaneous,9,323805,#110A30
2,Arson,7,281455,#320A5E
3,Lightning,1,278468,#57106E
4,Missing/Undefined,13,166723,#781C6D
5,Equipment Use,2,147612,#9A2865
6,Campfire,4,76139,#BC3754
7,Children,8,61167,#D84C3E
8,Smoking,3,52869,#ED6925
9,Railroad,6,33455,#F98C0A


'[{"STAT_CAUSE_DESCR":"Debris Burning","STAT_CAUSE_CODE":5,"n_occurances":429028,"color":"#000004"},{"STAT_CAUSE_DESCR":"Miscellaneous","STAT_CAUSE_CODE":9,"n_occurances":323805,"color":"#110A30"},{"STAT_CAUSE_DESCR":"Arson","STAT_CAUSE_CODE":7,"n_occurances":281455,"color":"#320A5E"},{"STAT_CAUSE_DESCR":"Lightning","STAT_CAUSE_CODE":1,"n_occurances":278468,"color":"#57106E"},{"STAT_CAUSE_DESCR":"Missing\\/Undefined","STAT_CAUSE_CODE":13,"n_occurances":166723,"color":"#781C6D"},{"STAT_CAUSE_DESCR":"Equipment Use","STAT_CAUSE_CODE":2,"n_occurances":147612,"color":"#9A2865"},{"STAT_CAUSE_DESCR":"Campfire","STAT_CAUSE_CODE":4,"n_occurances":76139,"color":"#BC3754"},{"STAT_CAUSE_DESCR":"Children","STAT_CAUSE_CODE":8,"n_occurances":61167,"color":"#D84C3E"},{"STAT_CAUSE_DESCR":"Smoking","STAT_CAUSE_CODE":3,"n_occurances":52869,"color":"#ED6925"},{"STAT_CAUSE_DESCR":"Railroad","STAT_CAUSE_CODE":6,"n_occurances":33455,"color":"#F98C0A"},{"STAT_CAUSE_DESCR":"Powerline","STAT_CAUSE_CODE":11,"n_o

Reference: https://gis.stackexchange.com/questions/148834/creating-a-really-large-shapefile-without-eating-all-the-virtual-memory

### Point Data to GeoJSON

Dump out our GeoDataFrame to GeoJSON. This one takes a while and doesn't provide much feedback while it's in progress. In my experience, the output file is just over 1000MB -- you can keep an eye on the size of that file to check progress.

In [None]:
geojson_filename = '../data/188-million-us-wildfires/FPA_FOD_20170508.geojson'

if not os.path.exists(geojson_filename):
    gdf.to_file(geojson_filename, driver="GeoJSON")

### Tile Generation and Upload

#### STAT_CAUSE_CODE

Reference: https://github.com/mapbox/tippecanoe

`tippecanoe -o FPA_FOD_20170508.mbtiles -f -Z1 -z20 -y DISCOVERY_DATE,CONT_DATE,FIRE_SIZE_CLASS,STAT_CAUSE_CODE --drop-densest-as-needed FPA_FOD_20170508.geojson`

In [None]:
mbtiles_filename = '../data/188-million-us-wildfires/FPA_FOD_20170508.mbtiles'

if not os.path.exists(mbtiles_filename):
    !tippecanoe -o {mbtiles_filename} -ae -D 18 -d 18 -m 10 -rg -y DFIRE_YEAR -y STAT_CAUSE_CODE -y FIRE_YEAR -y FIRE_NAME -L fires:{geojson_filename}

Reference: https://github.com/mapbox/mapbox-sdk-py https://www.mapbox.com/api-documentation/#uploads

Use `mapbox-geostats` to query the resulting mbtiles file. Review the results to ensure that 

In [None]:
!mapbox-geostats {mbtiles_filename}

Use `mapbox-tile-copy` to move our tiles up to an S3 bucket. Don't forget to 1) allow public access to the bucket's keys., 2) setup static site hosting, and 3) enable CORS requests.

In [None]:
!mapbox-tile-copy {mbtiles_filename} s3://{s3_destination_bucket}/{z}/{x}/{y}