## Iggy demo with kepler.gl visualization

This notebook gives a quick example of how to enrich some user data with Iggy and then visualize the resulting features using [kepler.gl](https://kepler.gl/).

We're assuming you have some Iggy data on hand (if not, you can download a sample [here](https://docs.askiggy.com/download/sample-data)) and have unzipped the downloaded package into a local directory like so:

```bash
tar xzvf iggy-package-wkt-20211110214810_fl_pinellas_quadkeys.tar.gz
```

The resulting data (parquet files) should then be accessible in the directory `iggy-package-wkt-20211110214810_fl_pinellas_quadkeys`. 

Ok? Let's go!

In [1]:
# Install dependencies if needed
!pip install pandas geopandas pyarrow shapely keplergl

In [2]:
import pandas as pd
import geopandas as gpd
from shapely import wkt
from shapely.geometry import Point, Polygon
from keplergl import KeplerGl



### Let's start with a dataset

We'll assume you're using Iggy to enrich some data you already have on hand, like a data set of properties your company holds or number of users by zip code.

For this demo, we'll start with a 2018 dataset of public pools in FL which can be downloaded [here](https://download.fgdl.org/pub/state/public_pools_mar18.zip). The next two code blocks download the data, read it, and transform it into a data frame where each row represents a zip code and a column indicates the number of public pools in that zip.

In [3]:
!wget https://download.fgdl.org/pub/state/public_pools_mar18.zip
!unzip public_pools_mar18.zip

In [4]:
# read data
pools_gdf = gpd.read_file('public_pools_mar18.shp')
# calculate pool counts by zip
pools = pd.DataFrame(pools_gdf)[["ZIP_CODE", "ENTITYNUMB"]].groupby(["ZIP_CODE"]).count()
pools.rename(columns={"ENTITYNUMB": "pool_count"}, inplace=True)

So we're starting with a simple data frame that has two columns: `ZIP_CODE` (which is currently the index), and `pool_count`. If you're following along with your own data, just get it to the point where you have it in a pandas DataFrame and go from here.

In [5]:
pools.head()

Unnamed: 0_level_0,pool_count
ZIP_CODE,Unnamed: 1_level_1
32003,26
32008,2
32024,4
32025,10
32033,5


### Loading and enriching with Iggy data

The next step is to load the Iggy data that we'll use to enrich our pools dataset.

In [6]:
VERSION_ID = "20211110214810"
PREFIX = "fl_pinellas_quadkeys"

In [7]:
# Load Iggy zipcode data and set the dataframe index to the `id` which contains the zip code
iggy_zips = pd.read_parquet(f"../iggy-data/iggy-package-wkt-{VERSION_ID}_{PREFIX}/{PREFIX}_zipcode_{VERSION_ID}")
iggy_zips.set_index("id", inplace=True)

# Merge it with our pools
iggy_pools = pools.merge(iggy_zips, left_index=True, right_index=True, suffixes=("", "_zipcode"))

In [8]:
iggy_pools.head()

Unnamed: 0,pool_count,name,area_sqkm,perimeter_km,population,poi_count,poi_count_per_sqkm,poi_count_per_capita,poi_is_transportation_count,poi_is_transportation_count_per_sqkm,...,national_forest_count_per_sqkm,national_forest_count_per_capita,national_forest_intersecting_area_in_sqkm,national_forest_pct_area_intersecting_boundary,public_park_count,public_park_count_per_sqkm,public_park_count_per_capita,public_park_intersecting_area_in_sqkm,public_park_pct_area_intersecting_boundary,geometry
33556,12,33556,106.265117,58.167026,23182,140,1.31746,0.006039,2,0.018821,...,0.0,0.0,0.0,0.0,6.0,0.056463,0.000259,0.277289,0.002609,"POLYGON((-82.651165 28.173266, -82.651144 28.1..."
33626,36,33626,38.051243,34.312747,30743,306,8.041787,0.009953,11,0.289084,...,0.0,0.0,0.0,0.0,6.0,0.157682,0.000195,0.835024,0.021945,"POLYGON((-82.649149 28.098901, -82.649126 28.0..."
33635,26,33635,16.020221,30.481792,18650,134,8.364429,0.007185,17,1.061159,...,,,,,,,,,,"POLYGON((-82.648565 28.033407, -82.648567 28.0..."
33701,66,33701,10.382468,16.786176,15728,788,75.897177,0.050102,177,17.04797,...,0.0,0.0,0.0,0.0,30.0,2.889486,0.001907,0.747974,0.072042,"POLYGON((-82.646726 27.785689, -82.646743 27.7..."
33702,65,33702,33.046359,35.880793,32019,437,13.223847,0.013648,124,3.752304,...,0.0,0.0,0.0,0.0,15.0,0.453908,0.000468,0.650914,0.019697,"MULTIPOLYGON(((-82.669789 27.826477, -82.66978..."


Great! Now we have a dataframe with not one but 227 columns that describe each zip code.

If you noticed, the number of rows in this dataset shrunk from 897 to 52. This is because our Iggy sample dataset only contains 52 zip codes in Pinellas County. 

### Visualization

Next let's visualize the enriched pools data using kepler.

The first thing we'll need to do is turn our vanilla DataFrame into a GeoDataFrame, so that kepler can understand the geometries.

In [9]:
# Define the geometry
pools_geom = iggy_pools.pop("geometry")
pools_geom = gpd.GeoSeries(pools_geom.map(lambda geom: wkt.loads(geom)), crs="WGS84")

# Convert to GeoDataFrame
iggy_pools_gdf = gpd.GeoDataFrame(iggy_pools, geometry=pools_geom)
iggy_pools_gdf.fillna(0, inplace=True)

  super().__setitem__(key, value)


Finally, let's take a look!

In [10]:
map = KeplerGl()
map.add_data(data=iggy_pools_gdf, name="enriched_pools")

User Guide: https://docs.kepler.gl/docs/keplergl-jupyter


In [11]:
map

KeplerGl(data={'enriched_pools': {'index': ['33556', '33626', '33635', '33701', '33702', '33703', '33704', '33…

When you first load the map, it'll show you the outlines of zip codes in Pinellas County and every zip will have the same color.

You can change the colors to reflect features by:
- clicking on the little arrow in the top left of the map
- clicking the down arrow at the right side of the `enriched_pools` box 
- clicking on the three dots by "Fill Color"
- selecting a feature in the "Color Based On" box 

### Exporting

Now let's say you prefer to use kepler.gl in their web app, instead of here in the notebook. Or, maybe you haven't been able to get the kepler map to render in this notebook and have exhausted all of the install suggestions in [kepler's docs](https://docs.kepler.gl/docs/keplergl-jupyter) and just want to see the data. 

You can export your GeoDataFrame to a file in GeoJSON format, which you can then drag and drop into your kepler browser window:

In [13]:
iggy_pools_gdf.to_file("iggy-pools-export.json", driver="GeoJSON", index=True)