# Map Based Visualization with cuxfilter

In this Notebook, we’ll walk through how to quickly get a map based visualization from a cuDF dataframe using [cuxfilter](https://github.com/rapidsai/cuxfilter), a RAPIDS framework that enables fast, interactive, multi-dimensional filtering of 100 million+ row datasets.

In [1]:
import cudf
import cuxfilter
import numpy as np

## Dataset

The store location data we'll be using for this demo comes from the [Walmart Store location data](https://data.world/data-hut/walmart-store-location-data) dataset and the population data we'll be using comes from the [U.S. Population by zip code, 2010-2016]() dataset, both found on [data.world](https://data.world/).

In [2]:
store_df = cudf.read_csv('https://query.data.world/s/bin3w5x2c52guoqvce7tu42rjhgki5', dtype=str)
pop_df = cudf.read_csv('https://query.data.world/s/7u7a4tau6my73leqt5srpcjmixuxpd', dtype=str)

Let's give the data a quick look to see what we're working with.

In [3]:
store_df.head(5)

Unnamed: 0,name,url,street_address,city,state,zip_code,country,phone_number_1,phone_number_2,fax_1,...,email_2,website,open_hours,latitude,longitude,facebook,twitter,instagram,pinterest,youtube
0,Conway Supercenter,https://www.walmart.com/store/5/conway-ar/details,1155 Hwy 65 North,Conway,AR,72032,US,501-329-0023,,,...,,,"monday - friday : 00:00-24:00, saturday : 00:0...",35.10866,-92.436905,,,,,
1,Sikeston Supercenter,https://www.walmart.com/store/9/sikeston-mo/de...,1303 S Main St,Sikeston,MO,63801,US,573-472-3020,,,...,,,"monday - friday : 00:00-24:00, saturday : 00:0...",36.857394,-89.586051,,,,,
2,Tahlequah Supercenter,https://www.walmart.com/store/10/tahlequah-ok/...,2020 S Muskogee Ave,Tahlequah,OK,74464,US,918-456-8804,,,...,,,"monday - friday : 00:00-24:00, saturday : 00:0...",35.888765,-94.979859,,,,,
3,Mountain Home Supercenter,https://www.walmart.com/store/11/mountain-home...,65 Wal Mart Dr,Mountain Home,AR,72653,US,870-492-9299,,,...,,,"monday - friday : 00:00-24:00, saturday : 00:0...",36.3549565,-92.3410256,,,,,
4,Claremore Supercenter,https://www.walmart.com/store/12/claremore-ok/...,1500 S Lynn Riggs Blvd,Claremore,OK,74017,US,918-341-2765,,,...,,,"monday - friday : 00:00-24:00, saturday : 00:0...",36.293955,-95.627125,,,,,


In [4]:
pop_df.head(5)

Unnamed: 0,﻿zip_code,y-2016,y-2015,y-2014,y-2013,y-2012,y-2011,y-2010,aggregate
0,601,17800,17982,18088,18450,18544,18533,18570,127967
1,602,39716,40260,40859,41302,41640,41930,41520,287227
2,603,51565,52408,53162,53683,54540,54475,54689,374522
3,606,6320,6331,6415,6591,6593,6386,6615,45251
4,610,27976,28328,28805,28963,29141,29111,29016,201340


## Preparation

We want a dataframe that has the store's zip prefix and some purchase data randomly generated based on local population.

In [5]:
df = store_df.join(pop_df, on="zip_code")[['name', "zip_code", "y-2016"]]
df = df.rename({"y-2016":"local_pop"}, axis=1).astype({"local_pop":"int"})

Next we'll create some random purchase data for our analysis

In [6]:
df["purchases"] = df["local_pop"] * cudf.Series(np.random.randint(60,80,len(df))/100, index=df.index)
df["purchases"] = df["purchases"].astype("int")

df["revenue"] = df["purchases"] * cudf.Series(np.random.randint(4000,5000,len(df))/100, index=df.index)

Next, we'll create a column for the zip prefixes as that's what we'll be using to aggregate for our visualization

In [7]:
df["zip"] = df["zip_code"].str.slice(0,3).astype("int")

In [8]:
df

Unnamed: 0,name,zip_code,local_pop,purchases,revenue,zip
224,Rogers Gas Station,72758,870,556,23774.56,727
225,Kosciusko Supercenter,39090,15725,10693,467070.24,390
226,Crockett Supercenter,75835,734,447,20016.66,758
227,Harrisburg Supercenter,62946,985,699,32957.85,629
228,Paola Supercenter,66071,869,634,28174.96,660
...,...,...,...,...,...,...
1083,De Funiak Springs Supercenter,32433,211,143,6875.44,324
1084,Albemarle Supercenter,28001,32,23,994.29,280
1085,Port Allen Supercenter,70767,86,66,2981.22,707
1086,Hartsville Supercenter,29550,313,209,10142.77,295


Then we can convert the cudf DataFrame into a cuxfilter DataFrame to prepare for visualization.

In [9]:
cux_df = cuxfilter.DataFrame.from_dataframe(df)

## Creating the Visualization

Now we can visualize our data using the chloropleth chart built into cuxfilter

In [10]:
geoJSONSource='https://raw.githubusercontent.com/rapidsai/cuxfilter/GTC-2018-mortgage-visualization/javascript/demos/GTC%20demo/src/data/zip3-ms-rhs-lessprops.json'
size=len(df["zip"].unique())

We can create a 3d heatmap where color represents the average revenue of stores in the given zip prefix and elevation represents the average number of purchases made at stores in the given zip prefix. 

In [11]:
chart0 = cuxfilter.charts.choropleth(x='zip', color_column='revenue', color_aggregate_fn='mean',
            elevation_column='purchases', elevation_factor=100, elevation_aggregate_fn='mean',
        geoJSONSource=geoJSONSource, data_points=size, add_interaction=True
)

d = cux_df.dashboard([chart0], layout=cuxfilter.layouts.single_feature, theme=cuxfilter.themes.dark, title='Purchase Dashboard')



Finally we can display the chart.

In [12]:
chart0.view()

  return array(a, dtype, copy=False, order=order)


Or, we can display a dashboard in a new window.

In [13]:
d.show()

Dashboard running at port 46687
