# Using Cuxfilter to Plot Coordinates on a Map of the United States
### A map of Harris County households and tanks with a distance range slider

### Import statements

Importing different packages and libraries for our visualization.

In [1]:
import geopandas as gpd
import os

import cuxfilter
from cuxfilter.layouts import feature_and_five_edge, feature_and_double_base, feature_and_base
import cudf
import numpy as np

import holoviews as hv
import pandas as pd



### Setting ```DATA_DIR```
In order to read in files from this repository, we must set ```DATA_DIR``` to be the data folder within this repository. This requires ```os.getcwd()``` to return the path to the processing notebook of this repository, so ```xxx/codeplus-celine-dcc-package/visualizations```, where ```xxx``` is the path to where you cloned this repository. If it is not, use ```os.chdir(path)``` to change the current working directory to ```xxx/codeplus-celine-dcc-package/visualizations``` before getting the current working directory in ```DATA_DIR = os.getcwd()```, where ```path``` is ```xxx/codeplus-celine-dcc-package/visualizations```.

In [2]:
DATA_DIR = os.getcwd()
DATA_DIR = DATA_DIR.replace('visualizations', 'data')
DATA_DIR

'/hpc/home/at341/ondemand/codeplus-celine-dcc-package/data'

### Reading household distance data

This is a preprocessed file with distance between households in Harris county and tanks already calculated in miles, created in processing notebook **06_case_studies_dist_processing**. This dataframe also includes information as to whether the households have children, the age code of the head of household, the latitude and longitudes of the tanks and households, tank type, tank diameter, and distance, and if there are elderly in each household. 

The ```lat_3857``` and ```lon_3857``` coordinates will be the points we plot on our cuxfilter dashboard, and the rest of the variables are used for the range slider and multiselect tools the user can interact with.

In [None]:
df_harris = pd.read_parquet(DATA_DIR + '/harris_dist.parquet')
df_harris = df_harris[df_harris['distance_category'] != 4]
df_harris

Unnamed: 0,has_child,age_code,lat_3857,lon_3857,tank_type,diameter,distance_m,distance_mi,distance_category,is_elderly
13,2,I,-1.059221e+07,3.458798e+06,closed_roof_tank,29.4,5008.432385,3.112096,3,2
17,2,M,-1.059430e+07,3.462129e+06,closed_roof_tank,29.4,3159.104500,1.962977,3,1
20,1,G,-1.059041e+07,3.462799e+06,closed_roof_tank,29.4,6540.607560,4.064145,3,2
32,2,C,-1.057942e+07,3.459363e+06,narrow_closed_roof_tank,6.0,3262.534222,2.027245,3,2
44,1,C,-1.061478e+07,3.466496e+06,external_floating_roof_tank,10.2,5651.484940,3.511670,3,2
...,...,...,...,...,...,...,...,...,...,...
500003,0,,-1.061374e+07,3.477134e+06,narrow_closed_roof_tank,5.4,,35.000000,0,0
500004,0,,-1.057771e+07,3.453203e+06,closed_roof_tank,10.2,,35.000000,0,0
500005,0,,-1.057989e+07,3.455635e+06,narrow_closed_roof_tank,6.0,,35.000000,0,0
500006,0,,-1.059765e+07,3.460715e+06,closed_roof_tank,29.4,,35.000000,0,0


### Transforming to cuxfilter dataframe

This transforms the pandas dataframe into a cuDF dataframe, then from a cuDF dataframe into a Cuxfilter dataframe. This makes it possible to plot these dataframes using the Cuxfilter library.

In [5]:
cdf = cudf.DataFrame.from_pandas(df_harris) 

In [6]:
cux_df = cuxfilter.DataFrame.from_dataframe(cdf) 

### Defining label maps
The Datashader plotting library that Cuxfilter uses to create our visualization through the use of Graphical Processing Units (GPUs) is optimized for working with large dataframes. Here, we're plotting over 1 million of them. However, an aspect of Datashader is that it only takes numerical inputs when creating the range slider and multiselect charts. This means that instead of being able to categorize each household by whether or not it has children by labelling it with ```strings``` as ```'Children'``` or ```'No Children'```, we must label it numerically. Hence, our column ```has_child``` has numerical indicators. ```0``` indicates that the point is a tank, ```1``` indicates that the point is a household and has children, and ```2``` indicates that the point is a household and does not have children. The same structure is true for our ```is_elderly``` column. For the ```distance_category``` column, ```0``` indicates that the point is a tank, ```1``` indicates that the point is a household 0.5 miles away from the nearest tank, ```2``` is a household 1 mile away, and ```3``` is a household 5 miles away. 

The label maps below associate each numerical value in our dataframe to a ```string``` label which is displayed on the range slider and multiselect. The ```colors``` list provides the hex codes for the coloring of each point in the map when it is displayed. 

In [4]:
label_map_distance = {0: 'Tank', 1: '0.5 miles away', 
             2: '1 mile away', 3: '5 miles away'}

label_map_elderly = {0: 'Tank', 1: 'Elderly', 
             2: 'Not Elderly'}

label_map_children = {0: 'Tank', 1: 'Children', 
             2: 'No Children'}

colors = ['#05c1ff', '#ff0000', '#ff00a4', '#a11aeb']

### Defining cuxfilter charts

This code defines the charts for our interactive dashboard. The ```points``` chart is the main map with households and tanks plotted. We are using latitude and longitude coordinates in the EPSG 3857 coordinate system, as it is the one used by the Cuxfilter library. The points for tanks and households are colored differently by setting the ```aggregate_col``` parameter as the ```distance_category``` column in our dataframe. The ```aggregate_fn``` parameter, set to either ```max```, ```min``` or ```mean``` specifies which aggregation of the data to perform when coloring the points. 

This column has four categories, as described above, and the ```colors``` list has four different colors, each which will be assigned to one of the distance categories. This colors each point on the map by its corresponding distance category.

The next four charts define the interactive range sliders and multiselects the user can interact with. In each of these lines, the ```.multi_select``` specifies that the chart is a multiselect chart, while the ```.range_slider``` specifies that the chart is a range slider. In each of these chart definitions, we specify the column name from our dataframe that the chart should pull from, and the label map Cuxfilter should use to create the chart. For example, the ```distance_category``` chart is a multiselect chart that pulls from the ```distance_category``` column. The options on that multiselect chart are labelled according to the ```label_map_distance``` label map.

In [7]:
points = cuxfilter.charts.scatter(x='lat_3857', y='lon_3857', pixel_shade_type='linear', color_palette = colors, aggregate_fn = 'max', 
                                  aggregate_col = 'distance_category', tile_provider="CartoDark", 
                                  title = 'Households in Harris County in Close Proximity to Tanks', legend = True)

distance_category = cuxfilter.charts.multi_select('distance_category', label_map=label_map_distance)

age = cuxfilter.charts.multi_select('is_elderly', label_map=label_map_elderly)

children = cuxfilter.charts.multi_select('has_child', label_map=label_map_children)

distance_slider = cuxfilter.charts.range_slider('distance_mi')

Finally, we use the ```.dashboard``` method to put these charts together as an interactive dashboard for the user. We first specify the main charts which will be displayed in the ```layout``` we choose, then specify the charts that will be displayed on the dashboard's ```sidebar```.

In [8]:
d = cux_df.dashboard([points, distance_slider], 
                     sidebar = [distance_category, age, children], layout = cuxfilter.layouts.feature_and_base) 

### Displaying interactive dashboard

Running the commands below displays the interactive dashboard. The user can use the multiselect charts to view specific subsets of points, and use ```ctrl``` click to view multiple catgories in one multiselect at a time. The user can also use the range slider chart to view households within a certain distance range from a storage tank.

Using these interactive tools creates a copy of the data every time a user interacts with them, which may cause a memory allocation error. You can run ```nvidia-smi``` to see how much memory you are using on the GPUs.

In [14]:
d.show()
d.app(sidebar_width=290) # run the dashboard within the notebook cell

Dashboard running at port 49325
