# Working with Mapbox Scatterplots (`scatter_mapbox`) and Density Heatmaps (`density_mapbox`) in Plotly

These types of graphs are especially useful for continuous data with a spatial location (latitude and longitude), such as earthquake locations and their magnitudes, volcano locations and their VEIs, research sites and their temperatures, etc. 

## The data
The csv we will be working with contains information on 822 events of significant volcanic eruptions from 1750 BCE to 2020 CE. This data is originally from <a href="https://www.ngdc.noaa.gov/hazard/volcano.shtml">NCEI</a> but the data used in this tutorial is specifically a csv available <a href="https://scipython.com/book2/chapter-9-data-analysis-with-pandas/examples/analysing-the-history-of-volcanic-eruptions-with-pandas/">on this webpage</a>. We will be focusing on representing volcano locations along with Volcanic Explosivity Index (VEI) information. The <a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjEmYPx5YP6AhUtL0QIHVAmAPgQFnoECAMQAw&url=https%3A%2F%2Fwww.nps.gov%2Fsubjects%2Fvolcanoes%2Fvolcanic-explosivity-index.htm&usg=AOvVaw0CGWzkA9rRDJkxveGFr36S">NPS</a> defines the VEI as a scale that describes the size of explosive volcanic eruptions based on magnitude and intensity.

## Wrangle the data
First, we will get the data into a form more suitable for our plotting. We first read in the data from the url containing the csv, then keep only non-NA values for VEI, and keep columns of VEI, Latitude, Longitude, Name, and Country. Each location can have multiple volcanic eruptions, so we then create another dataframe with means by volcano. We will be looking at plotting both locations with multiple values for VEI and plotting the averages for each volcano.

In [78]:
# Initial importing of packages and data with a bit of wrangling
import pandas as pd
import plotly.express as px


df = pd.read_csv('https://scipython.com/static/media/2/examples/E9/volcanic-eruptions.csv') # Read in a csv from a webpage
df = df[df['VEI'].notna()] # Keep non-NAs for value of interest
df = df[['VEI', 'Latitude', 'Longitude', 'Name', 'Country']] # Select columns of interest

df_vei_means = df.groupby(['Latitude', 'Longitude', 'Name', 'Country']).mean().reset_index() # Take mean VEI for each lat-lon set

## Mapbox Scatterplots
In the below code chunks, we explore how to visualize the data (without taking averages and with taking averages by volcano) using `scatter_mapbox`. Comments next to each line of code explain what should be passed to each argument. Documentation can be found <a href="https://plotly.github.io/plotly.py-docs/generated/plotly.express.scatter_mapbox.html">here</a>.

In [79]:
# Creating a Mapbox Scatterplot (-without- taking an average for each lat-lon set with multiple values)
fig1 = px.scatter_mapbox(
    df, # Dataframe name
    lat = 'Latitude', # Latitude column
    lon = 'Longitude', # Longitude column
    color = 'VEI', # Values for determining the color of each point
    size = 'VEI', # Values for determining the size of each point
    size_max = 15, # Setting a limit on the size of points
    hover_data = ['VEI', 'Latitude', 'Longitude', 'Name', 'Country'], # Choose what data to display when you hover over a point
    center = dict(lat=10, lon=10), # Choose the center point on the map
    zoom = 0, # Set the map's zoom level (0-20)
    mapbox_style = "stamen-watercolor", # Which kind of map you want underneath your data
    color_continuous_scale = px.colors.diverging.RdBu_r, # Choose your own built-in color scheme! _r is the reverse color direction
    title = "Mapbox Scatterplot: VEIs of Significant Volcanic Events, 1750 BCE-2020 CE"
) 

fig1.show()

In [80]:
# Creating a Mapbox Scatterplot (-with- taking an average for each lat-lon set with multiple values)
fig2 = px.scatter_mapbox(
    df_vei_means, # Dataframe name
    lat = 'Latitude', # Latitude column
    lon = 'Longitude', # Longitude column
    color = 'VEI', # Values for determining the color of each point
    size = 'VEI', # Values for determining the size of each point
    size_max = 15, # Setting a limit on the size of points
    hover_data = ['VEI', 'Latitude', 'Longitude', 'Name', 'Country'], # Choose what data to display when you hover over a point
    center = dict(lat=10, lon=10), # Choose the center point on the map
    zoom = 0, # Set the map's zoom level (0-20)
    mapbox_style = "stamen-watercolor", # Which kind of map you want underneath your data
    color_continuous_scale = px.colors.diverging.RdBu_r, # Choose your own built-in color scheme! _r is the reverse color direction
    title = "Mapbox Scatterplot: VEIs of Significant Volcanic Events, 1750 BCE-2020 CE <br>                                              (averages by volcano)"
) 

fig2.show()

### Additional notes for Mapbox Scatterplot

If there are multiple values for one latitude and longitude set, they will plot on top of each other. Setting size to change according to your value of interest can help show all of the values for a location, if these are all relevant for you. 

There is unfortunately no jitter feature at the moment, so you would have to create jittered data yourself before plotting if you'd rather use that to make values at the same latitude and longitude more clear. Otherwise, you could select a singular value to plot for each lat-lon set, such as the max or the average for each location, or choose a year range where there are not multiple values for the same location.

## Mapbox Density Heatmaps
In the below code chunks, we explore how to visualize the data (without taking averages and with taking averages by volcano) using `density_mapbox`. Comments next to each line of code explain what should be passed to each argument. Documentation can be found <a href="https://plotly.com/python-api-reference/generated/plotly.express.density_mapbox.html">here</a>.

In [81]:
# Creating a Mapbox Density Heatmap (-without- taking an average for each lat-lon set with multiple values)
fig3 = px.density_mapbox(
    df, # Dataframe name
    lat = 'Latitude', # Latitude column
    lon = 'Longitude', # Longitude column
    z = 'VEI', # Values for the z axis
    radius = 8, # Set the radius of influence for VEI
    hover_data = ['VEI', 'Latitude', 'Longitude', 'Name', 'Country'], # Choose what data to display when you hover over a point
    center = dict(lat=10, lon=10), # Choose the center point on the map
    zoom = 0, # Set the map's zoom level (0-20)
    mapbox_style = "stamen-watercolor", # Which kind of map you want underneath your data
    color_continuous_scale = px.colors.diverging.RdBu_r, # Choose your own built-in color scheme! _r is the reverse color direction
    title = "Mapbox Density Heatmap: VEIs of Significant Volcanic Events, 1750 BCE-2020 CE"
) 

fig3.show()

In [82]:
# Creating a Mapbox Density Heatmap (-with- taking an average for each lat-lon set with multiple values)
fig4 = px.density_mapbox(
    df_vei_means, # Dataframe name
    lat = 'Latitude', # Latitude column
    lon = 'Longitude', # Longitude column
    z = 'VEI', # Values for the z axis
    radius = 8, # Set the radius of influence for VEI
    hover_data = ['VEI', 'Latitude', 'Longitude', 'Name', 'Country'], # Choose what data to display when you hover over a point
    center = dict(lat=10, lon=10), # Choose the center point on the map
    zoom = 0, # Set the map's zoom level (0-20)
    mapbox_style = "stamen-watercolor", # Which kind of map you want underneath your data
    color_continuous_scale = px.colors.diverging.RdBu_r, # Choose your own built-in color scheme! _r is the reverse color direction
    title = "Mapbox Density Heatmap: VEIs of Significant Volcanic Events, 1750 BCE-2020 CE <br>                                              (averages by volcano)"
) 

fig4.show()

### Additional notes for Mapbox Density Heatmaps

If you would like to represent more general trends of high-low values in areas as well as the density of locations with your relevant values (ie. density of volcanic eruption sites), you may want to use `density_mapbox` instead of `scatter_mapbox` (key differences: set `z` and `radius` values instead of `color`, `size`, and `size_max` values), especially if you have multiple values for each lat-lon set (according to <a href="https://community.plotly.com/t/is-it-possible-to-jitter-scatter-mapbox/41927">a member of the Plotly team</a>). 

However, this type of mapbox graph could end up blurring some of the specific values of your data, so be sure it is displaying your data values in a desired way before choosing this type. For example, when there are multiple values for one lat-lon set, you will only be able to hover to see one value, instead of being able to hover on the differently sized circles as in `scatter_mapbox` to see each of the values. 

You may also see individual points represented as different colors than what its specific value would indicate that its color should be, even if there is only one point for that lat-lon set. For example, a point with a VEI value of 7 that was red in the plot created by `scatter_mapbox` may show up white or blue in a `density_mapbox` plot when you zoom in to look at that specific point, but it will still show up as a color higher on the scale relative to a point with a lower VEI. Therefore, the color scale generated on the side is not truly representative of the z axis values.