![Header image: Example of a bivariate map of EU countries](https://raw.githubusercontent.com/yotkadata/plotly-bivariate-choropleth/main/img/header.png)

# Bivariate choropleth map using Plotly Choropleth Mapbox

Based on the example by [_empet_](https://chart-studio.plotly.com/~empet/15191/texas-bivariate-choropleth-assoc/#/)

**Choropleth** maps are a great way to visualize values in maps. But sometimes we not only want to show one variable, but the relationship of two. That's where **bivariate maps** come in. They show values of two variables using colors or symbols for each one of them. In the case of this example, we are using two colors that are blended together. That way, we can see areas were both variables are high or low and also those where one value is high and the other one is low.

When searching for ways to create bivariate maps using Plotly, I was surprised to find little information on how to do it. What helped me a lot, was the example that [_empet_ published on plotly.com](https://chart-studio.plotly.com/~empet/15191/texas-bivariate-choropleth-assoc/#/). I built on that and tried to make it more generic and reuseable. I hope it worked. This notebook walks you through the whole process.

Source code: https://github.com/yotkadata/plotly-bivariate-choropleth/blob/main/plotly-bivariate-choropleth.ipynb




## Part I: Generic code

First, we declare functions and other things that are independet from the data in our examples. We start importing the necessary libraries:

In [None]:
import numpy as np
import pandas as pd
import json, requests
import plotly.graph_objs as go
import os
import math
from urllib.parse import urlparse

### Define functions

In the next step, we define all the functions we will use later on. The 'core' functions for bivariate maps are `set_interval_value()` and `prepare_df()`. Both of them took the functionality from empet's example. But the latter works a little different and, in my opinion, is simpler than the original.

In [None]:
"""
Function to set default variables
"""

def conf_defaults():
    # Define some variables for later use
    conf = {
        'plot_title': 'Bivariate choropleth map using Ploty',  # Title text
        'plot_title_size': 20,  # Font size of the title
        'width': 1000,  # Width of the final map container
        'ratio': 0.8,  # Ratio of height to width
        'center_lat': 0,  # Latitude of the center of the map
        'center_lon': 0,  # Longitude of the center of the map
        'map_zoom': 3,  # Zoom factor of the map
        'hover_x_label': 'Label x variable',  # Label to appear on hover
        'hover_y_label': 'Label y variable',  # Label to appear on hover
        'borders_width': 0.5,  # Width of the geographic entity borders
        'borders_color': '#f8f8f8',  # Color of the geographic entity borders

        # Define settings for the legend
        'top': 1,  # Vertical position of the top right corner (0: bottom, 1: top)
        'right': 1,  # Horizontal position of the top right corner (0: left, 1: right)
        'box_w': 0.04,  # Width of each rectangle
        'box_h': 0.04,  # Height of each rectangle
        'line_color': '#f8f8f8',  # Color of the rectagles' borders
        'line_width': 0,  # Width of the rectagles' borders
        'legend_x_label': 'Higher x value',  # x variable label for the legend 
        'legend_y_label': 'Higher y value',  # y variable label for the legend
        'legend_font_size': 9,  # Legend font size
        'legend_font_color': '#333',  # Legend font color
    }

    # Calculate height
    conf['height'] = conf['width'] * conf['ratio']
    
    return conf


"""
Function to recalculate values in case width is changed
"""
def recalc_vars(new_width, variables, conf=conf_defaults()):
    
    # Calculate the factor of the changed width
    factor = new_width / 1000
    
    # Apply factor to all variables that have been passed to th function
    for var in variables:
        if var == 'map_zoom':
            # Calculate the zoom factor
            # Mapbox zoom is based on a log scale. map_zoom needs to be set to value ideal for our map at 1000px.
            # So factor = 2 ^ (zoom - map_zoom) and zoom = log(factor) / log(2) + map_zoom
            conf[var] = math.log(factor) / math.log(2) + conf[var]
        else:
            conf[var] = conf[var] * factor

    return conf


"""
Function to load GeoJSON file with geographical data of the entities
"""

def load_geojson(geojson_url, data_dir='data', local_file=False):
    
    # Make sure data_dir is a string
    data_dir = str(data_dir)
    
    # Set name for the file to be saved
    if not local_file:
        # Use original file name if none is specified
        url_parsed = urlparse(geojson_url)
        local_file = os.path.basename(url_parsed.path)
        
    geojson_file = data_dir + '/' + str(local_file)

    # Create folder for data if it does not exist
    if not os.path.exists(data_dir):
        os.makedirs(data_dir)

    # Download GeoJSON in case it doesn't exist
    if not os.path.exists(geojson_file):

        # Make http request for remote file data
        geojson_request = requests.get(geojson_url)

        # Save file to local copy
        with open(geojson_file, 'wb') as file:
            file.write(geojson_request.content)

    # Load GeoJSON file
    geojson = json.load(open(geojson_file, 'r'))
    
    # Return GeoJSON object
    return geojson


"""
Function that assigns a value (x) to one of three bins (0, 1, 2).
The break points for the bins can be defined by break_a and break_b.
"""

def set_interval_value(x, break_1, break_2):
    if x <= break_1: 
        return 0
    elif break_1 < x <= break_2: 
        return 1
    else: 
        return 2


"""
Function that adds a column 'biv_bins' to the dataframe containing the 
position in the 9-color matrix for the bivariate colors
    
Arguments:
    df: Dataframe
    x: Name of the column containing values of the first variable
    y: Name of the column containing values of the second variable

"""

def prepare_df(df, x='x', y='y'):
    
    # Check if arguments match all requirements
    if df[x].shape[0] != df[y].shape[0]:
        raise ValueError('ERROR: The list of x and y coordinates must have the same length.')
    
    # Calculate break points at percentiles 33 and 66
    x_breaks = np.percentile(df[x], [33, 66])
    y_breaks = np.percentile(df[y], [33, 66])
    
    # Assign values of both variables to one of three bins (0, 1, 2)
    x_bins = [set_interval_value(value_x, x_breaks[0], x_breaks[1]) for value_x in df[x]]
    y_bins = [set_interval_value(value_y, y_breaks[0], y_breaks[1]) for value_y in df[y]]
    
    # Calculate the position of each x/y value pair in the 9-color matrix of bivariate colors
    df['biv_bins'] = [int(value_x + 3 * value_y) for value_x, value_y in zip(x_bins, y_bins)]
    
    return df
   


"""
Function to create a color square containig the 9 colors to be used as a legend
"""

def create_legend(fig, colors, conf=conf_defaults()):
    
    # Reverse the order of colors
    legend_colors = colors[:]
    legend_colors.reverse()

    # Calculate coordinates for all nine rectangles
    coord = []

    # Adapt height to ratio to get squares
    width = conf['box_w']
    height = conf['box_h']/conf['ratio']
    
    # Start looping through rows and columns to calculate corners the squares
    for row in range(1, 4):
        for col in range(1, 4):
            coord.append({
                'x0': round(conf['right']-(col-1)*width, 4),
                'y0': round(conf['top']-(row-1)*height, 4),
                'x1': round(conf['right']-col*width, 4),
                'y1': round(conf['top']-row*height, 4)
            })

    # Create shapes (rectangles)
    for i, value in enumerate(coord):
        # Add rectangle
        fig.add_shape(go.layout.Shape(
            type='rect',
            fillcolor=legend_colors[i],
            line=dict(
                color=conf['line_color'],
                width=conf['line_width'],
            ),
            xref='paper',
            yref='paper',
            xanchor='right',
            yanchor='top',
            x0=coord[i]['x0'],
            y0=coord[i]['y0'],
            x1=coord[i]['x1'],
            y1=coord[i]['y1'],
        ))
    
        # Add text for first variable
        fig.add_annotation(
            xref='paper',
            yref='paper',
            xanchor='left',
            yanchor='top',
            x=coord[8]['x1'],
            y=coord[8]['y1'],
            showarrow=False,
            text=conf['legend_x_label'] + ' 🠒',
            font=dict(
                color=conf['legend_font_color'],
                size=conf['legend_font_size'],
            ),
            borderpad=0,
        )
        
        # Add text for second variable
        fig.add_annotation(
            xref='paper',
            yref='paper',
            xanchor='right',
            yanchor='bottom',
            x=coord[8]['x1'],
            y=coord[8]['y1'],
            showarrow=False,
            text=conf['legend_y_label'] + ' 🠒',
            font=dict(
                color=conf['legend_font_color'],
                size=conf['legend_font_size'],
            ),
            textangle=270,
            borderpad=0,
        )
    
    return fig


"""
Function to create the map

Arguments:
    df: The dataframe that contains all the necessary columns
    colors: List of 9 blended colors
    x: Name of the column that contains values of first variable (defaults to 'x')
    y: Name of the column that contains values of second variable (defaults to 'y')
    ids: Name of the column that contains ids that connect the data to the GeoJSON (defaults to 'id')
    name: Name of the column conatining the geographic entity to be displayed as a description (defaults to 'name')
"""

def create_bivariate_map(df, colors, geojson, x='x', y='y', ids='id', name='name', conf=conf_defaults()):
    
    if len(colors) != 9:
        raise ValueError('ERROR: The list of bivariate colors must have a length eaqual to 9.')
    
    # Recalculate values if width differs from default
    if not conf['width'] == 1000:             
        conf = recalc_vars(conf['width'], ['height', 'plot_title_size', 'legend_font_size', 'map_zoom'], conf)
        
    # Prepare the dataframe with the necessary information for our bivariate map
    df_plot = prepare_df(df, x, y)
    
    # Create the figure
    fig = go.Figure(go.Choroplethmapbox(
        geojson=geojson,
        locations=df_plot[ids],
        z=df_plot['biv_bins'],
        marker_line_width=.5,
        colorscale=[
            [0/8, colors[0]],
            [1/8, colors[1]],
            [2/8, colors[2]],
            [3/8, colors[3]],
            [4/8, colors[4]],
            [5/8, colors[5]],
            [6/8, colors[6]],
            [7/8, colors[7]],
            [8/8, colors[8]],
        ],
        customdata=df_plot[[name, ids, x, y]],  # Add data to be used in hovertemplate
        hovertemplate='<br>'.join([  # Data to be displayed on hover
            '<b>%{customdata[0]}</b> (ID: %{customdata[1]})',
            conf['hover_x_label'] + ': %{customdata[2]}',
            conf['hover_y_label'] + ': %{customdata[3]}',
            '<extra></extra>',  # Remove secondary information
        ])
    ))

    # Add some more details
    fig.update_layout(
        title=dict(
            text=conf['plot_title'],
            font=dict(
                size=conf['plot_title_size'],
            ),
        ),
        mapbox_style='white-bg',
        width=conf['width'],
        height=conf['height'],
        autosize=True,
        mapbox=dict(
            center=dict(lat=conf['center_lat'], lon=conf['center_lon']),  # Set map center
            zoom=conf['map_zoom']  # Set zoom
        ),
    )

    fig.update_traces(
        marker_line_width=conf['borders_width'],  # Width of the geographic entity borders
        marker_line_color=conf['borders_color'],  # Color of the geographic entity borders
        showscale=False,  # Hide the colorscale
    )

    # Add the legend
    fig = create_legend(fig, colors, conf)
    
    return fig

### Define lists of blended colors 

To get bivariate maps working, we need a set of nine colors that are the result of blending two main colors. Here are some examples, but others can easily be added:

1) "pink-blue" by [Joshua Stevens](http://www.joshuastevens.net/cartography/make-a-bivariate-choropleth-map/)
2) "teal-red" by [Joshua Stevens](http://www.joshuastevens.net/cartography/make-a-bivariate-choropleth-map/)
3) "blue-orange" from the [ArcGIS Website](https://pro.arcgis.com/de/pro-app/latest/help/mapping/layer-properties/bivariate-colors.htm)

![Three examples of color sets](https://raw.githubusercontent.com/yotkadata/plotly-bivariate-choropleth/main/img/colors.png)

In [None]:
# Define sets of 9 colors to be used
# Order: bottom-left, bottom-center, bottom-right, center-left, center-center, center-right, top-left, top-center, top-right
color_sets = {
    'pink-blue':   ['#e8e8e8', '#ace4e4', '#5ac8c8', '#dfb0d6', '#a5add3', '#5698b9', '#be64ac', '#8c62aa', '#3b4994'],
    'teal-red':    ['#e8e8e8', '#e4acac', '#c85a5a', '#b0d5df', '#ad9ea5', '#985356', '#64acbe', '#627f8c', '#574249'],
    'blue-organe': ['#fef1e4', '#fab186', '#f3742d',  '#97d0e7', '#b0988c', '#ab5f37', '#18aee5', '#407b8f', '#5c473d']
}

## Part II: Prepare our data

The next step is where we load our data and "connect" it to the map to actually create the map. Basically, we build a dataframe that contains at least one column with the **id** that will map the data to the geospatial data, two columns with the values of our two variables (**x** and **y**) and ideally a **name** or description of the geographic entities.

We then load the geospatial data and pass both dataframe and the geospatial data to the function `create_bivariate_map()`. If the columns of the dataframe have exactly the names mentioned above (id, x, y, name), the function will work. If not, we have to pass the column names as keyword arguments to the function (see example 2 below).

### First example: Housing and population ratios in Texas, USA (from empet's example)

Because I wrote this notebook based on empet's example, we reproduce it. I am not exactly sure, though, what the data actually stand for. One variable is about housing, the other one about population. In both cases, a ratio of the values from two years (2010 and 2016) is calculated. The results are saved as the values of our x and y columns.

#### Load data

We load the data from CSV files and do some minor transformations. In this example we make sure that the data is saved in columns with the default names (id, name, x, y).

In [None]:
"""
Get data and write it to a dataframe containing
    id: Id of the geographic entity (needs to be the same as references in the geospatial data)
    name: Name of the geographic entity to be displayed as a description
    x: Values of the first variable
    y: Values of the second variable
"""

data = pd.read_csv("hotspot_vind_county.csv", dtype = {"COUNTY_CODE": str})

In [None]:
data = data.rename(columns={
    'STCNTY': 'name', 
    '0': 'y'})

In [None]:
data['y']=data['y']*100

In [None]:
data['y']=data['y'].astype(np.int64)

#### Load geospatial data (GeoJSON)

Next we load the GeoJSON file for the geospatial representation. In this case, we have to transform it a little, because it does not contain the necessary 'id' property.

In [None]:
import plotly.express as px
import matplotlib.pyplot as plt

from urllib.request import urlopen
import json

geojson_url  = 'https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json'

In [None]:
geojson = load_geojson(geojson_url, local_file='ca-counties.geojson')

In [None]:
# Load conf defaults
conf = conf_defaults()

# Override some variables
#conf['plot_title'] = 'Hotspot frequency and vunerability index'
conf['width'] = 1000  # Width of the final map container
conf['ratio'] = 0.8  # Ratio of height to width
conf['height'] = conf['width'] * conf['ratio']  # Width of the final map container
conf['center_lat'] = 37.  # Latitude of the center of the map
conf['center_lon'] = -119.  # Longitude of the center of the map
conf['map_zoom'] = 5.2  # Zoom factor of the map
#conf['hover_x_label'] = 'Heatwaves and pollution day count'  # Label to appear on hover
#conf['hover_y_label'] = 'Vunerability index'  # Label to appear on hover

# Define settings for the legend
conf['line_width'] = 0.5  # Width of the rectagles' borders
conf['legend_x_label'] = 'More exposure days'  # x variable label for the legend 
conf['legend_y_label'] = 'Higher vuneability'  # y variable label for the legend

In [None]:
import plotting
hs_dict = plotting.get_hotspopt_dict()

In [None]:
for x in hs_dict.keys():
    conf['plot_title'] = hs_dict[x]["title_map"] + ' & vunerability'
    plt2 = create_bivariate_map(
        data, 
        color_sets['pink-blue'], 
        geojson,
        x=x, 
        ids='COUNTY_CODE', 
        conf=conf,
    )
    plt2.show()
    plt2.write_image("figures/map_"+x+"_svi.png")