# Habitat suitability under climate change

[Our changing climate is changing where key grassland species can live,
and grassland management and restoration practices will need to take
this into
account.](https://www.frontiersin.org/articles/10.3389/fpls.2017.00730/full)

In this coding challenge, you will create a habitat suitability model
for a species of your choice that lives in the continental United States
(CONUS). We have this limitation because the downscaled climate data we
suggest, the [MACAv2 dataset](https://www.climatologylab.org/maca.html),
is only available in the CONUS – if you find other downscaled climate
data at an appropriate resolution you are welcome to choose a different
study area. If you don’t have anything in mind, you can take a look at
Sorghastrum nutans, a grass native to North America. [In the past 50
years, its range has moved
northward](https://www.gbif.org/species/2704414).

Your suitability assessment will be based on combining multiple data
layers related to soil, topography, and climate. You will also need to
create a **modular, reproducible, workflow** using functions and loops.
To do this effectively, we recommend planning your code out in advance
using a technique such as pseudocode outline or a flow diagram. We
recommend planning each of the blocks below out into multiple steps. It
is unnecessary to write a step for every line of code unles you find
that useful. As a rule of thumb, aim for steps that cover the major
structures of your code in 2-5 line chunks.

## STEP 1: STUDY OVERVIEW

Before you begin coding, you will need to design your study.

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-respond"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Reflect and Respond</div></div><div class="callout-body-container callout-body"><p>What question do you hope to answer about potential future changes in
habitat suitability?</p></div></div>

YOUR QUESTION HERE

In [14]:
pip install pygbif




In [15]:
# load packages

# reproducible file paths
import os
from glob import glob
import pathlib

# gbif packages
# need to install pygbif
import pygbif.occurrences as occ
import pygbif.species as species
from getpass import getpass

# unzipping and handling gbif data
import zipfile
import time

# deal with spatial data
import geopandas as gpd
import xrspatial

# deal w/ other types of data
import numpy as np
import pandas as pd
import rioxarray as rxr
import rioxarray.merge as rxrm

# invalied geometries
from shapely.geometry import MultiPolygon, Polygon

# packages for visualizing
import holoviews as hv
import hvplot.pandas
import hvplot.xarray

In [16]:
# make reproducible file paths
data_dir = os.path.join(
    # home directory
    pathlib.Path.home(),

    # eda directory
    'earth-analytics',
    'data',

    # project directory
    'aspen-habitat-suitability'
)

# make the directory
os.makedirs(data_dir, exist_ok=True)

In [17]:
# set gbif dir
gbif_dir = os.path.join(data_dir, 'gbif_aspen')

In [18]:
# access gbif
reset_credentials = True

# enter gbif username, password, and email
credentials = dict(
    GBIF_USER=(input, 'GBIF username'),
    GBIF_PWD=(getpass, 'GBIF password'),
    GBIF_EMAIL=(input, 'GBIF email')
)

for env_variable, (prompt_func, prompt_text) in credentials.items():
    # delete credential from the environment if requested
    if reset_credentials and (env_variable in os.environ):
        os.environ.pop(env_variable)
    
    # ask for credential and save to environment
    if not env_variable in os.environ:
        os.environ[env_variable] = prompt_func(prompt_text)

In [19]:
# species names
species_name = 'Populus tremuloides'

# species info for gbif
species_info = species.name_lookup(species_name,
                                   rank = 'SPECIES')

# grab first result
first_result = species_info['results'][0]

# get species key
species_key = first_result['nubKey']

# check first_result
first_result['species'], species_key

('Populus tremuloides', 3040215)

In [20]:
#save species code
species_key = 3040215

In [21]:
# set a file pattern 
gbif_pattern = os.path.join(gbif_dir,
                            '*csv')

# download it once
if not glob(gbif_pattern):
    # submit query to GBIF to get all data from all years
    gbif_query = occ.download([
        f'speciesKey = {species_key}',
        'hasCoordinate = True'
    ])

    # only download once
    if not 'GBIF_DOWNLOAD_KEY' in os.environ:
        os.environ['GBIF_DOWNLOAD_KEY'] = gbif_query[0]
        download_key = os.environ['GBIF_DOWNLOAD_KEY'] 

        # wait for downlaod to build
        wait = occ.download_meta(download_key)['status']
        while not wait == 'SUCCEEDED' :
            wait = occ.download_meta(download_key)['status']
            time.sleep(5)
    # download the data
    download_info = occ.download_get(
        os.environ['GBIF_DOWNLOAD_KEY'],
        path = data_dir
    )

    # unzip it
    with zipfile.ZipFile(download_info['path']) as download_zip:
        download_zip.extractall(path = gbif_dir)

# find csv file path
gbif_path = glob(gbif_pattern)[0]

INFO:Your download key is 0000376-250225214225278
INFO:Download file size: 3160905 bytes
INFO:On disk at C:\Users\riede\earth-analytics\data\aspen-habitat-suitability/0000376-250225214225278.zip


gbif download citation: GBIF.org (26 February 2025) GBIF Occurrence Download  https://doi.org/10.15468/dl.s7t3zx

In [22]:
# open gbif data
gbif_df = pd.read_csv(
    gbif_path,
    delimiter = '\t'
)

# check dataframe
gbif_df.head()

  gbif_df = pd.read_csv(


Unnamed: 0,gbifID,datasetKey,occurrenceID,kingdom,phylum,class,order,family,genus,species,...,identifiedBy,dateIdentified,license,rightsHolder,recordedBy,typeStatus,establishmentMeans,lastInterpreted,mediaType,issue
0,997428553,95c938a8-f762-11e1-a439-00145eb45e9a,c3221b5b-2097-4e2c-877f-db4bfcd016f1,Plantae,Tracheophyta,Magnoliopsida,Malpighiales,Salicaceae,Populus,Populus tremuloides,...,Keith Shaw,,CC_BY_4_0,,Keith Shaw,,,2025-02-13T16:11:52.062Z,,GEODETIC_DATUM_INVALID;GEODETIC_DATUM_ASSUMED_...
1,930742206,0096dfc0-9925-47ef-9700-9b77814295f1,http://bioimages.vanderbilt.edu/ind-kaufmannm/...,Plantae,Tracheophyta,Magnoliopsida,Malpighiales,Salicaceae,Populus,Populus tremuloides,...,Maurice J. Kaufmann,1972-01-01T00:00:00,CC0_1_0,,Maurice J. Kaufmann,,native,2025-02-06T17:33:39.616Z,StillImage,
2,930742181,0096dfc0-9925-47ef-9700-9b77814295f1,http://bioimages.vanderbilt.edu/ind-kaufmannm/...,Plantae,Tracheophyta,Magnoliopsida,Malpighiales,Salicaceae,Populus,Populus tremuloides,...,Maurice J. Kaufmann,1972-01-01T00:00:00,CC0_1_0,,Maurice J. Kaufmann,,native,2025-02-06T17:33:43.367Z,StillImage,
3,930742153,0096dfc0-9925-47ef-9700-9b77814295f1,http://bioimages.vanderbilt.edu/ind-kaufmannm/...,Plantae,Tracheophyta,Magnoliopsida,Malpighiales,Salicaceae,Populus,Populus tremuloides,...,Maurice J. Kaufmann,1972-01-01T00:00:00,CC0_1_0,,Maurice J. Kaufmann,,native,2025-02-06T17:33:43.497Z,StillImage,
4,930740127,0096dfc0-9925-47ef-9700-9b77814295f1,http://bioimages.vanderbilt.edu/ind-baskauf/14...,Plantae,Tracheophyta,Magnoliopsida,Malpighiales,Salicaceae,Populus,Populus tremuloides,...,Steven J. Baskauf,2002-07-30T00:00:00,CC0_1_0,,Steven J. Baskauf,,native,2025-02-06T17:33:43.275Z,StillImage,


In [None]:
# see all gbif_df columns
gbif_df.columns

Index(['gbifID', 'datasetKey', 'occurrenceID', 'kingdom', 'phylum', 'class',
       'order', 'family', 'genus', 'species', 'infraspecificEpithet',
       'taxonRank', 'scientificName', 'verbatimScientificName',
       'verbatimScientificNameAuthorship', 'countryCode', 'locality',
       'stateProvince', 'occurrenceStatus', 'individualCount',
       'publishingOrgKey', 'decimalLatitude', 'decimalLongitude',
       'coordinateUncertaintyInMeters', 'coordinatePrecision', 'elevation',
       'elevationAccuracy', 'depth', 'depthAccuracy', 'eventDate', 'day',
       'month', 'year', 'taxonKey', 'speciesKey', 'basisOfRecord',
       'institutionCode', 'collectionCode', 'catalogNumber', 'recordNumber',
       'identifiedBy', 'dateIdentified', 'license', 'rightsHolder',
       'recordedBy', 'typeStatus', 'establishmentMeans', 'lastInterpreted',
       'mediaType', 'issue'],
      dtype='object')

In [None]:
# make gbif_df spatial
gbif_gdf = (
    gpd.GeoDataFrame(
        gbif_df,
        geometry = gpd.points_from_xy(
            # x value comes from decimalLongitude
            gbif_df.decimalLongitude,
            # y value comes from decimalLatitude
            gbif_df.decimalLatitude
        ),
        # assign a crs to the gdf
        crs = 'EPSG:4326'
    )
)

# check gbif_gdf
gbif_gdf.head()

Unnamed: 0,gbifID,datasetKey,occurrenceID,kingdom,phylum,class,order,family,genus,species,...,dateIdentified,license,rightsHolder,recordedBy,typeStatus,establishmentMeans,lastInterpreted,mediaType,issue,geometry
0,997428553,95c938a8-f762-11e1-a439-00145eb45e9a,c3221b5b-2097-4e2c-877f-db4bfcd016f1,Plantae,Tracheophyta,Magnoliopsida,Malpighiales,Salicaceae,Populus,Populus tremuloides,...,,CC_BY_4_0,,Keith Shaw,,,2025-02-13T16:11:52.062Z,,GEODETIC_DATUM_INVALID;GEODETIC_DATUM_ASSUMED_...,POINT (-113.30542 49.46786)
1,930742206,0096dfc0-9925-47ef-9700-9b77814295f1,http://bioimages.vanderbilt.edu/ind-kaufmannm/...,Plantae,Tracheophyta,Magnoliopsida,Malpighiales,Salicaceae,Populus,Populus tremuloides,...,1972-01-01T00:00:00,CC0_1_0,,Maurice J. Kaufmann,,native,2025-02-06T17:33:39.616Z,StillImage,,POINT (-105.3479 39.74944)
2,930742181,0096dfc0-9925-47ef-9700-9b77814295f1,http://bioimages.vanderbilt.edu/ind-kaufmannm/...,Plantae,Tracheophyta,Magnoliopsida,Malpighiales,Salicaceae,Populus,Populus tremuloides,...,1972-01-01T00:00:00,CC0_1_0,,Maurice J. Kaufmann,,native,2025-02-06T17:33:43.367Z,StillImage,,POINT (-105.3479 39.74944)
3,930742153,0096dfc0-9925-47ef-9700-9b77814295f1,http://bioimages.vanderbilt.edu/ind-kaufmannm/...,Plantae,Tracheophyta,Magnoliopsida,Malpighiales,Salicaceae,Populus,Populus tremuloides,...,1972-01-01T00:00:00,CC0_1_0,,Maurice J. Kaufmann,,native,2025-02-06T17:33:43.497Z,StillImage,,POINT (-105.3479 39.74944)
4,930740127,0096dfc0-9925-47ef-9700-9b77814295f1,http://bioimages.vanderbilt.edu/ind-baskauf/14...,Plantae,Tracheophyta,Magnoliopsida,Malpighiales,Salicaceae,Populus,Populus tremuloides,...,2002-07-30T00:00:00,CC0_1_0,,Steven J. Baskauf,,native,2025-02-06T17:33:43.275Z,StillImage,,POINT (-112.1184 36.04881)


In [27]:
# plot gbif_gdf to see where the occurrences are
gbif_gdf.hvplot(
    # treat the plot as geographic and assume lat/lon coordinates
    geo = True,
    # overlay the plot on EsriImagery tiles
    tiles = 'EsriImagery',
    # set title
    title = 'Quaking Aspen (Populus tremuloides) Occurrences in GBIF',
    # set fill and line color
    fill_color = None, line_color = 'black'
)


Site determination - 
* Uinta national Forest
    * [Forest Resources of the Uinta National Forest](https://www.fs.usda.gov/rm/pubs_series/forest_resources/uinta.pdf)

* Kaibab National Forest
    * [About the Forest](https://www.fs.usda.gov/main/kaibab/about-forest)
    
The forest url is from here: https://data-usfs.hub.arcgis.com/datasets/usfs::fs-national-forests-dataset-us-forest-service-proclaimed-forests/about

In [52]:
# set up national forest url
forest_url = ("https://apps.fs.usda.gov/arcx/rest/services/EDW/EDW_ProclaimedForestBoundaries_01/MapServer/0/query?where=1%3D1&outFields=*&geometry=&geometryType=esriGeometryEnvelope&inSR=4326&spatialRel=esriSpatialRelIntersects&outSR=4326&f=json")

# set up path to save forest data
forest_dir = os.path.join(data_dir, 'site_aspen')
os.makedirs(forest_dir, exist_ok=True)


# Join forest shapefile path
forest_path = os.path.join(forest_dir, 'S_USA.ProclaimedForestBoundaries.shp')

# Only download once
if not os.path.exists(forest_path):
    forest_gdf = gpd.read_file(forest_url)
    forest_gdf.to_file(forest_path)

# Create forest_gdf
forest_gdf = gpd.read_file(forest_path)

  forest_gdf.to_file(forest_path)
  ogr_write(
INFO:Created 30 records


In [53]:
# Check forest_gdf
forest_gdf

Unnamed: 0,OBJECTID,PROCLAIMED,FORESTNAME,GIS_ACRES,SHAPE.AREA,SHAPE.LEN,geometry
0,200571,66329010328,Bighorn National Forest,1112646.0,0.509899,4.943363,"POLYGON ((-107.54298 44.93779, -107.54298 44.9..."
1,200572,93007010328,Gifford Pinchot National Forest,1532173.0,0.72229,11.679336,"MULTIPOLYGON (((-122.27148 45.75119, -122.2711..."
2,200573,96812010328,Manti-La Sal National Forest,1337654.0,0.561732,9.953308,"MULTIPOLYGON (((-111.40257 39.98081, -111.4009..."
3,200574,96813010328,Uinta National Forest,961743.0,0.411377,6.874785,"MULTIPOLYGON (((-111.55131 40.59378, -111.5513..."
4,200575,105935010328,Kaibab National Forest,1601003.0,0.647156,9.140002,"MULTIPOLYGON (((-112.06366 36.8781, -112.06366..."
5,200576,106640010328,Fremont National Forest,1713917.0,0.760518,11.282002,"MULTIPOLYGON (((-120.53326 42.57082, -120.5285..."
6,200577,106887010328,Mt. Baker National Forest,1317677.0,0.648567,10.659633,"MULTIPOLYGON (((-121.75954 48.99733, -121.7512..."
7,200578,107266010328,Olympic National Forest,695868.5,0.337426,8.17974,"MULTIPOLYGON (((-123.23508 48.01368, -123.2297..."
8,200579,107474010328,Wallowa National Forest,1064857.0,0.496222,9.027734,"MULTIPOLYGON (((-117.56261 45.48274, -117.5625..."
9,200580,108201010328,Wenatchee National Forest,1963052.0,0.952602,15.7073,"MULTIPOLYGON (((-121.17501 47.19079, -121.1750..."


In [60]:
# Create individual gdf for Uinta
uinta_forest_gdf = forest_gdf[forest_gdf['FORESTNAME']=='Uinta National Forest']

# Plot Uinta National Forest
uinta_forest_gdf.hvplot(
    # treat the plot as geographic and assume lat/lon coordinates
    geo = True,
    # overlay the plot on EsriImagery tiles
    tiles = 'EsriImagery',
    # set title
    title = 'Uinta National Forest',
    # set fill and line color
    fill_color = None, line_color = 'pink',
    frame_width = 300
)

In [61]:
# Create individual gdf for Kaibab
kaibab_forest_gdf = forest_gdf[forest_gdf['FORESTNAME']=='Kaibab National Forest']

# Plot Uinta National Forest
kaibab_forest_gdf.hvplot(
    # treat the plot as geographic and assume lat/lon coordinates
    geo = True,
    # overlay the plot on EsriImagery tiles
    tiles = 'EsriImagery',
    # set title
    title = 'Kaibab National Forest',
    # set fill and line color
    fill_color = None, line_color = 'blue',
    # set frame width
    frame_width = 300
)

In [62]:
# intersect aspen occurrence with forest_gdf
# only keep the occurrences that are in the forest_gdf shapefile
aspen_forest = gpd.overlay(gbif_gdf, forest_gdf, how = 'intersection')

In [64]:
# how many occurrences per site?
value_counts = aspen_forest['FORESTNAME'].value_counts()
value_counts

FORESTNAME
Coconino National Forest           269
Coronado National Forest           240
Kaibab National Forest             131
Uinta National Forest              129
Medicine Bow National Forest       121
Apache National Forest             114
Manti-La Sal National Forest        76
Sitgreaves National Forest          48
Wenatchee National Forest           39
Fremont National Forest             37
Tonto National Forest               32
Gifford Pinchot National Forest     31
Bighorn National Forest             29
Wallowa National Forest             25
Whitman National Forest             22
Prescott National Forest            22
Winema National Forest              19
Snoqualmie National Forest           8
Chugach National Forest              6
Mt. Baker National Forest            4
Angeles National Forest              1
Name: count, dtype: int64

In [67]:
# combine both sites into one
sites_gdf = gpd.GeoDataFrame(pd.concat([uinta_forest_gdf, kaibab_forest_gdf], ignore_index=True))
sites_gdf

Unnamed: 0,OBJECTID,PROCLAIMED,FORESTNAME,GIS_ACRES,SHAPE.AREA,SHAPE.LEN,geometry
0,200574,96813010328,Uinta National Forest,961743.024,0.411377,6.874785,"MULTIPOLYGON (((-111.55131 40.59378, -111.5513..."
1,200575,105935010328,Kaibab National Forest,1601002.978,0.647156,9.140002,"MULTIPOLYGON (((-112.06366 36.8781, -112.06366..."


In [68]:
# Plot both National Forests
sites_gdf.hvplot(
    # treat the plot as geographic and assume lat/lon coordinates
    geo = True,
    # overlay the plot on EsriImagery tiles
    tiles = 'EsriImagery',
    # set title
    title = 'Kaibab & Uinta National Forests',
    # set fill and line color
    fill_color = None, line_color = 'white',
    # set frame width
    frame_width = 400
)

### Species

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-task"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Try It</div></div><div class="callout-body-container callout-body"><p>Select the species you want to study, and research it’s habitat
parameters in scientific studies or other reliable sources. You will
want to look for reviews or overviews of the data, since an individual
study may not have the breadth needed for this purpose. In the US, the
National Resource Conservation Service can have helpful fact sheets
about different species. University Extension programs are also good
resources for summaries.</p>
<p>Based on your research, select soil, topographic, and climate
variables that you can use to determine if a particular location and
time period is a suitable habitat for your species.</p></div></div>

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-respond"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Reflect and Respond</div></div><div class="callout-body-container callout-body"><p>Write a description of your species. What habitat is it found in?
What is its geographic range? What, if any, are conservation threats to
the species? What data will shed the most light on habitat suitability
for this species?</p></div></div>

<span style="color: purple;">

Species: *Populus tremuloides*

Common Name: Quaking Aspen

The quaking aspen has some versatility in terms of it's possible habitats. It can live in 

</span>

### Sites

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-task"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Try It</div></div><div class="callout-body-container callout-body"><p>Select at least two site to study, such as two of the U.S. National
Grasslands. You can download the <a
href="https://data.fs.usda.gov/geodata/edw/edw_resources/shp/S_USA.NationalGrassland.zip">USFS
National Grassland Units</a> and select your study sites. Generate a
site map for each location.</p>
<p>When selecting your sites, you might want to look for places that are
marginally habitable for this species, since those locations will be
most likely to show changes due to climate.</p></div></div>

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-respond"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Reflect and Respond</div></div><div class="callout-body-container callout-body"><p>Write a site description for each of your sites, or for all of your
sites as a group if you have chosen a large number of linked sites. What
differences or trends do you expect to see among your sites?</p></div></div>

YOUR SITE DESCRIPTION HERE

### Time periods

In general when studying climate, we are interested in **climate
normals**, which are typically calculated from 30 years of data so that
they reflect the climate as a whole and not a single year which may be
anomalous. So if you are interested in the climate around 2050, download
at least data from 2035-2065.

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-respond"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Reflect and Respond</div></div><div class="callout-body-container callout-body"><p>Select at least two 30-year time periods to compare, such as
historical and 30 years into the future. These time periods should help
you to answer your scientific question.</p></div></div>

YOUR TIME PERIODS HERE

### Climate models

There is a great deal of uncertainty among the many global climate
models available. One way to work with the variety is by using an
**ensemble** of models to try to capture that uncertainty. This also
gives you an idea of the range of possible values you might expect! To
be most efficient with your time and computing resources, you can use a
subset of all the climate models available to you. However, for each
scenario, you should attempt to include models that are:

-   Warm and wet
-   Warm and dry
-   Cold and wet
-   Cold and dry

for each of your sites.

To figure out which climate models to use, you will need to access
summary data near your sites for each of the climate models. You can do
this using the [Climate Futures Toolbox Future Climate Scatter
tool](https://climatetoolbox.org/tool/Future-Climate-Scatter). There is
no need to write code to select your climate models, since this choice
is something that requires your judgement and only needs to be done
once.

If your question requires it, you can also choose to include multiple
climate variables, such as temperature and precipitation, and/or
multiple emissions scenarios, such as RCP4.5 and RCP8.5.

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-task"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Try It</div></div><div class="callout-body-container callout-body"><p>Choose at least 4 climate models that cover the range of possible
future climate variability at your sites. How did you choose?</p></div></div>

LIST THE CLIMATE MODELS YOU SELECTED HERE AND CITE THE CLIMATE TOOLBOX