# Mapping migration

Introduction to vector data operations

## STEP 0: Set up

To get started on this notebook, you’ll need to restore any variables
from previous notebooks to your workspace. To save time and memory, make
sure to specify which variables you want to load.

In [1]:
%store -r gbif_gdf ecoregions_gdf

:::

### Identify the ecoregion for each observation

You can combine the ecoregions and the observations **spatially** using
a method called `.sjoin()`, which stands for spatial join.

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-read"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Read More</div></div><div class="callout-body-container callout-body"><p>Check out the <a
href="https://geopandas.org/en/stable/docs/user_guide/mergingdata.html#spatial-joins"><code>geopandas</code>
documentation on spatial joins</a> to help you figure this one out. You
can also ask your favorite LLM (Large-Language Model, like ChatGPT)</p></div></div>

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-task"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Try It: Perform a spatial join</div></div><div class="callout-body-container callout-body"><p>Identify the correct values for the <code>how=</code> and
<code>predicate=</code> parameters of the spatial join.</p></div></div>

In [2]:
gbif_ecoregion_gdf = (
    ecoregions_gdf
    # Match the CRS of the GBIF data and the ecoregions
    .to_crs(gbif_gdf.crs)
    # Find ecoregion for each observation
    .sjoin(
        gbif_gdf,
        how='right', 
        predicate='contains')
)
gbif_ecoregion_gdf

Unnamed: 0,index_left,OBJECTID,ECO_NAME,BIOME_NUM,BIOME_NAME,REALM,ECO_BIOME_,NNH,ECO_ID,SHAPE_LENG,SHAPE_AREA,NNH_NAME,COLOR,COLOR_BIO,COLOR_NNH,LICENSE,gbifID,month,geometry
0,160.0,162.0,Chiapas Depression dry forests,2.0,Tropical & Subtropical Dry Broadleaf Forests,Neotropic,NO02,4.0,528.0,13.574493,1.181290,Nature Imperiled,#76D40A,#CCCD65,#EE1E23,CC-BY 4.0,5840232682,3,POINT (-93.09082 16.75711)
1,793.0,799.0,Willamette Valley oak savanna,8.0,"Temperate Grasslands, Savannas & Shrublands",Nearctic,NE08,4.0,403.0,12.142545,1.695309,Nature Imperiled,#A87001,#FEFF73,#EE1E23,CC-BY 4.0,5716113565,5,POINT (-123.17495 44.07056)
2,141.0,143.0,Central Mexican matorral,13.0,Deserts & Xeric Shrublands,Nearctic,NE13,4.0,427.0,54.694349,5.157283,Nature Imperiled,#CC9141,#CC6767,#EE1E23,CC-BY 4.0,5716040825,2,POINT (-99.09634 19.29198)
3,624.0,629.0,Sierra Madre de Oaxaca pine-oak forests,3.0,Tropical & Subtropical Coniferous Forests,Neotropic,NO03,2.0,557.0,24.173893,1.215253,Nature Could Reach Half Protected,#00421C,#88CE66,#7BC141,CC-BY 4.0,5715916092,11,POINT (-96.12075 17.01428)
4,718.0,724.0,Talamancan montane forests,1.0,Tropical & Subtropical Moist Broadleaf Forests,Neotropic,NO01,1.0,506.0,36.135686,1.339368,Half Protected,#4C7300,#38A700,#257339,CC-BY 4.0,5715897879,6,POINT (-85.01513 10.71931)
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
46085,341.0,345.0,Isthmian-Atlantic moist forests,1.0,Tropical & Subtropical Moist Broadleaf Forests,Neotropic,NO01,2.0,470.0,51.187393,4.816350,Nature Could Reach Half Protected,#23DB01,#38A700,#7BC141,CC-BY 4.0,4028988425,1,POINT (-79.64917 9.07761)
46086,804.0,810.0,Yucatán moist forests,1.0,Tropical & Subtropical Moist Broadleaf Forests,Neotropic,NO01,2.0,519.0,27.741105,5.968650,Nature Could Reach Half Protected,#6C9400,#38A700,#7BC141,CC-BY 4.0,4028921985,1,POINT (-86.96908 20.47673)
46087,120.0,122.0,Central American pine-oak forests,3.0,Tropical & Subtropical Coniferous Forests,Neotropic,NO03,2.0,553.0,130.831169,9.312385,Nature Could Reach Half Protected,#267300,#88CE66,#7BC141,CC-BY 4.0,4028867559,2,POINT (-89.0445 13.66092)
46088,118.0,120.0,Central American dry forests,2.0,Tropical & Subtropical Dry Broadleaf Forests,Neotropic,NO02,3.0,527.0,96.749052,5.654780,Nature Could Recover,#DCF63D,#CCCD65,#F9A91B,CC-BY 4.0,4022369471,1,POINT (-89.22574 13.54412)


### Count the observations in each ecoregion each month

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-task"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Try It: Group observations by ecoregion</div></div><div class="callout-body-container callout-body"><ol type="1">
<li>Replace <code>columns_to_group_by</code> with a list of columns.
Keep in mind that you will end up with one row for each group – you want
to count the observations in each ecoregion by month.</li>
<li>Select only month/ecosystem combinations that have more than one
occurrence recorded, since a single occurrence could be an error.</li>
<li>Use the <code>.groupby()</code> and <code>.mean()</code> methods to
compute the mean occurrences by ecoregion and by month.</li>
<li>Run the code – it will normalize the number of occurrences by month
and ecoretion.</li>
</ol></div></div>

In [3]:
occurrence_df = (
    gbif_ecoregion_gdf
    # Select only necessary columns
    [["ECO_NAME","month","gbifID"]]
    # For each ecoregion, for each month...
    .groupby(["ECO_NAME","month"])
    # ...count the number of occurrences
    .agg(occurrences=("gbifID", "count"))
    .reset_index()
)

# Get rid of rare observations (possible misidentification?)
occurrence_df = occurrence_df[occurrence_df["occurrences"] > 1]

# Take the mean by ecoregion
mean_occurrences_by_ecoregion = (
    occurrence_df
    .groupby("ECO_NAME", as_index=False)["occurrences"]
    .mean()
    .rename(columns={"occurrences": "mean_occurrences"})
)
# Take the mean by month
mean_occurrences_by_month = (
    occurrence_df
    .groupby("month", as_index=False)["occurrences"]
    .mean()
    .rename(columns={"occurrences": "mean_occurrences"})
)

### Normalize the observations

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-task"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Try It: Normalize</div></div><div class="callout-body-container callout-body"><ol type="1">
<li>Divide occurrences by the mean occurrences by month AND the mean
occurrences by ecoregion</li>
</ol></div></div>

In [4]:
# Merge means onto the counts
occurrence_df = (
    occurrence_df
    .merge(
        mean_occurrences_by_month.rename(columns={"mean_occurrences": "mean_by_month"}),
        on="month", how="left"
    )
    .merge(
        mean_occurrences_by_ecoregion.rename(columns={"mean_occurrences": "mean_by_ecoregion"}),
        on="ECO_NAME", how="left"
    )
)

# Normalize by space (ecoregion) and time (month)
occurrence_df["norm_occurrences"] = (
    occurrence_df["occurrences"] /
    (occurrence_df["mean_by_month"] * occurrence_df["mean_by_ecoregion"])
)

occurrence_df.head()

Unnamed: 0,ECO_NAME,month,occurrences,mean_by_month,mean_by_ecoregion,norm_occurrences
0,Alaska-St. Elias Range tundra,6,3,112.854167,3.0,0.008861
1,Arizona Mountains forests,5,4,150.639344,13.0,0.002043
2,Arizona Mountains forests,9,29,103.4,13.0,0.021574
3,Arizona Mountains forests,10,6,43.714286,13.0,0.010558
4,Bajío dry forests,9,2,103.4,2.5,0.007737


<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-task"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Try It</div></div><div class="callout-body-container callout-body"><p>Make sure to store the new version of your <code>DataFrame</code> for
other notebooks!</p>
<div id="f13606e9" class="cell" data-execution_count="9">
<div class="sourceCode" id="cb1"><pre
class="sourceCode python cell-code"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="op">%</span>store occurrence_df</span></code></pre></div>
</div></div></div>

# STEP -1: Wrap up

Don’t forget to store your variables so you can use them in other
notebooks! Replace `var1` and `var2` with the variable you want to save,
separated by spaces.

In [5]:
%store occurrence_df

Stored 'occurrence_df' (DataFrame)


Finally, be sure to `Restart` and `Run all` to make sure your notebook
works all the way through!