# Biogeography Notebook 2

The goal of this notebook is to access and integrate diverse data sets to visualize correlations and discover patterns to address questions of species’ responses to environmental change. We will use programmatic tools to show how to use Berkeley resources such as the biodiversity data from biocollections and online databases, field stations, climate models, and other environmental data.

This notebook is a continuation of [Biogeography Notebook 1](
http://datahub.berkeley.edu/user-redirect/interact?account=ds-modules&repo=IB-ESPM-105&branch=master&path=fall2019/notebook1.ipynb).

If you have any questions getting the Jupyter notebook to run, try dropping into [data peer consulting](https://data.berkeley.edu/education/data-peer-consulting).

## Table of Contents

3 - [Mapping](#mapping)

4 - [Comparing California Oak Species](#oak)

5 - [Cal-Adapt](#caladapt)

## Helpful Reminders: ###

### Text cells
In a notebook, each rectangle containing text or code is called a *cell*.

Text cells (like this one) can be edited by double-clicking on them. They're written in a simple format called [Markdown](http://daringfireball.net/projects/markdown/syntax) to add formatting and section headings.  You don't need to learn Markdown, but you might want to.

After you edit a text cell, click the "run cell" button at the top to confirm any changes. (Try not to delete the instructions of the lab.)

**The only text cells that need to be modified are labeled "YOUR RESPONSE HERE" and are right below yellow question boxes. To edit a response, double click on YOUR RESPONSE HERE and type in your answer. Afterwards, run the cell with Shift-Enter.**

### Code cells
Other cells contain code in the Python 3 language. Running a code cell will execute all of the code it contains.

To run the code in a code cell, first click on that cell to activate it.  It'll be highlighted with a little green or blue rectangle.  Next, either press the Run button or hold down the `shift` key and press `return` or `enter`.

The only code cells that need to be modified are right below a blue exercise box.

### Comments
Comments are statements in English that the computer ignores. We use comments to explain what the surrounding code does. Comments appear in green after the `#` symbol like below:

In [None]:
1 + 2 # After you run this, you should see 3 as the output

Run this cell to set up the programming environment. It will take a few seconds.

In [None]:
%%capture
!pip install --no-cache-dir shapely
!pip install -U folium

%matplotlib inline

import pandas as pd
import folium
import json
from pandas import json_normalize

from shapely.geometry import Point, mapping
from shapely.geometry.polygon import Polygon
from shapely import geometry as sg, wkt
from scripts.espm_module import *
from IPython.core.display import display, HTML
import matplotlib.pyplot as plt
import otter
from otter.export import export_notebook

plt.style.use('seaborn')

# Part 3: Mapping  <a id='mapping'></a>

In programming, we often reuse chunks of code. So instead of copy/pasting it and repeating the same code over and over again, we have something called a **function**, which gives a name to a block of code. This allows us to just call the function instead of rewriting code we used before.

For example, this is a function that squares an input.

In [None]:
# This code creates a function named square
def square(n):
    return n * n

In [None]:
# Let's find the square of 5
square(5)

In [None]:
# Let's try it with -3
square(-3)

Our use of functions later in the notebook is more complex than this example. We will use them in order to reduce the amount of code in this notebook. For now, you can just ignore the details and structure of how functions work. Just remember that a **function** is a shortcut to easily re-run old code and that the `def` keyword means we are creating a function.

---

These functions get the species record from the API (like in Part 2). The function `get_species_record` gives us the raw/unorganized records, while `get_species_records_df` gives us the data in a **DataFrame** (table of data). It uses the same commands as we used in Part 2.

In [None]:
def get_species_records(scientific_name):
    req = GBIFRequest()  # creating a request to the API
    params = {'scientificName': scientific_name}  # setting our parameters (the specific species we want)
    pages = req.get_pages(params)  # using those parameters to complete the request
    records = [rec for page in pages for rec in page['results'] if rec.get('decimalLatitude')]  # sift out valid records
    return records
    
def get_species_records_df(scientific_name):
    records = get_species_records(scientific_name) # Get the records using the function above
    records_df = json_normalize(records) # Convert the raw records into a DataFrame
    return records_df

This creates the **DataFrame** (table of data) we used in Part 2 using one of the functions we defined in the cell above. It will take a few seconds to get the records from the API.

In [None]:
argia_agrioides_df = get_species_records_df('Argia agrioides')
argia_agrioides_df.head() # Show the first 5 records

Since we are about to map all of the _Argia agrioides_ specimen by their collection, let's assign each collection a color. These colors are chosen randomly each time the cell is run so you can re-run the cell if you don't like them.

In [None]:
color_dict, html_key = assign_colors(argia_agrioides_df, 'collectionCode')
display(HTML(html_key))

Folium is a useful library for generating map visualizations. Here, we create a function that handles the Folium mapping for us.

In [None]:
# This function generates a map visualization using data from species_df and child (if a value is given)
# Grouping Criteria tells Folium how to group specimen by color (ex. by collection or by species)
# Child is any secondary data we want to display (ex. UC Reserve boundaries)
def map_species_with_folium(species_df, grouping_criteria, child=None):
    map = folium.Map(location=[37.359276, -122.179626], zoom_start=5) # Creates the starting map location & zoom
    if child: # If a child is given, add it to the map
        map.add_child(child)
    for r in species_df.iterrows(): # For ever specimen in the species record, do the following:
        lat, long = r[1]['decimalLatitude'], r[1]['decimalLongitude'] # Get the specimen latitude/longitude
        # Add the specimen to the map
        folium.CircleMarker((lat, long), color=color_dict[r[1][grouping_criteria]]).add_to(map)
    return map

Let's map the _Argia agrioides_ specimen distribution using the function we just created.

In [None]:
argia_agrioides_map = map_species_with_folium(argia_agrioides_df, 'collectionCode')
argia_agrioides_map

---

Let's map the distribution of _Argia agrioides_ with the boundaries of UC Reserves.

To get the boundaries for all the reserves, we will need to send a request to get GeoJSON, which is a format for encoding a variety of geographic data structures. With this code, we can request GeoJSON for all reserves and plot ocurrences of the species.

First, we'll assign the API URL that has the data to a new variable `url`. Then, we make the requests just like we did earlier through the GBIF. You'll see a huge mess of mostly numbers. This is a JSON of all the UC Reserves and the coordinates of their boundaries.

In [None]:
url = 'https://ecoengine.berkeley.edu/api/layers/reserves/features/'
reserves = requests.get(url, params={'page_size': 30}).json()
reserves

There are some reserves that the EcoEngine didn't catch. We'll add the information for "Blodgett", "Hopland", and "Sagehen" manually.

In [None]:
station_urls = {
    'Blodgett Reserve': 'https://raw.githubusercontent.com/BNHM/spatial-layers/master/wkt/BlodgettForestResearchStation.wkt',
    'Hopland Reserve': 'https://raw.githubusercontent.com/BNHM/spatial-layers/master/wkt/HoplandResearchAndExtensionCenter.wkt',
    'Sagehen Reserve': 'https://raw.githubusercontent.com/BNHM/spatial-layers/master/wkt/SagehenCreekFieldStation.wkt'
}
reserves['features'] += [{'type': 'Feature', 'properties': {'name': name}, 'geometry':
                          mapping(wkt.loads(requests.get(url).text))} for name, url in station_urls.items()]

This code goes through our list of reserves and outputs their names. Make sure "Blodgett", "Hopland", and "Sagehen" are included!

In [None]:
[r['properties']['name'] for r in reserves['features']]

We can send this `geojson` directly to our mapping library `folium`. We already defined a function to do this for us, so the code is much shorter. You'll have to zoom in, but you should see blue outlined areas. Those are the reserves!

In [None]:
reserve_points = folium.features.GeoJson(reserves) # This tells Folium our reserve boundaries
argia_agrioides_and_reserves_map = map_species_with_folium(argia_agrioides_df, 'collectionCode', child=reserve_points)
argia_agrioides_and_reserves_map

**To answer the question, double click on YOUR RESPONSE HERE. Then run the cell afterwards.**

<!-- BEGIN QUESTION -->

<div class="alert alert-block alert-warning">
    <b>QUESTION 1:</b>
    <br />
    The UC Reserves are a tremendous resource for researchers and students. You can zoom in to make the reserve boundaries more visible and see the geographic characteristics of each reserve. 
    <br />
    Find one reserve where <i>A. agrioides</i> was collected. Do the characteristics of the reserve fit with what you know about the biology of <i>Agria agrioides</i> (mainly lower elevation, riparian zone). Is there another reserve that also seems like it is a suitable habitat?
</div>



YOUR RESPONSE HERE

<!-- END QUESTION -->

---

Now that we've mapped the _Agria agrioides_ specimen, let's do that with a different species.

<!-- BEGIN QUESTION -->

<div class="alert alert-block alert-info">
    <b>EXERCISE 1:</b>
    <br />
    Pick a species and replace ... with its scientific name. Make sure to add quotation marks around the name!
</div>

**Hint:** Here's what the code looks like if we used _Argia agrioides_ again:
```
my_species_df = get_species_records_df('Argia agrioides')
my_species_df.head()
```

In [None]:
# print pdf
my_species_df = get_species_records_df('...')
my_species_df.head() # Show the first 5 records

<!-- END QUESTION -->

If the output above doesn't contain a table, that means either you didn't enter a name or the scientific name isn't in the database. Make sure you typed it correctly without abbreviating the species name. You might also have to use a different capitalization.

<!-- BEGIN QUESTION -->


<div class="alert alert-block alert-info">
    <b>EXERCISE 2:</b>
    <br />
    Assign colors to each collection by replacing ... with the name of the DataFrame we just created (my_species_df). Make sure you <b>don't</b> add quotation marks this time! Also, be careful to not accidentally delete the comma!
</div>

**Hint:** Here's what the code looks like with the `argia_agrioides_df` DataFrame:
```
color_dict, html_key = assign_colors(argia_agrioides_df, 'collectionCode')
display(HTML(html_key))
```

In [None]:
# print in pdf
color_dict, html_key = assign_colors(my_species_df, 'collectionCode')
display(HTML(html_key))

<!-- END QUESTION -->


Let's map your species with Folium!

<!-- BEGIN QUESTION -->


<div class="alert alert-block alert-info">
    <b>EXERCISE 3:</b>
    <br />
    Now let's map your species. Replace ... with the name of the DataFrame we just created (my_species_df). Make sure you <b>don't</b> add quotation marks this time! Also, be careful to not accidentally delete the comma!
</div>

**Hint:** Here's what the code looks like with the `argia_agrioides_df` DataFrame:
```
reserve_points = folium.features.GeoJson(reserves) # Adds reserve boundaries
my_species_map = map_species_with_folium(argia_agrioides_df, 'collectionCode', child=reserve_points)
my_species_map
```

In [None]:
#print out pdf
reserve_points = folium.features.GeoJson(reserves) # Adds reserve boundaries
my_species_map = map_species_with_folium(..., 'collectionCode', child=reserve_points)
my_species_map

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->



<div class="alert alert-block alert-warning">
    <b>QUESTION 2:</b>
    <br />
    Make some inferences about the biology of your mapped organism from the mapped distribution. Consider the end of the species range. What conditions (both biotic and abiotic) might be limiting the range?
</div>


YOUR RESPONSE HERE

<!-- END QUESTION -->

---

Now let's go back to looking at _Argia argrioides_.

We can also find out which stations have how many _Argia argrioides_. First we'll have to add a column to our DataFrame that makes points out of the latitude and longitude coordinates.

In [None]:
station_df = argia_agrioides_df

def make_point(row):
    return Point(row['decimalLongitude'], row['decimalLatitude'])

station_df['point'] = station_df.apply(lambda row: make_point (row), axis=1)

Now we can write a little function to check whether that point is in one of the stations, and if it is, we'll add that station in a new column called `station`. Then we'll apply that function the DataFrame.

In [None]:
def in_station(reserves, row):
    reserve_polygons = []
    for r in reserves['features']:
        name, poly = r['properties']['name'], sg.shape(r['geometry'])
        reserve_polygons.append({'id': name, 'geometry': poly})
    sid = False
    for r in reserve_polygons:
        if r['geometry'].contains(row['point']):
            sid = r['id']
    return sid

station_df['station'] = station_df.apply(lambda row: in_station(reserves, row),axis=1)
in_stations_df = station_df[station_df['station'] != False]
in_stations_df.head()

Let's see if this corresponds to what we observed on the map:

In [None]:
in_stations_df.groupby(['species', 'station'])['station'].count().unstack().plot.barh(stacked=True);

---

# Part 4: Comparing California Oak Species  <a id='oak'></a>

Instead of investigating just one species, let’s compare several different species.

California oaks are common woody plants across North America. Almost all oaks are trees, but in drier areas they can be found as shrub oaks in poorer soils. Thus, they can be a great model system to illuminate the processes of speciation, adaptation, and expression. Let’s see their distribution!

<table style='center'>
  <tr>
    <td style="text-align: center; vertical-align: middle;">
        <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/6d/Large_Blue_Oak.jpg/220px-Large_Blue_Oak.jpg" alt="Quercus douglassi" />
        <br />
        Quercus douglassi
    </td>
    <td style="text-align: center; vertical-align: middle;">
        <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/86/Valley_Oak_Mount_Diablo.jpg/220px-Valley_Oak_Mount_Diablo.jpg" alt="Quercus lobata" />
        <br />
        Quercus lobata
    </td>
  </tr>
  <tr>
    <td style="text-align: center; vertical-align: middle;">
        <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/b/b8/Quercusduratadurata.jpg/220px-Quercusduratadurata.jpg" alt="Quercus durata" />
        <br />
        Quercus durata
    </td>
    <td style="text-align: center; vertical-align: middle;">
        <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/d1/Quercus_agrifolia_foliage.jpg/220px-Quercus_agrifolia_foliage.jpg" alt="Quercus agrifolia" />
        <br />
        Quercus agrifolia
    </td>
  </tr>
</table>

Let's get the California oak records. This cell will take a while to run (about 30 seconds).

In [None]:
species_dfs = []
species = ['Quercus douglassi', 'Quercus lobata', 'Quercus durata', 'Quercus agrifolia']

# Here, we're getting the species record from the API for the four species of oak trees listed above
for s in species:
    species_dfs.append(get_species_records_df(s))

# Combine the data we received into one DataFrame
oak_df = pd.concat(species_dfs, axis=0, sort=True)

In [None]:
oak_df.head() # Show the first 5 rows of our data

The table above only shows us the first 5 rows. Run the cell below to see how many total records we have.

In [None]:
len(oak_df)

Let's see how those records are distributed by species.

In [None]:
# This creates a bar graph showing the distribution of species in our records
oak_df['scientificName'].value_counts().plot.barh();

We can also map these like we did with the *Argia arioides* above:

In [None]:
color_dict, html_key = assign_colors(oak_df, 'scientificName')
display(HTML(html_key))

In [None]:
oak_map = map_species_with_folium(oak_df, 'scientificName', child=folium.features.GeoJson(reserves))
oak_map


<!-- BEGIN QUESTION -->

<div class="alert alert-block alert-warning">
    <b>QUESTION 3:</b>
    <br />
    Examine the map you generated of <i>Quercus spp</i>. In some places the geographic range of each species overlaps and in other parts of California the range is non-overlapping. Discuss factors that create this patterning in the oak community. Include concepts of niche and competitive exclusion.
</div>


YOUR RESPONSE HERE

<!-- END QUESTION -->


---

# Part 5: Cal-Adapt  <a id='caladapt'></a>

Let's go back to the data from _Argia agrioides_ with the GBIF API. This will take a few seconds. The output is also really long. Remember you can click the area to the left of the cell (below the red `Out[ ]`) to expand/collapse the output.

In [None]:
# Get the first five records in raw text form (rather than DataFrame)
argia_agrioides_records = get_species_records('Argia agrioides')
argia_agrioides_records[:5] # Show the first 5 records
# This looks different from the records from our earlier records because these are the raw records

Now we will use the [Cal-Adapt](http://www.cal-adapt.org/) Web API to work with time series raster data. It will request an entire time series for any geometry and return a DataFrame for each record in all of our _Argia agrioides_ records. This cell also takes a while to run (1-3 minutes).

In [None]:
req = CalAdaptRequest()
record_geometry = [dict(rec, geometry=sg.Point(rec['decimalLongitude'], rec['decimalLatitude']))
             for rec in argia_agrioides_records]
ca_df = req.concat_features(record_geometry, 'gbifID')
ca_df.head() # Show the first five rows


<!-- BEGIN QUESTION -->

<div class="alert alert-block alert-warning">
    <b>QUESTION 4:</b>
    <br />
    What is Cal-Adapt? What can it be used for?
</div>


YOUR RESPONSE HERE

<!-- END QUESTION -->


<!-- END QUESTION -->
This looks like the time series data we want for each record (the unique ID numbers as the columns). Each record has the projected temperature in Fahrenheit for 170 years (every row!). We can plot predictions for few random records:

In [None]:
# Make a line plot using the first 9 columns of df.
ca_df.iloc[:,:9].plot();

# Use matplotlib to title your plot.
plt.title('Argia agrioides - %s' % req.slug)

# Use matplotlib to add labels to the x and y axes of your plot.
plt.xlabel('Year', fontsize=18)
plt.ylabel('Degrees (Fahrenheit)', fontsize=16);

It looks like temperature is increasing across the board wherever these observations are occuring. We can calculate the average temperature for each year across observations in California:

In [None]:
tmax_means = ca_df.mean(axis=1)
# This is just some Pandas code to make the data prettier
pd.DataFrame(tmax_means).reset_index().rename(columns={'event':'Year', 0:'Avg Projected Temp'})

What's happening to the average temperature that *Argia agrioides* is going to experience in the coming years across California?

In [None]:
tmax_means.plot();


<!-- BEGIN QUESTION -->

<div class="alert alert-block alert-warning">
    <b>QUESTION 5:</b>
    <br />
    Is there a temperature at which the <i>Argia agrioides</i> cannot survive? Is there one in which they particularly thrive?
</div>


YOUR RESPONSE HERE

<!-- END QUESTION -->


<!-- BEGIN QUESTION -->

<div class="alert alert-block alert-warning">
    <b>QUESTION 6:</b>
    <br />
    What does this tell you about Santa Cruz Island? As time goes on and the temperature increases, might Santa Cruz Island serve as a refuge for <i>Argia agrioides</i>?
</div>


YOUR RESPONSE HERE

<!-- END QUESTION -->

---

**Make sure that you've answered questions 1-5. Also make sure you've done all 3 exercises.**

In [None]:
#right click on the link and open in new tab
from IPython.display import display, HTML
export_notebook("notebook2.ipynb", filtering = True, pagebreaks = False)
display(HTML("Download your PDF <a href='notebook2.pdf' download>here</a>."))

You are finished with this notebook! Please run the above cell to generate a download link for your submission file.

If the download link does not work, open a new tab and go to https://datahub.berkeley.edu, click the box next to `notebook2_submission.pdf`, then click the "Download" link below the menu bar.

**Check the PDF before submitting and make sure all of your answers & code changes are shown.**

---

Notebook developed by: Michelle Koo, Nina Pak, Natalie Graham, Monica Wilkinson, Andy Sheu, Harry Li

[Data Science Modules](http://data.berkeley.edu/education/modules)