Last updated: 27 Jul 2018

# Exploring US Endangered Species Data with Python

This notebook retrieves data from the US Fish & Wildlife Service on endangered and threatened species listed under the Endangered Species Act. It runs a simple analysis and returns a graph of species added under the ESA over time.

In [None]:
%matplotlib inline
from bs4 import BeautifulSoup 
import pandas as pd
import re, requests
import matplotlib.pyplot as plt
plt.rcParams['font.size'] = 16
plt.rcParams['figure.figsize'] = [16, 8]

## Data source

Data are retrieved from the U.S. Fish & Wildlife Service [Environmental Conservation Online System](https://ecos.fws.gov/ecp/), which reports on Threatened & Endangered species in the United States that are protected by the Endangered Species Act (ESA). 

There are a number of different reports are available here: https://ecos.fws.gov/ecp/species-reports. They are mostly provided as sortable tables, viewable online.

## What question are we asking?

After looking at some of the available datasets, it seems like there are data available on how many requests were made to list species, and how many species were officially listed.

For the purposes of this analysis, let's ask the following question:

> *How has the rate of species being listed under the ESA changed over time?*

## Downloading listing years

We can use the `requests` and `BeautifulSoup` libraries to scrape the [U.S. Federal Endangered and Threatened Species by Calendar Year](https://ecos.fws.gov/ecp0/reports/species-listings-count-by-year-report) report to get the years that species were listed under the ESA.

In [None]:
url = 'https://ecos.fws.gov/ecp0/reports/species-listings-count-by-year-report'
r = requests.get(url)
soup = BeautifulSoup(r.text, "html5lib")
rows = soup.table.find_all('tr')
listing_years = set()
for row in rows:
    values = row.find_all('td')
    if len(values) > 0:
        listing_years.add(values[0].string)
listing_years = sorted(list(listing_years))
print('Years when species were listed: {}'.format(", ".join(listing_years)))

## Downloading species listed for each year

We can take the list of years we determined from the last step, and again use `requests` and `BeautifulSoup` to query the [Species Listed During Calendar Year](https://ecos.fws.gov/ecp0/reports/species-listings-by-year-report?year=2018) reports for each year in `listing_years`.

In [None]:
url = 'https://ecos.fws.gov/ecp0/reports/species-listings-by-year-report'
all_species = []
for y in listing_years:
    params = {'year': y}
    r = requests.get(url, params)
    soup = BeautifulSoup(r.text, "html5lib")
    rows = soup.table.find_all('tr')
    for row in rows:
        species = {}
        data = row.find_all('td')
        if len(data) > 0:
            splink = data[2].a['href']
            spcode = re.findall('spcode=(\w{4})', splink)[0]
            species['spcode'] = spcode
            species['sciname'] = data[2].string
            species['status'] = data[3].string
            species['listyear'] = y
            all_species.append(species)
print(f'{len(all_species)} species records read from Species Listed During Calendar Year')

Let's take the `all_species` records and put them into a `pandas` DataFrame. There are a number of duplicate values in the DataFrame that we can also drop.

In [None]:
df_species = pd.DataFrame(all_species)
df_species = df_species[['spcode', 'sciname', 'status', 'listyear']]
df_species = df_species.drop_duplicates()
print(f'{len(df_species)} species records remaining after dropping duplicates')
df_species.head()

Let's look at the numbers of species listed in each year.

In [None]:
year_counts = df_species['listyear'].value_counts().sort_index()
plt.rcParams['figure.figsize'] = [16, 8]
year_counts.plot(kind='bar', color='orange');

Let's also look at the percentages of each current conservation status for all species

In [None]:
status_count = df_species['status'].value_counts().sort_index()
plt.rcParams['figure.figsize'] = [8, 8]
status_count.plot(kind='pie', autopct='%1.1f%%');

## Downloading petition data

One of the ways that species get listed under the ESA is by petition from the public. ECOS makes these petitions available in a nice JSON format for us.

In [None]:
url = 'https://ecos.fws.gov/ecp/report/table/petitions-received.json'
params = {'active': 'any'}
r = requests.get(url, params).json()
print('{} petition records downloaded'.format(len(r['data'])))

In [None]:
columnheaders = [x['title'] for x in r['metadata']['columns']]
print('Fields available in petition data:\n{}'.format("; ".join(columnheaders)))

Let's put the petition data into a DataFrame as well.

In [None]:
df_petitions = pd.DataFrame(r['data'], columns=columnheaders)
df_petitions.head()

The `Date Received by the FWS` field contains the date that the petition was submitted. We can add a column to the petition data that contains the year component of this date.

In [None]:
df_petitions['year_received'] = df_petitions.apply(lambda x: x['Date Received by the FWS'][-4:], axis=1)
df_petitions.head()

Some petitions request that multiple species be listed under the ESA. We can find these exceptions if we look in the `Petition Title` field. We can use a function to check if the Petition Title refers to multiple species. 

In [None]:
def parseSpeciesNumber(petitionTitle):
    nspecies = re.findall('\d+', petitionTitle)
    if len(nspecies) > 0:
        return int(nspecies[0])
    else:
        return 1

In [None]:
df_petitions['n_species'] = df_petitions.apply(lambda x: parseSpeciesNumber(x['Petition Title']), axis=1)
df_petitions.at[576, 'n_species'] = 1
df_petitions.at[577, 'n_species'] = 1
df_petitions.at[582, 'n_species'] = 1
df_petitions.at[598, 'n_species'] = 1
df_petitions.at[715, 'n_species'] = 1
df_petitions.sort_values(by='n_species', ascending=False).head()

Let's look at a bar graph to see the number of species petitioned per year.

In [None]:
petitions_per_year = df_petitions[['year_received', 'n_species']].groupby('year_received').sum()
plt.rcParams['figure.figsize'] = [16, 8]
petitions_per_year.plot(kind='bar', color='blue');

## Combining species and petition data

We can merge the two DataFrames that summarize species and petitions by year.

In [None]:
df_merged = pd.concat([petitions_per_year, year_counts], axis=1, sort=True).fillna(0)
df_merged = df_merged.rename(index=str, columns={"listyear": "Species Added", "n_species": "Species Petitioned"})
df_merged.head()

We can plot the merged DataFrame to see how the species added and the species petitioned compare each year.

In [None]:
df_merged.plot.bar();

## Conclusions

There seem to be a couple broad trends we can draw from all our analyses.

- Looking at the numbers of petitions per year, it seems that petitioning to add species to the ESA has gotten more common over time. There are few petitions before 1990, and what seem to be a significant amount after 2000. 


- Looking at the species added per year, the data seem to be multimodal. Perhaps this could reflect the political climate of the time, and the executive branch's views on protecting endangered and threatened species?

And there are more analyses we could pursue, if we wanted to delve more deeply:

- Explore the actual `Petition Finding(s)` from the petition data. How successful are petitions?


- Do petitions or FWS scientists have a greater impact on getting species listed under the ESA?


- How do the types of species (plant, animal, invertebrate, etc.) affect the success of a petition, or the likelihood that a species will be listed?

Feel free to clone this notebook and do some of these further analyses, if you are interested in learning more!