In [None]:
# Run this cell to set up your notebook
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# The Ocean Health Index

The goal of the ocean health index is to give scientists a method of accessing the state of oceans over time and to make predictions for the future. Scientists, economists, and sociologists used surveys to determine specific factors that people want and expect from the ocean. Then, these factors were grouped into ten categories called "goals". Each goal is scored out of 100 on whether or not the goals were maximized without compromising the ocean's ability to deliever those same benefits in the future. Read more about the ocean health index here: https://oceanhealthindex.org/methodology/

In [None]:
# load the data
ohi = pd.read_csv("https://oceanhealthindex.org/data/scores.csv")
ohi

The "goal" column consists of 2 letter abbreviations of one of the 10 goals I described above. This website describes each abbreviation and what they mean: https://ohi-science.org/ohiprep_v2021/Reference/methods_and_results/Supplement_Results.html

In [None]:
# Here are all of the different goal abbreviations. 
# Go to the link above to read what they stand for.
ohi['goal'].unique()

In [None]:
ohi['region_name'].unique()

## Analyzing distributions of ocean health indexes

Let's focus on data from 2021. In the cell below, isolate rows where the "scenario" column is 2021, and drop 0 and NaN values. Then, plot a historgam showing the distributions of all ocean health indexes.

In [None]:
# isolate data from the year 2021 
only2021 = ohi[ohi['scenario'] == 2021]

# remove 0 and NaNvalues
only2021 = only2021[only2021['value'] != 0].dropna()

# plot a histogram showing the percent of values that fall within each bin
sns.histplot(data=only2021, x='value', binwidth=5, color='red')
plt.xlabel('Ocean Health Index')
plt.title('Number of Ocean Health Indexes')

Above we can see the distribution of ocean health indexes in 2021. Look at the histogram above. What do you notice? 

Let's compute the mean ocean health index below. 

In [None]:
avg = np.mean(only2021['value'])
print('The average ocean health index is', avg)

Above, we described that there are seperate goals with their own indexes. Let's create a histogram showing the distribution of indexes for the clean water goal.

In [None]:
clean_water = ohi[ohi['goal'] == 'CW']
clean_water = clean_water[clean_water['scenario'] == 2021]
sns.histplot(data=clean_water, x='value', binwidth=5) 
plt.xlabel('Ocean Health Index')
plt.title('Distribution of ocean health indexes for clean water goal')

Do the same thing for a few more goals.

In [None]:
tourism = ohi[ohi['goal'] == 'TR']
tourism = tourism[tourism['scenario'] == 2021]
sns.histplot(data=tourism, x='value', binwidth=5, color='purple') 
plt.xlabel('Ocean Health Index')
plt.title('Distribution of ocean health indexes for tourism and recreation goal')

In [None]:
food = ohi[ohi['goal'] == 'FP']
food = food[food['scenario'] == 2021]
sns.histplot(data=food, x='value', binwidth=5, color='brown') 
plt.xlabel('Ocean Health Index')
plt.title('Distribution of ocean health indexes for food provision goal')

## Analyzing change in ocean health indexes over time

In [None]:
# make a new df with year in one column, average score in next column
vals = ohi.groupby('scenario')['value'].agg(np.mean)
averages = pd.DataFrame(vals).reset_index()
averages

Let's visualize the change in average ocean health indexes over time. Create a line plot below.

In [None]:
sns.lineplot(data=averages, x='scenario', y='value')
plt.xlabel('Year')
plt.ylabel('Averge ocean health indexes')
plt.title('Average ocean health indexes over time')

Now, we calculate the change in indexes over time. Create a new column "differences" in the data frame averages that calculates the difference in indexes from row i+1 and row i (the difference between each row and the one above it). Don't worry too much about understanding the code, just run the next cell.

In [None]:
diff = np.array([0])
for i in np.arange(len(averages)-1):
    x = averages.iloc[i+1, 1] - averages.iloc[i,1]
    diff = np.append(diff,x)
averages['differences'] = diff #create a new column in table
averages

Let's visualize the change in the differences over time below with a line graph.

In [None]:
sns.lineplot(data=averages, x='scenario', y='differences')
plt.xlabel('year')
plt.title('Average differences in scores per year')

Look at the plot you created above. Notice that the average difference among scores decreases starting around 2013. What does that mean about the ocean health indexes? Are the indexes increasing or decreasing? Are they increasing/decreasing at the same rate over time?

## Lets Look at French Polynesia

Let's look at French Polynesia by looking at the region_name and sub-setting the data based on the region being equal to French Polynesia



In [None]:
# isolate data from the year 2021 
FP = ohi[ohi['region_name'] == 'French Polynesia']

# remove 0 and NaNvalues
FP = FP[FP['value'] != 0].dropna()
FP

In [None]:
FP_score = FP[FP['dimension'] == 'score']
FP_score_Biod=FP_score[FP['long_goal'] == 'Biodiversity']

FP_score_Biod

#sns.histplot(data=food, x='value', binwidth=5, color='brown') 
#plt.xlabel('Ocean Health Index')
#plt.title('Distribution of ocean health indexes for food provision goal')

In [None]:
sns.lineplot(data=FP_score_Biod, x='scenario', y='value')
plt.xlabel('year')
plt.title('Biodiversity Score')