<a target="_blank" href="https://colab.research.google.com/github/GP115/Labs/blob/main/notebooks/pulse_assignment_5.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# ***Assignment 5: Taking the Pulse of the Oceans & Fishing Activity***

Written by Shin Nakayama (shinn1@stanford.edu), Fio Micheli (micheli@stanford.edu)

[Stanford Center for Ocean Solutions](https://oceansolutions.stanford.edu), Doerr School of Sustainability

Modified by Flora Huang (flora221@stanford.edu)


This assignment is due on Thursday 2024-2-8, before class. Change the name of your notebook to tpp_assignment_5_sunetID.ipynb Share your completed notebook with the TAs akroo@stanford.edu & flora221@stanford.edu using the share banner at the top. For help submitting, see the canvas walkthrough. If you are still having technical difficulties, email us before the deadline.

## **INTRODUCTION TO THE ASSIGNMENT**
Fisheries play a vital role in our food security, economy and ecosystem, while posing significant impacts on the oceans. Industrialization of fisheries has allowed us to catch more fish with engine-powered fishing fleets and gears. Non-selective fishing gears have let bycatch of unwanted and endangered species, while bottom trawlers have killed coral reefs by dragging the nets over seabeds. Currently, 35% of the fish population is overfished, and a study predicts that there will be no fish in the ocean by 2048, calling for better fisheries management.

## **DATA SETS**
Data sets used this assignment are: 1) global fishing effort, 2) regional fishing effort, 3) ocean chlorophyll a and sea surface temperature, and 4) shapefile of exclusive economic zones (EEZ).  
1. `fishing_hours.parquet` &mdash; A pre-processed grid data of daily fishing hours from 2016-01-01 to 2021-12-31, sumed within 1 × 1-degree grids. The original data are composed of locations, timestamp, and fishing hours associated with each AIS signal. Data taken from [Global Fishing Watch](https://globalfishingwatch.org).
    - `date`: year-month-day
    - `lat`: latitude
    - `lon`: longitude
    - `gear`: fishing gear type (trawlers, purse seines, etc.)
    - `flag`: vessel flag in ISO 3 code (that is, State which a vessel is registered to)
    - `fishing_hours`: total fishing hours in a 1 x 1-degree grid
2. `fishing_hours_x.parquet` &mdash; A pre-processed grid data of daily fishing hours from 2016-01-01 to 2021-12-31, sumed within 0.1 × 0.1-degree grids, for specific regions.
    - `fishing_hours_peru.parquet`: Near Peru with longitude [-100, -60] and latitude [-30, 10]
    - `fishing_hours_w_africa.parquet`: Near West Africa with longitude [-40, 0] and latitude [-10, 30]
    - `fishing_hours_pacific.parquet` : in the Pacific with longitude [140, 180] and latitude [-35, 5]    
3. `modis.parquet` &mdash; A pre-processed grid data of monthly mean measurements by MODIS AQUA sensor from 2016-01 to 2021-12, averaged within 1 × 1-degree grids. Data taken from [NASA MODIS Terra satellite imagery](https://developers.google.com/earth-engine/datasets/catalog/NASA_OCEANDATA_MODIS-Terra_L3SMI).
    - `month`: year-month
    - `sst`: sea surface temperature (°C)
    - `chlor_a`: chlorophyll a concentration (mg/m<sup>3</sup>)
4. `World_EEZ_v12` &mdash; shapefile of Exclusive Economic Zones (EEZ). Data taken from [MarineRegions.org](https://www.marineregions.org).


## **TOOLBOX**
All the Python functions and packages you will use in this assignment are in the toolbox for the course. We add new tools to the toolbox with each assignment as new ways of analyzing and visualizing data are introduced.<a target="_blank" href="https://colab.research.google.com/github/GP115/Labs/blob/main/notebooks/pulse_toolbox.ipynb">
  <img src=https://github.com/GP115/labs/blob/main/toolbox.png?raw=true" alt="Open In Colab"/ width=80>
  
This week you will be working with both gridded data and dataframes. Try leveraging what we've been learning in the past few weeks!
numpy (numbers for python) is a package of python tools that handle the mathematical operations.
pandas (referred to as pd in lines of code) is a package of python tools that can be used to work efficiently with datasets, taken into pandas and referred to as dataframes.
matplotlib is a package that is useful for generating plots.
xarray is a package in python that builds off of pandas to support working with gridded data.
cartopy is a package designed for geospatial data processing in order to produce maps and other geospatial data analyses.
datetime is a package that helps interpret months, years etc. as a single date/time object

## **THE LEARNING GOALS FOR THE WEEK**

Students will:
- learn about the ways in which climate change and human activity are impacting planet Earth, *with a focus this week on fishing activities in response to climates.*

- become familiar with the wide range of sensors available to study various components of the Earth system. These include sensors on satellites, aircraft, ground-based platforms, and deployed above or beneath the surface on land or water. *This week we will use vessel-borne GPS data on the Automatic Identification System (AIS) for fishing effort and MODIS satellite imagery data for ocean productivity and temperature.*

- become familiar with the basic physical principles (resolution, sampling, processing workflows, etc.) common to all sensors. *This week we will use data with different sampling frequencies and resolutions.*

- work with various sources of data, learning how to access, analyze, synthesize, and describe the data to quantify trends; think critically and creatively about how to project these trends into the future. *This week we will describe the spatial and temporal patterns of the data.*

- describe the complex interactions between human activity and various components of the Earth system. *This week we will explore the potential drivers of fishing activities by overlaying different datasets on the same plots and with statistics such as linear regression and cross-correlation.*

- become motivated to think about new sensors and new ways of using sensor data to study the planet. This is always the last question in each assignment.

# **In Class Assignment**

## 1) **Install and Import Packages**: numpy, pandas, matplotlib, xarray, cartopy
Complete this in the code cell below.

In [None]:
!pip install cartopy

In [None]:
import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.colors import LogNorm
import xarray as xr
import cartopy.crs as ccrs
import cartopy.feature as cf

## 2) **Importing required data**
Run the following code cells to load the data into a variable called fish.

In [None]:
!git clone https://premonition.stanford.edu/sgkang09/taking_the_pulse_ocean_data.git

In [None]:
! git clone https://premonition.stanford.edu/taking-the-pulse-of-the-planet/world-eez-v12

In [None]:
# load data
fish = pd.read_parquet('./taking_the_pulse_ocean_data/fishing_hours.parquet')

In [None]:
# convert string to datetime object -- this is data cleanup that we do for you
fish['date'] = pd.to_datetime(fish['date'], format='%Y-%m-%d')
fish

## 3) **Understanding Global Patterns of Fishing Activity**

Fishing activity can be inferred from the automatic information system (AIS). Originally to avoid collision at sea, distant-water fishing fleets are required by International Maritime Organization to transmit their positions at sea. From their movement patterns, we can infer whether a vessel is fishing or non-fishing using a machine-learning approach (see [this paper](https://journals.plos.org/plosone/article?id=10.1371%2Fjournal.pone.0158248) for details). The AIS data are processed and curated by [Global Fishing Watch](https://globalfishingwatch.org), a non-profitable organization with a mission of transparency for sustainable fisheries.

### **a)** Observe the patterns of global fishing hours by grouping by date and using ```.sum()``` to add up the total hours recorded each day. Then, plot it out!

### **b)** What temporal pattern do you notice in your plot? What do you think causes these patterns? (1-2 sentences)

### **c)** Plot a map of fishing hours and observe the spatial patterns.
The work of standardizing fishing hours is done for you already. For your workflow, you may want to think about regrouping your fish data in a way that looks at fishing_hours_km2 spatially. Once you've done so, try manipulating your data into an xarray gridded data format to plot!

As a bit of optional work: try looking around the notebook and seeing if there's any way to make your plot look nicer -- think about figure sizes or adding in the continents.

In [None]:
fish['area_km2'] = 111 * 111 * np.cos(fish['lat'] * np.pi / 180)
fish

Unnamed: 0,date,lat,lon,gear,flag,fishing_hours,area_km2
0,2016-01-01,-78,-178,set_longlines,RUS,16.628333,2561.679943
1,2016-01-01,-78,-177,set_longlines,RUS,3.347222,2561.679943
2,2016-01-01,-74,-118,set_longlines,UKR,17.537222,3396.127861
3,2016-01-01,-74,-117,set_longlines,UKR,3.983611,3396.127861
4,2016-01-01,-73,-180,set_longlines,RUS,21.068056,3602.311774
...,...,...,...,...,...,...,...
10891139,2021-12-31,77,13,trawlers,RUS,4.600833,2771.621941
10891140,2021-12-31,78,8,trawlers,RUS,2.299722,2561.679943
10891141,2021-12-31,78,9,trawlers,RUS,4.818611,2561.679943
10891142,2021-12-31,79,8,trawlers,RUS,55.911568,2350.957632


In [None]:
# standardize fishing hours to unit area
# 1 degree is roughly 111 km at the equator and smaller at higher latitudes
fish['area_km2'] = 111 * 111 * np.cos(fish['lat'] * np.pi / 180)
#fish['area_km2'] = fish['area_km2'].astype(float) #convert numpy float to pandas float
fish['fishing_hours_km2'] = fish['fishing_hours']/fish['area_km2']
fish

Unnamed: 0,date,lat,lon,gear,flag,fishing_hours,area_km2,fishing_hours_km2
0,2016-01-01,-78,-178,set_longlines,RUS,16.628333,2561.679943,0.006491
1,2016-01-01,-78,-177,set_longlines,RUS,3.347222,2561.679943,0.001307
2,2016-01-01,-74,-118,set_longlines,UKR,17.537222,3396.127861,0.005164
3,2016-01-01,-74,-117,set_longlines,UKR,3.983611,3396.127861,0.001173
4,2016-01-01,-73,-180,set_longlines,RUS,21.068056,3602.311774,0.005848
...,...,...,...,...,...,...,...,...
10891139,2021-12-31,77,13,trawlers,RUS,4.600833,2771.621941,0.00166
10891140,2021-12-31,78,8,trawlers,RUS,2.299722,2561.679943,0.000898
10891141,2021-12-31,78,9,trawlers,RUS,4.818611,2561.679943,0.001881
10891142,2021-12-31,79,8,trawlers,RUS,55.911568,2350.957632,0.023782


Using the fish dataset, group fishing_hours_km2 spatially, summing up similar to what you did in part a.

In [None]:
fishing_spatial = fish.groupby(['lat', 'lon'])['fishing_hours_km2'].sum()
fishing_spatial

### **d)** Modify the scale of the map to offer more visual depth. Does a linear colorscale offer a good representation of the fishing hours? (1-2 sentences)

### **e)** Using the toolbox, replot the fishing hours map using a logarithmic scale.

### **f)** In what cases would you use a linear scale versus a logarithmic scale for plots? (1-2 sentences)

### **g)** What observations do you notice of the spatial distribution of fishing hours? (3-5 sentences)




### **h)** Finally, we can take a look at this through a heatmap.

No need for coding in this section, just run the cells below.


In [None]:
# transform from a long to wide format
# rows are latitude, columns are days
pivoted_data = fish.pivot_table(columns='date', index='lat', values='fishing_hours_km2', aggfunc=np.sum)
pivoted_data = pivoted_data.loc[::-1] # inverse rows
pivoted_data = pivoted_data.reindex(np.arange(-90, 90, 1)) # expand index range
pivoted_data

In [None]:
# # plot heatmap
fig, ax = plt.subplots(figsize=(16, 6))
date = list(pivoted_data.columns)
lat = list(pivoted_data.index)
im = ax.pcolormesh(date, lat, pivoted_data.values, cmap='magma_r')
ax.set_xlabel('Year')
ax.set_ylabel('Latitude')
fmt_month = mdates.MonthLocator()
ax.xaxis.set_minor_locator(fmt_month)
ax.grid(True, which='both', alpha=0.2)
fig.colorbar(im, ax=ax);

### **i)** What observations do you notice of the spatial and temporal distribution of fishing hours, especially compared to the maps? Is one clearer than the other? (1-2 sentences)


## 4) **Exploring the Biological Drivers of Underlying Fishing Effort**

Considering that humans are the top fish predators equipped with highly-advanced sensors, fishing activity should be strongly correlated with the fish distribution. However, we lack global data on fish distribution to support this. Therefore, we will look at a known correlate of fish abundance &mdash; primary production, which can be measured as chlorophyll a, a pigment in phytoplankton used for photosynthesis.

In this course, we will use NASA MODIS satellite imagery, pre-downloaded and pre-processed. MODIS (Moderate Resolution Imaging Spectroradiometer) satellite measures surface reflectance (color and temperature) in 36 spectral bands at spatial resolutions ranging from 250 m to 1 km, depending on the bands. The swath can cover the entire Earth in 1-2 days. We will use MODIS AQUA data.


#### **a)** Setting up the MODIS DataFrame
No code needed in this section, feel free to just run the below cells to set up our data, which is stored in the variable modis.

In [None]:
# read ocean climate data
modis = pd.read_parquet('./taking_the_pulse_ocean_data/modis.parquet')

In [None]:
# convert month string to datetime
modis['year_month'] = pd.to_datetime(modis['month'], format='%Y-%m')
modis

#### **b)** Plot out the average spatial distribution of chlorophyll a
You may use either a linear or logarithmic plot -- whatever you think conveys the data best!

#### **c)** What patterns do you notice in this map? How does this compare to the spatial distribution of fishing hours in part 3c? (1-2 sentences)

#### **d)** Plot mean chlorophyll a (x-axis) and fishing hours (y-axis) for each latitude

Merge the data of chlorophyll a and fishing hours on each latitude.

#### **e)** Is the plot you created above a good demonstration of the relationship between chlorophyll a and fishing hours? (1-2 sentences)

#### **f)** Plot time series of chlorophyll a and fishing hours, respectively, and observe how monthly fishing hours are correlated to chlorophyll.

The fishing hours plot is given to you, because we have to further clean the data to put it in year-month format. However, you will have to plot the modis chlorophyll a data on your own. For modis, use 'year_month' for your time column.

In [None]:
# time series
# monthly total fishing hours to make same temporal resolutuions as modis data
fish['year'] = fish['date'].dt.year
fish['month'] = fish['date'].dt.month
fish['year_month'] = pd.to_datetime(fish['year']*100 + fish['month'], format='%Y%m')

fish_monthly = fish[['year_month', 'fishing_hours_km2']].groupby('year_month').sum()
fish_monthly.plot()

### **e)** Between these two plots, do you observe any temporal patterns? Why do you think they might be related or not related? (3-5 sentences)

# After Class Assignment


## 1) **Exploring Non-biological drivers of Underlying Fishing Effort**

Fishing activities are also shaped by other factors, such as holidays, COVID, and fuel prices. Also, fishing regulations should influence the activity. How strongly do they influence fishing activities?

### **a)** Plot time series of fishing hours for Chinese-flagged vessels and non-Chinese-flagged vessels, respectively, and explore how fishing activities are influenced by social / cultural events.

The dataset of chinese flagged vessles, named "chinese_vessels" and non-chinese flagged vessels, named "other_vessels" is given to you in the code cell below.

In [None]:
# activity of chinese-flagged vessels
chinese_vessels = fish[fish['flag']=='CHN'].copy()

# activity of non-chinese-flagged vessels
other_vessels = fish[(fish['flag']!='CHN') | fish['flag'].isnull()].copy()

### **b)** What observations can you make of the two plots you made above? Are there any yearly patterns that exist? (3-5 sentences)

### **c)**  Plot a heatmap of high-resolution fishing hours for Peru, and explore how fishing activities are influenced by geospatial regulations by adding EEZ on the map.

The code below is given to you -- no need to write anything!

In [None]:
# plot near Peru

data = pd.read_parquet('./taking_the_pulse_ocean_data/fishing_hours_peru.parquet')

data['area_km2'] = 111 * 111 * np.cos(data['lat'] * np.pi / 180)
data['area_km2'] = data['area_km2'].astype(float)
data['fishing_hours_km2'] = data['fishing_hours']/data['area_km2']

data['date'] = pd.to_datetime(data['date'], format='%Y-%m-%d')

# change lat lon to integer (we want to sum within each grid but floating point causes problems when grouping)
data['lat'] = (data['lat'].values * 10).astype(int)
data['lon'] = (data['lon'].values * 10).astype(int)

# sum within each grid
summary = data.groupby(['lat', 'lon'])['fishing_hours_km2'].sum()
summary

In [None]:
eez = gpd.read_file('./world-eez-v12')

In [None]:
# convert to xarray
xarray_data = summary.to_xarray()

# back to 0.1 degree intervals
xarray_data['lat'] = 0.1*xarray_data['lat']
xarray_data['lon'] = 0.1*xarray_data['lon']

In [None]:
# plot on a log scale
fig = plt.figure(figsize=(16, 6))
ax = plt.axes(projection=ccrs.PlateCarree())

xarray_data.plot(
    cmap="magma_r",
    norm=LogNorm(vmin=0.0001, vmax=xarray_data.max()),
    ax=ax, zorder=-1,
    add_colorbar=True
)

ax.add_feature(cf.LAND, linewidth=0)
ax.add_feature(cf.BORDERS)
ax.add_feature(cf.COASTLINE)

eez.plot(ax=ax, linestyle='--', color='black')

ax.set_extent([-100, -60, -30, 10], crs=ccrs.PlateCarree())

### **d)** In the map above, the dashed lines represent the EEZ (exclusive economic zone), where coastal nations have jurisdiction over natural resources. Often, these areas have strict regulations for foreign-flagged vessels, while areas outside of it do not. Given what you know about the EEZ now, what observations can you make about fishing areas and political boundaries from the map above? (3-5 sentences)

---
# Supplemental Information
How to access original Global Fishing Watch data & NASA MODIS Terra data
#### Global Fishing Watch
- Online web map on their [website](https://globalfishingwatch.org/our-map/)
- R package `gfwr` ([link](https://github.com/GlobalFishingWatch/gfwr)) &mdash; an R API to download fishing data. Note that this is still at an early development stage.

#### NASA MODIS data
- The easiest way is to access the data on [Google Earth Engine](https://developers.google.com/earth-engine/datasets) through a Python API `earthengine-api` ([link](https://github.com/google/earthengine-api))
- You can also find the data on other repositories including [Amazon Web Service](https://aws.amazon.com/opendata/?wwps-cards.sort-by=item.additionalFields.sortDate&wwps-cards.sort-order=desc) or [Copernicus](https://cds.climate.copernicus.eu), or original data [NASA website](https://oceancolor.gsfc.nasa.gov).

(Note that AIS is mandated only for distant-water fleets engaging on international voyages, and therefore, it does not cover small-scale/artisanal fisheries in coastal areas and within EEZ. Also, some areas have low AIS coverage, including Southeastern Asia and the Gulf of Mexico. AIS devices can be tampered with or turned off.)

(Note that high primary production does not always favor fish growth. A combination of high temperature and nutrient runoff can cause harmful algal blooms, leading to hypoxic dead zones and fish kills. If you are interested, you can explore the relationship between chlorophyll-a `chlor_a`, sea surface temperature `sst` and particulate organic carbon `poc`, in MODIS data.)