tech_doc.Rmd

---
title: "Technical Documentation, State of the Ecosystem Report"
author: "Northeast Fisheries Science Center"
date: "`r format(Sys.Date(), '%e %B %Y')`"
site: bookdown::bookdown_site
knit: "bookdown::render_book"
always_allow_html: true
documentclass: book
bibliography: ["bibliography/introduction.bib","bibliography/aggregate_groups.bib","bibliography/seasonal_sst_anomaly_maps.bib","bibliography/Aquaculture.bib","bibliography/Bennet_indicator.bib","bibliography/bottom_temperature.bib","bibliography/bottom_temp_highres.bib","bibliography/Revenue_Diversity.bib","bibliography/ches_bay_water_quality.bib","bibliography/phytoplankton.bib","bibliography/ecosystem_overfishing.bib","bibliography/comm_eng.bib","bibliography/calanus_stage.bib","bibliography/ches_bay_temp.bib","bibliography/conceptmods.bib","bibliography/Condition.bib","bibliography/EPU.bib","bibliography/Expected_Number.bib","bibliography/cold_pool_index.bib","bibliography/sandlance.bib","bibliography/gulf_stream_index.bib","bibliography/habitat_diversity.bib","bibliography/habitat_vulnerability.bib","bibliography/Ich_div.bib","bibliography/long_term_sst.bib","bibliography/MAB_HAB.bib","bibliography/NE_HAB.bib","bibliography/habs.bib","bibliography/occupancy.bib","bibliography/productivity_tech_memo.bib","bibliography/RW.bib","bibliography/seabird_ne.bib","bibliography/seal_pup.bib","bibliography/slopewater_proportions.bib","bibliography/Species_dist.bib","bibliography/survey_data.bib","bibliography/thermal_hab_proj.bib","bibliography/trans_dates.bib","bibliography/trend_analysis.bib","bibliography/zooplankton.bib","bibliography/cold_pool_index.bib","bibliography/forage_energy_density.bib","bibliography/Forage_Fish_Biomass_Index.bib","bibliography/marine_heatwave.bib","bibliography/protected_species_hotspots.bib","bibliography/ocean_acidification.bib","bibliography/wind_habitat_occupancy.bib","bibliography/warm_core_rings.bib", "bibliography/glossary.bib","packages.bib"]
geometry: "left=1.0in, right=1.0in, top=1.0in, bottom=1.0in, includefoot"
biblio-style: apalike
link-citations: true
github-repo: NOAA-EDAB/tech-doc
description: "This book documents each indicator and analysis used in State of the Ecosystem reporting"
---


# Introduction {-}

The purpose of this document is to collate the methods used to access, collect, process, and analyze derived data ("indicators") used to describe the status and trend of social, economical, ecological, and biological conditions in the Northeast Shelf Large Marine Ecosystem (see figure, below). These indicators are further synthesized in State of the Ecosystem Reports produced annually by the [Northeast Fisheries Science Center](https://www.nefsc.noaa.gov/) for the [New England Fisheries Management Council](https://www.nefmc.org/) and the [Mid-Atlantic Fisheries Management Council](http://www.mafmc.org/). The metadata for each indicator (in accordance with the [Public Access to Research Results (PARR) directive](http://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf)) and the methods used to construct each indicator are described in the subsequent chapters, with each chapter title corresponding to an indicator or analysis present in State of the Ecosystem Reports. The most recent and usable html version of this document can be found at the [NOAA EDAB Github](https://noaa-edab.github.io/tech-doc/). The PDF version of this document is for archiving only. The [PDF version](https://repository.library.noaa.gov/welcome) from previous years is archived in NOAA's Institutional Repository. 

Indicators included in this document were selected to clearly align with management objectives, which is required for integrated ecosystem assessment [@levin_integrated_2009], and has been advised many times in the literature [@degnbol_review_2004; @jennings_indicators_2005; @rice_framework_2005; @link_translating_2005]. A difficulty with practical implementation of this in ecosystem reporting can be the lack of clearly specified ecosystem-level management objectives (although some have been suggested [@murawski_definitions_2000]). In our case, considerable effort had already been applied to derive both general goals and operational objectives from both US legislation such as the Magnuson-Stevens Fisheries Conservation and Management Act ([MSA](https://www.fisheries.noaa.gov/resource/document/magnuson-stevens-fishery-conservation-and-management-act)) and regional sources [@depiper_operationalizing_2017]. These objectives are somewhat general and would need refinement together with managers and stakeholders, however, they serve as a useful starting point to structure ecosystem reporting.

```{r setup, echo=FALSE, message = FALSE, warning = FALSE, results='hide'}
knitr::opts_chunk$set(echo = F, 
                      message = F,
                      warning = F, 
                      #dev = "cairo_pdf", 
                      fig.path = here::here("images/")) 
knitr::opts_chunk$set(tidy.opts=list(width.cutoff=60),tidy=TRUE)
#update.packages(ask = FALSE, checkBuilt = TRUE)  # update R packages
#tinytex::tlmgr_update() 

#source directories
image.dir <- here::here("images")
r.dir <- here::here("R")
gis.dir <- here::here("gis")
data.dir <- here::here("data")
#Plotting and data libraries
remotes::install_github("noaa-edab/ecodata")
remotes::install_github("noaa-edab/stocksmart")
remotes::install_github("thomasp85/patchwork")
#remotes::install_github("andybeet/arfit")

library(ggplot2)
#library(formatR)
#library(magrittr)
library(dplyr)
library(tidyr)
library(ecodata)
library(here)
library(kableExtra)
library(ggrepel)
#library(stringr)
library(patchwork)
library(heatwaveR)
library(gridExtra)
library(vegan)
library(grid)
library(rpart)
library(knitr)
library(rmarkdown)
library(readr)
library(RColorBrewer)
library(DT)
library(AICcmodavg)

library(plyr)
library(cowplot)
#library(plotly)
#GIS libraries
library(sf)
#library(rgdal)
#library(raster)
library(marmap)
library(ggspatial)


#Time series constants
 shade.alpha <- 0.3
 shade.fill <- "lightgrey"
 lwd <- 1
 pcex <- 2
 trend.alpha <- 0.5
 trend.size <- 2
 hline.size <- 1
 hline.alpha <- 0.35
 hline.lty <- "dashed"
 label.size <- 5
 hjust.label <- 1.5
 letter_size <- 4
 feeding.guilds1 <- c("Piscivore","Planktivore","Benthivore","Benthos")
 feeding.guilds <- c("Apex Predator","Piscivore","Planktivore","Benthivore","Benthos")
 x.shade.min <- 2012 
 x.shade.max <- 2022
#Function for custom ggplot facet labels
label <- function(variable,value){
  return(facet_names[value])
}


#Map line parameters
map.lwd <- 0.4
#CRS
crs <- "+proj=longlat +lat_1=35 +lat_2=45 +lat_0=40 +lon_0=-77 +x_0=0 +y_0=0 +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0"
#Coastline shapefile
# coast <- ne_countries(scale = 10,
#                           continent = "North America",
#                           returnclass = "sf") %>%
#              sf::st_transform(crs = crs)
# # #State polygons
# ne_states <- ne_states(country = "united states of america",
#                                       returnclass = "sf") %>%
#   sf::st_transform(crs = crs)
# #high-res polygon of Maine
# new_england <- read_sf(gis.dir,"new_england")
#EPU shapefile
#epu_sf <- ecodata::epu_sf %>%
#  filter(EPU %in% c("MAB","GB","GOM"))
#identifiers
council <- "Mid-Atlantic Fishery Management Council"
council_abbr <- "MAFMC"
epu <- "Mid-Atlantic Bight"
epu_abbr <- "MAB"
region <- "Mid-Atlantic"
region_abbr <- "MA"
```

(ref:neusmap) Map of Northeast U.S. Continental Shelf Large Marine Ecosystem from @Hare2016.


```{r neusmap, message = FALSE, warning=FALSE, fig.align='center', fig.height=6, echo = F, fig.cap='(ref:neusmap)'}
knitr::include_graphics("images/journal.pone.0146756.g002.PNG")

```


The table below shows which versions of all related products correspond to a specific State of the Ecosystem report cycle. The reports and supporting products including the technical documentation are developed annually. The DOI links will be included once they are available so may lag. 

**DOIs**
*  [MAFMC SOE 2020](https://doi.org/10.25923/1f8j-d564) 
*  [NEFMC SOE 2020](https://doi.org/10.25923/4tdk-eg57) 
*  [Technical Documentation SOE 2020](https://doi.org/10.25923/64pf-sc70) 
*  [MAFMC SOE 2021](https://repository.library.noaa.gov/view/noaa/29525) 
*  [NEFMC SOE 2021](https://repository.library.noaa.gov/view/noaa/29524) 
*  [Technical Documentation SOE 2021](https://repository.library.noaa.gov/view/noaa/29277) 
*  [MAFMC SOE 2022](https://doi.org/10.25923/5s5y-0h81) 
*  [NEFMC SOE 2022](https://doi.org/10.25923/ypv2-mw79) 
*  [Technical Documentation SOE 2022](https://doi.org/10.25923/xq8b-dn10) 


<!--chapter:end:index.Rmd-->

# Data and Code Access {#erddap}

### About

The Technical Documentation for the State of the Ecosystem (SOE) reports is a [bookdown](https://bookdown.org) document; hosted on the NOAA Northeast Fisheries Science Center (NEFSC) Ecosystems Dynamics and Assessment Branch [Github page](https://github.com/NOAA-EDAB), and developed in R. Derived data used to populate figures in this document are queried directly from the [ecodata](https://github.com/NOAA-EDAB/ecodata) R package or the NEFSC [ERDDAP server](https://comet.nefsc.noaa.gov/erddap/info/index.html?page=1&itemsPerPage=1000). ERDDAP queries are made using the R package [rerddap](https://cran.r-project.org/web/packages/rerddap/vignettes/Using_rerddap.html).  

```{r global-opts1, echo = FALSE}
knitr::opts_chunk$set(tidy.opts=list(width.cutoff=60),tidy=TRUE)
```
### Accessing data and build code

In this technical documentation, we hope to shine a light on the processing and analytical steps involved to get from source data to final product. This means that whenever possible, we have included the code involved in source data extraction, processing, and analyses. We have also attempted to thoroughly describe all methods in place of or in supplement to provided code. Example plotting code for each indicator is presented in sections titled "Plotting", and these code chunks can be used to recreate the figures found in ecosystem reporting documents where each respective indicator was included[^1].

Source data for the derived indicators in this document are linked to in the text unless there are privacy concerns involved. In that case, it may be possible to access source data by reaching out to the Point of Contact associated with that data set. Derived data sets make up the majority of the indicators presented in ecosystem reporting documents, and these data sets are available for download through the [ecodata](https://github.com/NOAA-EDAB/ecodata) R package. 

### Building the document

Start a local build of the SOE bookdown document by first cloning the project's associated [git repository](https://github.com/NOAA-EDAB/tech-doc). Next, if you would like to build a past version of the document, use `git checkout [version_commit_hash]` to revert the project to a past commit of interest, and set `build_latest <- FALSE` in this [code chunk](https://github.com/NOAA-EDAB/tech-doc/tree/master/R/stored_scripts/erddap_query_and_build_code.R). This will ensure the project builds from a cached data set, and not the most updated versions present on the NEFSC ERDDAP server. Once the `tech-doc.Rproj` file is opened in RStudio, run `bookdown::serve_book()` from the console to build the document. 

#### A note on data structures

The majority of the derived time series used in State of the Ecosystem reports are in long format. This approach was taken so that all disparate data sets could be "bound" together for ease of use in our base plotting [functions]((https://github.com/NOAA-EDAB/ecodata/tree/master/R)).

[^1]: There are multiple R scripts sourced throughout this document in an attempt to keep code concise. These scripts include [BasePlot_source.R](https://github.com/NOAA-EDAB/tech-doc/blob/master/R/BasePlot_source.R), [GIS_source.R](https://github.com/NOAA-EDAB/tech-doc/blob/master/R/GIS_source.R), and [get_erddap.R](https://github.com/NOAA-EDAB/tech-doc/blob/master/R/get_erddap.R). The scripts `BasePlot_source.R` and `GIS_source.R` refer to deprecated code used prior to the 2019 State of the Ecosystem reports. Indicators that were not included in reports after 2018 make use of this syntax, whereas newer indicators typically use `ggplot2` for plotting.


<!--chapter:end:chapters/erddap_query_and_build.Rmd-->

# Aggregate Groups {#species_groupings}

**Description**: Mappings of species into aggregate group categories for different analyses

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2018+), State of the Ecosystem - Mid-Atlantic (2018+) 

**Indicator category**: Synthesis of published information 

**Contributor(s)**: Geret DePiper, Sarah Gaichas, Sean Hardison, Sean Lucey
  
**Data steward**: Sean Lucey <Sean.Lucey@noaa.gov>
  
**Point of contact**: Sean Lucey <Sean.Lucey@noaa.gov>
  
**Public availability statement**: Source data is available to the public (see Data Sources). 
```{r global-opts2, echo = FALSE}
knitr::opts_chunk$set(tidy.opts=list(width.cutoff=60),tidy=TRUE)
```

## Methods
The State of the Ecosystem (SOE) reports are delivered to the New England Fishery Management Council (NEFMC) and Mid-Atlantic Fishery Management Council (MAFMC) to provide ecosystems context.  To better understand that broader ecosystem context, many of the indicators are reported at an aggregate level rather than at a single species level.  Species were assigned to an aggregate group following the classification scheme of @garrison2000dietary and @link2006EMAX.  Both works classified species into feeding guilds based on food habits data collected at the Northeast Fisheries Science Center (NEFSC).  In 2017, the SOE used seven specific feeding guilds (plus an "other" category; Table \@ref(tab:soe2017class)).  These seven were the same guilds used in @garrison2000dietary, which also distinguished ontogentic shifts in species diets.  

For the purposes of the SOE, species were only assigned to one category based on the most prevalent size available to commercial fisheries.  However, several of those categories were confusing to the management councils, so in 2018 those categories were simplified to five (plus "other"; Table \@ref(tab:soe2018class)) along the lines of @link2006EMAX.  In addition to feeding guilds, species managed by the councils have been identified.  This is done to show the breadth of what a given council is responsible for within the broader ecosystem context.

In the 2020 report, squids were moved from planktivores to piscivores based on the majority of their diet being either fish or other squid.  

```{r soe2017class, eval = T, echo = F}
soe.17.class <- data.frame('Feeding Guild' = c('Apex Predator', 'Piscivore', 
                                               'Macrozoo-piscivore', 'Macroplanktivore', 
                                               'Mesoplanktivore', 'Benthivore',
                                               'Benthos', 'Other'),
                           Description = c('Top of the food chain', 'Fish eaters', 
                                           'Shrimp and small fish eaters', 'Amphipod and shrimp eaters',
                                           'Zooplankton eaters', 'Bottom eaters',
                                           'Things that live on the bottom', 
                                           'Things not classified above'))

kable(soe.17.class, booktabs = TRUE,
      caption = "Aggregate groups use in 2017 SOE.  Classifications are based on Garrison and Link (2000). \\label{}")
```

```{r soe2018class, eval = T, echo = F}
soe.18.class <- data.frame('Feeding Guild' = c('Apex Predator', 'Piscivore', 
                                               'Planktivore', 'Benthivore',
                                               'Benthos', 'Other'),
                           Description = c('Top of the food chain', 'Fish eaters', 
                                           'Zooplankton eaters', 'Bottom eaters',
                                           'Things that live on the bottom', 
                                           'Things not classified above'))

kable(soe.18.class, booktabs = TRUE,
      caption = "Aggregate groups use since 2018 SOE.  Classifications are based on Link et al. (2006).")
```

### Data sources

In order to match aggregate groups with various data sources, a look-up table was generated which includes species' common names (COMNAME) along with their scientific names (SCINAME) and several species codes. SVSPP codes are used by the NEFSC Ecosystems Surveys Branch (ESB) in their fishery-independent Survey Database (SVDBS), while NESPP3 codes refer to the codes used by the Commercial Fisheries Database System (CFDBS) for fishery-dependent data. A third species code provided is the ITISSPP, which refers to species identifiers used by the Integrated Taxonomic Information System (ITIS). Digits within ITIS codes are hierarchical, with different positions in the identifier referring to higher or lower taxonomic levels. More information about the SVDBS, CFDBS, and ITIS species codes are available in the links provided below.

Management responsibilities for different species are listed under the column "Fed.managed" (NEFMC, MAFMC, or JOINT for jointly managed species). More information about these species is available on the FMC websites listed below. Species groupings listed in the "NEIEA" column were developed for presentation on the Northeast Integrated Ecosystem Assessment ([NE-IEA](https://www.integratedecosystemassessment.noaa.gov/regions/northeast)) website. These groupings are based on EMAX groupings [@link2006EMAX], but were adjusted based on conceptual models developed for the NE-IEA program that highlight focal components in the Northeast Large Marine Ecosystem (i.e. those components with the largest potential for perturbing ecosystem dynamics). NE-IEA groupings were further simplified to allow for effective communication through the NE-IEA website.

#### Supplemental information

See the following links for more information regarding the NEFSC ESB Bottom Trawl Survey, CFDBS, and ITIS:  

*    https://www.itis.gov/  
*    https://inport.nmfs.noaa.gov/inport/item/22561  
*    https://inport.nmfs.noaa.gov/inport/item/22560  
*    https://inport.nmfs.noaa.gov/inport/item/27401  	

More information about the NE-IEA program is available [here](http://integratedecosystemassessment.noaa.gov).

More information about the New Engalnd Fisheries Management Council is available [here](https://www.nefmc.org/).

More information about the Mid-Atlantic Fisheries Management Council is available [here](http://www.mafmc.org/).

### Data extraction 

Species lists are pulled from SVDBS and CFDBS.  They are merged using the ITIS code.  Classifications from Garrison and Link [@garrison2000dietary] and Link et al. [@link2006EMAX] are added manually. The R code used in the extraction process can be found [here](https://github.com/slucey/RSurvey/blob/master/Species_list.R).


<!--chapter:end:chapters/aggregate_groups.rmd-->

# Annual SST Cycles

**Description**: Annual SST Cycles

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2018), State of the Ecosystem - Mid-Atlantic (2018) 

**Indicator category**: Database pull with analysis

**Contributor(s)**: Sean Hardison, Vincent Saba
  
**Data steward**: Kimberly Bastille, <kimberly.bastille@noaa.gov>
  
**Point of contact**: Kimberly Bastille, <kimberly.bastille@noaa.gov>
  
**Public availability statement**: Source data are available [here](https://www.esrl.noaa.gov/psd/data/gridded/data.noaa.oisst.v2.highres.html). 

## Methods

### Data sources
Data for annual sea surface tempature (SST) cycles were derived from the NOAA optimum interpolation sea surface temperature (OISST) high resolution dataset ([NOAA OISST V2 dataset](https://www.esrl.noaa.gov/psd/data/gridded/data.noaa.oisst.v2.highres.html)) provided by NOAA's Earth System Research Laboratory's Physical Sciences Devision, Boulder, CO. The data extend from 1981 to present, and provide a 0.25&deg; x 0.25&deg; global grid of SST measurements [@Reynolds2007]. Gridded SST data were masked according to the extent of Ecological Production Units (EPU) in the Northeast Large Marine Ecosystem (NE-LME) (See ["EPU_Extended" shapefiles](https://github.com/NOAA-EDAB/tech-doc/tree/master/gis)).


### Data extraction 
Daily mean sea surface temperature data for 2017 and for each year during the period of 1981-2012 were downloaded from the NOAA [OI SST V2 site](https://www.esrl.noaa.gov/psd/data/gridded/data.noaa.oisst.v2.highres.html) to derive the long-term climatological mean for the period. The use of a 30-year climatological reference period is a standard procedure for metereological observing [@WMO2017]. These reference periods serve as benchmarks for comparing current or recent observations, and for the development of standard anomaly data sets. The reference period of 1982-2012 was chosen to be consistent with previous versions of the State of the Ecosystem report. 

R code used in extraction and processing can be found [here](https://github.com/NOAA-EDAB/tech-doc/blob/master/R/stored_scripts/annual_sst_cycles_extraction_and_processing.R).

<!--chapter:end:chapters/Annual_SST_cycle_indicator.Rmd-->

# Aquaculture {#aquaculture}

**Description**: Aquaculture indicators

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2017, 2018 (Different Methods), 2021+), State of the Ecosystem - Mid-Atlantic (2017, 2018, 2019)

**Indicator category**: Synthesis of published information

**Contributor(s)**: Christopher Schillaci, Maine DMR, NH DES, MA DMF, RI CRMC, MD DNR
  
**Data steward**: Chris Schillaci <christopher.schillaci@noaa.gov>
  
**Point of contact**: Chris Schillaci <christopher.schillaci@noaa.gov>
  
**Public availability statement**: Source data are publicly available


## Methods

### Data Sources 
Data was synthesized from state specific sources, listed below. 

* [State of Maine, Department of Marine Resources.](https://www.maine.gov/dmr/aquaculture/data/index.html)                

* [State of New Hampshire, Marine Aquaculture Compendium](https://drive.google.com/file/d/1eCg0cP2rsjZ0AAloPuxIyDiA01urOcjR/view?usp=sharing)

* [State of Massachusetts, Division of Marine Fisheries](https://www.mass.gov/service-details/dmf-annual-reports)

* [State of Rhode Island, Coastal Resource Management Council](http://www.crmc.ri.gov/aquaculture.html)        

* [State of Maryland, Aquaculture Coordinating Council](https://calendarmedia.blob.core.windows.net/assets/1495a281-9eab-422a-9f90-a16ac9686db8.pdf) 

### Data Extraction/Analysis
Production described as the number of oysters harvested is collected by individual states. This means that time series maybe vary by state. A table of start dates are shown below. Individual state information is available at the above links.  

Only the New England State of the Ecosystem includes aquaculture information as there are reporting issues and many states are do not have available data in the Mid-Atlantic States.  

| State         | Timeseries Start Year |
|---------------|-----------------------|
| Maine         | 2009  |
| New Hampshire | 2013  |
| Massachusetts | 2009  |
| Rhode Island  | 2009  |
| New Jersey    | 2012* |
| Maryland      | 2012  |
| Virginia      | 2009  |
\* only includes data through 2016.

No further analysis was conducted on these. 

### Data processing

Aquaculture data were formatted for inclusion in the `ecodata` R package using the code found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_aquaculture.R).


## Methods 2017-2019
Aquaculture data included in the State of the Ecosystem (SOE) report were time series of number of oysters sold in Virginia, Maryland, and New Jersey. 


### Data sources
Virginia oyster harvest data are collected from mail and internet-based surveys of active oyster aquaculture operations on both sides of the Chesapeake Bay, which are then synthesized in an annual report [@Hudson2017a]. In Maryland, shellfish aquaculturists are required to report their monthly harvests to the Maryland Department of Natural Resources (MD-DNR). The MD-DNR then aggregates the harvest data for release in the Maryland Aquaculture Coordinating Council Annual Report [@ACC2017], from which data were collected. Similar to Virginia, New Jersey releases annual reports synthesizing electronic survey results from lease-holding shellfish growers. Data from New Jersey reflects cage reared oysters grown from hatchery seed [@Calvo2017]. 


### Data extraction 
Data were collected directly from state aquaculture reports. Oyster harvest data in MD was reported in bushels which were then converted to individual oysters by an estimate of 300 oysters bushel$^{-1}$. View processing code for this indicator [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_aquaculture.R).

### Data analysis
No data analyses occurred for this indicator.

<!--chapter:end:chapters/Aquaculture_indicators.Rmd-->

# Bennet Indicator {#bennet}

**Description**: Bennet Indicator

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2018, 2019, 2020, 2021), State of the Ecosystem - Mid-Atlantic (2018, 2019, 2020, 2021) 

**Indicator category**: Database pull with analysis

**Contributor(s)**: John Walden
  
**Data steward**:Kimberly Bastille, <kimberly.bastille@noaa.gov>
  
**Point of contact**: John Walden, <john.walden@noaa.gov>
  
**Public availability statement**: Derived CFDBS data are available for this analysis (see [Comland](#comdat)).
  
## Methods

### Data sources
Data used in the Bennet Indicator were derived from the Comland data set; a processed subset of the Commercial Fisheries Database System (CFDBS). The derived Comland data set is available for download [here](https://comet.nefsc.noaa.gov/erddap/tabledap/group_landings_soe_v1.html).

### Data extraction 
For information regarding processing of CFDBS, please see [Comland](#comdat) methods. The Comland dataset containing seafood landings data was subsetted to US landings after 1964 where revenue was $\ge$ 0 for each Ecological Production Unit (i.e. Mid-Atlantic Bight, Georges Bank, and Gulf of Maine). Each EPU was run in an individual R script, and the code specific to Georges Bank is shown [here](https://github.com/NOAA-EDAB/tech-doc/blob/master/R/stored_scripts/bennet_extraction.R).

### Data analysis

Revenue earned by harvesting resources from a Large Marine Ecosystem (LME) at time *t* is a function of both the quantity landed of each species and the prices paid for landings. Changes in revenue between any two years depends on both prices and quantities in each year, and both may be changing simultaneously. For example, an increase in the harvest of higher priced species, such as scallops can lead to an overall increase in total revenue from an LME between time periods even if quantities landed of other species decline. Although measurement of revenue change is useful, the ability to see what drives revenue change, whether it is changing harvest levels, the mix of species landed, or price changes provides additional valuable information. Therefore, it is useful to decompose revenue change into two parts, one which is due to changing quantities (or volumes), and a second which is due to changing prices. In an LME, the quantity component will yield useful information about how the species mix of harvests are changing through time.

A Bennet indicator (BI) is used to examine revenue change between 1964 and 2015 for two major LME regions. It is composed of a volume indicator (VI), which measures changes in quantities, and a price indicator (PI) which measures changes in prices. The Bennet (1920) indicator (BI) was first used to show how a change in social welfare could be decomposed into a sum of a price and quantity change indicator [@Cross2009]. It is called an indicator because it is based on differences in value between time periods, rather than ratios, which are referred to as indices. The BI is the indicator equivalent of the more popular Fisher index [@Balk2010], and has been used to examine revenue changes in Swedish pharmacies, productivity change in U.S. railroads [@lim2009], and dividend changes in banking operations [@Grifell-Tatje2004].  An attractive feature of the BI is that the overall indicator is equal to the sum of its subcomponents [@Balk2010]. This allows one to examine what component of overall revenue is responsible for change between time periods. This allows us to examine whether changing quantities or prices of separate species groups are driving revenue change in each EPU between 1964 and 2015.

Revenue in a given year for any species group is the product of quantity landed times price, and the sum of revenue from all groups is total revenue from the LME. In any year, both prices and quantities can change from prior years, leading to total revenue change. At time t, revenue (R) is defined as $$R^{t} = \sum_{j=1}^{J}p_{j}^{t}y_{j}^{t},$$
where $p_{j}$ is the price for species group $j$, and $y_{j}$ is the quantity landed of species group $j$. Revenue change between any two time periods, say $t+1$ and $t$, is then $R^{t+1}-R^{t}$, which can also be expressed as:
$$\Delta R = \sum_{j=1}^{J}p_{j}^{t+1}y_{j}^{t+1}-\sum_{j=1}^{J}p_{j}^{t}y_{j}^{t}.$$
This change can be decomposed further, yielding a VI and PI. The VI is calculated using the following formula [@Georgianna2017]:

$$VI = \frac{1}{2}(\sum_{j=1}^{J}p_{j}^{t+1}y_{j}^{t+1} - \sum_{j=1}^{J}p_{j}^{t+1}y_{j}^{t} + \sum_{j=1}^{J}p_{j}^{t}y_{j}^{t+1} - \sum_{j=1}^{J}p_{j}^{t}y_{j}^{t})$$
The price indicator (PI) is calculated as follows:
$$PI = \frac{1}{2}(\sum_{j=1}^{J}y_{j}^{t+1}p_{j}^{t+1} - \sum_{j=1}^{J}y_{j}^{t+1}p_{j}^{t} + \sum_{j=1}^{J}y_{j}^{t}p_{j}^{t+1} - \sum_{j=1}^{J}y_{j}^{t}p_{j}^{t})$$
Total revenue change between time $t$ and $t+1$ is the sum of the VI and PI. Since revenue change is being driven by changes in the individual prices and quantities landed of each species group, changes at the species group level can be examined separately by taking advantage of the additive property of the indicator. For example, if there are five different species groups, the sum of the VI for each group will equal the overall VI, and the sum of the PI for each group will equal the overall PI. 


### Data processing

Bennet indicator time series were formatted for inclusion in the `ecodata` R package using the R code found [here](https://raw.githubusercontent.com/NOAA-EDAB/ecodata/master/data-raw/get_bennet.R).

<!--chapter:end:chapters/Bennet_indicator.Rmd-->

# Bottom temperature - GLORYS

**Description**: Time series of annual bottom temperatures on the Northeast Continental Shelf from the GLORYS model.

**Indicator category**: 

**Found in:** State of the Ecosystem - Gulf of Maine & Georges Bank (2021); State of the Ecosystem - Mid-Atlantic Bight (2021)

**Contributor(s)**: Joe Caracappa <joseph.caracappa@noaa.gov>

**Data steward**: Joe Caracappa <joseph.caracappa@noaa.gov>

**Point of contact**: Joe Caracappa <joseph.caracappa@noaa.gov>

**Public availability statement**: Source data are publicly available. 

## Methods

### Data sources

The three-dimensional temperature of the Northeast US shelf is downloaded from the CMEMS (https://marine.copernicus.eu/). Source data is available [at this link](https://resources.marine.copernicus.eu/?option=com_csw&task=results?option=com_csw&view=details&product_id=GLOBAL_REANALYSIS_PHY_001_030).


### Data extraction

NA

### Data analysis

The GLORYS12V1 daily bottom temperature product was downloaded as a flat 8km grid subsetted over the northwest Atlantic. Then the EPUNOESTUARIES.shp polygons were used to match GLORYS grid cells to EPUS. A weighted mean of bottom temperature was used weighted by the area of each GLORYS grid cell to obtain daily mean bottom temp by EPU. Then the mean daily bottom temp was used to get the annual bottom temp. A 1994-2010 climatology was used to best match with that used by the observed bottom temp (model doesnt' go back any further). The 1994-2010 climatology was used to get the annual bottom temp anomaly by EPU.

### Data processing

Derived bottom temperature data were formatted for inclusion in the `ecodata` R package using the R code found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_bottom_temp.R).

<!--chapter:end:chapters/bottom_temperature_GLORYS.Rmd-->

# Bottom temperature - High Resolution {#bottom_temp_seasonal_gridded}

**Description**: Seasonal bottom temperatures on the Northeast Continental Shelf between 1959 and 2022 in a 1/12° grid. 

**Indicator category**: Published Methods, Synthesis of published information

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2023); State of the Ecosystem - Mid-Atlantic Bight (2023)

**Contributor(s)**: Hubert du Pontavice, Vincent Saba, Zhuomin Chen

**Data steward**: Hubert du Pontavice, hubert.dupontavice@noaa.gov

**Point of contact**: Hubert du Pontavice, hubert.dupontavice@noaa.gov

**Public availability statement**: Source data are NOT publicly available. Please email hubert.dupontavice@noaa.gov for further information and queries of bottom temperature source data.

## Methods

### Data sources

#### Study area

The bottom temperature product covered the northeast U.S. shelf marine ecosystem (NEUS) and specifically an area of four Ecological Production Units (EPUs) defined by NOAA's Northeast Fisheries Science Center (https://noaa-edab.github.io/tech-doc/epu.html).

#### Design of the gridded bottom temperature time series

The bottom temperature product is in a horizontal 1/12 degree grid between 1959 and 2022 and is made of daily bottom temperature estimates from:

*Bias-corrected ROMS-NWA (ROMScor) between 1959 and 1992 which was regridded
in the same 1/12degree grid as GLORYS using bilinear interpolation;
*GLORYS12v1 in its original 1/12 degree grid between 1993 and 2020;
*GLO12v3 (called PSY4V3R1 in @duPontavice2023 and @Lellouche2018) in its original 1/12 degree grid for 2021.
*GLO12v4 in its original 1/12 degree grid for 2022.

#### Ocean model data

Four ocean models were used to get high-resolution daily bottom temperature on the NEUS between 1959 and 2022.

For the period between 1959 and 1992, we used daily ocean bottom temperature from the long-term (1958–2007) high-resolution numerical simulation of the Northwest Atlantic Ocean in the Regional Ocean Modelling System (ROMS), a split-explicit, free-surface, terrain-following, hydrostatic, primitive equation model (@Shchepetkin2005). The model domain covers the Northwest Atlantic Ocean with ~7km horizontal resolution and 40 vertical terrain- following layers. A detailed description of ROMS-NWA can be found in @Chen2018.

For the period between 1992 and 2020, the daily bottom temperature outputs from the GLORYS12v1 ocean reanalysis product were used. GLORYS12v1 is a global ocean, eddy-resolving, and data assimilated hindcast from Mercator Ocean (European Union-Copernicus Marine Service, 2018; Fernandez and Lellouche2018; @Lellouche2021) with 1/12 degree horizontal resolution and 50 vertical levels. The base ocean model is the Nucleus for European Modelling of the Ocean 3.1 (NEMO 3.1; Madec, 2016) driven at the surface by the European Centre for the Medium-Range Weather Forecasts (ECMWF) ERA-Interim reanalysis (@Dee2011). Remotely sensed and in situ observations are jointly assimilated by means of a reduced-order Kalman filter.

For the year 2021, we used daily bottom temperature from the Operational Mercator global ocean analysis and forecast system (GLO12v3 called PSY4V3R1 in @duPontavice2023 and @Lellouche2018). GLO12v3 is a global ocean, eddy-resolving, monitoring forecasting system (@Lellouche2018) with the same ocean model grid (1/12 degree horizontal resolution and 50 vertical levels) and has many similarities with GLORYS12v1. Remotely sensed and in situ observations are also jointly assimilated by means of a reduced-order Kalman filter.

For the year 2022, we used GLO12v4 which is a revised and updated version of GLO12v3 (European Union-Copernicus Marine Service, 2016). The general model structure is similar to GLO12v3 with some changes in model configuration, parameterizations, relaxations to avoid spurious drifts, river inputs, atmospheric fluxes and data assimilation (more detail in https://data.marine.copernicus.eu/product/GLOBAL_ANALYSISFORECAST_PHY_001_024/description)

#### Bias-correction process of NWA-ROMS

We used the methodology presented in du Pontavice et al. (2023) based on the Northwest Atlantic Regional Ocean Climatology (NWARC). The first step was to regrid ROMS-NWA bottom temperature over the same 1/10 degree horizontal grid as the NWARC using bilinear interpolation. Then, we conducted the bottom temperature bias-correction in the 1/10 degree NWARC grid using monthly climatologies from NWARC over four decadal periods from 1955 to 1994. A monthly bias was calculated in each 1/10 degree grid cell and for each decade (1955–1964, 1965–1974, 1975–1984, 1985–1994). Based on this monthly bias, we estimated a daily bias for each decade in each grid cell. Lastly, for each ROMS-NWA grid cell we identified the bias from the closest 1/10 degree NWARC grid cell and subtracted the daily bias to the daily ROMS-NWA bottom temperature for all years and days of each decade.

### Data processing

Derived bottom temperature data were formatted for inclusion in the `ecodata` R package using the R code found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_bottom_temp_comp.R).

<!--chapter:end:chapters/bottom_temperature_highres.Rmd-->

# Bottom temperature - in situ {#bottom_temp}

**Description**: Time series of annual in situ bottom temperatures on the Northeast Continental Shelf.

**Indicator category**: Extensive analysis; not yet published

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2019+); State of the Ecosystem - Mid-Atlantic Bight (2019+)

**Contributor(s)**: Paula Fratantoni, paula.fratantoni@noaa.gov

**Data steward**: Kimberly Bastille, kimberly.bastille@noaa.gov

**Point of contact**: Paula Fratantoni, paula.fratantoni@noaa.gov

**Public availability statement**: Source data are publicly available at ftp://ftp.nefsc.noaa.gov/pub/hydro/matlab_files/yearly and in the World Ocean Database housed at  http://www.nodc.noaa.gov/OC5/SELECT/dbsearch/dbsearch.html under institute code number 258.

## Methods

### Data sources

The bottom temperature index incorporates near-bottom temperature measurements collected on Northeast Fisheries Science Center (NEFSC) surveys between 1977-present. Early measurements were made using surface bucket samples, mechanical  bathythermographs and expendable bathythermograph probes, but by 1991 the CTD – an acronym for conductivity temperature and depth – became standard equipment on all NEFSC surveys.  Near-bottom refers to the deepest observation at each station that falls within 10 m of the reported water depth. Observations encompass the entire continental shelf area extending from Cape Hatteras, NC to Nova Scotia, Canada, inclusive of the Gulf of Maine and Georges Bank.

### Data extraction

While all processed hydrographic data are archived in an Oracle database (OCDBS), we work from Matlab-formatted files stored locally. 

### Data analysis

Ocean temperature on the Northeast U.S. Shelf varies significantly on seasonal timescales.  Any attempt to resolve year-to-year changes requires that this seasonal variability be quantified and removed to avoid bias. This process is complicated by the fact that NEFSC hydrographic surveys conform to a random stratified sampling design meaning that stations are not repeated at fixed locations year after year so that temperature variability cannot be assessed at fixed station locations. Instead, we consider the variation of the average bottom temperature within four [Ecological Production Units](#epu) (EPUs): Middle Atlantic Bight, Georges Bank, Gulf of Maine and Scotian Shelf. Within each EPU, ocean temperature observations are extracted from the collection of measurements made within 10 m of the bottom on each survey and an area-weighted average temperature is calculated. The result of this calculation is a timeseries of regional average near-bottom temperature having a temporal resolution that matches the survey frequency in the database. Anomalies are subsequently calculated relative to a reference annual cycle, estimated using a multiple linear regression model to fit an annual harmonic (365-day period) to historical regional average temperatures from 1981-2010.  The curve fitting technique to formulate the reference annual cycle follows the methodologies outlined by @mountain1991.  The reference period was chosen because it is the standard climatological period adopted by the World Meteorological Organization. The resulting anomaly time series represents the difference between the time series of regional mean temperatures and corresponding reference temperatures predicted by a reference annual cycle for the same time of year. Finally, a reference annual average temperature (calculated as the average across the reference annual cycle) is added back into the anomaly timeseries to convert temperature anomalies back to ocean bottom temperature.


### Data processing

Derived bottom temperature data were formatted for inclusion in the `ecodata` R package using the R code found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_bottom_temp.R).

<!--chapter:end:chapters/bottom_temperature_insitu.Rmd-->

# Calanus Stage

**Description**: Calanus abundance by life stage

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2021), State of the Ecosystem - Mid-Atlantic (2021)

**Indicator category**:  Database pull with analysis

**Contributor(s)**:  Ryan Morse

**Data steward**: Ryan Morse <ryan.morse@noaa.gov>
  
**Point of contact**: Ryan Morse <ryan.morse@noaa.gov>
  
**Public availability statement**:  Please contact Harvey Walsh (<harvey.walsh@noaa.gov>) for raw data. 


## Methods

### Data sources
Zooplankton data are from the National Oceanographic and Atmospheric Administration Marine Resources Monitoring, Assessment and Prediction (MARMAP) program and Ecosystem Monitoring (EcoMon) cruises detailed extensively in @Kane2007, @Kane2011, and @Morse2017.


### Data analysis
This index tracks the overall abundance of mature adult *Calanus finmarchicus* copepods and immature copepodite stage-5 (c5) *Calanus finmarchicus* copepods on the US Northeast Shelf ecosystem. The life cycle of *C. finmarchicus* relies on an overwintering phase (diapuse) where immature c5 copepodites build a lipid reserve prior to entering diapuse and remain at depth until favorable conditions for growth emerge. Because of this lipid reserve, diapausing c5 copepodites are a primary food source for many organisms, including the North Atlantic right whale.

Data are processed similarly to @Morse2017, except that cruises were partitioned into three seasons based on the median day of the year (DOY) for a given cruise. Cruises with median DOY between 0 and 120 were classified as spring cruises (i.e. their bimontly median dates correspond to 1 or 3). Cruises with a median DOY between 121 and 243 were classified as summer (bimonthly means of 5 or 7). Cruises with a median DOY between 244 and 366 were classified as fall (bimonthly mean cruise date of 9 or 11). Samples were assigned to EPUs based on their location, and transformed from raw counts to units of number per 100 m^-3 following MARMAP protocols. Samples were then aggregated to EPU by year using log transformed abundance. Cruises with less than 10 sampling days per cruise were removed due to incomplete surveys. Samples were limited to Calanus finmarchicus adults and copepodite stage-5 (c5) for inclusion as an indicator.


Code used to analyze calanus stage data can be found [at this link](https://github.com/NOAA-EDAB/tech-doc/blob/master/R/stored_scripts/CalanusStage_SOE.R). 

### Data processing

The Calanus Stage indicator was formatted for inclusion in the `ecodata` R package using the R script  found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_calanus_stage.R).

<!--chapter:end:chapters/calanus_stage.Rmd-->

# Catch and Fleet Diversity {#commercial_div}

**Description**: Permit-level species diversity and Council-level fleet diversity.

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2018+), State of the Ecosystem - Mid-Atlantic (2018+)

**Indicator category**: Database pull with analysis; Published methods

**Contributor(s)**: Geret DePiper, Min-Yang Lee
  
**Data steward**: Geret DePiper, <geret.depiper@noaa.gov>
  
**Point of contact**: Geret DePiper, <geret.depiper@noaa.gov>
  
**Public availability statement**: Source data is not publicly availabe due to PII restrictions. Derived time series are available for download [here](https://comet.nefsc.noaa.gov/erddap/tabledap/comm_data_soe_v1.html).
  
## Methods
Diversity estimates have been developed to understand whether specialization, or alternatively stovepiping, is occurring in fisheries of the Northeastern Large Marine Ecosystem. We use the average effective Shannon indices for species revenue at the permit level, for all permits landing any amount of [NEFMC](https://www.nefmc.org/) or [MAFMC](http://www.mafmc.org/) Fishery Management Plan (FMP) species within a year (including both Monkfish and Spiny Dogfish). We also use the effective Shannon index of fleet revenue diversity and count of active fleets to assess the extent to which the distribution of fishing changes across fleet segments.

### Data sources
Data for these diversity estimates comes from a variety of sources, including the Commercial Fishery Dealer Database, Vessel Trip Reports, Clam logbooks, vessel characteristics from Permit database, WPU series producer price index. These data are typically not available to the public.

### Data extraction 
The following describes both the permit-level species and fleet diversity data generation. Price data was extracted from the Commercial Fishery Dealer database (CFDERS) and linked to Vessel Trip Reports by a heirarchical matching algorithm that matched date and port of landing at its highest resolution. Code used in these analyses is available upon request.

<!-- For NOAA personnel: Code currently archived in the \\\\net\\home2\\mlee\\diversity\\code folder, while data is currently archived in \\\\net\\home2\\mlee\\diversity folder. -->

Output data was then matched to vessel characteristics from the VPS VESSEL data set. For the permit-level estimate, species groups are based off of a slightly refined NESPP3 code (Table \@ref(tab:spp-groupings)), defined in the data as "myspp", which is further developed in the script to rectify inconsistencies in the data.


```{r spp-groupings, eval = T}
raw.dir <- here::here("data")
spp <- read.csv(file.path(raw.dir,"spp_groupings.csv"),stringsAsFactors = F)
spp <- spp %>%
  dplyr::rename( 'Common Name' = COMNAME,
          'Scientific Name' = SCINAME) %>%
  dplyr::select(Group, NESPP3, 'Common Name', 'Scientific Name')
 
knitr::kable(spp, caption="Species grouping", booktabs=T, longtable = T) %>%
  kableExtra::kable_styling(
    latex_options = c("repeat_header","scale_down"), font_size = 5) %>%
  kableExtra::collapse_rows(columns = 1)

```


For the fleet diversity metric, gears include scallop dredge (gearcodes DRS, DSC, DTC, and DTS), other dredges (gearcodes DRM, DRO, and DRU), gillnet (gearcodes GND, GNT, GNO, GNR, and GNS), hand (gearcode HND), longline (gearcodes LLB and LLP), bottom trawl (gearcodes OTB, OTF, OTO, OTC. OTS, OHS, OTR, OTT, and PTB), midwater trawls (gearcode OTM and PTM), pot (gearcodes PTL, PTW, PTC, PTE, PTF, PTH, PTL, PTO, PTS, and PTX), purse seine (gearcode PUR), and hydraulic clam dredge (gearcode DRC).Vessels were further grouped by length categories of less than 30 feet, 30 to 50 feet, 50 to 75 feet, and 75 feet and above. All revenue was deflated to real dollars using the "WPU0223" Producer Price Index with a base of January 2015. Stata code for data processing is available [here](https://github.com/NOAA-EDAB/tech-doc/tree/master/data/Human_Dimensions_code).

### Data analysis
This permit-level species effective Shannon index is calculated as 
$$exp(-\sum_{i=1}^{N}p_{ijt}ln(p_{ijt}))$$
for all $j$, with $p_{ijt}$ representing the proportion of revenue generated by species or species group $i$ for permit $j$ in year $t$, and is a composite of richness (the number of species landed) and abundance (the revenue generated from each species). The annual arithmetic mean value of the effective Shannon index across permits is used as the indicator of permit-level species diversity. 

In a similar manner, the fleet diversity metric is estimated as 
$$exp(-\sum_{i=1}^{N}p_{kt}ln(p_{kt})) $$
for all $k$, where $p_{kt}$ represents the proportion of total revenue generated by fleet segment $k$ (gear and length combination) per year $t$. The indices each run from 1996 to 2017. A count of the number of fleets active in every year is also provided to assess whether changes in fleet diversity are caused by shifts in abundance (number of fleets), or evenness (concentration of revenue). The work is based off of analysis conducted in @eric_m_thunberg_measures_2015 and published in @gaichas_framework_2016.

### Data processing

Catch and fleet diversity indicators were formatted for inclusion in the `ecodata` R package using the R script  found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_commercial_div.R).

<!--chapter:end:chapters/Catch_and_Fleet_Diversity_indicators.Rmd-->

# Chesapeake Bay Salinity and Temperature {#ches_bay_sal}

**Description**: Chesapeake Bay Salinity and Temperature

**Found in**: State of the Ecosystem - Mid-Atlantic (2020+)

**Indicator category**: Database pull with analysis

**Contributor(s)**: Bruce Vogt, Charles Pellerin
  
**Data steward**: Charles Pellerin, <charles.pellerin@noaa.gov>
  
**Point of contact**: Bruce Vogt, <bruce.vogt@noaa.gov> 
  
**Public availability statement**: Source data are publicly available.

## Methods


### Data sources
The National Oceanic and Atmospheric Administration’s (NOAA) Chesapeake Bay Interpretive Buoy System ([CBIBS](https://buoybay.noaa.gov/data)) is a network of observing platforms (buoys) that collect meteorological, oceanographic, and water-quality data and relay that information using wireless technology. The stations have been in place since 2007. The Sting Ray station was deployed in July of 2008 and has been monitoring conditions on and off since then. The data is recorded in situ and sent to a server over a cellular modem. 

The standard CBIBS instrument is a WETLabs WQM (water quality monitor) mounted in the buoy well approximately 0.5 meters below the surface. Seabird purchased WETLabs and are now the manufacturer of the instruments. The WQM instruments are calibrated and swapped out on a regular basis. Salinity is stored as a `double` with the units of PSU.

### Data extraction 
Data is directly inserted into a database from the real time system over the cellular network. The general public can use [this link](https://buoybay.noaa.gov/observations/data-download) to explore and pull that data from the CBIBS database. The process for data extraction for this indicator can be found [here](https://github.com/NOAA-EDAB/tech-doc/tree/master/R/stored_scripts/ches_bay_sal_extraction.txt).


### Data analysis
The data is processed with a [python script](https://github.com/NOAA-EDAB/tech-doc/tree/master/R/stored_scripts/ches_bay_sal_analysis.py). This creates an array and runs the data through a [QARTOD](https://ioos.noaa.gov/project/qartod/) (Quality Assurance/Quality Control of Real-Time Oceanographic Data) routine. The result is a set of flags. Only the good data is used in the plot below.

The stations include annapolis ([AN](https://buoybay.noaa.gov/locations/annapolis)), goose reef ([GR](https://buoybay.noaa.gov/locations/gooses-reef)), potomac ([PL](https://buoybay.noaa.gov/locations/potomac)), and york-split ([YS](https://buoybay.noaa.gov/locations/york-spit)). 

### Data processing
Code for processing salinity data can be found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_ch_bay_sal.R).

<!--chapter:end:chapters/ches_bay_sal.Rmd-->

# Chesapeake Bay Seasonal SST Anomalies {#ches_bay_sst}

**Description**:  Chesapeake Bay Seasonal SST Anomalies

**Found in**: State of the Ecosystem - Mid-Atlantic (2021+)

**Indicator category**: Database pull with analysis

**Contributor(s)**: Bruce Vogt, Ron Vogel
  
**Data steward**: Ron Vogel, <ronald.vogel@noaa.gov>
  
**Point of contact**: Bruce Vogt, <bruce.vogt@noaa.gov> 
  
**Public availability statement**: Source data are publicly available.
  
**Public availability statement**: Source data are publicly available [here](https://eastcoast.coastwatch.noaa.gov/cw_avhrr.php).

## Methods


### Data sources
Data for Chesapeake Bay seasonal sea surface temperature (SST) anomalies were derived from the NOAA Multi-satellite AVHRR SST data set, available from NOAA CoastWatch [East Coast Regional Node](https://eastcoast.coastwatch.noaa.gov). The data set is a composite of overpasses from all operational satellites currently flying the Advanced Very High Resolution Radiometer (AVHRR) instrument. SST is derived using the Operational Non-linear Multichannel SST Algorithm (@Li2001a, @Li2001b). Both daytime and nighttime overpasses are composited into daily and then seasonal SST products. The data extend from 2008 to present, and provide a 1.25 km x 1.25 km grid of SST measurements.

### Data analysis
Anomaly maps of SST are generated by creating long-term ‘climatological’ seasonal average SST for the years from 2008 to the year immediately prior to the current year `(max(Year) - 1)`. The reference period serves as a benchmark for comparing current observations. The current-year seasonal SST is then subtracted from the long-term seasonal average. Seasons for Chesapeake Bay are Dec-Feb (winter), Mar-May (spring), Jun-Aug (summer), and Sep-Nov (fall).


### Data processing
Code for processing Chesapeake Bay temperature data can be found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_ch_bay_temp.R).

<!--chapter:end:chapters/ches_bay_sst.Rmd-->

# Chesapeake Bay Water Quality Standards Attainment {#ches_bay_wq}

**Description**: A multimetric indicator describing the attainment status of Chesapeake Bay with respect to three water quality standards criteria, namely, dissolved oxygen, chlorophyll-a, and water clarity/submerged aquatic vegetation. 

**Indicator category**: Published method; Database pull with analysis

**Found in**: State of the Ecosystem - Mid-Atlantic (2019,2022)

**Contributor(s)**: Qian Zhang, Richard Tian, and Peter Tango
  
**Data steward**: Qian Zhang, <qzhang@chesapeakebay.net>
  
**Point of contact**: Qian Zhang, <qzhang@chesapeakebay.net>
  
**Public availability statement**: Data are publicly available (see Data Sources below).

## Methods

To protect the aquatic living resources of Chesapeake Bay, the [Chesapeake Bay Program](https://www.chesapeakebay.net/) (CBP) partnership has developed a guidance framework of ambient water quality criteria with designated uses and assessment procedures for dissolved oxygen, chlorophyll-a, and water clarity/submerged aquatic vegetation (SAV) [@usepa2003]. To achieve consistent assessment over time and between jurisdictions, a multimetric indicator was proposed by the CBP partnership to provide a means for tracking the progress in all 92 management segments of Chesapeake Bay [@usepa2017]. This indicator has been computed for each three-year assessment period since 1985-1987, providing an integrated measure of Chesapeake Bay’s water quality condition over the last three decades.

### Data sources

The multimetric indicator required monitoring data on dissolved oxygen (DO) concentrations, chlorophyll-a concentrations, water clarity, SAV acreage, water temperature, and salinity. SAV acreage has been measured by the Virginia Institute of Marine Science in collaboration with the CBP, which is available via http://web.vims.edu/bio/sav/StateSegmentAreaTable.htm. Data for all other parameters were obtained from the [CBP Water Quality Database](http://www.chesapeakebay.net/data/downloads/cbp_water_quality_database_1984_present). These data have been routinely reported to the CBP by the Maryland Department of Natural Resources, Virginia Department of Environmental Quality, Old Dominion University, Virginia Institute of Marine Science, and citizen/volunteer monitoring initiatives.

### Data analysis

**Criteria attainment assessment**

Monitoring data of DO, chlorophyll-a, and water clarity/SAV were processed and compared with water quality criteria thresholds according to different designated uses (DUs). These DUs are migratory spawning and nursery (MSN), open water (OW), deep water (DW), deep channel (DC), and shallow water (SW), which reflect the seasonal nature of water column structure and the life history needs of living resources. Station-level DO and chlorophyll-a data were spatially interpolated in three dimensions.

Salinity and water temperature data were used to compute the vertical density structure of the water column, which was translated into layers of different DUs. Criteria attainment was determined by comparing violation rates over a 3-year period to a reference cumulative frequency distribution that represents the extent of allowable violation. This approach was implemented using FORTRAN codes, which are provided as a zipped folder. For water clarity/SAV, the single best year in the 3-year assessment period was compared with the segment-specific acreage goal, the water clarity goal, or a combination of both. For more details, refer to the Methods section of @zhang2018.

**Indicator calculation**

The multimetric indicator quantifies the fraction of segment-DU-criterion combinations that meet all applicable season-specific thresholds for each 3-year assessment period from 1985-1987 to 2017-2019. For each 3-year assessment period, all applicable segment-DU-criterion combinations were evaluated in a binomial fashion and scored 1 for “in attainment” and 0 for “nonattainment”. The classified status of each segment-DU-criterion combination was weighted via segments’ surface area and summed to obtain the multimetric index score. This weighting scheme was adopted for two reasons: (1) segments vary in size over four orders of magnitude, and (2) surface area of each segment does not change with time or DUs, unlike seasonally variable habitat volume or bottom water area [@usepa2017]. For more details, refer to the Methods section of @zhang2018.

The indicator provides an integrated measure of Chesapeake Bay’s water quality condition (Figure 1). In 2017-2019, 33.1% of all tidal water segment-DU-criterion combinations are estimated to have met or exceeded applicable water quality criteria thresholds, which marks the best 3-year status since 1985-1987. The indicator has a positive and statistically significant trend from 1985-1987 to 2017-2019, which shows that Chesapeake Bay is on a positive trajectory toward recovery. This pattern was statistically linked to total nitrogen reduction, indicating responsiveness of attainment status to management actions implemented to reduce nutrients in the system. 
Patterns of attainment of individual DUs are variable (Figure 2). Changes in OW-DO, DC-DO, and water clarity/SAV have shown long-term improvements, which have contributed to overall attainment indicator improvement. By contrast, the MSN-DO attainment experienced a sharp spike in the first few assessment periods but generally degraded after the 1997-1999, which has implications to the survival, growth, and reproduction of the migratory and resident tidal freshwater fish during spawning and nursery season in the tidal freshwater to low-salinity habitats. The status and trends of tidal segments’ attainment may be used to inform siting decisions of aquaculture operations in Chesapeake Bay.### Data processing

The indicator data set was formatted for inclusion in the ecodata R package using the R script found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_ches_bay_wq.R). 


<!--chapter:end:chapters/ches_bay_water_quality.Rmd-->

# Cold Pool Index {#cold_pool}

**Description**: Cold Pool Index - three annual cold pool indices (and the standard errors) between 1958 and 2021. 

**Found in**: State of the Ecosystem - Mid-Atlantic (2020 (Different Methods), 2021 (Different Methods), 2022+)

**Indicator category**:Published methods, Extensive analysis, not yet published

**Contributor(s)**: Hubert du Pontavice, Vincent Saba, Zhuomin Chen
  
**Data steward**: Kimberly Bastille <Kimberly.bastille@noaa.gov>
  
**Point of contact**: Hubert du Pontavice <hubert.dupontavice@noaa.gov>
  
**Public availability statement**: Source data are NOT publicly available. Please email hubert.dupontavice@noaa.gov for further information and accessing the ROMS-NWA bottom temperature data.

## Methods
The methodology for the cold pool index changed between 2020, 2021, and 2022 SOEs. The most recent methods and at the top with older methods below those.


The cold pool is an area of relatively cold bottom water that forms on the US northeast shelf in the Mid-Atlantic Bight.


### Data Sources
The three cold pool indices were calculated using a high-resolution long-term bottom temperature product. All the details on the bottom temperature dataset are available in the [Bottom Temperature - High Resolution](https://noaa-edab.github.io/tech-doc/bottom-temperature---high-resolution.html) chapter and in @duPontavice2023.


### Data Analysis 

#### Cold Pool Domain

The first step was to define the Cold Pool domain, which is typically located within the MAB and the southern flank of Georges Bank (@Chen2018; @Houghton1982; @Lentz2017). Here, we delineated a spatial domain covering the management area of the SNEMA yellowtail flounder (since this method was initially developed to study the Cold Pool impact on yellowtail flounder recruitment) comprising the MAB and in the SNE shelf between the 20 and 200 m isobaths (@Chen2018; @Chen2020). We restricted the time period from June (to match the start of the settlement period; @Sullivan2005) to September (which is the average end date of the Cold Pool (calendar day 269) estimated by @Chen2020. The Cold Pool domain was defined as the area, wherein average bottom temperature was cooler than 10°C between June and September from 1959 to 2022. We then developed the three Cold Pool indices using bottom temperature from ocean models.

#### Cold Pool Index (Model_CPI)

The Cold Pool Index (Model_CPI) was adapted from @miller2016 based on the method developed in @dupontavice2022. Residual temperature was calculated in each grid cell, i, in the Cold Pool domain as the difference between the average bottom temperature at the year y (Ty) and the average bottom temperature over the period 1959–2022 $$({\bar{T}}_{i,\ 1958-2022})$$ between June and September. Model_CPI was calculated as the mean residual temperature over the Cold Pool domain such that:


$${{CPI}_y}=\ \frac{\sum_{i=1}^{n}{{(T}_{i,\ y}\ -\ {\bar{T}}_{i,\ 1958-2022})\ }}{n}$$


where n is the number of grid cells over the Cold Pool domain.

#### Persistence Index (Model_PI)

The temporal component of the Cold Pool was calculated using the persistence index (Model_PI). Model_PI measures the duration of the Cold Pool and is estimated using the month when bottom temperature rises above 10C after the Cold Pool is formed each year. We first selected the area over the cold pool domain in which bottom temperature falls below 10C between June and October. We then calculated the “residual month” in each grid cell, i, in the Cold Pool domain as the difference between the month when bottom temperature rises above 10C in year y and the average of those months over the period 1959–2022. Then, Model_PI was calculated as the mean “residual month” over the Cold Pool domain:


$${PI}_y=\ \frac{\sum_{i=1}^{n}{{(Month}_{i,\ y}\ -\ {\bar{Month}}_{i,\ 1958-2022})\ }}{n}$$


#### Spatial Extent Index (Model_SEI)

The spatial component of the Cold Pool and the habitat provided by the cold pool was calculated using the Spatial Extent Index (Model_SEI). The Model_SEI is estimated by the number of cells where bottom temperature remains below 10C for at least 2 months between June and September. 

The Bottom temperature data is the average ROMS-NWA bottom temperature over the decade $$d$$ in the grid cell $$i$$. All above methods @dupontavice2022.

Bottom temperature from Glorys reanalysis and Global Ocean Physics Analysis were not being processed. 

Bottom temperature from ROMS-NWA (used for the period 1959-1992) were bias-corrected. Previous studies that focused on the ROMS-NWA-based Cold Pool highlighted strong and consistent warm bias in bottom temperature of about 1.5C during the stratified seasons over the period of 1958-2007 (@Chen2018; @Chen2020). In order to bias-correct bottom temperature from ROMS-NWA, we used the monthly climatologies of observed bottom temperature from the Northwest Atlantic Ocean regional climatology (NWARC) over decadal periods from 1955 to 1994. The NWARC provides high resolution (1/10° grids) of quality-controlled in situ ocean temperature based on a large volume of observed temperature data (@Seidov2016a, @Seidov2016b) (https://www.ncei.noaa.gov/products/northwest-atlantic-regional-climatology). The first step was to re-grid the ROMS-NWA to obtain bottom temperature over the same 1/10° grid as the NWARC. Then, a monthly bias was calculated in each grid cell and for each decade (1955–1964, 1965–1974, 1975–1984, 1985–1994) in the MAB and in the SNE shelf:

$${BIAS}_{i,\ d}=\ T_{i,d}^{Climatology}\ -\ {\bar{T}}_{i,\ d}^{ROMS-NWA}$$


where $$T_{i,d}^{Climatology}$$ is the NWARC bottom temperature in the grid cell i for the decade d and $${\bar{T}}_{i,\ d}^{ROMS-NWA}$$ is the average ROMS-NWA bottom temperature over the decade d in the grid cell i. All above methods @dupontavice2022.

### Data processing

Code used to process the cold pool inidcator can be found in the `ecodata` package [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_cold_pool.R). 

## 2021 Methods
**Point of Contact:**: Zhoumin Chen <zhuomin.chen@uconn.edu>

### Data Sources
The three-dimensional temperature of the Northeast US shelf is downloaded from the CMEMS (https://marine.copernicus.eu/). Source data is available [at this link](https://resources.marine.copernicus.eu/?option=com_csw&task=results?option=com_csw&view=details&product_id=GLOBAL_REANALYSIS_PHY_001_030).

### Data Analysis
Depth-averaged spatial temperature is calculated based on the daily Cold Pool dataset, which is quantified following @Chen2018.

### Data processing

Code used to process the cold pool inidcator can be found in the `ecodata` package [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_cold_pool.R).

## 2020 Methods
**Point of Contact:**: Chris Melrose <chris.melrose@noaa.gov>

### Data sources
NEFSC Hydrographic Database
This data represents the annual mean bottom temperature residual for Sept-Oct in the Mid-Atlantic Bight cold pool region from 1977-2018.

### Data extraction 


### Data analysis
Methods published @miller2016, [original MATLAB source code](https://github.com/NOAA-EDAB/tech-doc/tree/master/R/stored_scripts/cold_pool_analysis.txt) used in that paper was provided by Jon Hare and used in this analysis.


<!--chapter:end:chapters/cold_pool_index.Rmd-->

# Commercial Landings Data {#comdat}

**Description**: Commercial landings data pull

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2017+), State of the Ecosystem - Mid-Atlantic (2017+)

**Indicator category**: Database pull

**Contributor(s)**: Sean Lucey

**Data steward**: Sean Lucey, <Sean.Lucey@noaa.gov>

**Point of contact**: Sean Lucey, <Sean.Lucey@noaa.gov>

**Public availability statement**: Raw data are not publicly available due to confidentiality of individual fishery participants.  Derived indicator outputs are
available [here](https://comet.nefsc.noaa.gov/erddap/tabledap/group_landings_soe_v1.html).


## Methods

Fisheries dependent data for the Northeast Shelf extend back several decades. Data from the 1960s on are housed in the Commercial database (CFDBS) of the Northeast Fisheries Science Center which contains the commercial fisheries dealer purchase records (weigh-outs) collected by National Marine Fisheries Service (NMFS) Statistical Reporting Specialists and state agencies from Maine to Virginia. The data format has changed slightly over the time series with three distinct time frames as noted in Table \@ref(tab:calibration1) below.  

```{r calibration1, eval = T, echo = F}
com.tables <- data.frame(Table = c('WOLANDS', 'WODETS', 'CFDETS_AA'),
                         Years = c('1964 - 1981', '1982 - 1993', '> 1994'))
knitr::kable(com.tables, caption="Data formats", booktabs = T) #%>% 
  #kableExtra::kable_styling(full_width = F)

```

Comlands is an R database pull that consolidates the landings records from 1964 on and attempts to associate them with NAFO statistical areas (Figure \@ref(fig:StatAreaMap)). The script is divided into three sections. The first pulls domestic landings data from the yearly landings tables and merges them into a single data source. The second section applies an algorithm to associate landings that are not allocated to a statistical area using similar characteristics of the trip to trips with known areas. The final section pulls foreign landings from the Northwest Atlantic Fisheries Organization website and rectifies species and gear codes so they can be merged along with domestic landings.

```{r StatAreaMap, fig.cap="Map of the North Atlantic Fisheries Organization (NAFO) Statistical Areas.  Colors represent the Ecological Production Unit (EPU) with which the statistical area is associated.", echo=F, eval=T, out.width = "50%", fig.align = "center"}

image.dir <- here::here('images')

knitr::include_graphics(file.path(image.dir, 'Stat_Area_Map.jpg'))
```

During the first section, the Comlands script pulls the temporal and spatial information as well as vessel and gear characteristics associated with the landings in addition to the weight, value, and utilization code of each species in the landings record.  The script includes a toggle to use landed weights as opposed to live weights.  For all but shellfish species, live weights are used for the State of the Ecosystem report.  Due to the volume of data contained within each yearly landings table, landings are aggregated by species, utilization code, and area as well as by month, gear, and tonnage class.  All weights are then converted from pounds to metric tons.  Landings values are also adjusted for inflation using the Producer Price Index by Commodity for Processed Foods and Feeds: Unprocessed and Packaged Fish.  Inflation is based on January of the terminal year of the data pull ensuring that all values are in current dollar prices.


```{r geartypes, eval = T, echo = F}

gear.table <- data.frame('gear code' = c(1,2,3,4,5,6,7,8,9),
                         'Major gear' = c('Otter Trawls', 'Scallop Dredges',
                                        'Other Dredges', 'Gillnets', 'Longlines',
                                        'Seines', 'Pots/Traps', 'Midwater', 'Other'))
names(gear.table) <- c("","Major gear")


knitr::kable(gear.table, caption = "Gear types used in commercial landings",  booktabs=T)# %>%
  #kableExtra::kable_styling(full_width = F)
```

Several species have additional steps after the data is pulled from CFDBS.  Skates are typically landed as a species complex.  In order to segregate the catch into species, the ratio of individual skate species in the NEFSC bottom trawl survey is used to disaggregate the landings. A similar algorithm is used to separate silver and offshore hake which can be mistaken for one another.  Finally, Atlantic herring landings are pulled from a separate database as the most accurate weights are housed by the State of Maine.  Comlands pulls from the State database and replaces the less accurate numbers from the federal database.

The majority of landings data are associated with a NAFO Statistical Area.  For those that are not, Comlands attempts to assign them to an area using similar characteristics of trips where the area is known.  To simplify this task, landings data are further aggregated into quarter and half year, small and large vessels, and eight major gear categories (Table \@ref(tab:geartypes)).  Landings are then proportioned to areas that meet similar characteristics based on the proportion of landings in each area by that temporal/vessel/gear combination.  If a given attribute is unknown, the algorithm attempts to assign it one, once again based on matched characteristics of known trips.  Statistical areas are then assigned to their respective [Ecological Production Unit](#epu) (Table \@ref(tab:statareas)).  

```{r statareas, eval = T, echo = F}
area.table <- data.frame(EPU = c('Gulf of Maine', 'Georges Bank', 'Mid-Atlantic'),
                         'Stat Areas' = c('500, 510, 512, 513, 514, 515',
                                        '521, 522, 523, 524, 525, 526, 551, 552, 561, 562',
                                        '537, 539, 600, 612, 613, 614, 615, 616, 621, 622, 625, 626, 631, 632'))
names(area.table) [2]<-  "Stat Areas"
kable(area.table, caption = "Statistical areas making up each EPU") %>% 
  kable_styling(latex_options = "HOLD_position")
```

The final step of Comlands is to pull the foreign landings from the [NAFO database](https://www.nafo.int/Data/frames).  US landings are removed from this extraction so as not to be double counted.  NAFO codes and CFDBS codes differ so the script rectifies those codes to ensure that the data is seamlessly merged into the domestic landings.  Foreign landings are flagged so that they can be removed if so desired.


### Data sources
Comland is a database query of the NEFSC commercial fishery database (CFDBS). More information about the CFDBS is available [here](https://inport.nmfs.noaa.gov/inport/item/27401).  

### Data extraction 

[`comlandr`](https://github.com/NOAA-EDAB/comlandr) is a package used to extract relevant data from the database.  


#### Data Processing

The landings data were formatted for inclusion in the `ecodata` R package with this [R code](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_comdat.R).

### Data analysis

Fisheries dependent data from Comlands is used in several indicators for the State of the Ecosystem report; the more complicated analyses are detailed in their own sections (ie. [bennet index](#bennet)).  The most straightforward use of this data are the region total and aggregate landings indicators.  Regional totals sum landings three ways: 1) All landings regardless of management authority and eventual use (i.e. food or bait), 2) All landings used for seafood but regardless of management authority, and 3) All landings used for seafood and managed by the regional fisheries management council for whom the report is presented.

Landings are also calculated by aggregate groups per region.  These are calculated by first assigning the various species into [aggregate groups](#aggroups).  Landings are then summed by year, [EPU](#epu), aggregate group, and whether they are managed by the regional fisheries management council or not.  Proportions of managed landings to total landings are also calculated and have been reported in some reports.

  These are calculated by first assigning the various species into [aggregate groups](#aggroups).  Landings are then summed by year, [EPU](#epu), aggregate group, and whether they are managed by the regional fisheries management council or not.  Proportions of managed landings to total landings are also calculated and have been reported in some reports.

<!--chapter:end:chapters/landings_data.Rmd-->

# Conceptual Models

**Description**: Conceptual models for the New England (Georges Bank and Gulf of Maine) and Mid-Atlantic regions of the Northeast US Large Marine Ecosystem

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2018+), State of the Ecosystem - Mid-Atlantic (2018+) 

**Indicator category**: Synthesis of published information, Extensive analysis; not yet published

**Contributor(s)**: Sarah Gaichas, Patricia Clay, Geret DePiper, Gavin Fay, Michael Fogarty, Paula Fratantoni, Robert Gamble, Sean Lucey, Charles Perretti, Patricia Pinto da Silva, Vincent Saba, Laurel Smith, Jamie Tam, Steve Traynor, Robert Wildermuth 

**Data steward**: Sarah Gaichas, <sarah.gaichas@noaa.gov>

**Point of contact**: Sarah Gaichas, <sarah.gaichas@noaa.gov>

**Public availability statement**: All source data aside from confidential commercial fisheries data (relevant only to some components of the conceptual models) are available to the public (see Data Sources below).


## Methods
Conceptual models were constructed to facilitate multidisciplinary analysis and discussion of the linked social-ecological system for integrated ecosystem assessment. The overall process was to first identify the components of the model (focal groups, human activities, environmental drivers, and objectives), and then to document criteria for including groups and linkages and what the specific links were between the components.

The prototype conceptual model used to design Northeast US conceptual models for each ecosystem production unit (EPU) was designed by the California Current IEA program. The California Current IEA developed an [overview conceptual model for the Northern California Current Large Marine Ecosystem (NCC)](https://www.integratedecosystemassessment.noaa.gov/regions/california-current/cc-ecosystem-components), with models for each [focal ecosystem component](https://www.integratedecosystemassessment.noaa.gov/regions/california-current/cc-coastalpelagicspecies#overview) that detailed the [ecological](https://www.integratedecosystemassessment.noaa.gov/regions/california-current/cc-coastalpelagicspecies#ecologicalinteractions), [environmental](https://www.integratedecosystemassessment.noaa.gov/regions/california-current/cc-coastalpelagicspecies#environmentalDrivers), and [human system](https://www.integratedecosystemassessment.noaa.gov/regions/california-current/cc-coastalpelagicspecies#humanActivities) linkages. Another set of conceptual models outlined [habitat](https://www.integratedecosystemassessment.noaa.gov/regions/california-current/cc-habitat) linkages. 

An inital conceptual model for Georges Bank and the Gulf of Maine was outlined at the 2015 ICES WGNARS meeting. It specified four categories: Large scale drivers, focal ecosystem components, human activities, and human well being. Strategic management objectives were included in the conceptual model, which had not been done in the NCC. Focal ecosystem components were defined as aggregate species groups that had associated US management objectives (outlined within WGNARS for IEAs, see @depiper_operationalizing_2017): groundfish, forage fish, fished invertebrates, living habitat, and protected species. These categories roughly align with Fishery Managment Plans (FMPs) for the New England Fishery Management Council. The Mid-Atlantic conceptual model was developed along similar lines, but the focal groups included demersals, forage fish, squids, medium pelagics, clams/quahogs, and protected species to better align with the Mid Atlantic Council's FMPs.

```{r draftmod, echo = F, eval = T, out.width='80%'}

knitr::include_graphics(file.path(image.dir, 'GBGOMconceptual1.png'))
```

After the initial draft model was outlined, working groups were formed to develop three submodels following the CCE example: ecological, environmental, and human dimensions. The general approach was to specify what was being included in each group, what relationship was represented by a link between groups, what threshold of the relationship was used to determine whether a relationship was significant enough to be included (we did not want to model everything), the direction and uncertainty of the link, and documentation supporting the link between groups. This information was recorded in a [spreadsheet](https://comet.nefsc.noaa.gov/erddap/tabledap/concept_model_2018.html). Submodels were then merged together by common components using the "merge" function in the (currently unavailable) desktop version of Mental Modeler (http://www.mentalmodeler.org/#home; @gray_mental_2013). The process was applied to Georges Bank (GB), the Gulf of Maine (GOM), and the Mid-Atlantic Bight (MAB) [Ecological Production Units](#epu). 

### Data sources

#### Ecological submodels
Published food web (EMAX) models for each subregion [@link_documentation_2006; @link_northeast_2008], food habits data collected by NEFSC trawl surveys [@smith_trophic_2010], and other literature sources [@smith_consumption_2015] were consulted. Expert judgement was also used to adjust historical information to current conditions, and to include broad habitat linkages to Focal groups. 

#### Environmental submodels
Published literature on the primary environmental drivers (seasonal and interannual) in each EPU was consulted. 
Sources for Georges Bank included @backus_georges_1987 and @townsend_oceanography_2006. 
Sources for the Gulf of Maine included @smith_mean_1983, @smith_interannual_2001, @mupparapu_role_2002, @townsend_oceanography_2006, @smith_regime_2012, and @mountain_labrador_2012.  
Sources for the Mid Atlantic Bight included @houghton_middle_1982, @beardsley_nantucket_1985, @lentz_climatology_2003, @mountain_variability_2003,   @glenn_biogeochemical_2004, @sullivan_evidence_2005, @castelao_seasonal_2008, @shearman_long-term_2009, @castelao_temperature_2010, @gong_seasonal_2010, @gawarkiewicz_direct_2012, @forsyth_recent_2015, @fratantoni_description_2015, @zhang_dynamics_2015, @miller_state-space_2016, and @lentz_seasonal_2017.

#### Human dimensions submodels
Fishery catch and bycatch information was drawn from multiple regional datasets, incuding the Greater Atlantic Regional Office Vessel Trip Reports & Commercial Fisheries Dealer databases, Northeast Fishery Observer Program & Northeast At-Sea Monitoring databases, Northeast Fishery Science Center Social Sciences Branch cost survey, and the Marine Recreational Informational Program database. Further synthesis of human welfare derived from fisheries was drawn from @fare_adjusting_2006, @walden_productivity_2012, @lee_inverse_2013, @lee_hedonic_2014, and @lee_applying_2017. Bycatch of protected species was taken from @waring_us_2015, with additional insights from @bisack_measuring_2014. The top 3 linkages were drawn for each node. For example, the top 3 recreational species for the Mid-Atlantic were used to draw linkages between the recreational fishery and species focal groups. A similar approach was used for relevant commercial fisheries in each region.

Habitat-fishery linkages were drawn from unpublished reports, including:  

1. Mid-Atlantic Fishery Management Council. 2016. [Amendment 16](http://www.mafmc.org/actions/msb-am16) to the Atlantic Mackerel, Squid, and Butterfish Fishery Management Plan: Measures to protect deep sea corals from Impacts of Fishing Gear. Environmental Assessment, Regulatory Impact Review, and Initial Regulatory Flexibility Analysis. Dover, DE. August, 2016. 

2. NOAA. 2016. Deep sea coral research and technology program 2016 Report to Congress. http://www.habitat.noaa.gov/protection/corals/deepseacorals.html retrieved February 8, 2017.  

3. New England Fishery Management Council. 2016. Habitat Omnibus Deep-Sea Coral Amendment: Draft. http://www.nefmc.org/library/omnibus-deep-sea-coral-amendment Retrieved Feb 8, 2017.

4. Bachman et al. 2011. The Swept Area Seabed Impact (SASI) Model: A Tool for Analyzing the Effects of Fishing on Essential Fish Habitat. New England Fisheries Management Council Report. Newburyport, MA.

Tourism and habitat linkages were drawn from unpublished reports, including: 

1. http://neers.org/RESOURCES/Bibliographies.html                               

2. Great Bay (GoM) resources  http://greatbay.org/about/publications.htm        

3. Meaney, C.R. and C. Demarest. 2006. Coastal Polution and New England Fisheries. Report for the New England Fisheries Management Council. Newburyport, MA.

4. List of valuation studies, by subregion and/or state, can be found at http://www.oceaneconomics.org/nonmarket/valestim.asp.

Published literature on human activities in each EPU was consulted. 

Sources for protected species and tourism links included @hoagland_demand_2000 and @lee_economic_2010. 

Sources for links between environmental drivers and human activities included @adams_uncertainty_1973, @matzarakis_proceedings_2001, @scott_climate_2004, @hess_climate_2008, @colburn_social_2012, @jepson_development_2013, and @colburn_indicators_2016. 

Sources for cultural practices and attachments links included @pauly_putting_1997, @mcgoodwin_understanding_2001, @st_martin_making_2001, @norris-raynbird_for_2004, @pollnac_toward_2006, @clay_defining_2007, @clay_definingfishing_2008, @everett_role_2008, @donkersloot_politics_2010, @lord_understanding_2011, @halpern_index_2012, @wynveen_natural_2012, @cortes-vazquez_identity_2013, @koehn_progress_2013, @potschin_landscapes_2013, @reed_beyond_2013, @urquhart_constructing_2013, @blasiak_paradigms_2014, @klain_what_2014, @poe_cultural_2014, @brown_we_2015, @donatuto_evaluating_2015, @khakzad_role_2016, @oberg_surviving_2016, and @seara_perceived_2016.  

### Data extraction 

#### Ecological submodels
"Data" included model estimated quantities to determine whether inclusion thresholds were met for each potential link in the conceptual model. A matrix with diet composition for each modeled group is an input to the food web model. A matrix of mortalities caused by each predator and fishery on each modeled group is a direct ouput of a food web model (e.g. Ecopath). Food web model biomasss flows between species, fisheries, and detritus were summarized using algorithms implemented in visual basic by Kerim Aydin, NOAA NMFS Alaska Fisheries Science Center. Because EMAX model groups were aggregated across species, selected diet compositions for individual species were taken from the NEFSC food habits database using the FEAST program for selected species (example query below). These diet queries were consulted as supplemental information. 

Example FEAST sql script for Cod weighted diet on Georges Bank can be found [here](https://github.com/NOAA-EDAB/tech-doc/tree/master/R/stored_scripts/conceptual_models_extraction.sql).
Queries for different species are standardized by the FEAST application and would differ only in the svspp code. 

#### Environmental submodels
Information was synthesized entirely from published sources and expert knowledge; no additional data extraction was completed for the environmental submodels.

#### Human dimensions submodels
Recreational fisheries data were extracted from the 2010-2014 MRIP datasets. Original data can be found [here]( data/top10_prim1_common_mode.xlsx) for each region (New England or Mid-Atlantic as defined by states). 

Commercial fishing data was developed as part of the State of the Ecosystem Report, including revenue and food production estimates, with data extraction metodology discussed in the relevant sections of the technical document. In addition, the Northeast Regional Input/Output Model [@steinback_scott_northeast_2006] was used as the basis for the strength of the employment linkages.

### Data analysis
<!--Text description of analysis methods, similar in structure and detail to a peer-reviewed paper methods section.-->
#### Ecological submodels
Aggregated diet and mortality information was examined to determine the type of link, direction of link, and which links between which groups should be inclded in the conceptual models. Two types of ecological links were defined using food web models: prey links and predation/fishing mortality links. Prey links resulted in positve links between the prey group and the focal group, while predation/fishing mortality links resulted in negative links to the focal group to represent energy flows. The intent was to include only the most important linkages between focal groups and with other groups supporting or causing mortality on focal species groups. Therefore, threshold levels of diet and mortality were established (based on those that would select the top 1-3 prey and predators of each focal group): 10% to include a link (or add a linked group) in the model and 20% to include as a strong link. A Primary Production group was included in each model and linked to pelagic habitat to allow environmental effects on habitat to be connected to the ecologial submodel. Uncertainty for the inclusion of each link and for the magnitude of each link was qualitatively assessed and noted in the [spreadsheet](https://comet.nefsc.noaa.gov/erddap/tabledap/concept_model_2018.html). 

Four habitat categories (Pelagic, Seafloor and Demersal, Nearshore, and Freshwater and Estuarine) were included in ecological submodels as placeholders to be developed further along with habitat-specific research. Expert opinion was used to include the strongest links between each habitat type and each Focal group (noting that across species and life stages, members of these aggregate groups likely occupy many if not all of the habitat types). Link direction and strength were not specified. Environmental drivers were designed to link to habitats, rather than directly to Focal groups, to represent each habitat's important mediation function.

EMAX model groups were aggregated to focal groups for the Georges Bank (GB), Gulf of Maine (GOM) and Mid-Atlantic Bight (MAB) conceptual models according to Table \@ref(tab:groups). "Linked groups" directly support or impact the Focal groups as described above.

```{r groups, eval = T, echo = F}
#read in EMAXconceptualmodgroups.csv and kable it
#print(data.dir)
 # # emaxgroups <- read.csv(paste0(data.dir,"/EMAXconceptualmodgroups.csv"),stringsAsFactors=F)
 # names(emaxgroups) <- c("Group Type", "Region", "Conceptual model group", "EMAX group(s)", "Notes")


emaxgroups <- readRDS(here::here("data","Emax.RDS")) 
knitr::kable(emaxgroups, caption="Relationship between food web model groups and conceptual model focal groups. Pinnipeds not included in GB and Seabirds not included in MAB.", booktabs = T) %>% 
  kableExtra::kable_styling(font_size = 8) %>% 
  landscape()
```


Ecological submodels were constructed and visualized in Mental Modeler (Fig. \@ref(fig:draftGOMeco)). Here, we show only the Gulf of Maine submodels as examples.

```{r draftGOMeco, fig.cap="Gulf of Maine Ecological submodel", echo = F, eval = T, out.width='80%'}

knitr::include_graphics(file.path(image.dir, 'MM_GoM_Ecological.png'))
```

#### Environmental submodels
Environmental submodels were designed to link key oceanographic processes in each ecosystem production unit to the four general habitat categories (Pelagic, Seafloor and Demersal, Nearshore, and Freshwater and Estuarine) with emphasis on the most important physical processes in each ecosystem based on expert knowledge as supported by literature review. The basis of each submodel were environmental variables observable at management-relevant scales as identified by [WGNARS](http://ices.dk/sites/pub/Publication%20Reports/Expert%20Group%20Report/SSGRSP/2014/WGNARS14.pdf): Surface and Bottom Water Temperature and Salinity, Freshwater Input, and Stratification (as well as sea ice timing and cover, which is not relevant to the northeast US shelf). Key drivers changing these observable variables and thus structuring habitat dynamics in each [Ecological Production Units](#epu) were added to the model using expert consensus. 

Environmental submodels were initially constructed and visualized in Mental Modeler (Fig. \@ref(fig:draftGOMenv)).
```{r draftGOMenv, fig.cap="Gulf of Maine Environmental submodel", echo = F, eval = T, out.width='80%'}

knitr::include_graphics(file.path(image.dir, 'MM_GoM_Climate.png'))
```

#### Human dimensions submodels
The top 3 species from each mode of recreational fishing (shoreside, private boat, party/charter) were used to assess the potential for missing links between the recreational fishing activity and biological focal components. Given the predominance of Mid-Atlantic groundfish in recreational fishing off New England (summer flounder, bluefish, striped bass), a Mid-Atlantic groundfish focal component was added to the Georges Bank EPU model. The magnitude of benefits generated from recreational fishing was scaled to reflect expert knowledge of target species, coupled with the MRIP data highlighted above. Scales were held consistent across the focal components within recreational fishing.

No additional biological focal components were added to the commercial fishing activity, beyond what was developed in the ecological submodel. Benefits derived from commercial fishing were scaled to be consistent with the State of the Ecosystem revenue estimates, as modulated by expert knowledge and additional data sources. For example,the percentage of landings sold as food was used to map fishing activity to the commercial fishery food production objective, and the Northeast Regional Input/Output Model [@steinback_scott_northeast_2006] was used to define the strength of the employment linkages. For profitability, expert knowledge was used to reweight revenue landings, based on ancillary cost data available [@das_chhandita_northeast_2013; @das_chhandita_overview_2014]. Human activities and objectives for the conceptual sub model are defined in @depiper_operationalizing_2017. As shown in Figure \@ref(fig:draftGOMhuman), human dimensions submodels were also initially constructed and visualized in Mental Modeler.

```{r draftGOMhuman, fig.cap="Gulf of Maine Human dimensions submodel", echo = F, eval = T, out.width='80%'}

knitr::include_graphics(file.path(image.dir, 'MM_GoM_Human_Connections.png'))
```

#### Merged models
All links and groups from each submodel were preserved in the full merged model for each system. Mental modeler was used to merge the submodels. Full models were then re-drawn in Dia (http://dia-installer.de/) with color codes for each model component type for improved readability. Examples for each system are below. 

```{r diaGB, fig.cap="Georges Bank conceptual model", echo = F, eval = T, out.width='80%'}

knitr::include_graphics(file.path(image.dir, 'GBoverview5.png'))
```


```{r diaGOM, fig.cap="Gulf of Maine conceptual model", echo = F, eval = T, out.width='80%'}

knitr::include_graphics(file.path(image.dir, 'GoMoverview4.png'))
```


```{r diaMAB, fig.cap="Mid-Atlantic Bight conceptual model", echo = F, eval = T, out.width='80%'}

knitr::include_graphics(file.path(image.dir, 'MAB_3.png'))
```

#### Communication tools
The merged models were redrawn for use in communications with the public. These versions lead off the State of the Ecosystem reports for both Fishery Management Councils to provide an overview of linkages between environmental drivers, ecological, and human systems. 

```{r prettyNE, fig.cap="New England conceptual model for public communication", echo = F, eval = T, out.width='80%'}

knitr::include_graphics(file.path(image.dir, 'GOM_GB_conmod_overview.jpg'))
```

```{r prettyMA, fig.cap="Mid-Atlantic conceptual model for public communication", echo = F, eval = T, out.width='80%'}

knitr::include_graphics(file.path(image.dir, 'MAB_conmod_overview.jpg'))
```

<!--
What packages or libraries did you use in your work flow?
```{r, echo = T}
sessionInfo(package = NULL)


#Use this to output a detailed list of the package information
current.session <- sessionInfo(package = NULL)
current.session$otherPkgs
```


Include accompanying R code, pseudocode, flow of scripts, and/or link to location of code used in analyses.
```{r, echo = T, eval = F}
# analysis code
```
-->

<!--chapter:end:chapters/conceptualmodels.Rmd-->

# Ecological Production Units {#epu}

**Description**: Ecological Production Units

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2018+), State of the Ecosystem - Mid-Atlantic (2018+) 

**Indicator category**: Extensive analysis, not yet published

<!-- 1. Database pull -->
<!-- 2. Database pull with analysis -->
<!-- 3. Synthesis of published information -->
<!-- 4. Extensive analysis, not yet published -->
<!-- 5. Published methods -->
**Contributor(s)**: Robert Gamble

**Data steward**: NA

**Point of contact**: Robert Gamble, <robert.gamble@noaa.gov>

**Public availability statement**: Ecological production unit (EPU) shapefiles are available [here](https://github.com/NOAA-EDAB/tech-doc/tree/master/gis). More information about source data used to derive EPUs can be found [here](https://www.integratedecosystemassessment.noaa.gov/sites/default/files/pdf/ne-ecological-production-units-paper.pdf).


## Methods
To define ecological production units (EPUs), we assembled a set of physiographic, oceanographic and biotic variables on the Northeast U.S. Continental Shelf, an area of approximately 264,000 km within the 200 m isobath. The physiographic and hydrographic variables selected have been extensively used in previous analyses of oceanic provinces and regions [e.g @Roff2000]. Primary production estimates have also been widely employed for this purpose in conjunction with physical variables [@Longhurst2007] to define ecological provinces throughout the world ocean. 

We did not include information on zooplankton, benthic invertebrates, fish, protected species, or fishing patterns in our analysis. The biomass and production of the higher trophic level groups in this region has been sharply perturbed by fishing and other anthropogenic influences. Similarly, fishing patterns are affected by regulatory change, market and economic factors and other external influences. 

Because these malleable patterns of change are often unconnected with underlying productivity, we excluded factors directly related to fishing practices. The physiographic variables considered in this analysis are listed in Table \@ref(tab:epuinputs). They include bathymetry and surficial sediments. The physical oceanographic and hydrographic measurements include sea surface temperature, annual temperature span, and temperature gradient water derived from satellite observations for the period 1998 to 2007. 

### Data sources
Shipboard observations for surface and bottom water temperature and salinity in surveys conducted in spring and fall. Daily sea surface temperature (SST, &deg;C) measurements at 4 km resolution were derived from nighttime scenes composited from the AVHRR sensor on NOAA's polar-orbiting satellites and from NASA's MODIS TERRA and MODIS AQUA sensors. We extracted information for the annual mean SST, temperature span, and temperature gradients from these sources. The latter metric provides information on frontal zone locations. 


```{r epuinputs,  echo = F, include = T, warning = F, message = F, results='asis'}
#Table: (\#label) Variables used in derivation of Ecological Production Units
tab <- '
|Variables|Sampling Method|Units|
|:-----------------------|:-----------------------|:-----------------------|
|Bathymetry|Soundings/Hydroacoustics|Meters|
|Surficial Sediments|Benthic Grab|Krumbian Scale|
|Sea Surface Temperature|Satellite Imagery (4km grid)|&deg;C annual average|
|Sea Surface Temperature|Satellite Imagery (4km grid)|dimensionless|
|Sea Surface Temperature|Satellite Imagery (4km grid)|&deg;C annual average|
|Surface Temperature|Shipboard hydrography (point)|&deg;C (Spring and Fall)|
|Bottom Temperature|Shipboard hydrography (point)|&deg;C (Spring and Fall)|
|Surface Salinity|Shipboard hydrography (point)|psu (Spring and Fall)|
|Bottom Salinity|Shipboard hydrography (point)|psu (Spring and Fall)|
|Stratification|Shipboard hydrography (point)|Sigma-t units (Spring and Fall)|
|Chlorophyll-a|Satellite Imagery (1.25 km grid)|mg/C/m^3^ (annual average)|
|Chlorophyll-a gradient|Satellite Imagery (1.25 km grid)|dimensionless|
|Chlorophyll-a span|Satellite Imagery (1.25 km grid)|mg/C/m^3^ (annual average)|
|Primary Production|Satellite Imagery (1.25 km grid)|gC/m^3^/year (cumulative)|
|Primary Production gradient|Satellite Imagery (1.25 km grid)|dimensionless|
|Primary Production span|Satellite Imagery (1.25 km grid)|gC/m^3^/year (cumulative)|
'
#cat(tab)
df<-readr::read_delim(tab, delim="|")
df<-df[-c(1,2) ,c("Variables","Sampling Method","Units")]
knitr::kable(
  df, booktabs = TRUE,
  caption = 'Variables used in derivation of Ecological Production Units.'
)
```


The biotic measurements included satellite-derived estimates of chlorophyll *a* (CHLa) mean concentration, annual span, and CHLa gradients and related measures of primary production. Daily merged SeaWiFS/MODIS-Aqua CHLa (CHL, mg m^-3^) and SeaiWiFS photosynthetically available radiation (PAR, Einsteins m^-2^ d^-1^) scenes at 1.25 km resolution were obtained from NASA Ocean Biology Processing Group. 

### Data extraction
NA

### Data analysis
In all cases, we standardized the data to common spatial units by taking annual means of each observation type within spatial units of 10' latitude by 10' longitude to account for the disparate spatial and temporal scales at which these observations are taken. There are over 1000 spatial cells in this analysis. Shipboard sampling used to obtain direct hydrographic measurements is constrained by a minimum sampling depth of 27 m specified on the basis of prescribed safe operating procedures. As a result nearshore waters are not fully represented in our initial specifications of ecological production units. 

The size of the spatial units employed further reflects a compromise between retaining spatial detail and minimizing the need for spatial interpolation of some data sets. For shipboard data sets characterized by relatively coarse spatial resolution, where necessary, we first constructed an interpolated map using an inverse distance weighting function before including it in the analysis. Although alternative interpolation schemes based on geostatistical approaches are possible, we considered the inverse distance weighting function to be both tractable and robust for this application. 

We first employed a spatial principal components analysis [PCA; e.g. @Pielou1984; @Legendre1998] to examine the multivariate structure of the data and to account for any inter-correlations among the variables to be used in subsequent analysis. The variables included in the analysis exhibited generally skewed distributions and we therefore transformed each to natural logarithms prior to analysis. 

The PCA was performed on the correlation matrix of the transformed observations. We selected the eigenvectors associated with eigenvalues of the dispersion matrix with scores greater than 1.0 [the Kaiser-Guttman criterion; @Legendre1998] for all subsequent analysis. These eigenvectors represent orthogonal linear combinations of the original variables used in the analysis. 

We delineated ecological subunits by applying a disjoint cluster based on Euclidean distances using the K-means procedure [@Legendre1998] on the principal component scores The use of non-independent variables can strongly influence the results of classification analyses of this type [@Pielou1984], hence the interest in using the PCA results in the cluster. 

The eigenvectors were represented as standard normal deviates. We used a Pseudo-F Statistic described by @Milligan1985 to objectively define the number of clusters to use in the analysis. The general approach employed is similar to that of @Host1996 for the development of regional ecosystem classifications for terrestrial systems.

After the analyses were done, we next considered options for interpolation of nearshore boundaries resulting from depth-related constraints on shipboard observations. For this, we relied on information from satellite imagery. For the missing nearshore areas in the Gulf of Maine and Mid-Atlantic Bight, the satellite information for chlorophyll concentration and sea surface temperature indicated a direct extension from adjacent observations. For the Nantucket Shoals region south of Cape Cod, similarities in tidal mixing patterns reflected in chlorophyll and temperature observations indicated an affinity with Georges Bank and the boundaries were changed accordingly.

Finally, we next considered consolidation of ecological subareas so that nearshore regions are considered to be special zones nested within the adjacent shelf regions. Similar consideration led to nesting the continental slope regions within adjacent shelf regions in the Mid-Atlantic and Georges Bank regions. This led to four major units: Mid-Atlantic Bight, Georges Bank, Western-Central Gulf of Maine (simply "Gulf of Maine" in the State of the Ecosystem), and Scotian Shelf-Eastern Gulf of Maine. As the State of the Ecosystem reports are specific to FMC managed regions, the Scotian Shelf-Eastern Gulf of Maine EPU is not considered in SOE indicator analyses. 


```{r EPUmap, fig.cap="Map of the four Ecological Production Units, including the Mid-Atlantic Bight (light blue), Georges Bank (red), Western-Central Gulf of Maine (or Gulf of Maine; green), and Scotian Shelf-Eastern Gulf of Maine (dark blue)", fig.align='center', echo = F}

knitr::include_graphics(file.path(image.dir,"EPUs.jpg"))

```

### Data processing

Shapefiles were converted to `sf` objects for inclusion in the `ecodata` R package using the R code found [here](https://raw.githubusercontent.com/NOAA-EDAB/ecodata/master/data-raw/get_epu_sf.R).


<!--chapter:end:chapters/EPU.Rmd-->

# Ecosystem Overfishing {#ppr}

**Description**: Ecosystem Overfishing Indices (Primary Production Required, Fogarty, Ryther)

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2021+), State of the Ecosystem - Mid-Atlantic (2021+)

**Indicator category**: Database pull with analysis; Published methods

**Contributor(s)**: Michael Fogarty, Andrew Beet

**Data steward**: Andrew Beet, [andrew.beet\@noaa.gov](mailto:andrew.beet@noaa.gov){.email}

**Point of contact**: Andrew Beet, [andrew.beet\@noaa.gov](mailto:andrew.beet@noaa.gov){.email}

**Public availability statement**: Source data is not publicly availabe due to PII restrictions.

```{r pprchunk1, echo = F, eval = T}
imagePath <- here::here("images")
```

## Methods

We use the definition of ecosystem overfishing from [@link2019eof]: 

1. The sum of catches is flat or declining
1. Total catch per unit effort is declining
1. Total landings relative to ecosystem  production exceeds suitable limits

All of the indices are based on the principle of energy transfer up the foodweb from primary producers.

### Fogarty & Ryther Indices

The Fogarty index is defined as ratio of total catches to total primary productivity in an ecosystem [@link2019eof]. The units are parts per thousand.

The Ryther index is defined as total catch per unit area in the ecosystem [@link2019eof]. The units are mt km^-2 year^-1

A modification of the indices is used. Total landings are used in lieu of total catch. This will have the effect of reducing the value of the index (compared to using total catch).

### Primary Production Required (PPR)

The index is a measure of the impact of fishing on the base of the foodweb. The amount of potential yield we can expect from a marine ecosystem depends on the amount of production entering at the base of the food web, primarily in the form of phytoplankton; the pathways this energy follows to reach harvested species; the efficiency of transfer of energy at each step in the food web; and the fraction of this production that is removed by the fisheries. Species such as scallops and clams primarily feed directly on larger phytoplankton species and therefore require only one step in the transfer of energy. The loss of energy at each step can exceed 80-90%. For many fish species, as many as 2-4 steps may be necessary. Given the trophic level and the efficiency of energy transfer of the species in the ecosystem the amount phytoplankton production required (PPR) to account for the observed catch can be estimated.

The index for Primary Production Required (PPR) was adapted from [@pauly1995ppr].

$$PPR_t = \sum_{i=1}^{n_t}  \left(\frac{landings_{t,i}}{9}\right) \left(\frac{1}{TE}\right)^{TL_i-1}$$

where $n_t$ = number of species in time $t$, $landings_{t,i}$ = landings of species $i$ in time $t$, $TL_i$ is the trophic level of species $i$, $TE$ = Trophic efficiency. The PPR estimate assumes a 9:1 ratio for the conversion of wet weight to carbon and a 15% transfer efficiency per trophic level, ($TE$ = 0.15)

The index is presented as a percentage of [estimated primary production](https://noaa-edab.github.io/tech-doc/chl-pp.html) (PP) available over the geographic region of interest, termed an [Ecological Production Unit](https://noaa-edab.github.io/tech-doc/comdat.html) (EPU). The scaled index is estimated by dividing the PPR index in year $t$ by the estimated primary production in time $t$.

$$scaledPPR_t = \frac{PPR_t}{PP_t}$$

The species selected in each year were determined by their cumulative contribution to total landings. A threshold of at least 80% of the total landings is used.

#### Data sources

Data for this index come from a variety of sources. The landings data come from the Commercial Fishery Database (CFDBS), species trophic level information come from [fishbase](http://fishbase.de) and [sealifebase](http://sealifebase.ca), and primary production estimates are derived from [satellites](https://noaa-edab.github.io/tech-doc/chl-pp.html). Some of these data are typically not available to the public.

#### Data extraction

Landings are extracted from the commercial fisheries database (CFDBS) using the methods described in the chapter [Commercial Landings Data.](https://noaa-edab.github.io/tech-doc/comdat.html)

Trophic level information for each species is obtained from [fishbase](http://fishbase.de) and [sealifebase](http://sealifebase.ca) using the R package [rfishbase](https://github.com/ropensci/rfishbase) [@froese2019fishbase] in tandem with the package [eofindices.](https://github.com/NOAA-EDAB/eofindices/)

Primary Production is estimated using the methods described in the chapter [Chlorophyll a and Primary Production.](https://noaa-edab.github.io/tech-doc/chl-pp.html)

#### Data analysis

##### Primary Production Required

Annual (wet weight) landings are calculated for each species for a given EPU. For each year the landings are sorted in descending order by species and the cumulative landings are calculated. The species that accounted for the top 80% of total cumulative landings are selected. The trophic level for each of these species are then obtained from fishbase/sealifebase. At this point the PPR index is calculated. The units of the index are $gCyear^{-1}$ for the EPU. The index is converted to $gCm^{-2}year^{-1}$ by dividing by the area (in $m^2$) of the EPU.

To normalize the index the total Primary Production for the given EPU is required. This is calculated as described in the chapter [Chlorophyll a and Primary Production](https://noaa-edab.github.io/tech-doc/chl-pp.html). The units are also converted to $gCm^{-2}year^{-1}$.

The index is then normalized by dividing the index in year t by the total primary production in time $t$.

##### Fogarty and Ryther Indices

Total annual (wet weight) landings are calculated for a given EPU (summed over all species). The units for both primary production and landings are in $mt km^{-2} year^1$. A factor of (1/9) is used to convert landings to weight in carbon. The area in $km^2$ of each EPU is obtained from the shapefile used to define the area.

#### Plotting

Four plots are produced for each EPU:

-   The normalized PPR index (along with the associated landings).
-   Total primary production
-   Mean trophic level of the species included in the index (weighted by their landings)
-   Species composition of landings
-   The Fogarty index (with reference levels)
-   The Ryther Index (with reference levels)

All created using the [eofindices](https://github.com/NOAA-EDAB/eofindices)

See the [workedExample vignette](https://NOAA-EDAB.github.io/eofindices/articles/workedExample2021.html) in the [eofindices](https://github.com/NOAA-EDAB/eofindices/) package for plotting code.

Figures for Mid-Atlantic Bight are presented in this document. For Georges Bank and the Gulf of Maine, please visit [here](https://noaa-edab.github.io/eofindices/articles/currentIndices.html)

#### Mid-Atlantic Bight (MAB)


```{r, pprmab,  echo=FALSE, out.width='50%', fig.cap="Ecosystem overfishing indicators."}
knitr::include_graphics(paste0(imagePath,"/PPR-MAB-0_80.png"))
knitr::include_graphics(paste0(imagePath,"/MTL-MAB-0_80.png"))
knitr::include_graphics(paste0(imagePath,"/PP-MAB.png"))
knitr::include_graphics(paste0(imagePath,"/composition-MAB-0_80.png"))
knitr::include_graphics(paste0(imagePath,"/ryther-index-MAB.png"))
knitr::include_graphics(paste0(imagePath,"/fogarty-index-Constant-MAB.png"))
```


<!--chapter:end:chapters/ecosystem_overfishing.Rmd-->

# Expected Number of Species {#exp_n}

**Description**: Time Series of Expected Number of Species per Tow in NEFSC BTS

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2021+), State of the Ecosystem - Mid-Atlantic (2021+)

**Indicator category**: Database pull with analysis; Published methods

**Contributor(s)**: Sean Lucey
  
**Data steward**: Sean Lucey, <sean.lucey@noaa.gov>
  
**Point of contact**: Sean Lucey, <sean.lucey@noaa.gov>
  
**Public availability statement**: 


## Methods
Diversity estimates have been developed to understand whether the overall structure of the ecosystem has remained stable or is changing.  There are a large number of diversity indices that can be used to measure diversity; for the purposes of the State of the Ecosystem report we report on the expected number of species in a sample size ($E(S_n)$).  This index was originally developed by @sanders1968 and later refined by @hurlbert1971 using a hypergeometric probability distribution.  These “rarefied” samples allow for comparisons between sample sites with varying number of species present.  The estimate of $E(S_n)$ is less biased than other diversity indices which usually increase with sample size.  It also has a more meaningful biological interpretation than other indices.  For example, if a predator eats 10 random individuals, $E(S_n)$ will predict the number of species consumed.


### Data sources
Data used for the calculation of the expected number of species come from the Northeast Fisheries Science Center's survey database (SVDBS) as pulled in the Survdat data set.  These data are available to qualified researchers upon request. More information on the data request process is available under the "Access Information" field [here](https://inport.nmfs.noaa.gov/inport/item/22560).

### Data analysis
The expected number of species ($E(S_n)$) was calculated for each survey tow as:

\begin{equation}
E(S_n) = \sum_{i=1}^S{ \Bigg( 1 - \frac{\binom{N-N_i}{n}}{\binom{N}{n}} \Bigg) }  
\end{equation}

where $S$ is the total number of species present, $N$ the total number of individuals, and $N_i$ the number of individuals of *ith* species.  The result represents a sample of *n* individuals randomly selected from the tow without replacement.  The calculation is made using the `rarefy` function of the `vegan` package [@R-vegan] using an *n* of 1000.

The number of species represented in these samples of 1000 fishes are then averaged over the survey for each [Ecological Production Unit](https://noaa-edab.github.io/tech-doc/epu.html).  Due to the lack of survey calibration factor to account for differences in the number of species caught between the NOAA Ship Albatross IV and NOAA Ship Henry B. Bigelow, the time series are kept separate.  


### Data processing
Data were formatted for inclusion in the `ecodata` R package using the R code found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_exp_n.R).

<!--chapter:end:chapters/Expected_Number.Rmd-->

# Fish Condition Indicator {#condition}

**Description**: Relative condition

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2018+), State of the Ecosystem - Mid-Atlantic (2018+) 

**Indicator category**: Database pull with analysis

**Contributor(s)**: Laurel Smith
  
**Data steward**: Laurel Smith, <laurel.smith@noaa.gov>
  
**Point of contact**: Laurel Smith, <laurel.smith@noaa.gov>
  
**Public availability statement**: NEFSC survey data used in these analyses are available upon request (see [BTS metadata](https://inport.nmfs.noaa.gov/inport/item/22560) for access procedures). Derived condition data are available [here](https://github.com/Laurels1/Condition/tree/master/data).


## Methods
Relative condition (Kn) was introduced by @Cren1951a as a way to remove the influence of length on condition, and @Blackwell2000 noted that Kn may be useful in detecting prolonged physical stress on a fish populations. Relative condition is calculated as
$$Kn = W/W',$$ where $W$ is the weight of an individual fish and $W'$ is the predicted length-specific mean weight for the fish population in a given region. Here, relative condition was calculated for finfish stocks commonly caught on the Northeast Fisheries Science Center’s (NEFSC) autumn bottom trawl survey, from 1992-present. 


For this work, length-weight coefficients from @Wigley2003 were used to calculate W’. Individual fish weights were total body weights from Northeast Fisheries Science Center (NEFSC) fall bottom trawl surveys. Most finfish species included in this study are spring or summer spawners, so the fall survey was chosen to reduce variability of gonad weights in the spring survey as butterfish ramp up for spawning. Kn was averaged on a NEFSC bottom trawl strata resolution. 

The `Condition` package used for calculations and plotting of fish condition factor can be found on [GitHub](https://github.com/Laurels1/Condition).

### Data sources
Individual fish lengths (to the nearest 0.5 cm) and weights (grams) were collected on the NEFSC bottom trawl surveys from 1992-present aboard RVs Albatross IV, Delaware II and the Henry B. Bigelow  (see [Survdat](#survdat)). A small number of outlier values were removed when calculating the length-weight parameters.

### Data extraction
Data were extracted from NEFSC's survey database (SVDBS) using the R script found [here](https://github.com/Laurels1/Condition/blob/master/R/pull_from_svdbs.R) 

### Data analysis

Relative condition is calculated by fish species and EPU as ($Kn$ formula found above) where $W$ is the weight of an individual fish and $W'$ is the predicted length-specific mean weight for the fish population in a given region. Predicted weight was calculated as: 

$$\textrm{Weight} = e^{Fall_{coef}} * \textrm{Length}^{Fall_{exp}},$$

where $Fall coef$ and $Fall exp$ are from @Wigley2003.
 

The code found [here](https://github.com/Laurels1/Condition/blob/master/R/RelConditionEPU.R) was used in the analysis of fish condition.

<!--chapter:end:chapters/Condition_indicator.Rmd-->

# Fish Productivity Indicator {#productivity_anomaly}
 
**Description**: Groundfish productivity estimated as the ratio of small fish to large fish

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2017, 2018, 2020), State of the Ecosystem - Mid-Atlantic (2017, 2018, 2019, 2020)

**Indicator category**: Database pull with analysis; Published methods

**Contributor(s)**: 
  
**Data steward**: Kimberly Bastille, <kimberly.bastille@noaa.gov>
  
**Point of contact**: Kimberly Bastille, <kimberly.bastille@noaa.gov>
  
**Public availability statement**: Source data are available upon request.


## Methods


### Data sources
Survey data from the Northeast Fisheries Science Center (NEFSC) trawl database. These data in their derived form are available through [Survdat](#survdat).


### Data extraction 
Data were extracted from [Survdat](#survdat).


### Data analysis
We defined size thresholds separating small and large fish for each species based on the 20th percentile of the length distribution across all years. This threshold was then used to calculate a small and large fish index (numbers below and above the threshold, respectively) each year. Although the length percentile corresponding to age-1 fish will vary with species, we use the 20th percentile as an approximation. Biomass was calculated using length–weight relationships directly from the survey data. Following @wigley_length-weight_2003, the length-weight relationship was modeled as 
$$\ln W = \ln a + b \ln L$$
where $W$ is weight (kg), $L$ is length (cm), and $a$ and $b$ are parameters fit via linear regression. The ratio of small fish numbers of the following year to larger fish biomass in the current year was used as the index of recruitment success. The fall and spring recruitment success anomalies were averaged to provide an annual index of recruitment success.

Further details of methods described in @perretti_regime_2017.

### Data processing

Productivity data were formatted for inclusion in the `ecodata` R package using the R code found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_productivity_anomaly.R).

<!--chapter:end:chapters/productivity_for_tech_memo.Rmd-->

# Fisheries Revenue in Wind Development Areas {#wind_revenue}

**Description**: Top Species Revenue from Wind Development Areas

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank 2022+ (Different Methods 2021), State of the Ecosystem - Mid-Atlantic 2022+ (Diferent Methods 2021)

**Indicator category**: Database pull with analysis

**Contributor(s)**: Benjamin Galuardi, Douglas Christel
  
**Data steward**: Doug Christel <douglas.christel@noaa.gov>
  
**Point of contact**: Doug Christel <douglas.christel@noaa.gov>
  
**Public availability statement**: Source data are NOT publicly available. Please email douglas.christel@noaa.gov for further information and queries of indicator source data.

## Methods

### Data Sources

Modeled vessel trip report (VTR) data using the fishing footprint method (DePiper 2014 and Benjamin et al. 2017) linked with dealer reports for annual landings and revenue within wind lease areas and dealer report data for annual GARFO landings/revenue.


### Data Analysis

Using raster data of modeled VTR data using the Fishing Footprint method, we integrated dealer data and compared to existing/proposed wind lease areas to get landings/revenue in each area by year.  

### Data Processing 

Data were formatted for inclusion in the `ecodata` R package using the R code found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_wind_revenue.R).

## Methods 2021
### Data Sources

This indicator is derived from the data underpinning the "Sociceoeconomic Impacts of Atlantic Offshore Wind Development" web site, which can be accessed at https://www.fisheries.noaa.gov/resource/data/socioeconomic-impacts-atlantic-offshore-wind-development.

The underlying raster data is defined in Benjamin S, Lee MY, DePiper G. 2018. Visualizing fishing data as rasters. NEFSC Ref Doc 18-12; 24 p.

This raster data was then linked to the Greater Atlantic Regional Office's Data Matching Imputation System (https://www.fisheries.noaa.gov/inport/item/17328) to derive revenue estimates from the Wind Energy Areas, defined as of December 11, 2020. Of note is that the version of DMIS utilized for this reporting includes the SFCLAM data missing from the traditional DMIS dataset. All revenue estimates are deflated to 2019 dollars using the St. Louis Federal Reserve's Quarterly Implicit GDP Deflator, which can be accessed at https://fred.stlouisfed.org/data/GDPDEF.txt

### Data Analysis
Code used to analyze this data can be [found here](https://github.com/NOAA-EDAB/tech-doc/blob/master/R/stored_scripts/WindRevenue_Code_for_Dissemination.R)


<!--chapter:end:chapters/Wind_area_fisheries_revenue.Rmd-->

# Fishery Reliance and Social Vulnerability {#engagement}

**Description**: Fishing community commercial and recreational fishing reliance and social vulnerability

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2018+), State of the Ecosystem - Mid-Atlantic (2018+) 

**Indicator category**: Database pull with analysis

**Contributor(s)**: Lisa L. Colburn, Changua Weng
  
**Data steward**: Changua Weng <changhua.weng@noaa.gov>

**Point of contact**: Lisa L. Colburn <lisa.colburn@noaa.gov>
  
**Public availability statement**: The source data used to construct the commercial fishing engagement and reliance indices include confidential information and are not available publicly. However, the commercial fishing engagement and reliance indices are not confidential so are available to the public. All calculated indices can be found [here](https://www.fisheries.noaa.gov/national/socioeconomics/social-indicators-coastal-communities).
  

## Methods


### Data sources
NOAA Fisheries' Community Social Vulnerability Indicators (CSVIs) were developed using secondary data including social, demographic and fisheries variables. The social and demographic data were downloaded from the 2018 American Community Survey (ACS) 5-yr estimates Dataset at the [U.S. Census American Community Survey (ACS)](https://www.census.gov/programs-surveys/acs/) for coastal communities at the Census Designated Place (CDP) level, and in some cases the County Subdivision (MCD) level. Commercial fisheries data were pulled from the SOLE server located at Northeast Fisheries Science Center in Woods Hole, MA. The recreational fishing information is publicly accessible through the [Marine Recreational Information Program (MRIP)](https://www.st.nmfs.noaa.gov/recreational-fisheries/MRIP/), and for this analysis was custom requested from NOAA Fisheries headquarters.


### Data extraction 
Commercial fisheries data was pulled from the NEFSC SOLE server in Woods Hole, MA.

SQL and SAS code for data extraction and processing steps can be found [here](https://github.com/NOAA-EDAB/tech-doc/tree/master/R/stored_scripts/comm_rel_vuln_extraction.sql). 


### Data analysis
The indicators were developed using the methodology described in @Jacob2010, @Jacob2013, @colburn_social_2012 and @Jepson2013. Indicators were constructed through principal component analysis with a single factor solution, and the following criteria had to have been met: a minimum variance explained of 45%; Kasier-Meyer Olkin measure of sampling adequacy above.500; factor loadings above.350; Bartlett's test of sphericity significance above .05; and an Armor's Theta reliability coefficient above .500. Factor scores for each community were ranked based on standard deviations into the following categories: High(>=1.00SD), MedHigh .500-.999 SD), Moderate (.000-.499 SD) and Low (<.000 SD).

### Data processing

Data were formatted for inclusion in the ecodata R package using the R script found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_engagement.R).

<!--chapter:end:chapters/Comm_rel_vuln_indicator.Rmd-->

# Forage Fish Biomass Indices {#forage_index} 


**Description**: Forage Fish Biomass Indices 

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2023), State of the Ecosystem - Mid-Atlantic (2023)

**Indicator category**: Extensive analysis, not yet published

**Contributor(s)**: Sarah Gaichas, James Gartland, Brian Smith, Elizabeth Ng, Michael Celestino, Anthony Wood, Katie Drew, Abigail Tyrell, and James Thorson
  
**Data steward**: Sarah Gaichas, <Sarah.Gaichas@noaa.gov>

**Point of contact**: Sarah Gaichas, <Sarah.Gaichas@noaa.gov>

**Public availability statement**: Source data are publicly available. All data and code available on GitHub at [sgaichas/bluefishdiet](https://github.com/sgaichas/bluefishdiet). 
  

## Methods

Forage fish indices were developed in support of the Bluefish Research Track stock assessment working group. Key methods are briefly reported here. Detailed methods. results, and model diagnostics are available in the [full working paper posted here](https://sgaichas.github.io/bluefishdiet/VASTcovariates_forageindex_WP.html). 

Small pelagic forage species are difficult to survey directly, so we developed a novel method of assessing small pelagic fish aggregate abundance using predator diet data. We used piscivore diet data collected from multiple bottom trawl surveys within a Vector Autoregressive Spatio-Temporal (VAST, @thorson2017; @thorson2019) model to assess trends of small pelagic forage species on the Northeast US shelf. This approach uses survey-sampled predator stomach contents as observations to develop a survey index for forage fish, following @ng2021, which used predator stomach data to create a biomass index for a single prey, Atlantic herring. 

We adapted the approach of @ng2021 to get an index for "bluefish prey" in aggregate rather than a single prey species. We include inshore and offshore regions by combining results across two regional bottom trawl surveys surveys, the Northeast Fisheries Science Center (NEFSC) survey and the Northeast Area Monitoring and Assessment Program (NEAMAP) survey, as was done for summer flounder biomass in @perretti2019. Finally, since bluefish themselves are somewhat sparsely sampled by the surveys, we aggregate all predators that have a similar diet composition to bluefish to better represent bluefish prey biomass. 

### Data sources  
Data used to develop this indicator comes from multispecies diet data collected on the Northeast Fisheries Science Center (NEFSC) and NorthEast Area Monitoring and Assessment Program (NEAMAP) bottom trawl surveys. Sea Surface Temperature (SST) data were used from in situ NEFSC and NEAMAP survey in-situ collections, as well as NOAA High Resolution SST data (Optimal Interpolation Sea Surface Temperature- OISST, @Reynolds2007), provided by the NOAA/OAR/ESRL PSL, Boulder, Colorado, USA, from their Web site at https://psl.noaa.gov/data/gridded/data.noaa.oisst.v2.highres.html. This is the same source data used in [seasonal SST anomaly analyses](https://noaa-edab.github.io/tech-doc/seasonal-sst-anomalies.html). 

### Data extraction  
NEFSC survey diet data were extracted and provided by Brian Smith (NEFSC). NEAMAP survey diet data were extracted and processed by James Gartland (VIMS). Code to extract the OISST information was modified from [code](https://github.com/kimberly-bastille/ecopull/blob/main/.github/workflows/pull_satellite_data.yml) kindly provided by Kim Bastille  pulling daily gridded SST for each year 1985-2021 using her code starting line 260, as well as Kim’s nc_to_raster function for NEUS shelf from at [this link](https://github.com/kimberly-bastille/ecopull/blob/main/R/utils.R). The full OISST extraction script is available at [this link](https://github.com/sgaichas/bluefishdiet/blob/main/pull_OISST.R) with visualizations of survey in-situ temperatures compared with OISST at [this link](https://sgaichas.github.io/bluefishdiet/SSTmethods.html) 

### Data analysis  
The steps involved to estimate the forage index included defining the input dataset, and running multiple configurations of the VAST model. Steps involved in defining the dataset included defining "bluefish prey", defining a set of piscivore predators with similar diets to bluefish, integrating diet data from two regional surveys, and integrating supplementary SST data to fill gaps in in-situ temperature data measurements. Steps involved in running the VAST model included decisions on spatial footprint, model structure, model selection to determine if spatial and spatio-temporal random effects were supported by the data, and further model selection to determine which catchability covariates were best supported by the data. Finally, subsets of the spatial domain were defined to match bluefish assessment inputs (survey and recreational fishery CPUE) for potential use as covariates in bluefish stock assessment models, and a bias-corrected [@thorson2016] forage index for each spatial subset was generated. 

#### Forage fish in bluefish diets

Using NEFSC bottom trawl survey diet data from 1973-2021, 20 small pelagic groups were identified as major bluefish prey with 10 or more observations (in descending order of observations): Longfin squids (*Doryteuthis* formerly *Loligo* sp.), Anchovy family (Engraulidae), bay anchovy (*Anchoa mitchilli*), Atlantic butterfish, (*Peprilus triachanthus*), Cephalopoda, (*Anchoa hepsetus*), red eye round herring (*Etrumeus teres*), Sandlance (*Ammodytes* sp.), scup (*Stenotomus chrysops*), silver hake (*Merluccius bilinearis*), shortfin squids (*Illex* sp.), Atlantic herring (*Clupea harengus*), Herring family (Clupeidae), Bluefish (*Pomatomus saltatrix*), silver anchovy (*Engraulis eurystole*), longfin inshore squid (*Doryteuthis pealeii*), Atlantic mackerel (*Scomber scombrus*), flatfish (Pleuronectiformes), weakfish (*Cynoscion regalis*), and Atlantic menhaden (*Brevoortia tyrannus*). 

Prey categories such as fish unidentified, Osteichthyes, and unidentified animal remains were not included in the prey list. Although unidentified fish and Osteichthyes can comprise a significant portion of bluefish stomach contents, we cannot assume that unidentified fish in other predator stomachs represent unidentified fish in bluefish stomachs.

#### Predators feeding similarly to bluefish 

All size classes of 50 fish predators captured in the NEFSC bottom trawl survey were grouped by diet similarity to identify the size classes of piscivore species with the most similar diet to bluefish in the region.  Diet similarity analysis was completed using the Schoener similarity index (@schoener1970; B. Smith, pers. comm.), and is available available via [this link on the NEFSC food habits shiny app](https://fwdp.shinyapps.io/tm2020/#4_DIET_OVERLAP_AND_TROPHIC_GUILDS). The working group evaluated several clustering methods to develop the predator list (see [this link with detailed cluster results](https://sgaichas.github.io/bluefishdiet/PreySimilarityUpdate.html)). 

Predators with highest diet similarity to Bluefish from the NEFSC diet database (1973-2020) include Atlantic cod, Atlantic halibut, buckler dory, cusk, fourspot flounder, goosefish, longfin squid, shortfin squid, pollock, red hake, sea raven, silver hake, spiny dogfish, spotted hake, striped bass, summer flounder, thorny skate, weakfish, and white hake. The NEAMAP survey operates closer to shore than the current NEFSC survey. The NEAMAP dataset includes predators sampled by the NEFSC survey and adds two species, Spanish mackerel and spotted sea trout, not captured by the NEFSC survey offshore but included based on working group expert judgement of prey similarity to bluefish. Predator size classes included are listed in Table 2 of the forage fish index working paper at [this link](https://sgaichas.github.io/bluefishdiet/VASTcovariates_forageindex_WP.html). 

#### VAST Input Dataset

Diets from all 22 piscivores (including bluefish) were combined for the 20 forage fish (bluefish prey) groups at each surveyed location, and the mean weight of forage fish per predator stomach at each location was calculated. Data for each station included station ID, year, season, date, latitude, longitude, vessel, mean bluefish prey weight (g), mean piscivore length (cm), number of piscivore species, and sea surface temperature (degrees C). Because approximately 10% of survey stations were missing in-situ sea water temperature measurements, National Oceanic and Atmospheric Administration Optimum Interpolation Sea Surface Temperature (NOAA OI SST) V2 High Resolution Dataset [@Reynolds2007] data provided by the NOAA PSL, Boulder, Colorado, USA, from their website at https://psl.noaa.gov were used to fill gaps. For survey stations with in-situ temperature measurements, the in-situ measurement was retained. For survey stations with missing temperature data, OI SST was substituted for input into VAST models.

The final dataset input to VAST is available at [this link](https://github.com/sgaichas/bluefishdiet/blob/main/fhdat/bluepyagg_stn_all_OISST.rds). 

#### VAST Modeling

Models were developed combining all data for the year ("Annual") and with separate data for "Spring" (collection months January - June) and "Fall" (collection months July-December) to align with assumptions used in the bluefish stock assessment. Modeled years included 1985-2021 to align with other data inputs in the bluefish stock assessment. 
VAST is structured to estimate fixed and random effects across two linear predictors, which are then multiplied to estimate an index of the quantity of interest. Following what @ng2021 did for herring, we apply a Poisson-link delta model to estimate expected prey mass per predator stomach. However, we use a higher resolution (500 knots, estimated by k-means clustering of the data), to define the spatial dimensions of each seasonal model. Two step model selection first compared whether the data supported estimation of spatial and spatio-temporal random effects, and then evaluated whether catchability covariates improved fits. Best fit models included spatial and spatio-temporal random effects, with predator mean length, number of predator species, and sea surface temperature as catchability covariates. See [this link for detailed results of model selection](https://sgaichas.github.io/bluefishdiet/VASTcovariates_forageindex_WP.html). 

Similar to findings of @ng2021, a vessel effect was not supported, but the predator length covariate may more directly model vessel differences in predator catch that affect stomach contents than modeling a vessel catchability covariate directly. @ng2021 found that predator length covariates were strongly supported as catchability covariates (larger predators being more likely to have more prey in stomachs). In our aggregate predator dataset, we also found the strong support for including the number of predators as a catchability covariate. The rationale for including number of predator species is that more species "sampling" the prey field at a particular station may result in a higher encounter rate (more stations with positive bluefish prey in stomachs). Water temperature was also supported as a catchability covariate, perhaps because temperature affects predator feeding rate.

Although the forage fish index is based on trawl-surveyed fish predators and the area swept of the net capturing predators is available, determining the actual area swept of the predators "sampling" the prey field is less clear. Therefore, we set area swept to 1 as recommended for "sampling gears" with unknown effective sampling areas, which means our forage abundance index does not have an interpretable scale, but should be proportional to actual forage biomass [@thorson2019]. 

#### Spatial Forage Indices

Spring, fall, and annual prey indices were developed for the full VAST extrapolation grid, as well as [survey strata-based EPU definitions](https://noaa-edab.github.io/tech-doc/survdat.html#fig:epustrata). NEFSC survey strata definitions are built into the VAST `northwest-atlantic` extrapolation grid. The forage index was calculated and bias corrected [@thorson2016] for all defined strata within VAST.

Full VAST model results for Fall, Spring, and Annual models, along with diagnostics, are available at [this link](https://sgaichas.github.io/bluefishdiet/VASTcovariates_forageindex_WP.html).

Code used to develop this indicator from the full set of VAST model index outputs is [available here](https://github.com/NOAA-EDAB/tech-doc/blob/master/R/stored_scripts/SOE-VASTForageIndices.R).

<!--chapter:end:chapters/Forage_Fish_Biomass_Index.Rmd-->

# Forage Fish Energy Density {#energy_density}

**Description**: Forage Engery Density indicators

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2020+), State of the Ecosystem - Mid-Atlantic (2020+)

**Indicator category**: Database pull with analysis

**Contributor(s)**: Mark Wuenschel, Ken Oliveira and Kelcie Bean
  
**Data steward**: Mark Wuenschel <mark.wuenschel@noaa.gov>
  
**Point of contact**: Mark Wuenschel <mark.wuenschel@noaa.gov>
  
**Public availability statement**: Source data are publicly available.

## Methods

The forage fish energy denisty indicator comes from a collaborative project between UMASS Dartmouth Biology Department (Dr. Ken Oliveira, M.S student Kelcie Bean) and NEFSC Population Biology Branch (Mark Wuenschel).  The study focuses on evaluating energy content of the species in Table \@ref(tab:foragefish).

```{r foragefish, eval = T, echo = F}

forage.tab <- data.frame('Common Name' = c('Atlantic Herring','alewife','silver hake',
                                           'butterfish','northern sandlance','Atlantic mackerel',
                                           'longfin squid','northern shortfin squid'),
                         'Scientific Name' = c('*Clupea harengus*', '*Alosa pseudoharengus*',
                                        '*Merluccius bilinearis*', '*Peprilus triacanthus*', 
                                        '*Ammodytes dubius*','*Scomber Scombrus*', 
                                        '*Loligo pealeii*', '*Illex illecebrosus*'))
names(forage.tab) <- c("Common Name","Scientific Name")


knitr::kable(forage.tab, caption = "List of forage fish study species.",  booktabs=T) #%>%
 # kableExtra::kable_styling(full_width = F)
```


### Data sources
NEFSC spring and fall bottom trawl surveys. 

### Data extraction
NA 

### Data analysis

Samples were analyzed for proximate composition and energy density from NEFSC spring and fall bottom trawl surveys. Predictive relationships between the percent dry weight of samples and energy density were developed, and samples collected from current surveys are currently being analyzed for percentage dry weight to enable estimation of energy content (@Bean2020). The energy density of forage species differed from prior studies in the 1980s and 1990s (@steimle1985, @lawson1998, Table \@ref(tab:foragefish)).

Sampling and laboratory analysis is ongoing, with the goal of continuing routine monitoring of energy density of these species. 

### Data processing

Code for building the table used in the SOE can be found 
[here](https://github.com/NOAA-EDAB/ecodata/blob/master/chunk-scripts/macrofauna.Rmd-forage.R).

<!--chapter:end:chapters/forage_energy_density.Rmd-->

# Gray Seal Pups {#seal_pups}

**Description**: Gray seal pup counts 
 

**Indicator category**: Extensive analysis

**Found in**: State of the Ecosystem - New England (2023)

**Contributor(s)**: Stephanie Wood
  
**Data steward**: Stephanie Wood <Stephanie.Wood@noaa.gov>
  
**Point of contact**: Stephanie Wood <Stephanie.Wood@noaa.gov>
  
**Public availability statement**: Reach out to Stephanie Wood <stephanie.wood@noaa.gov> for data. 


## Methods

### Data sources

Data comes from NOAA's NEFSC Aerial Surveys (@wood2022).

### Data analysis

Image processing and modelling is described in a NEFSC center reference document (@wood2022). 

### Data Processing

The gray seal pup indicator was formatted for inclusion in the `ecodata` R package with the code found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_sealpups.R).

<!--chapter:end:chapters/seal_pups.Rmd-->

# Gulf Stream Index {#gsi}

**Description**: Annual time series of the Gulf Stream index

**Indicator category**: Published method

**Found in**: State of the Ecosystem - New England (2019 (Different Methods), 2020+), 
State of the Ecosystem - Mid-Atlantic (2019 (Different Methods), 2020+) 

**Contributor(s)**: Zhuomin Chen, Young-oh Kwon
  
**Data steward**: Vincent Saba, <vincent.saba@noaa.gov>
  
**Point of contact**: Vincent Saba, <vincent.saba@noaa.gov>
  
**Public availability statement**: Source data are publicly available at [CMEMS](http://marine.copernicus.eu/services-portfolio/access-to-products/?option=com_csw&view=details&product_id=SEALEVEL_GLO_PHY_L4_REP_OBSERVATIONS_008_047). Index data are NOT publicly available so please email vincent.saba@noaa.gov for further information and queries of GSI indicator data.

## Methods
The methods used to calculate the Gulf Stream Index changed between 2019 and 2020 SOEs. The most recent methods and at the top with older methods below those.

This gulf stream index is a position anomaly meaning the larger the value of the index the further north the northern wall of the Gulf Stream is for that year. 

### Data sources
Data used in this analysis come from Compernicus Marine Environment Monitoring Service [CMEMS - GLOBAL OCEAN GRIDDED L4 SEA SURFACE HEIGHTS AND DERIVED VARIABLES REPROCESSED (1993-ONGOING)](http://marine.copernicus.eu/services-portfolio/access-to-products/?option=com_csw&view=details&product_id=SEALEVEL_GLO_PHY_L4_REP_OBSERVATIONS_008_047).

### Data analysis

The GSI is calculated based on the method presented by @perez-hernandez2014. It is a simple 16-point GS index constructed by selecting grid points following the maximum Standard deviation of sea level height anomalies every 1.33° longitude between 52° and 72°W and averaging them. The value of 1.33° is based on the resolution of satellite dataset from AVISO.  We followed the same method, except using the dataset from [CMEMS](http://marine.copernicus.eu/services-portfolio/access-to-products/?option=com_csw&view=details&product_id=SEALEVEL_GLO_PHY_L4_REP_OBSERVATIONS_008_047), which has a 0.25°x0.25° resolution. Therefore we select points every 1° between 52° and 72° and average them, and there are 21 points in total.

### Data Processing

The Gulf Stream index data set was formatted for inclusion in the `ecodata` R package with the code found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_gsi.R).

## 2019 Methods

Summarized from @joyce2019, ocean temperature data from NOAA NODC were sorted by latitude, longitude, and time using a resolution of 1`r ifelse(knitr::is_latex_output(), '\\textdegree', '&deg;')` of longitude, latitude, and 3 months of time, respectively, with a Gaussian squared weighting from the selected desired point in a window twice the size of the desired resolution. Editing was used to reject duplicate samples and 3$\sigma$ outliers from each selected sample point prior to performing the weighting and averaging; the latter was only carried out when there were at least three data points in the selected interval for each sample point. Typically, 50 or more data values were available. The resulting temperature field was therefore smoothed. Data along the Gulf Stream north wall at nine data points were used to assemble a spatial/temporal sampling of the temperature at 200m data along the north wall from 75`r ifelse(knitr::is_latex_output(), '\\textdegree ', '&deg;')`W to 55`r ifelse(knitr::is_latex_output(), '\\textdegree ', '&deg;')`W. The leading mode of temperature variability of the Gulf Stream is equivalent to a north‐south shift of 50–100 km, which is zonally of one sign and amounts to 50\% of the seasonal‐interannual variance between 75`r ifelse(knitr::is_latex_output(), '\\textdegree ', '&deg;')`W and 55`r ifelse(knitr::is_latex_output(), '\\textdegree ', '&deg;')`W. The temporal behavior of this mode (PC1) shows the temporal shift of the Gulf Stream path with a dominant approximately 8‐ to 10‐year periodicity over much of the period. 

### Data Sources

Ocean temperatures at 200 m are available at https://www.nodc.noaa.gov/OC5/3M_HEAT_CONTENT/.

### Data analysis

For detailed analytical methods, see @joyce2019. 

### Data processing and plotting

Data processing and plotting remained the same between years. 

<!--chapter:end:chapters/gulf_stream_index.Rmd-->

# Habitat Occupancy Models {#hab-occu}

**Description**: Habitat Occupancy

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2018), State of the Ecosystem - Mid-Atlantic (2018)

**Indicator category**: Database pull with analysis; Extensive analysis; not yet published; Published methods

**Contributor(s)**: Kevin Friedland
  
**Data steward**: Kevin Friedland, <kevin.friedland@noaa.gov>
  
**Point of contact**: Kevin Friedland, <kevin.friedland@noaa.gov>
  
**Public availability statement**: Source data are available upon request (see [Survdat](#survdat), [CHL/PP](#chl-pp), and Data Sources below for more information). Model-derived time series are available [here](https://comet.nefsc.noaa.gov/erddap/tabledap/SOE_habitat_soe_v1.html).


## Methods
Habitat area with a probability of occupancy greater than 0.5 was modeled for many [species throughout the Northeast Large Marine Ecosystem (NE-LME)](https://www.nefsc.noaa.gov/ecosys/current-conditions/occupancy-change.html) using random forest decision tree models. 

### Data sources
Models were parameterized using a suite of static and dynamic predictor variables, with occurrence and catch per unit effort (CPUE) of species from spring and fall Northeast Fisheries Science Center (NEFSC) bottom trawl surveys (BTS) serving as response variables. Sources of variables used in the analyses are described below.

#### Station depth
The NEFSC BTS data included depth observations made concurrently with trawls at each station. Station depth was a static variable for these analyses. 

#### Ocean temperature and salinity
Sea surface and bottom water temperature and salinity measurements were included as dynamic predictor variables in the model, and were collected using Conductivity/Temperature/Depth (CTD) instruments. Ocean temperature and salinity measurements had the highest temporal coverage during the spring (February-April) and fall (September-November) months. Station salinity data were available between 1992-2016. 

#### Habitat descriptors
A variety of benthic habitat descriptors were incorporated as predictor variables in occupancy models (Table \@ref(tab:habitatdesc)). The majority of these parameters are based on depth (e.g. *BPI*, *VRM*, *Prcury*, *rugosity*, *seabedforms*, *slp*, and *slpslp*). The vorticity variable is based on current estimates, and the variable *soft_sed* based on sediment grain size. 


```{r habitatdesc, echo = F, results='asis', message=F, warning=F}

tab <- '
|Variables|Notes|References|
|:-----------------------|:-----------------------|:-----------------------|
|Complexity - Terrain Ruggedness Index|The difference in elevation values from a center cell and the eight cells immediately surrounding it. Each of the difference values are squared to make them all positive and averaged. The index is the square root of this average.|@Riley1999|
|Namera bpi|BPI is a second order derivative of the surface depth using the TNC Northwest Atlantic Marine Ecoregional Assessment ("NAMERA") data with an inner radius=5 and outer radius=50.|@Lundblad2006|
|Namera_vrm|Vector Ruggedness Measure (VRM) measures terrain ruggedness as the variation in three-dimensional orientation of grid cells within a neighborhood based on The Nature Conservancy Northwest Atlantic Marine Ecoregional Assessment ("NAMERA") data.|@Hobson1972; @Sappington2007|
|Prcurv (2 km, 10 km, and 20 km)|Benthic profile curvature at 2km, 10km and 20 km spatial scales was derived from depth data.|@Winship2018|
|Rugosity|A measure of small-scale variations of amplitude in the height of a surface, the ratio of the real to the geometric surface area.|@Friedman2012|
|seabedforms|Seabed topography as measured by a combination of seabed position and slope.|[http://www.northeastoceandata.org/](http://www.northeastoceandata.org/)|
|Slp (2 km, 10 km, and 20 km)|Benthic slope at 2km, 10km and 20km spatial scales.|@Winship2018|
|Slpslp (2 km, 10 km, and 20 km)|Benthic slope of slope at 2km, 10km and 20km spatial scales|@Winship2018|
|soft_sed|Soft-sediments is based on grain size distribution from the USGS usSeabed: Atlantic coast offshore surficial sediment data.|[http://www.northeastoceandata.org/](http://www.northeastoceandata.org/)|
|Vort (fall - fa; spring - sp; summer - su; winter - wi)|Benthic current vorticity at a 1/6 degree (approx. 19 km) spatial scale.|@Kinlan2016|
'

df<-readr::read_delim(tab, delim="|")
df<-df[-c(1,2,3) ,c("Variables","Notes","References")]
knitr::kable(
  df, booktabs = TRUE,
  caption = 'Habitat descriptors used in model parameterization.') %>% 
  kableExtra::kable_styling(font_size = 6) %>% 
  landscape()
```

#### Zooplankton
Zooplankton data are acquired through the NEFSC Ecosystem Monitoring Program ("EcoMon"). For more information regarding the collection process for these data, see @Kane2007, @Kane2011, and @Morse2017. The bio-volume of the 18 most abundant zooplankton taxa were considered as potential predictor variables.  

#### Remote sensing data
Both chlorophyll concentration and sea surface temperature (SST) from remote sensing sources were incorporated as static predictor variables in the model. During the period of 1997-2016, chlorophyll concentrations were derived from observations made by the Sea-viewing Wide Field of View Sensor (SeaWIFS), Moderate Resolution Imaging Spectroradiometer (MODIS-Aqua), Medium Resolution Imaging Spectrometer (MERIS), and Visible and Infrared Imaging/Radiometer Suite (VIIRS). 

### Data processing

#### Zooplankton
Missing values in the EcoMon time series were addressed by summing data over five-year time steps for each seasonal time frame and interpolating a complete field using ordinary kriging. Missing values necessitated interpolation for spring data in 1989, 1990, 1991, and 1994. The same was true of the fall data for 1989, 1990, and 1992.

#### Remote sensing data
An overlapping time series of observations from the four sensors listed above was created using a bio-optical model inversion algorithm [@Maritorena2010]. Monthly SST data were derived from MODIS-Terra sensor data (available [here](https://oceancolor.gsfc.nasa.gov/data/terra/)). 

#### Ocean temperature and salinity
Date of collection corrections for ocean temperature data were developed using linear regressions for the spring and fall time frames; standardizing to collection dates of April 3 and October 11 for spring and fall. No correction was performed for salinity data. Annual data for ocean temperature and salinity were combined with climatology by season through an optimal interpolation approach. Specifically, mean bottom temperature or salinity was calculated by year and season on a 0.5&deg; grid across the ecosystem, and data from grid cells with >80% temporal coverage were used to calculate a final seasonal mean. Annual seasonal means were then used to calculate combined anomalies for seasonal surface and bottom climatologies. 

An annual field was then estimated using raw data observations for a year, season, and depth using universal kriging [@automap], with depth included as a covariate (on a standard 0.1&deg; grid). This field was then combined with the climatology anomaly field and adjusted by the annual mean using the variance field from kriging as the basis for a weighted mean between the two. The variance field was divided into quartiles with the lowest quartile assigned a weighting of 4:1 between the annual and climatology values. The optimally interpolated field at these locations was therefore skewed towards the annual data, reflecting their proximity to actual data locations and associated low kriging variance. The highest kriging variance quartile (1:1) reflected less information from the annual field and more from the climatology.

### Data analysis

#### Occupancy models
Prior to fitting the occupancy models, predictor variables were tested for multi-collinearity and removed if found to be correlated. The final model variables were then chosen utilizing a model selection process as shown by @Murphy2010 and implemented with the R package `rfUtilities` [@rfUtilities-package]. Occupancy models were then fit as two-factor classification models (absence as 0 and presence as 1) using the `randomForest` R package [@randomForest]. 

#### Selection criteria and variable importance
The `irr` R package [@irr] was used to calculate Area Under the ROC Curve (AUC) and Cohen's Kappa for assessing accuracy of occupancy habitat models. Variable importance was assessed by plotting the occurrence of a variable as a root variable versus the mean minimum node depth for the variable [@randomForestExplainer], as well as by plotting the Gini index decrease versus accuracy decrease.

<!--chapter:end:chapters/occupancy_indicator.Rmd-->

# Habitat Diversity {#habitat_diversity}

**Description**: Species richness was derived from the Northeast Regional Habitat Assessment models for 55 common species sampled by the 2000-2019 spring and fall NEFSC bottom trawl surveys. The joint species distribution model controls for differences in capture efficiency across survey vessels.
 

**Indicator category**: Extensive analysis

**Found in**: State of the Ecosystem - New England (2023), 
State of the Ecosystem - Mid-Atlantic (2023) 

**Contributor(s)**: Chris Haak <ChrisHaak@gmail.com>
  
**Data steward**: Laurel Smith <laurel.smith@noaa.gov>
  
**Point of contact**: Laurel Smith <laurel.smith@noaa.gov>
  
**Public availability statement**: This analysis is based on NEFSC bottom trawl survey data which are publicly available. Please reached out to Laurel Smith with questions.


## Methods


### Data sources

Abundance data were extracted from the NEFSC’s SVDBS database using Survdat for 55 fish species regularly sampled on spring and fall NEFSC bottom trawl surveys:

Species included in NRHA Diversity Index: 


| Common Name             | Scientific Name                   |
|-------------------------|-----------------------------------|
| Acadian Redfish         | *Sebastes fasciatus*              |
| Alewife                 | *Alosa pseudoharengus*            |
| American Lobster        | *Homarus americanus*              |
| American Plaice         | *Hippoglossoides platessoides*    |
| American Shad           | *Alosa sapidissima*               |
| Atlantic Cod            | *Gadus morhua*                    |
| Altantic Croaker        | *Micropogonias undulatus*         |
| Atlantic Herring        | *Clupea harengus*                 |
| Atlantic Mackerel       | *Scomber scombrus*                |
| Barndoor Skate          | *Dipturus laevis*                 |
| Black Sea Bass          | *Centropristis striata*           | 
| Blackbelly Rosefish     | *Helicolenus dactylopterus*       |
| Blueback Herring        | *Alosa aestivalis*                |
| Bluefish                | *Pomatomus saltatrix*             |
| Butterfish              | *Peprilus triacanthus*            |
| Chain Dogfish           | *Scyliorhinus retifer*            |
| Clearnose Skate         | *Rostroraja eglanteria*           |
| Fawn Cusk Eel           | *Lepophidium profundorum*         |
| Fourbeard Rockling      | *Enchelyopus cimbrius*            |
| Fourspot Flounder       | *Hippoglossina oblonga*           |
| Goosefish               | *Lophius americanus*              |
| Gulf Stream Flounder    | *Citharichthys arctifrons*        |
| Haddock                 | *Melanogrammus aeglefinus*        |
| Horseshoe Crab          | *Limulus polyphemus*              |
| Jonah Crab              | *Cancer borealis*                 |
| Little Skate            | *Leucoraja erinaceus*             |
| Longfin Squid           | *Doryteuthis (Amerigo) pealeii*   |
| Longhorn Sculpin        | *Myoxocephalus octodecemspinosus* |
| Northern Searobin       | *Prionotus carolinus*             |
| Northern Shortfin Squid | *Illex illecebrosus*              |
| Northern Shrimp         | *Pandalus borealis*               |
| Ocean Pout              | *Zoarces americanus*              |
| Offshore Hake           | *Merluccius albidus*              |
| Pollock                 | *Pollachius pollachius*           |
| Red Hake                | *Urophycis chuss*                 |
| Rosette Skate           | *Leucoraja garmani*               |
| Scup                    | *Stenotomus chrysops*             |
| Sea Raven               | *Hemitripterus americanus*        |
| Sea Scallop             |  *Placopecten magellanicus*       |
| Silver Hake             | *Merluccius bilinearis*           |
| Smooth Dogfish          | *Mustelus canis*                  |
| Smooth Skate            | *Malacoraja senta*                |
| Spiny Dogfish           | *Squalus acanthias*               |
| Spot                    | *Leiostomus xanthurus*            |
| Spotted Hake            | *Urophycis regia*                 |
| Striped Searobin        | *Prionotus evolans*               |
| Summer Flounder         | *Paralichthys dentatus*           |
| Thorny Skate            | *Amblyraja radiata*               |
| Weakfish                | *Cynoscion regalis*               |
| White Hake              | *Urophycis tenuis*                |
| Windowpane Flounder     | *Scophthalmus aquosus*            |
| Winter Flounder         | *Pseudopleuronectes americanus*   |
| Winter Skate            | *Leucoraja ocellata*              |
| Witch Flounder          | *Glyptocephalus cynoglossus*      |
| Yellowtail Flounder     | *Myzopsetta ferruginea*           |


Data were converted to presence/absence for species richness modeling.

### Data analysis

#### Species Richness 

Estimated species richness is the number of unique species expected to be observed in NEFSC bottom trawl surveys conducted in a given ecological production unit (EPU) and year, based on a fitted joint-species distribution/habitat suitability model (considering only the 55 commonly-occurring species listed above).

 
#### Model Fitting


A spatiotemporal joint species distribution model was fitted to n=13231 observations of presence/absence in the Spring and Fall NEFSC bottom trawl surveys for the years 2000-2019, using the [Community Level Basis Function Model (CBFM) framework](https://github.com/fhui28/CBFM) with a binomial error distribution and logistic link function.  The probability of presence was modeled as a function of environmental predictor variables (using smooth terms), a vessel effect (factor) to account for changes in sampling gear, as well as spatiotemporal (Lat, Lon, Month) and temporal (Year) random effects, which were estimated hierarchically through a set of species-common basis functions. The model thus controls for differences in capture efficiency across survey vessels, permitting predictions on a common scale (here calibrated to the RFV Albatross IV).

#### Environmental Covariates

Covariate values (i.e., environmental parameters) corresponding to the approximate location (and time, when applicable) of each observation (i.e., tow) were extracted from the following sources:
Monthly mean surface and bottom temperature, surface and bottom salinity, and sea surface height anomaly were obtained from the GLORYS12V1 reanalysis (@Lellouche2012), as were annual minimum and maximum surface and bottom temperatures.

Monthly mean underwater optical parameters, including the intensity (photosynthetically active radiation - PAR) and spectral composition (hue angle) of downwelling light at mid-water column, were estimated from remote sensing data, following the methods of @Lee2005 and @Lee2022, respectively.

Hydrodynamic stress near the seabed (95th quantile) was obtained from the USGS Sea Floor Stress and Sediment Mobility database (Dalyander et al. 2012).


Annually-integrated chlorophyll was obtained from the Oceancolour-CCI (version 5) release (https://www.oceancolour.org/).

Bathymetric position index (BPI), benthic structural complexity, and sediment type data were estimated following the methods described at: https://www.conservationgateway.org/ConservationByGeography/NorthAmerica/UnitedStates/edc/reportsdata/marine/namera/namera/Pages/default.aspx/

#### Estimating Richness

Simulating from the fitted model, we generated 100 random draws of “joint” predictions of the species assemblage observed in the survey, taking into account species residual covariances (see @Wilkinson2021 for additional details). We used these to produce estimates of the mean species richness (and corresponding 95% prediction intervals) across all observations within each ecological production unit (EPU) for each modeled year (2000-2019).


### Data Processing

The Habitat Diversity indicator was formatted for inclusion in the `ecodata` R package with the code found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_habitat_diversity.R).

<!--chapter:end:chapters/habitat_diversity.Rmd-->

# Habitat Vulnerability 

**Description**: A summary of habitat vulnerability and the importance of such habitats to managed species. 

**Indicator category**: Extensive analysis, not yet published

**Found in**: State of the Ecosystem - New England (2021), 
State of the Ecosystem - Mid-Atlantic (2021) 

**Contributor(s)**: Mark Nelson, Mike Johnson, Emily Farr, Grace Roskar
  
**Data steward**: Grace Roskar <grace.roskar@noaa.gov>
  
**Point of contact**: Mark Nelson, Mike Johnson, Emily Farr
  
**Public availability statement**: Data from the Northeast Fish Climate Vulnerability Assessment and ACFHP’s species-habitat matrix are publicly available. However, the data from the Northeast Habitat Climate Vulnerability Assessment are not yet published. Please email emily.farr@noaa.gov or mike.r.johnson@noaa.gov for further information and queries.

## Methods


### Data sources

Data came from the Northeast Habitat Climate Vulnerability Assessment (HCVA; not yet published), the Northeast Fish and Shellfish Climate Vulnerability Assessment ([FCVA](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0146756))  and the Atlantic Coastal Fish Habitat Partnership’s ([ACFHP](https://www.atlanticfishhabitat.org/species-habitat-matrix/#:~:text=The%20Species%2DHabitat%20Matrix%20is,selected%20fish%20and%20invertebrate%20species)) Species-Habitat Matrix. 

### Data analysis

We assessed the vulnerability of 52 marine, estuarine, and riverine habitats in the Northeast U.S. to climate change. The northern and southern boundaries of the study area are the U.S./Canadian border and Cape Hatteras, NC, respectively, and the study includes habitats out to the U.S. EEZ and up-river to capture the full range of diadromous species. 

This habitat climate vulnerability assessment (HCVA) builds on the Northeast Fish and Shellfish Climate Vulnerability Assessment (FCVA) completed in 2016 (@hare2016), and uses a similar framework. While the species assessment primarily examined climate vulnerability based on life history, the HCVA assesses the vulnerability of the habitats themselves to climate change, and complements the species assessment by improving our understanding of how vulnerable habitats will impact fish and invertebrate populations.

To better understand which species depend on vulnerable habitats, the Atlantic Coastal Fish Habitat Partnership (ACFHP) habitat-species matrix (Kritzer2016) was used in conjunction with the results of the HCVA and the FCVA. The ACFHP matrix identified the importance of nearshore benthic habitats to each life stage of select fish species, which helps elucidate species that may be highly dependent on highly vulnerable habitats that were identified in the HCVA.

#### HCVA Methods:
The Northeast HCVA is a trait-based vulnerability assessment which was adapted from the framework developed for NOAA’s Fish Stock Climate Vulnerability Assessment (@morrison2015). The HCVA considers the overall vulnerability of habitat to climate change to be a function of two main components: exposure and sensitivity. The exposure component considers the magnitude and overlap of projected changes in climate with the distribution of each habitat. Climate exposure is assessed using end-of-century climate projections based on the Intergovernmental Panel on Climate Change RCP 8.5 emissions scenario. The sensitivity component includes nine habitat attributes, or traits, that are believed to be indicative of the response of a habitat to potential changes in climate. The assessment methodology relies on an expert opinion-based approach to determine the sensitivity of each habitat to potential climate change related impacts. The sensitivity is combined with the climate exposure information to determine the overall vulnerability rank. 

#### Methods for linking habitat vulnerability results with species:
The Atlantic Coastal Fish Habitat Partnership (ACFHP) habitat-species matrix (@kritzer2016) was consulted and linked with the results of the HCVA and the FCVA in order to understand which federally managed species that are highly dependent on highly vulnerable habitats. 

The ACFHP habitat-species matrix evaluated the importance of 26 benthic habitat types to select fish and invertebrate species. Each habitat type was assigned a rank of “very high,” “high,” “moderate,” or “low,” reflecting a species’ use of the habitat at a specific life stage. Detailed descriptions of the rationale behind the rankings can be found in @kritzer2016. 

Using habitat descriptions from @kritzer2016, the 26 habitats analyzed by ACFHP were matched with HCVA habitats that best fit under the same description. Several habitat types that were included in the HCVA but not assessed by ACFHP were omitted from this analysis (e.g., manmade hard bottom habitats, aquaculture, invasive salt marsh and wetlands, water column habitats). A species-habitat matrix was then created using the species that were assessed in both the FCVA and by ACFHP, and the habitat importance ranking (very high, high, moderate, low) from the ACFHP matrix for each habitat type. Because the ACFHP habitat types were broader, several HCVA habitats often fit under a single ACFHP habitat; therefore, to determine which HCVA habitat a species/life stage actually uses under the broader ACFHP habitat, species profiles from the Northeast Regional Habitat Assessment (NRHA) and EFH Source Documents were consulted.

Species highlighted here are those that are highly dependent on highly vulnerable habitats. A ranking matrix was created using the habitat vulnerability rankings compared to the habitat importance rankings to determine the criteria, and for the purposes of this submission, “high dependence on a highly vulnerable habitat” encompasses moderate use of very highly vulnerable habitats, high use of highly or very highly vulnerable habitats, or very high use of moderately, highly, or very highly vulnerable habitats.


### Data Processing

The Habitat Vulnerability information table was formatted for inclusion in the `ecodata` R package with the code found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_habitat_vulnerability.R).

### Plotting

Table [found here](https://noaa-edab.github.io/ecodata/Hab_table). 


<!--chapter:end:chapters/habitat_vulnerability.Rmd-->

# Harbor Porpoise and Gray Seal Bycatch {#harborporpoise}


**Description**: Harbor Porpoise and Gray Seal Indicator

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2018+), State of the Ecosystem - Mid-Atlantic (2018+)

**Indicator category**: Synthesis of published information; Published methods

**Contributor(s)**: Christopher D. Orphandies, Debra Palka
  
**Data steward**: Debra Palka <debra.palka@noaa.gov>

**Point of contact**: Debra Palka <debra.palka@noaa.gov>

**Public availability statement**: Source data are available in public [stock assessment reports](https://www.fisheries.noaa.gov/national/marine-mammal-protection/marine-mammal-stock-assessment-reports-region).
  

## Methods


### Data sources
Reported harbor porpoise bycatch estimates and potential biological removal levels can be found in publicly available documents; detailed in [Marine Mammal Protection Stock Assessments](https://www.fisheries.noaa.gov/national/marine-mammal-protection/marine-mammal-stock-assessment-reports-region). More detailed documentation as to the methods employed can be found in NOAA Fisheries Northeast Fisheries Science Center (NEFSC) Center Reference Documents (CRDs) found on the NEFSC [publications page](https://www.nefsc.noaa.gov/publications/crd). 


### Data extraction 
Annual gillnet bycatch estimates are documented in the CRDs. These feed into the Stock Assessment Reports which report both the annual bycatch estimate and the mean 5-year estimate. 

### Data analysis
Bycatch estimates as found in stock assessment reports were plotted along with confidence intervals. The confidence intervals were calculated from published CVs assuming a normal distribution ($\sigma = \mu CV$; $CI = \bar{x} \pm \sigma * 1.96$).

Data were analyzed and formatted for inclusion in the `ecodata` R package using the R code found here, [Harbor Porpoise](https://raw.githubusercontent.com/NOAA-EDAB/ecodata/master/data-raw/get_harborporpoise.R) and [Gray Seal](https://raw.githubusercontent.com/NOAA-EDAB/ecodata/master/data-raw/get_grayseal.R).

<!--chapter:end:chapters/HP_indicator.Rmd-->

# Harmful Algal Bloom - Alexandrium Indicator {#habs}

**Description**: *Alexandrium catenella* annual cyst abundance in the Gulf of Maine

**Found In:**: 2022 Indicator Catalog

**Indicator category**: Published methods, Database pull with analysis

**Contributor(s)**: Yizhen Li, NOAA/NOS NCCOS Stressor Detection and Impacts Division, HAB Forecasting Branch, Silver Spring MD
  
**Data steward**: Moe Nelson <david.moe.nelson@noaa.gov>
  
**Point of contact**: Moe Nelson <david.moe.nelson@noaa.gov>

**Public availability statement**: Source data are NOT publicly available. Data were provided upon request by Yizhen Li.  Data are also used in operational HAB forecast models, freely available to the public.


## Methods
### Data Sources

*Alexandrium* cysts in sediments of the Gulf of Maine have been monitored through a cooperative effort of NOAA, WHOI, and other partners for over twenty years.  Sampling methods are described in @Anderson2005.  In the annual survey cruises, samples are obtained with a Craib corer, and *Alexandrium* cysts are counted from the top 1- cm of sediment layer. Results are extrapolated to estimate overall cyst abundance in the eastern, western, and entire Gulf of Maine.Results are reported as estimated total numbers of cells (10 to the 16th power) in Eastern Gulf of Maine (east of Penobscot Bay), Western Gulf of Maine (west of Penobscot Bay), Bay of Fundy (2003-2013 only), and entire Gulf of Maine.


### Data Extraction
Tabular data provided by Yizhen Li, NOAA/NOS NCCOS Stressor Detection and Impacts Division, HAB Forecasting Branch. 

### Data Analysis

The spatial distribution and abundance of cyst cells from the annual survey are used to drive an ecosystem forecast model for the Gulf of Maine (@Anderson2005, @Li2009, @Li2020, @McGillicuddy2011). The model also includes many other inputs of dynamic oceanographic data such as currents, temperature, and nutrients. Operational Harmful Algal Bloom forecast is served online at https://coastalscience.noaa.gov/research/stressor-impacts-mitigation/hab-forecasts/gulf-of-maine-alexandrium-catenella-predictive-models/.

### Data Processing

Code for processing *Alexandrium* cyst data can be found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_habs.R).

<!--chapter:end:chapters/habs_alexandrium.Rmd-->

# Harmful Algal Bloom Indicator - Mid Atlantic

```{r, echo = F, message=F}

#Load packages
library(knitr)
library(rmarkdown)

```

**Description**: An aggregation of reported algal bloom data in Chesapeake Bay between 2007-2017.

**Found in**: State of the Ecosystem - Mid-Atlantic (2018)

**Indicator category**: Database pull

**Contributor(s)**: Sean Hardison, Virginia Department of Health

**Data steward**: Kimberly Bastille, <kimberly.bastille@noaa.gov>

**Point of contact**: Kimberly Bastille, <kimberly.bastille@noaa.gov>

**Public availability statement**: Source data for this indicator are available [here](https://github.com/NOAA-EDAB/tech-doc/tree/master/data/CB_HAB). Processed time series can be found [here](http://comet.nefsc.noaa.gov/erddap/tabledap/CBhabs_ann_soe_v2.html).


## Methods
We presented two indicator time series for reports of algal blooms in the southern portion of Chesapeake Bay between 2007-2017. The first indicator was observations of algal blooms above 5000 cell ml^-1^. This threshold was developed by the Virginia Department of Health (VDH) for *Microcystis* spp. algal blooms based on World Health Organization guidelines [@WHO2003; @VDH2011].  VDH also uses this same threshold for other algal species blooms in Virginia waters. When cell concentrations are above 5000 cell ml^-1^, VDH recommends initiation of biweekly water sampling and that relevant local agencies be notified of the elevated cell concentrations.

The second indicator we reported, blooms of *Cochlodinium polykrikoides* at cell concentrations >300 cell ml^-1^, was chosen due to reports of high ichthyotoxicity seen at these levels. @Tang2009 showed that fish exposed to cultured *C. polykrikoides* at densities as low 330 cells ml^-1^ saw 100% mortality within 1 hour, which if often far less than *C. polykrikoides* cell concentrations seen in the field. Algal bloom data were not available for 2015 nor 2010. The algal bloom information presented here are a synthesis of reported events, and has been updated to include data not presented in the 2018 State of the Ecosystem Report. 

### Data sources
Source data were obtained from VDH. Sampling, identification, and bloom characterization was completed by the VDH, Phytoplankton Analysis Laboratory at Old Dominion University (ODU), Reece Lab at the Virginia Institute of Marine Science (VIMS), and Virginia Department of Environmental Quality. Problem algal species were targeted for identification via light microscopy followed by standard or quantitative PCR assays and/or enzyme-linked immunosorbent assay (ELISA). Reports specifying full methodologies from ODU, VIMS, and VDH source data are available upon request.

### Data extraction
Data were extracted from a series of spreadsheets provided by the VDH. We quantified the number of algal blooms in each year reaching target cell density thresholds in the southern Chesapeake Bay.

(ref:r-extract) All reported algal blooms >5000 cells ml ^-1^ (black), and reports of *C. polykrikoides* blooms >300 cells ml ^-1^ (red) between 2007-2017.

R code used in extracting harmful algal bloom data can be found [here](https://github.com/NOAA-EDAB/tech-doc/tree/master/R/stored_scripts/mab_hab_extraction.R).


### Data analysis
<!--Text description of analysis methods, similar in structure and detail to a peer-reviewed paper methods section-->
No data analysis steps took place for this indicator.

<!--chapter:end:chapters/MAB_HABs_indicator.Rmd-->

# Harmful Algal Bloom Indicator - New England

**Description**: Regional incidence of shellfish bed closures due to presence of toxins associated with harmful algae. 

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2018)

**Indicator Category**: Synthesis of published information

**Contributor(s)**: Dave Kulis, Donald M Anderson, Sean Hardison

**Data steward**: Kimberly Bastille, <kimberly.bastille@noaa.gov>

**Point of contact**: Kimberly Bastille, <kimberly.bastille@noaa.gov>

**Public availability statement**: Data are publicly available (see Data Sources below).

## Methods
The New England Harmful Algal Bloom (HAB) indicator is a synthesis of shellfish bed closures related to the presence of HAB-associated toxins above threshold levels from 2007-2016 (Figure \@ref(fig:NE-HAB)). Standard detection methods were used to identify the presence of toxins associated with Amnesic Shellfish Poisoning (ASP), Paralytic Shellfish Poisoning (PSP), and Diarrhetic Shellfish Poisoning (DSP) by state and federal laboratories. 

#### Paralytic Shellfish Poisoning
The most common cause of shellfish bed closures in New England is the presence of paralytic shellfish toxins (PSTs) produced by the dinoflagellate *Alexandrium catenella*. All New England states except Maine relied on the Association of Official Analytical Chemists (AOAC) approved mouse bioassay method to detect PSTs in shellfish during the 2007-2016 period reported here [@Anonymous2005].

In Maine, PST detection methods were updated in May 2014 when the state adopted the hydrophilic interaction liquid chromatography (HILIC) UPLC-MS/MS protocol [@Boundy2015] in concordance with National Shellfish Sanitation Program (NSSP) requirements.  Prior to this, the primary method used to detect PST in Maine was with the mouse bioassay. 

#### Amnesic Shellfish Poisoning
Amnesic shellfish poisoning (ASP) is caused by the toxin domoic acid (DA), which is produced by several phytoplankton species belonging to the genus *Pseudo-nitzchia*. In New England, a UV-HPLC method [@Quilliam1995], which specifies a HPLC-UV protocol, is used for ASP detection. 

#### Diarrhetic Shellfish Poisoning
Diarrhetic Shellfish Poisoning (DSP) is rare in New England waters, but the presence of the DSP-associated okadaic acid (OA) in mussels was confirmed in Massachusetts in 2015 (J. Deeds, personal communication, July 7, 2018). Preliminary testing for OA in Massachusetts utilized the commercially available Protein Phosphatase Inhibition Assay (PPIA) and these results are confirmed through LC-MS/MS when necessary [@Smienk2012; @Stutts2017]. 


### Data sources
<!--Please provide a text description of data sources, inlcuding primary collection methods. What equipment was used to turn signal to data? From which vessel were data collected and how? What quality control procedures were employed, if any?--> 


Data used in this indicator were drawn from the 2017 Report on the ICES-IOC Working Group on Harmful Algal Bloom Dynamics (WGHABD). The report and data are available [here](http://www.ices.dk/sites/pub/Publication%20Reports/Expert%20Group%20Report/SSGEPD/2017/01%20WGHABD%20-%20Report%20of%20the%20ICES%20-%20IOC%20Working%20Group%20on%20Harmful%20Algal%20Bloom%20Dynamics.pdf).


Closure information was collated from information provided by the following organizations:
```{r closuresrc, echo = F, include = T, results='asis', message=FALSE, warning=F}
tabl <- data.frame(State = c("Maine","New Hampshire","Massachusetts","Rhode Island","Connecticut"),
                   `Source Organization` = c("Maine Department of Marine Resources",
                                           "New Hampshire Department of Environmental Services",
                                           "Massachusetts Division of Marine Fisheries",
                                           "Rhode Island Department of Environmental Management",
                                           "Connecticut Department of Agriculture"))
names(tabl)[2] <- "Source Organization"
knitr::kable(
  tabl, booktabs = TRUE,
  caption = 'Shellfish closure information providers.'
)

```


### Data extraction
<!--Text overview description of extraction methods. What information was extracted and how was it aggregated? Can point to other indicator extraction methods if the same.-->

Data were extracted from the original report visually and accuracy confirmed with report authors.


### Data analysis
<!--Text description of analysis methods, similar in structure and detail to a peer-reviewed paper methods section-->

No data analysis steps took place for this indicator.

### Plotting

The script used to develop the figure in the SOE report can be found [here](https://github.com/NOAA-EDAB/tech-doc/tree/master/R/stored_scripts/ne_hab_plotting.R).

<!--chapter:end:chapters/NE_HABs_indicator.Rmd-->

# Harmful Algal Blooms - Paralytic Shellfish Poisoning Indicator

**Description**: Paralytic Shellfish Poisoning (PSP) toxins in the Gulf of Maine

**Found In**: 2022 Indicator Catalog

**Indicator category**: Published methods, Database pull

**Contributor(s)**: Yizhen Li, NOAA/NOS NCCOS Stressor Detection and Impacts Division, HAB Forecasting Branch, Silver Spring, MD.  Ayman Mabrouk, NOAA/NOS NCCOS Marine Spatial Ecology Division, Silver Spring, MD.
  
**Data steward**: Moe Nelson <david.moe.nelson@noaa.gov>
  
**Point of contact**: Moe Nelson <david.moe.nelson@noaa.gov>

**Public availability statement**: Source data are NOT publicly available. Data can be acquired upon request.

## Methods
### Data Sources

Data set was provided by Yizhen Li, NOAA/NOS NCCOS Stressor Detection and Impacts Division, HAB Forecasting Branch, Silver Spring, MD. Graphics and summaries were developed by Ayman Mabrouk, NOAA/NOS NCCOS Marine Spatial Ecology Division, Biogeography Branch, Silver Spring, MD.

Original data were collected by the State of Maine, Department of Marine Resources, which samples and tests blue mussels (*Mytilis edulis*) in coastal shellfish areas for HAB biotoxins on a weekly basis from March through October.  

[Maine Department of Marine Resources – Biotoxins in Maine](https://www.maine.gov/dmr/shellfish-sanitation-management/programs/biotoxininfo.html)

[Massachusetts Division of Marine Fisheries](https://www.mass.gov/service-details/psp-red-tide-monitoring)

[Massachusetts Division of Marine Fisheries – Shellfish classification areas](https://www.mass.gov/service-details/shellfish-classification-areas)

[New Hampshire Department of Environmental Services](https://www.des.nh.gov/water/healthy-swimming/harmful-algal-blooms)

### Data Analysis

Blue mussels (*Mytilis edulis*) were sampled at designated sites each year from March through October, and tissues were analyzed for presence and quantity of Paralytic Shellfish Poison (PSP) toxins.  Saxitoxin (STX) is a well-known PSP toxin, but a bloom can generate a range of related PSP toxins.  Therefore, in many monitoring programs, toxins are reported as "ug STX equivalents per 100 grams of shellfish tissue", where the quantity of each toxin present is normalized by it's toxicity compared to STX (@Chung2010).


Data include total number of samples at multiple locations collected in each calendar year (2005-2019), numbers of samples above and below the PSP threshold of 44 ug/100g, and percentage of samples above the threshold.  Simple bar and line graphs are used to plot the values for each variable as time series, 2005-2019.
Operational Harmful Algal Bloom forecast is served online at https://coastalscience.noaa.gov/research/stressor-impacts-mitigation/hab-forecasts/gulf-of-maine-alexandrium-catenella-predictive-models/.

### Data Processing

Code for processing salinity data can be found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_habs.R).

<!--chapter:end:chapters/habs_psp.Rmd-->

# Highly Migratory Species Landings {#hms_landings}

**Description**: Atlantic Highly Migratory Species Landings

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2020(Different Methods), 2021+), State of the Ecosystem - Mid-Atlantic (2020(Different Methods), 2021+)

**Indicator category**: Database pull with analysis

**Contributor(s)**: Heather Baertlein, Jackie Wilson, George Silva, Jennifer Cudney
  
**Data steward**: Kimberly Bastille
  
**Point of contact**: Jennifer Cudney <jennifer.cudney@noaa.gov>
  
**Public availability statement**: Source data are NOT publicly available.

## Methods
### Data Sources
Data from eDealer database (https://www.fisheries.noaa.gov/atlantic-highly-migratory-species/atlantic-highly-migratory-species-dealer-reporting) and Bluefin Tuna Dealer reports on SAFIS (https://www.accsp.org/what-we-do/safis/).  The eDealer data were supplemented with ACCSP records, GulfFIN records, and vessel logbook catches for which no dealer reports were submitted.

### Data Analysis
Data were processed for Fisheries of the United States and then aggregated by regions to avoid confidentiality issues. Data of Atlantic shark, swordfish, bigeye tuna, albacore tuna, yellowfin tuna and skipjack tuna were initially extracted from our eDealer database. Additional landings of these HMS not in eDealer were found in ACCSP, GulfFIN, and the SEFSC Atlantic HMS vessel logbook records. Bluefin tuna landings data from the Bluefin Tuna Dealer reports in SAFIS were also extracted and combined with the eDealer data for other HMS .

Procedures of quality assurance were conducted. Duplicate records were removed from the data. This may occur from multiple submissions of reports by the same dealer. It may also occur when two or more dealers report the same landings in “Packing” situations. While most vessels immediately sell their catch to the dealer at their port of landing, some vessels sell their catch to a dealer(s) in another location. Transport to alternate locations requires processing of the fish to preserve quality. This processing activity is done by the dealer at the port of landing and is referred to as "Packing". Differences in federal and state definitions of who is considered the “dealer” of the product, and thus ultimately responsible for submitting the landings report, often results in multiple reports being created for the same landings. These duplicate reports need to be accounted for when summarizing the data to reflect accurate landings. Therefore, searches for duplicate reports of the same landing were conducted and eliminated prior to summarizing the data for the Fisheries of the United States.

All reported landings were converted to live weights using conversion ratios appropriate for the species/species group and reported grade of the product. Shark fins were not reported to live weight as these weights are included in the converted whole weight of the reported shark landing.

States, where the landings occurred, were grouped into ‘ecological production units’ (EPUs), as defined by GARFO staff. “New England”, or NE, includes Maine, New Hampshire and Massachusetts, as well as landings from Canada. The “Mid-Atlantic Bight”, or MAB, includes states from Rhode Island to North
Carolina.

Seven HMS Management Groups represent 26 highly migratory species in the dataset. HMS Management Groups may include a single species or a group of species. HMS groups include “Bluefin Tuna”, “BAYS”, “Swordfish”, “Large Coastal Sharks”, “Small Coastal Sharks”, “Pelagic Sharks”, “Smoothhound Sharks”. “BAYS” includes bigeye, albacore, yellowfin and skipjack tunas. “Large Coastal Sharks” includes blacktip, bull, great hammerhead, scalloped hammerhead, smooth hammerhead, lemon, nurse, sandbar, silky, spinner, and tiger sharks. “Small Coastal Sharks” includes Atlantic sharpnose, blacknose, bonnethead, finetooth sharks. “Pelagic Sharks” includes blue, porbeagle, shortfin mako, and thresher sharks. “Smoothhound Sharks” includes smooth dogfish shark. 

Price per pound was used to determine the ex-vessel value. For landings with prices per pound reported as “N/A”, 0, $0.01 or left blank, average prices were calculated for each species and state. Those averages replaced the missing values to determine landings revenue. Revenue from sales to the aquarium trade were also excluded to avoid extreme values associated with shipping live specimens. 


High migratory landings include 26 species of tunas, sharks and swordfish.

Data were processed and analyzed using SAS and Microsoft Excel pivot tables. 
The count of dealers and vessels in each regional species/management group sum was used to determine if a sufficient number of records were available to make the data public or if it needed to be marked as confidential. Additionally, ratios of landings reported by dealers/fishermen were compared in each regional species/managment group sum to determine if any one entity contributed more than 2/3 of the total which would require it being marked as confidential.

### Data Processing 

HMS landings data were formatted for inclusion in the `ecodata` R package using the R code found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_hms_landings.R).

## Methods 2020


### Data sources
Data from eDealer database (https://www.fisheries.noaa.gov/atlantic-highly-migratory-species/atlantic-highly-migratory-species-dealer-reporting) and Bluefin Tuna Dealer reports on [SAFIS](https://www.accsp.org/what-we-do/safis/).  The eDealer data were supplemented with GulfFIN records and vessel logbook catches for which no dealer reports were submitted.


### Data extraction 
Data were processed for Fisheries of the United States and then aggregated by regions to avoid confidentiality issues.

Data of Atlantic shark, swordfish, bigeye tuna, albacore tuna, yellowfin tuna and skipjack tuna were initially extracted from our eDealer database. Additional landings of these HMS not in eDealer were found in GulfFIN records. Bluefin tuna landings data from the Bluefin Tuna Dealer reports in SAFIS were also extracted and combined with the eDealer data for other HMS .

Procedures of quality assurance were conducted. Duplicate records were removed from the data. This may occur from multiple submissions of reports by the same dealer. It may also occur when two or more dealers report the same landings in “Packing” situations. While most vessels immediately sell their catch to the dealer at their port of landing, some vessels sell their catch to a dealer(s) in another location. Transport to alternate locations requires processing of the fish to preserve quality. This processing activity is done by the dealer at the port of landing and is referred to as "Packing". Differences in federal and state definitions of who is considered the “dealer” of the product, and thus ultimately responsible for submitting the landings report, often results in multiple reports being created for the same landings. These duplicate reports need to be accounted for when summarizing the data to reflect accurate landings. Therefore, searches for duplicate reports of the same landing were conducted and eliminated prior to summarizing the data for the Fisheries of the United States.

Revenue from sales to the aquarium trade were also excluded to avoid extreme values associated with shipping live specimens. 

All reported landings were converted to live weights using conversion ratios appropriate for the species/species group and reported grade of the product. Shark fins were not reported to live weight as these weights are included in the converted whole weight of the reported shark landing.

Price per pound was used to determine the ex-vessel value. For landings with prices per pound reported as “N/A”, 0, $0.01 or left blank, average prices were calculated for each species and state. Those averages replaced the missing values to determine landings revenue.

The extract only includes species with more than $1,000 in landings in the region for that year to avoid issues with data confidentiality.  Other species landed include: tiger sharks, porbeagle, bonnethead, blacknose, blue, lemon, silky and smooth hammerhead sharks. However, these are not reported because of low volume and resulting data confidentiality issues.


### Data analysis
High migratory landings include 19 species of tunas, sharks and swordfish (table \@ref(tab:hmsspp)).

```{r hmsspp, eval = T, echo = F}

rec_spp <- read.csv(here::here("data","hms_species.csv")) 

knitr::kable(rec_spp, caption="Species included in the highly migratory species landings reported in the SOE.") %>%
  kableExtra::column_spec(2, italic = TRUE)
```

Data were processed and analyzed using SAS and Microsoft Excel pivot tables. The count of records marked as confidential and the number of states represented in each regional species sum was used to determine if a sufficient number of records were available to make the data public or if it needed to be marked as confidential.


<!--chapter:end:chapters/hms_landings.Rmd-->

# Highly Migratory Species POP Catch Per Unit Effort {#hms_cpue}

**Description**: CPUE from pelagic observer program (POP) observed hauls, presented as number of fish per haul, is provided for the northeast (shelf-wide) by year/species from 1992-2019.

**Indicator category**: Database pull with analysis

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2021+), State of the Ecosystem - Mid-Atlantic (2021+)

**Contributor(s)**: Tobey Curtis, Jennifer Cudney
  
**Data steward**:  Tobey Curtis, Jennifer Cudney
  
**Point of contact**:  Jennifer Cudney <jennifer.cudney@noaa.gov>
  
**Public availability statement**: Source data are NOT publicly available. Pelagic observer data is considered confidential data, and must be screened to ensure that data meet requirements for "rule of three" at the set and vessel level before they can be distributed. 


## Methods

### Data sources
Data for this indicator were compiled by NOAA Southeast Fisheries Science Center, Larry Beerkircher. These data come from the NOAA Southeast Fisheries Science Center databases holding the Atlantic highly migratory species information from the pelagic observer program. 

The Southeast Pelagic Observer Program monitors the pelagic longline fleet all along the Northeast U.S. Shelf. This covers approximately 1000 hauls per year and targets 8% coverage of the fishing effort. 


### Data analysis

Catch per unit effort was calculated as number of individuals per haul and summarized by year and species. 

The species groupings are available in the table below. 

| Category       | Common Name          | Species Name                 |
|----------------|----------------------|------------------------------|
| Small Coastal  | Atlantic Sharpnose   | *Rhizoprionodon terraenovae* |
| Small Coastal  | Blacknose            | *Carcharhinus acronotus*     |
| Small Coastal  | Bonnethead           | *Sphyrna tiburo*             |
| Small Coastal  | Finetooth            | *Carcharhinus isodon*        |
| Large Coastal  | Blacktip             | *Carcharhinus limbatus*      |  
| Large Coastal  | Bull                 | *Carcharhinus leucas*        |
| Large Coastal  | Great Hammerhead     | *Sphyrna mokarran*           |
| Large Coastal  | Lemon                | *Negaprion brevirostris*     |
| Large Coastal  | Nurse                | *Ginglymostoma cirratum*     |
| Large Coastal  | Sandbar              | *Carcharhinus plumbeus*      |
| Large Coastal  | Scalloped Hammerhead | *Sphyrna lewini*             |
| Large Coastal  | Silky                | *Carcharhinus falciformis*   |
| Large Coastal  | Smooth Hammerhead    | *Sphyrna zygaena*            | 
| Large Coastal  | Spinner              | *Carcharhinus brevipinna*    |
| Large Coastal  | Tiger                | *Galeocerdo cuvier*          |
| Prohibited     | Atlantic Angel       | *Squatina dumeril*           |
| Prohibited     | Basking              | *Cetorhinus maximus*         |
| Prohibited     | Bigeye Thresher      | *Alopias superciliosus*      |
| Prohibited     | Bignose              | *Carcharhinus altimus*       |
| Prohibited     | Night                | *Carcharhinus signatus*      |
| Prohibited     | Sand Tiger           | *Carcharias taurus*          |
| Prohibited     | Sevengill            | *Notorynchus cepedianus*     |
| Prohibited     | Sixgill              | *Hexanchus griseus*          |
| Prohibited     | White                | *Carcharodon carcharias*     |
| Pelagic        | Blue                 | *Prionace glauca*            |
| Pelagic        | Dusky                | *Carcharhinus obscurus*      |
| Pelagic        | Oceanic Whitetip     | *Carcharhinus longimanus*    |
| Pelagic        | Porbeagle            | *Lamna nasus*                |
| Pelagic        | Shortfin Mako        | *Isurus oxyrinchus*          |
| Pelagic        | Thresher             | *Alopias vulpinus*           |

### Data Processing 
Code used to process this data can be found on github - [NOAA-EDAB/ecodata](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_hms_cpue.R).

<!--chapter:end:chapters/hms_cpue.Rmd-->

# Highly Migratory Species Stock Status {#hms_stock_status}

**Description**:  Summary of the most recent stock assessment results for each assessed Atlantic HMS species. 

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2022+), State of the Ecosystem - Mid-Atlantic (2022+)

**Indicator category**: Synthesis of published information

**Contributor(s)**: Jennifer Cudney, Ben Duffin, Dan Crear, Tobey Curtis
  
**Data steward**: Jennifer Cudney, <Jennifer.Cudney@noaa.gov>
  
**Point of contact**: Jennifer Cudney, <Jennifer.Cudney@noaa.gov>
  
**Public availability statement**: Source data are publicly available.

## Methods


### Data sources
Data shared were collected from Atlantic HMS SAFE Reports (see 2021 report, https://www.fisheries.noaa.gov/atlantic-highly-migratory-species/atlantic-highly-migratory-species-stock-assessment-and-fisheries-evaluation-reports), Fishery Stock Status Determinations webpage (https://www.fisheries.noaa.gov/national/population-assessments/fishery-stock-status-updates), SEDAR assessments (www.sedarweb.org), ICCAT assessments (https://www.iccat.int/en/assess.html).


### Data analysis
Stock status information is compiled annually from stock assessments completed by the International Commission for the Conservation of Atlantic Tunas (ICCAT) (tunas, sharks, swordfish) and the Southeast Data Assessment and Review (SEDAR) (Atlantic HMS sharks). Species with a range of uncertainty estimates for F/Fmsy and B/Bmsy and assessments completed very recently may not be included in Stock Smart queries. We selected the most precautionary metrics for Fyr/Fmsy (high-end) and Byr/Bmsy (low-end). 

Stock status information was plotted on a Kobe chart using modified code from the 2021 SOE Technical Documentation. Although Gulf of Mexico stock information is provided, we only plotted Atlantic stocks to maintain relevance. Atlantic blacknose shark was considered an outlier due to an Fyr/Fmsy = 22.53. The y-axis is not scaled to include this species in the Kobe plot, so it was added in the top left segment of the box with the Fyr/Fmsy. The grey box lists species with unknown F/Fmsy and/or B/Bmsy.


The table below shows naming conventions used in the plot. 
```{r}
stock_status<-ecodata::hms_stock_status %>% 
  tidyr::separate(Var, c("Species_Abbreviation", "Common_Name"), ":") %>% 
  dplyr::select(Species_Abbreviation, Common_Name) %>% 
  dplyr::distinct()

kable(stock_status)
```

### Data processing
Code for processing Atlantic HMS Stock status data can be found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_hms_stock_status.R).

<!--chapter:end:chapters/hms_stock_status.Rmd-->

# Hudson River Flow

**Description**: Mean annual flow of the Hudson River in cubic meters per second at the USGS gauge 01358000 at Green Island, New York.

**Found In**: 2022 Indicator Catalog

**Indicator category**: 

**Contributor(s)**: Laura Gruenburg, Janet Nye, Kurt Heim
  
**Data steward**: Laura Gruenburg <laura.gruenburg@stonybrook.edu>
  
**Point of contact**: Laura Gruenburg <laura.gruenburg@stonybrook.edu>

**Public availability statement**: Source data are publicly available

## Methods
### Data Sources

River gauge data from USGS gauge 01358000 was obtained from [USGS water data](https://waterdata.usgs.gov/nwis/uv?site_no=01358000).

### Data Analysis

Mean annual flow rate was calculated by averaging all flow rate data for each year. Cubic feet per second were converted to cubic meters per second.  Attached code shows this process in detail.

A linear trend and a nonlinear GAM were calculated for the resulting annual mean flow rate time series.  [Attached code](https://github.com/NOAA-EDAB/tech-doc/blob/master/R/stored_scripts/Gruenburg_Hudson_River_Flow.R) shows this process in detail.

### Data Processing

Code for processing salinity data can be found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_hudson_river_flow.R).

<!--chapter:end:chapters/hudson_river_flow.Rmd-->

# Inshore bottom trawl surveys {#mab_inshore_survey}

**Description**: Inshore surveys include the Northeast Area Monitoring and Assessment Program (NEAMAP) survey, Massachusetts Division of Marine Fisheries Bottom Trawl Survey, and Maine/New Hampshire Inshore Trawl Survey. 

**Indicator category**: Database pull with analysis

**Found in**: State of the Ecosystem - Mid-Atlantic (2019+), State of the Ecosystem - New England (2019+)

**Contributor(s)**: James Gartland, Matt Camisa, Rebecca Peters, Sean Lucey
  
**Data steward**: Kimberly Bastille, <kimberly.bastille@noaa.gov>
  
**Points of contact**: James Gartland (NEAMAP), <jgartlan@vims.edu>; Rebecca Peters (ME/NH survey), <rebecca.j.peters@maine.gov>; Sean Lucey (MA Inshore Survey), <sean.lucey@noaa.gov>
  
**Public availability statement**: Data are available upon request. 

## Methods

### Data Sources

All inshore bottom trawl survey data sets were derived from raw survey data. NEAMAP source data are available for download [here](https://www.vims.edu/research/departments/fisheries/programs/multispecies_fisheries_research/abundance_indices/index.php). More detailed information describing NEAMAP survey methods is available on the [NEAMAP website](http://neamap.net). ME/NH inshore survey data are available upon request (see Points of Contact). Technical documentation for ME/NH survey methods and survey updates are made available through the [Maine Department of Marine Resources](https://www.maine.gov/dmr/science-research/projects/trawlsurvey/index.html). Data from the MA Inshore Bottom Trawl Survey are stored on local servers at the Northeast Fisheries Science Center (Woods Hole, MA), and are also available upon request. More information about the MA Inshore Bottom Trawl Survey is available [here](https://www.mass.gov/service-details/review-trawl-survey-updates).

### Data extraction

Source data from the Massachusetts DMF Bottom Trawl Survey were extracted using this [R script](https://github.com/slucey/RSurvey/blob/master/Mass_survey.R).

### Data Processing

The following R code was used to process inshore bottom trawl data into the `ecodata` R package.  

**New England**

https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_inshore_survdat.R

**Massachusetts**

https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_mass_inshore_survey.R

**Mid-Atlantic (NEAMAP)**

https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_mab_inshore_survey.R

### Data Analysis

Biomass indices were provided as stratified mean biomass (kg `r ifelse(knitr::is_latex_output(), 'tow\\textsuperscript{-1}', 'tow<sup>-1</sup>')`) for all inshore surveys. Time series of stratified mean biomass were calculated for ME/NH and NEAMAP surveys through the following procedure:

1. All species catch weights were summed for each tow and for each feeding guild category. 
2. The average weight per tow, associated variances and standard deviation for each survey, region, stratum, and feeding guild was calculated.
3. Stratified mean biomass was then calculated as the sum of the weighted averages of the strata, where the weight of a given stratum was the proportion of the survey area accounted for by that stratum. 

Stratified mean biomass was also calculated for the MA Inshore Bottom Trawl Survey. These calculations followed those used to find stratified mean biomass by feeding guild in the NEFSC Bottom Trawl Survey and are described in greater detail [here](#survdat). The R code used to derive the stratified mean biomass indices for MA Inshore time series is provided below. 

R code used for analysis can be found [here](https://github.com/NOAA-EDAB/tech-doc/blob/master/R/stored_scripts/inshore_survey_analysis.R).

<!--chapter:end:chapters/inshore_bottom_trawl_surveys.Rmd-->

# Long-term Sea Surface Temperature {#long_term_sst}


**Description**: Long-term sea-surface temperatures

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2017+), State of the Ecosystem - Mid-Atlantic (2017+)

**Indicator category**: Database pull

**Contributor(s)**: Kevin Friedland
  
**Data steward**: Kevin Friedland, <kevin.friedland@noaa.gov>
  
**Point of contact**: Kevin Friedland, <kevin.friedland@noaa.gov>
  
**Public availability statement**: Source data are available [here](https://www.esrl.noaa.gov/psd/data/gridded/data.noaa.ersst.v5.html).


## Methods
Data for long-term sea-surface temperatures were derived from the Noational Oceanographic and Atmospheric Administration (NOAA) extended reconstructed sea surface temperature data set (ERSST V5). The ERSST V5 dataset is parsed into 2&deg; x 2&deg; gridded bins between 1854-present with monthly temporal resolution. Data were interpolated in regions with limited spatial coverage, and heavily damped during the period between 1854-1880 when collection was inconsistent  [@Huang2017; @huang2017extended]. For this analysis, 19 bins were selected that encompassed the Northeast US Continental Shelf region [see @Friedland2007]. 


### Data sources
This indicator is derived from the [NOAA ERSST V5 dataset](https://www.esrl.noaa.gov/psd/data/gridded/data.noaa.ersst.v5.html) [@Huang2017].


### Data extraction 

```{r coordinates, echo = F, eval = T, results='asis'}
df <- data.frame(  
  Longitude = c(-74,-74,-72,-70,-70,-70,-68,-68),
  Latitude = c(40,38,40,44,42,40,44,42)
)

knitr::kable(df,
             caption="Coordinates used in NOAA ERSST V5 data extraction.",  booktabs=T) #%>%
  #kableExtra::kable_styling(full_width = F) 
```

R code used in extracting time series of long-term SST data can be found [here](https://github.com/NOAA-EDAB/tech-doc/tree/master/R/stored_scripts/long-term-sst-extraction.R).

### Data Processing

Data were formatted for inclusion in the `ecodata` R package with the R code found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_long_term_sst.R).

<!--chapter:end:chapters/long_term_sst_indicator.Rmd-->

# MAFMC ABC/ACL and Catch {#abc_acl}

**Description**: The catch limit (either ABC or ACL) and total catch from 2012 - 2020 for all MAFMC species and sector (commercial or recreational), if appropriate.

**Found in**:  State of the Ecosystem - Mid-Atlantic (2022) 

**Indicator category**: Synthesis of published information, Database pull

**Contributor(s)**: Jessica Coakley, Kiley Dancy, Jose Montanez, Julia Beaty, Karson Coutre, Jason Didden
  
**Data steward**: Brandon Muffley <bmuffley@mafmc.org>
  
**Point of contact**: Brandon Muffley <bmuffley@mafmc.org>
  
**Public availability statement**: Source data are publicly available

## Methods

### Data Sources

These data were compiled from MAFMC [Fishery Information Documents](https://www.mafmc.org/fishery-performance-reports), Stock Assessment reports, [SSC reports](https://www.mafmc.org/ssc), GARFO catch/landings database, and MRIP queries. 

### Data Analysis

Each stock has a threshold and catch value assigned to it from the sources above. The table below shows where the information comes from for each stock. 

|  Stock            | Fishery      |  Catch Threshold |  
|-------------------|--------------|------------------|
| Ocean Quahog      | Commercial   | ABC |
| Surfclam          | Commercial   | ABC |
| Summer Flounder   | Recreational | ABC |
| Summer Flounder   | Commercial   | ABC |
| Scup              | Recreational | ABC |
| Scup              | Commercial   | ABC |
| Atlantic Mackerel | Recreational | ABC |
| Atlantic Mackerel | Commercial   | ABC |
| Black Sea Bass    | Recreational | ABC |
| Black Sea Bass    | Commercial   | ABC |
| Butterfish        | Commercial   | ABC |
| Longfin Squid     | Commercial   | ABC |
| Illex Squid       | Commercial   | ABC |
| Golden Tilefish   | Commercial   | TAL |
| Blueline Tilefish | Recreational | ABC |
| Blueline Tilefish | Commercial   | ABC |
| Bluefish          | Both         | ABC |
| Spiny Dogfish     | Both         | ABC | 
| Chub Mackerel     | Both         | ACL |


Allowable Biological Catch (ABC) for each managed stock is set by the MAFMC Science and Statistical Committee(SSC), Annual Catch Limit (ACL) (if appropriate) is developed by the Council; recreational data come from [MRIP](https://www.fisheries.noaa.gov/topic/recreational-fishing-data) (Marine Recreational Information Program), commercial catch from either the NEFSC assessment lead or GARFO database. 

Each species, depending upon data availability, sectors, fleets etc., goes through a different data processing process. 

### Data Processing 

Data were formatted for inclusion in the `ecodata` R package using the R code found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_abc.acl.R).

<!--chapter:end:chapters/Quota_Catch_MA.Rmd-->

# Marine Heatwave {#heatwave}

**Description**: Marine Heatwave

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2020+), Mid-Atlantic (2020+)

**Indicator category**: Published methods, Database pull with analysis

**Contributor(s)**: Vincent Saba
  
**Data steward**: Kimberly Bastille <kimberly.bastille@noaa.gov>
  
**Point of contact**: Vincent Saba <vincent.saba@noaa.gov>
  
**Public availability statement**: Source data are publicly available. Please email vincent.saba@noaa.gov for further information and queries of the marine heatwave indicator data.

## Methods

Marine heatwaves (MHWs) measure not just anomalously high temperature, but how long the ecosystem is subjected to the high temperature. They are driven by both atmospheric and oceanographic factors and can have dramatic impacts on marine ecosystems. Marine heatwaves are measured in terms of intensity (water temperature) and duration (the cumulative number of degree days) using measurements of sea surface temperature (surface MHWs) or a combination of observations and models of bottom temperature (bottom MHWs). 

**2023-**

Recent research by @jacox_thermal_2020 and @jacox_global_2022 have modified the MHW methodology originally developed by @hobday2016.  

The new MHW indices use the entire temperature time-series for the baseline climatology (e.g. 1982-2022 in the 2023 report) and the global warming trend is removed (i.e. we detrended the data to create a shifting baseline instead of a fixed baseline) .  This new MHW method allows us to discern true extreme events from long-term ocean warming (climate change).  Surface MHW events are based on the criteria of a warming event that lasts for five or more days with temperatures above the 90th percentile of the historical daily climatology (1982-2022).  Bottom MHW events are defined as a warming event that lasts for thirty or more days with bottom temperatures above the 90th percentile of the historical daily climatology (1982-2022).  The longer time period criterion for bottom temperature is due to the longer persistence time of ocean bottom temperature anomalies in the U.S. northeast shelf (@chen_seasonal_2021).  

The new MHW indices can now discern extreme events that truly are “extreme” rather than occupying most of the year as was the case in the Gulf of Maine in 2021 using previous methods.  Because this approach moves from a fixed baseline to a shifting baseline by detrending ocean temperature data and using the entire time-series as a climatology, the global warming signal is removed and thus we are left with extremes in the variability of ocean temperature.  A combination of long-term ocean warming and MHWs should be used to assess total heat stress on marine organisms.

**2020-2022**

Marine heatwave (surface only) analysis for Georges Bank, Gulf of Maine, and the Middle Atlantic Bight according to the definition in @hobday2016.  

### Data sources

[NOAA high-res OISST (daily, 25-km, 1982-2019)](https://www.esrl.noaa.gov/psd)

Marine heatwave analysis for Georges Bank, Gulf of Maine, and the Middle Atlantic Bight according to the definition in @hobday2016.  Heatwaves are defined as temperatures that exceed the 90th percentile for at least 5 consecutive days for surface heatwaves and at least 30 consecutive days for bottom heatwaves. 


### Data extraction 

Each yearly file (global) was downloaded, concatenated into a single netcdf file using nco (Unix), and then cropped to the USNES region using Ferret.  Each EPU's time-series of SST was averaged using .shp file boundaries for the MAB, GB, and GOM (also done in Ferret) and saved to the three .csv files.

### Data analysis

**2023-** Maximum Intensity and Duration - Number of Days in a heatwave state (N days) are calculated using NOAA OISST daily sea surface temperature data (25-km resolution) from January 1982 to December of the most recent year. The heatwaves are calculated based on the algorithms in Hobday et al. 2016 and by using a climatology of 1982-most recent year.  These metrics were run R using https://robwschlegel.github.io/heatwaveR/.

**2020-2022**
The marine heatwave metrics Maximum Intensity [deg. C] and Cumulative Intensity [deg. C x days] are calculated using NOAA OISST daily sea surface temperature data (25-km resolution) from January 1982 to December 2019.  The heatwaves are calculated based on the algorithms in Hobday et al. 2016 and by using a climatology of 1982-2011.  These metrics were run R using https://robwschlegel.github.io/heatwaveR/.


### Data processing 

Marine Heatwave data were formatted for inclusion in the `ecodata` R package using this [R code](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_heatwave.R).

<!--chapter:end:chapters/marine_heatwave.Rmd-->

# Ocean Acidification {#ocean_acidification}

**Description**: Maps of regional carbonate chemistry

**Indicator category**: Synthesis of published information or openly accessible datasets

**Found in:** State of the Ecosystem - Gulf of Maine & Georges Bank (2021+); State of the Ecosystem - Mid-Atlantic Bight (2021+)

**Contributor(s)**: Grace Saba, Lori Garzio, Chris Melrose, Janet Nye, Teresa Schwemmer, Baoshan Chen

**Data steward**: Grace Saba <saba@marine.rutgers.edu>

**Point of contact**: Grace Saba <saba@marine.rutgers.edu>

**Public availability statement**: Source data is available to the public (see Data Sources).


## Methods

The New England Fishery Management Council (NEFMC) and Mid-Atlantic Fishery Management Council (MAFMC) have recently requested regional Ocean Acidification (OA) information in the State of the Ecosystem reports. The work included in the State of the Ecosystem 2021 report, [seasonal dynamics of pH in shelf waters in the Mid-Atlantic](https://noaa-edab.github.io/tech-doc/ocean-acidification.html#ref-Wright2020), was synthesized from @Wright2020. The maps included in the State of the Ecosystem 2022 reports include a plot of bottom pH in summer over the entire U.S. Northeast Shelf (2007-present), and glider-based pH profiles during summer 2021 in both the Mid-Atlantic Bight (MAB) and the Gulf of Maine. The plots in the 2023 State of the Ecosystem reports, maps of bottom summer aragonite saturation ($\Omega_{Arag}$) and locations where summer bottom $\Omega_{Arag}$ reached lab-derived sensitivity levels of designated target species, were developed using openly accessible, quality-controlled data from vessel-based discrete samples and glider-based measurements (see Data Sources).

### Data sources

Glider-based observations of pH pH (and other variables including temperature, salinity, chlorophyll-*a*, and dissolved oxygen) began in the southern Mid-Atlantic Bight region in May 2018 (@Saba2019), and seasonal glider pH missions thereafter began in February 2019 (@Wright2020; although no deployments occurred in 2020 as a result of the COVID pandemic). Simultaneous measurements from the glider's pH, temperature, and salinity sensors enable the derivation of total alkalinity and calculation of other carbonate system parameters including $\Omega_{Arag}$. The glider pH observation program expanded spatially, with additional deployments in the northern MAB (New York Bight) and the Gulf of Maine, starting in February 2021. A typical glider mission runs for about 4 weeks, covers 500 km, and collects data though the full water column. Full-resolution delayed-mode glider datasets containing raw pH voltages can be found on [RUCOOL's Glider ERDDAP Server](http://slocum-data.marine.rutgers.edu/erddap/index.html). Fully-processed and time-shifted pH glider datasets can be found [here](https://marine.rutgers.edu/~lgarzio/cinar_soe/glider_data/).


Vessel-based discrete carbinate chemistry data were mined from the Coastal Ocean Data Analysis Product in North America, version v2021 ([CODAP-NA](https://essd.copernicus.org/articles/13/2777/2021/); @Jiang2021). This data product synthesizes two decades of quality-controlled inorganic carbon system parameters (including pH, total alkalinity, dissolved inorganic carbon) along with other physical and chemical parameters (temperature, salinity, dissolved oxygen, nutrients) collected from the North American continental shelves.

Additionally, two recent vessel-based datasets that were not included in CODAP-NA (@Jiang2021) were included in this synthesis. These datasets were collected during more recent NOAA NEFSC Ecosystem Monitoring (EcoMon) surveys (June 2019, Cruise ID HB1902; August 2019, Cruise ID GU1902; August 2021, Cruise ID PC2104)) and include quality-controlled spectrophotometric-based pH measurements on discrete water samples. Data can be accessed through the [NCEI Ocean Carbon and Acidification Data Portal](https://www.ncei.noaa.gov/access/ocean-carbon-acidification-data-system-portal/).

### Data extraction

Glider data were processed and quality-controlled by software technician Lori Garzio at Rutgers University.

[CODAP-NA data](https://www.ncei.noaa.gov/data/oceans/ncei/ocads/metadata/0219960.html) were accessed and downloaded on October 14, 2021. 

EcoMon datasets were accessed and downloaded on October 13, 2022.

### Data processing

For processing and quality-control procedures of glider-based data, see @Wright2020. Glider data used in this synthesis were limited to summer only (June - August).

Data from CODAP-NA were filtered temporally to include only those collected during summer months (June-August) and were spatially limited to the U.S. Northeast Shelf. The resulting datasets included those from major vessel-based campaigns (East Coast Ocean Acidification, ECOA I and II cruises 2015 and 2018; The Gulf of Maine and East Coast Carbon cruises, GOMECC 2007 and 2012; EcoMon 2012-2013, 2015-2019).

For vessel-based datasets, when $\Omega_{Arag}$ was unavailable it was calculated using PyCO2SYS (@Humphreys2022) with inputs of pressure, temperature, salinity, total alkalinity, and pH.


For MAB glider datasets, total alkalinity was calculated from salinity using a linear relationship determined from *in situ* water sampling data taken during glider deployment and recovery in addition to ship-based water samples (@Wright2020). For the Gulf of Maine glider dataset, total alkalinity was calculated from temperature and salinity using Table 3 Equation IV in @McGarry2021. Calculations for $\Omega_{Arag}$ were then conducted using PyCO2SYS (@Humphreys2022) with inputs of pressure, temperature, salinity, totalalkalinity, and pH.


### Plotting


1. A plot of bottom $\Omega_{Arag}$ in summer over the entire U.S. Northeast Shelf (all available data from 2007-present and includes both glider-based measurements and vessel-based discrete samples)
  + This map is included in both MAFMC and NEFMC reports.
  + Bottom values were defined as the median of the measurements (or calculated
$\Omega_{Arag}$ values) within the deepest 1m of a glider profile or, for vessel-based measurements, the deepest measurement of a vertical CTD/Rosette cast where water samples were collected, for profiles deeper than 10m. In order to validate whether the deepest depth was at or near the bottom, the sampling depth was compared to water column depth (when provided) or water depths extracted from a [GEBCO](https://www.gebco.net/data_and_products/gridded_bathymetry_data/) bathymetry grid based on the sample collection coordinates. Any glider profiles/vessel-based casts with the deepest measurement shallower than the bottom 20% of total water column depth were removed. This allowed for a sliding scale instead of providing a strict cut off (e.g., 1 m above the bottom).

2. Maps depicting locations where summer bottom $\Omega_{Arag}$ reached lab-derived sensitivity levels of designated target species
  + Sensitivity levels of $\Omega_{Arag}$ were defined for each species as values of $\Omega_{Arag}$ where negative responses by an organism were observed during an experimental laboratory study. Typically, these laboratory experiments measure organism responses under ocean acidification conditions (lower pH, lower $\Omega_{Arag}$) against a control under ambient conditions (higher pH, higher $\Omega_{Arag}$). Most laboratory experiments have used a range of $\Omega_{Arag}$ between 0.5 to 2.0, which does not encompass the full range of $\Omega_{Arag}$ observed *in situ*. The metrics measured (e.g., survival, growth, calcification) can be different between experiments, but negative responses could include decreased survival, reduced growth, reduced calcification rate, reduced hatching success, and malformation. Because laboratory perturbation experiments testing the responses of organisms to ocean acidification conditions are a relatively new approach and logistically quite challenging, there are currently few published studies for individual species. Recent studies have also started incorporating additional stressors, which makes defining an OA-focused sensitivity level difficult. Therefore, with additional future studies, the $\Omega_{Arag}$ sensitivity levels defined here for these species are subject to change.
  + For the MAFMC report, designated target species included Atlantic sea scallop (*Placopecten magellanicus*) and Longfin squid (*Doryteuthis pealeii*). The sensitivity value used for Atlantic sea scallop was $\Omega_{Arag}$ ≤ 1.1 at 9 C, based on reduced adult calcification rate observed at this level in @Cameron2022. The sensitivity value used for longfin squid was $\Omega_{Arag}$ ≤ 0.96, based on embryo and paralarvae malformation, increased time to hatching and decreased hatching
success, and changes to mantle length and statolith morphology observed at this level in @Zakroff2019 and @Zakroff2020 Habitat depth ranges used for plotting the observed $\Omega_{Arag}$ values ≤ sensitivity $\Omega_{Arag}$ values for Atlantic sea scallop and longfin squid were limited to 25-200 meters (NEFSC 2014) and 0-400 meters (@Jacobson2005), respectively.
  + For the NEFMC report, designated target species included Atlantic cod (*Gadus morhua*) and American lobster (*Homarus americanus*). The sensitivity value used for Atlantic cod was $\Omega_{Arag}$ ≤ 1.31 at 10 C, based on decreased larval survival observed at this level in @Stiasny2016. The sensitivity value used for American lobster was $\Omega_{Arag}$ ≤ 1.09, based on decreased stage V and VI juvenile survival observed at this level in @Noisette2021. Habitat depth ranges used for plotting the observed $\Omega_{Arag}$ values ≤ sensitivity$\Omega_{Arag}$ values for Atlantic cod and American lobster were limited to 10-200 meters (@Gregory2004; @DeCelles2017) and 10-700 meters (@Mercaldo-Allen1994), respectively.


Code for data manipulation and plotting can be found here: https://github.com/lgarzio/cinar-soe.


```{r mab-oa, fig.cap = "Left panel: Bottom aragonite saturation state ($\\Omega_{Arag}$; summer only: June-August) on the U.S. Northeast Shelf based on quality-controlled vessel- and glider-based datasets from 2007-present. Right panel: Locations where summer bottom $\\Omega_{Arag}$ were at or below the laboratory-derived sensitivity level for Atlantic sea scallop (top panel) and longfin squid (bottom). Gray circles indicate locations where carbonate chemistry samples were collected, but bottom $\\Omega_{Arag}$ values were higher than sensitivity values determined for that species.", out.width="90%"}

knitr::include_graphics(here::here(file.path("images", "Saba_Fig_SOE_MAFMC - Grace Saba.jpg")))

```


```{r ne-oa, fig.cap = "Left panel: Bottom aragonite saturation state ($\\Omega_{Arag}$; summer only: June-August) on the U.S. Northeast Shelf based on quality-controlled vessel- and glider-based datasets from 2007-present. Right panel: Locations where summer bottom $\\Omega_{Arag}$ were at or below the laboratory-derived sensitivity level for Atlantic cod (top panel) and American lobster (bottom). The Atlantic cod sensitivity value of $\\Omega_{Arag}$ ≤ 1.31 is based on decreased larval survival observed at this level in Stiasny et al. (2016). The American lobster sensitivity value of $\\Omega_{Arag}$ ≤ 1.09 is based on decreased stage V and VI juvenile survival observed at this level in Noisette et al. (2021). Gray circles indicate locations where carbonate chemistry samples were collected, but bottom $\\Omega_{Arag}$ values were higher than sensitivity values determined for that species. ", out.width="90%"}

knitr::include_graphics(here::here(file.path("images", "Saba_Fig_SOE_NEFMC - Grace Saba.jpg")))

```


<!--chapter:end:chapters/ocean_acidification.Rmd-->

# Phytoplankton {#chl_pp}

**Description**: Phytoplankton products - Chlorophyll *a*, Primary Production, and Phytoplankton Size Class

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2018+), State of the Ecosystem - Mid-Atlantic (2018+)

**Indicator category**: Database pull; Database pull with analysis; Published methods

**Contributor(s)**: Kimberly Hyde
  
**Data steward**: Kimberly Hyde, kimberly.hyde@noaa.gov
  
**Point of contact**: Kimberly Hyde, kimberly.hyde@noaa.gov
  
**Public availability statement**: Source data used in these analyses are publicly available. 


## Current Methods
### Data sources

Daily Level 3 mapped (4km resolution, sinusoidally projected) satellite ocean color data are acquired from the European Space Agency’s [Ocean Colour Climate Change Initiative](https://www.google.com/url?q=https://www.oceancolour.org/&sa=D&source=docs&ust=1677275485958766&usg=AOvVaw35N7q94OZQAdfGBPIBGUwB) (OC-CCI; version 6.0) and [GlobColour Project](https://www.globcolour.info/).  The OC-CCI data is the primary ocean color data source, however the data latency is approximately 6-12 months.  GlobColour ocean color data are used to supplement the OC-CCI data to complete the time series for the current year. Sea Surface Temperature (SST) data include the 4 km nighttime NOAA Advanced Very High Resolution Radiometer (AVHRR) Pathfinder (@Casey2010; @Saha2018) and the Group for High Resolution Sea Surface Temperature (GHRSST) Multiscale Ultrahigh Resolution (MUR, version 4.1) Level 4 (@Chin2017; @Project2015) data. AVHRR Pathfinder data are used as the SST source until 2002 and MUR SST in subsequent years.


### Data extraction

NA

### Data analysis


The L3 OC-CCI products merge data from multiple ocean color sensors (SeaWiFS, MODIS Aqua, MERIS, VIIRS, Sentinel 3A and 3B OLCI) and include chlorophyll *a* (CHL-CCI), remote sensing reflectance $(R_{rs}(\lambda))$, and several inherent optical property (IOPs) products. The CHL-CCI blended algorithm attempts to weight the outputs of the best-performing chlorophyll algorithms based on the water types present, which improves performance in nearshore water compared to open-ocean algorithms. The L3 GlobColour products use data from the same ocean color sensors as the OC-CCI, but the chlorophyll a product is derived from the Garver, Siegel, and Maritorena (GSM) algorithm, which is a semi-analytical bio-optical model (@OReilly1998). GlobClolour also provides a photosynthetic available radiation (PAR) product, which is the mean daily photon flux density in the visible range (400 to 700 nm) that are used in the primary production calculations. The global OC-CCI, GlobColour, and the SST data are mapped to the same sinusoidal map projection and subset to the east coast region (SW longitude=82.5$^\circ$W, SW latitude=22.5$^\circ$N, NE longitude=51.5$^\circ$W, NE latitude=48.5$^\circ$N).


#### Data Interpolation

For use in the primary production model, the daily CHL and AVHRR SST data are temporally interpolated and smoothed (CHLINT and SSTINT respectively).  The interpolation increases the data coverage and is necessary to better match data collected from different sensors and different times. The daily PAR data are not affected by cloud cover and MUR SST data is a blended/gap free product so these parameters were not interpolated. Daily data at each pixel location are linearly interpolated based on days in the time series using [interpx.pro](https://github.com/callumenator/idl/blob/master/external/JHUAPL/INTERPX.PRO). Prior to interpolation, the CHL data are log-transformed to account for the log-normal distribution of chlorophyll data (@Campbell1995). The time series are processed in one-year chunks, with each yearly series including 60 days from the previous year and 60 days from the following year to improve the interpolation at the beginning and end of the year. Following interpolation, the data are smoothed with a tri-cube filter (width=7) using IDL’s [CONVOL](https://www.harrisgeospatial.com/docs/CONVOL.html) program. In order to avoid over interpolating data when there were several days of missing data in the time series, the interpolated data were removed and replaced with blank data if the window of interpolation spanned more than 7 days for CHL or 10 days for SST. 

#### Primary Productivity

The Vertically Generalized Production Model (VGPM) estimates net primary production (PP) as a function of chlorophyll a, photosynthetically available radiation (PAR), and photosynthetic efficiency (@Behrenfeld1997). In the VGPM-Eppley version, the original temperature-dependent function to estimate the chlorophyll-specific photosynthetic efficiency is replaced with the exponential “Eppley” function (Equation 14.1) as modified by @Morel1991. The VGPM calculates the daily amount of carbon fixed based on the maximum rate of chlorophyll-specific carbon fixation in the water column, sea surface daily photosynthetically available radiation, the euphotic depth (the depth where light is 1% of that at the surface), chlorophyll a concentration, and the number of daylight hours (Equation \@ref(eq:two)).

\begin{equation}
P_{max}^{b}(SST) = 4.6 * 1.065^{SST-20^{0}} 
(\#eq:two) 
\end{equation}
Where $P_{max}^{b}$ is the maximum carbon fixation rate and *SST* is sea surface temperature.

\begin{equation}
PP_{eu} = 0.66125 * P_{max}^{b} * \frac{I_{0}}{I_{0}+4.1} * Z_{eu} * \textrm{CHL} * \text{DL}
(\#eq:three) 
\end{equation}

Where $PP_{eu}$ is the daily amount of carbon fixed integrated from the surface to the euphotic depth (mgC m^-2^ day^-1^), $P_{max}^{b}$ is the maximum carbon fixation rate within the water column (mgC mgChl^-1^ hr^-1^), $I_{0}$ is the daily integrated molar photon flux of sea surface PAR (mol quanta m^-2^ day^-1^), Zeu is the euphotic depth (m), CHL is the daily interpolated CHLINT-CCI (mg m^-3^), and DL is the photoperiod (hours) calculated for the day of the year and latitude according to @Kirk1994. The light dependent function $(I_{0}/(I_{0}+4.1))$ describes the relative change in the light saturation fraction of the euphotic zone as a function of surface PAR ($I_0$).  Zeu is derived from an estimate of the total chlorophyll concentration within the euphotic layer (*CHL~eu~*) based on the Case I models of @Morel1989:

* For $\textrm{CHL}_{eu} > 10.0\;\;\;\;\;Z_{eu} = 568.2 * \textrm{CHL}_{eu}^{-0.746}$
* For $\textrm{CHL}_{eu} \leq 10.0\;\;\;\;\;Z_{eu} = 200.0 * \textrm{CHL}_{eu}^{-0.293}$
* For $\textrm{CHL}_{0} \leq 1.0\;\;\;\;\;\textrm{CHL}_{eu} = 38.0 * \textrm{CHL}_{0}^{0.425}$
* For $\textrm{CHL}_{0} > 1.0\;\;\;\;\;\textrm{CHL}_{eu} = 40.2 * \textrm{CHL}_{0}^{0.507}$

Where $\textrm{CHL}_0$ is the surface chlorophyll concentration.

#### Phytoplankton Size Class

Phytoplankton size classes (PSC) are calculated according to @Turner2021. The regionally tuned abundance-based model is based on the three-component model of @Brewin2010 that varies as a function of SST (@Brewin2017, @Moore2020). The model uses a look-up table with parameters indexed by SST, developed using a local data set of HPLC diagnostic pigment-derived phytoplankton size fractions matched with coincident satellite SST.

#### Stastistics and Anomalies

Statistics, including the arithmetic mean, geometric mean, median, standard deviation, and coefficient of variation are calculated at daily (3 and 8-day running means), weekly, monthly, and annual time steps, and for several climatological periods. Annual statistics used the monthly means as inputs to avoid a summer time bias when more data are available due to reduced cloud cover. The daily, weekly, monthly and annual climatological statistics include the entire time series for each specified period. For example, the climatological January uses the monthly mean from each January in the time series and the climatological annual uses the annual mean from each year. The CHL and PP climatological statistics include data from both SeaWiFS (1997-2007) and MODIS (2008-2017). 

Weekly, monthly and annual anomalies are calculated for each product by taking the difference between the mean of the input time period (i.e. week, month, year) and the climatological mean for the same period. Because bio-optical data are typically log-normally distributed @Campbell1995, the CHL and PP data were first log-transformed prior to taking the difference and then untransformed, resulting in an anomaly ratio.

The ecological production unit (EPU) shapefile that excludes the estuaries was used to spatially extract all data located within an ecoregion from the statistic and anomaly files. The median values, which are equivalent to the geometric mean, were used for the CHL and PP data.

### Data processing

CHL and PPD time series were formatted for inclusion in the `ecodata` R package using the R code found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_chl_pp.R).

Code used to process the phytoplankton size class inidcator can be found in the `ecodata` package [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_phyto_size.R).

## 2018-2020 Methods
### Data sources
Level 1A ocean color remote sensing data from the Sea-viewing Wide Field-of-view Sensor (SeaWiFS) [@NASA1] on the OrbView-2 satellite and the Moderate Resolution Imaging Spectroradiometer (MODIS) [@NASA2] on the Aqua satellite were acquired from the NASA Ocean Biology Processing Group (OBPG).  Sea Surface Temperature (SST) data included the 4 km nighttime NOAA Advanced Very High Resolution Radiometer (AVHRR) Pathfinder [@Casey2010; @Saha2018] and the Group for High Resolution Sea Surface Temperature (GHRSST) Multiscale Ultrahigh Resolution (MUR, version 4.1) Level 4 [@Chin2017; @Project2015] data.  Prior to June 2002, AVHRR Pathfinder data are used as the SST source and MUR SST in subsequent years.

### Data analysis
The SeaWiFS and MODIS L1A files were processed using the NASA Ocean Biology Processing Group [SeaDAS](https://seadas.gsfc.nasa.gov/) software version 7.4.  All MODIS files were spatially subset to the U.S. East Coast (SW longitude=-82.5, SW latitude=22.5, NE longitude=-51.5, NE latitude=48.5) using [L1AEXTRACT_MODIS](https://seadas.gsfc.nasa.gov/help/seadas-processing/ProcessL1aextract_modis.html). SeaWiFS files were subset using the same coordinates prior to begin downloaded from the [Ocean Color Web Browser](https://oceancolor.gsfc.nasa.gov/cgi/browse.pl?sen=am).  SeaDAS's [L2GEN](https://seadas.gsfc.nasa.gov/help/seadas-processing/ProcessL2gen.html) program was used to generate Level 2 (L2) files using the default settings and optimal ancillary files, and the [L2BIN](https://seadas.gsfc.nasa.gov/help/seadas-processing/ProcessL2bin.html) program spatially and temporally aggregated the L2 files to create daily Level 3 binned (L3B) files.  The daily files were binned at 2 km resolution that are stored in a global, nearly equal-area, [integerized sinusoidal grids](https://oceancolor.gsfc.nasa.gov/docs/format/l3bins/) and use the default [L2 ocean color flag masks](https://oceancolor.gsfc.nasa.gov/atbd/ocl2flags/).  The global SST data were also subset to the same East Coast region and remapped to the same sinusoidal grid.    

The L2 files contain several ocean color products including the default chlorophyll *a*; product (CHL-OCI), photosynthetic available radiation (PAR), remote sensing reflectance $(R_{rs}(\lambda))$, and several inherent optical property products (IOPs).  The CHL-OCI product combines two algorithms, the O'Reilly band ratio (OCx) algorithm [@OReilly1998] and the Hu color index (CI) algorithm [@SOE5].  The SeaDAS default CHL-OCI algorithm diverges slightly from @SOE5 in that the transition between CI and OCx occurs at 0.15 < CI < 0.2 mg m^-3^ to ensure a smooth [transition](https://oceancolor.gsfc.nasa.gov/atbd/chlor_a/). The regional chlorophyll *a* algorithm by @SOE12 was used to create a second chlorophyll product (CHL-PAN).  CHL-PAN is an empirical algorithm derived from *in situ* sampling within the Northeast Large Marine Ecosystem (NE-LME) and demonstrated significant improvements from the standard NASA operational algorithm in the NES-LME [@SOE13].  A 3rd-order polynomial function (Equation \@ref(eq:one)) is used to derive [CHL-PAN] from Rrs band ratios (RBR): 

\begin{equation}
log[\textrm{CHL-PAN}] = A_{0} + A_{1}X + A_{2}X^{2} + A_{3}X^{3},  
(\#eq:one) 
\end{equation}

where $X = log(R_{rs}(\lambda_{1})/R_{rs}(\lambda_{2}))$ and $A_{i} (i = 0, 1, 2, \textrm{or }  3)$ are sensor and RBR specific coefficients:

* If SeaWiFS and RBR is $R_{rs}(490)/R_{rs}(555)(R_{^3{\mskip -5mu/\mskip -3mu}_5})$ then: $A_0=0.02534, A_1=-3.033, A_2=2.096, A_3=-1.607$
* If SeaWiFS and RBR is $R_{rs}(490)/R_{rs}(670)(R_{^3{\mskip -5mu/\mskip -3mu}_6})$  then: $A_0=1.351, A_1=-2.427, A_2=0.9395, A_3=-0.2432$
* If MODIS and RBR is $R_{rs}(488)/R_{rs}(547)(R_{^3{\mskip -5mu/\mskip -3mu}_5})$  then: $A_0=0. 03664, A_1=-3.451, A_2=2.276, A_3=-1.096$
* If MODIS and RBR is $R_{rs}(488)/R_{rs}(667)(R_{^3{\mskip -5mu/\mskip -3mu}_6})$  then: $A_0=1.351, A_1=-2.427, A_2=0.9395, A_3=-0.2432$

C~3/5~ and C~3/6~ were calculated for each sensor specific RBR (R~3/5~ and R~3/6~ respectively) and then the following criteria were used to determine to derive CHL-PAN:
<ol type="a">
  <li>If $R_{^3{\mskip -5mu/\mskip -3mu}_5}>0.15$ or $R_{6} <0.0001$ then $\textrm{CHL-PAN} = C_{^3{\mskip -5mu/\mskip -3mu}_5};$</li>
  <li> Otherwise, $\textrm{CHL-PAN} = \textrm{max}(C_{^3{\mskip -5mu/\mskip -3mu}_5}, C_{^3{\mskip -5mu/\mskip -3mu}_6})$,</li>
</ol>
where $R_6$ is $R_{rs}(670)$ (SeaWiFS) or $R_{rs}(667)$ [@SOE13]. 


<!--chapter:end:chapters/phytoplankton.Rmd-->

# Plankton Diversity

**Description**: NOAA NEFSC Oceans and Climate branch public ichthyoplankton dataset

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2021), State of the Ecosystem - Mid-Atlantic (2021)

**Indicator category**: Database pull with analysis 

**Contributor(s)**: Harvey J. Walsh
  
**Data steward**: Harvey Walsh, harvey.walsh@noaa.gov
  
**Point of contact**: Harvey Walsh, harvey.walsh@noaa.gov
  
**Public availability statement**: Source data are available to the public [here](ftp://ftp.nefsc.noaa.gov/pub/hydro/zooplankton_data/). Derived data for this indicator are available [here](https://comet.nefsc.noaa.gov/erddap/tabledap/ichthyo_div_soe_v1.html).


## Methods
Data from the NOAA Northeast Fisheries Science Center (NEFSC) Oceans and Climate branch (OCB) public dataset were used to examine changes in diversity of abundance among 45 ichthyoplankton taxa.  The 45 taxa were established [@RN126], and include the most abundant taxa from the 1970s to present that represent consistency in the identification of larvae. 

### Data sources
Multi-species plankton surveys cover the entire Northeast US shelf from Cape Hatteras, North Carolina, to Cape Sable, Nova Scotia, four to six times per year.  A random-stratified design based on the NEFSC bottom trawl survey design [@Azarovitz1981] is used to collect samples from 47 strata. The number of strata is lower than the trawl survey as many of the narrow inshore and shelf-break strata are combined in the EcoMon design. 
The area encompassed by each stratum determined the number of samples in each stratum. Samples were collected both day and night using a 61 cm bongo net. Net tow speed was 1.5 knots and maximum sample depth was 200 m. Double oblique tows were a minimum of 5 mintues in duration, and fished from the surface to within 5 m of the seabed or to a maximum depth of 200 m. The volume filtered of all collections was measured with mechanical flowmeters mounted across the mouth of each net. 

Processing of most samples was conducted at the Morski Instytut Rybacki (MIR) in Szczecin, Poland; the remaining samples were processed at the NEFSC or the Atlantic Reference Center, St Andrews, Canada.  Larvae were identified to the lowest possible taxa and enumerated for each sample.  Taxon abundance for each station was standardized to number under 10 m^-2^ sea surface.

### Data extraction
Data retrieved from NOAA NEFSC Oceans and Climate branch [public dataset](ftp://ftp.nefsc.noaa.gov/pub/hydro/zooplankton_data/).
Filename: "EcoMon_Plankton_Data_v3_0.xlsx", File Date: 10/20/2016

### Data analysis
All detailed data processing steps are not currently included in this document, but general steps are outlined. Data were grouped into seasons: spring = February, March, April and fall = September, October, November. Stratified weighted mean abundance was calculated for each taxon for each year and season across all plankton strata (n = 47) for 17 years (1999 to 2015). Shannon Diversity Index and count of positive taxon was calculated for each season and year.

MATLAB code used to calculate diversity indices can be found using this [link](https://github.com/NOAA-EDAB/tech-doc/tree/master/R/stored_scripts/ich_div_analysis).


### Data processing

Forage Anomaly data sets were formatted for inclusion in the `ecodata` R package using the R code found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_forage_anomaly.R).

Ichthyoplankton diversity data sets were formatted for inclusion in the `ecodata` R package using the R code found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_ichthyoplankton.R).

<!--chapter:end:chapters/plankton_diversity.Rmd-->

# Protected Species Hotspots {#persistent_hotspots}

**Description**: Integrated persistent annual hotspots derived from at-sea observations of seabirds, cetaceans and sea turtles collected on systematic ship and aerial surveys

**Found in**: State of the Ecosystem - Mid-Atlantic (2022+), State of the Ecosystem - New England (2022+)

**Indicator category**: Extensive analysis, not yet published, Database pull with analysis

**Contributor(s)**: Timothy P. White <timothy.white@boem.gov>
  
**Data steward**: Timothy P. White <timothy.white@boem.gov>
  
**Point of contact**: Timothy P. White <timothy.white@boem.gov>
  
**Public availability statement**: Source data are publicly available. Please contact Timothy White for more details. 

## Methods

Individual hotspot richness maps represent annual persistent hotspots of 71 species and also common taxa challenging to identify to the species level on at-sea surveys but whose abundance and spatial patterns significantly contribute to richness and diversity on the Atlantic EEZ (seabirds, n = 49; marine mammals, n=18, turtles, n= 4). The integrated maps represent very high densities and very high persistence; however, one or both parameters can be adjusted to identify other important locations, for example, to reveal areas of high density and moderate persistence. Individual species-specific hotspots were defined using the 75th percentile of the annual density distribution on gridded segmented transects. This density threshold identified locations of enhanced abundance on daily gridded transects. Persistence probabilities for each grid cell were quantified by summing the number of times a given cell was classified as a hotspot to produce a spatial organization of hotspots coupled with persistence probabilities ranging from 0 to 1. These probabilities were thresholded also using the 75th percentile to locate highly persistent areas of single-species hotspots and summing across each grid cell to resolve multi-species hotspots. The minimum survey effort for each cell in the grid was five days.

### Data sources

The annual persistent hotspot maps presented here of seabirds, cetaceans, and sea turtles were derived from observations and survey effort archived in publically available databases such as the Bureau of Ocean Energy Management’s Northwest Atlantic Seabird Catalog; NOAA Northeast Fisheries Science Center’s (NEFSC) AMAPPS database; NEFSC’s Right Whale Aerial Survey database; and the MassCEC/NEAq database of cetacean and turtle surveys. Observer-based programs use two main survey methods to estimate densities at sea from ships and aircraft 1) the strip-width method (@White2020) and 2) distance sampling @Palka2017). 


### Data analysis

All detailed data processing steps are not currently included in this document, but general steps are outlined. Species-specific persistent hotspots were computed with observations and survey effort collected on ship and aerial surveys from 1978-2020. Species-specific hotspots were derived with daily timesteps on 10 x 10 km grids covering the Atlantic EEZ. Hotspot probabilities (i.e., persistence) were derived by summing the number of daily hotspots divided by the number of time steps (@Gende2006), which produced a continuum of probabilistic hotspots ranging from 0 to 1 across a final species-specific grid. Annual hotspot richness maps were derived by summing the species-specific grid cells with high persistence.


### Data processing

Persistent hotspots were computed with the `sf` and `raster` R packages.

<!--chapter:end:chapters/protected_species_hotspots.Rmd-->

# Quota and Catch - New England 

**Description**: The catch limit (either ABC or ACL) and total catch for all NEFMC species and sector (commercial or recreational), if appropriate.

**Found in**:  State of the Ecosystem - New England (2023) 

**Indicator category**: Synthesis of published information, Database pull

**Contributor(s)**: Kimberly Bastille
  
**Data steward**: Kimberly Bastille <kimberly.bastille@noaa.gov>
  
**Point of contact**: Kimberly Bastille <kimberly.bastille@noaa.gov>
  
**Public availability statement**: Source data are publicly available

## Methods

### Data Sources

Data found in NFMS [Species Information System (SIS)](https://apps-st.fisheries.noaa.gov/sis/#no-back-button). 

SIS Annual Catch Limit reports were used to collate data for each Fisheries Management Plan (FMP). The Allowable Biological Catch and Grand Total Catch (Commercial + Recreational) were recorded. 

### Data Analysis

Each stock has a threshold and catch value assigned to it from the sources above. The table below outlines the data pull for each FMP. 

|  FMP              | Quota Type  |  Fishing Year    |  
|-------------------|-------------|------------------|
| Atlantic Herring  |  ACL       | January 1 through December 31
| Atlantic Sea Scallop | ACL | Apr 1 through Mar 31
| Red Crab | ACL |  Apr 1 thorugh Mar 31
| Skates | ACL  | May 1 through April 30
| Groundfish | ACL | May 1 through April 30
| Monkfish  | ACL  | May 1 through April 30
| Golden Tilefish | ACL | November 1 through October 31


### Data Processing 

Data were formatted for inclusion in the `ecodata` R package using the R code found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_abc.acl.R).

<!--chapter:end:chapters/Quota_Catch_NE.Rmd-->

# Recreational Fishing Indicators {#recdat}

**Description**: A variety of indicators derived from MRIP Recreational Fisheries Statistics, including total recreational catch, total angler trips by region, annual diversity of recreational fleet effort, and annual diversity of managed species.

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2017+), State of the Ecosystem - Mid-Atlantic (2017+)

**Indicator category**: Database pull with analysis
  
**Contributor(s)**: Geret DePiper, Scott Steinbeck
  
**Data steward**: Geret DePiper, <geret.depiper@noaa.gov>
 
**Point of contact**: Geret DePiper, <geret.depiper@noaa.gov>

**Public availability statement**: Data sets are publicly available (see Data Sources below).


## Methods
We use total recreational harvest as an indicator of seafood production and total recreational trips and total recreational anglers as proxies for recreational value generated from the Mid-Atlantic and New England regions respectively. We estimate both recreational catch diversity in species managed by the Fisheries Management Councils; Mid-Atlantic (MAFMC), New England (NEFMC) and Atlantic States (ASFMC), and fleet effort diversity using the effective Shannon index. 

### Data sources
All recreational fishing indicator data, including number of recreationally harvested fish, number of angler trips, and number of anglers, were downloaded from the Marine Recreational Information Program [MRIP Recreational Fisheries Statistics Queries](https://www.st.nmfs.noaa.gov/recreational-fisheries/data-and-documentation/queries/index) portal. Relevant metadata including information regarding data methodology updates are available at the query site. Note that 2017 data were considered preliminary at the time of the data pull. 

Data sets were queried by region on the MRIP site, and for the purposes of the State of the Ecosystem reports, the "NORTH ATLANTIC" and "MID-ATLANTIC" regions were mapped to the New England and Mid-Atlantic report versions respectively. All query pages are accessible through the [MRIP Recreational Fisheries Statistics](https://www.st.nmfs.noaa.gov/recreational-fisheries/data-and-documentation/queries/index) site. 

The number of recreationally harvested fish was found by selecting "TOTAL HARVEST (A + B1)" on the [Catch Time Series Query](https://www.st.nmfs.noaa.gov/recreational-fisheries/data-and-documentation/run-a-data-query) page. Catch diversity estimates were also derived from the total catch time series (see below). Species included in the diversity of catch analysis can be found in Table \@ref(tab:rec-groups). The Mid-Atlantic Fishery Management Council asked that species managed by the South Atlantic Fishery Management Council be distinguished in the analysis of recreational species diversity. 


```{r rec-groups, eval = T, echo = F}

rec_spp <- read.csv(here::here("data","rec_spp_list.csv")) 

knitr::kable(rec_spp, caption="Species included in recreational catch diversity analysis.") %>%
  kableExtra::kable_styling(font_size = 8)
```


Angler trips (listed as "TOTAL" trips) were pulled from the MRIP [Effort Time Series Query](https://www.st.nmfs.noaa.gov/recreational-fisheries/data-and-documentation/run-a-data-query) page, and included data from 1981 - 2021. Time series of recreational fleet effort diversity were calculated from this data set (see below). The number of anglers was total number of anglers from the Marine Recreational Fishery Statistics Survey (MRFSS) Participation Time Series Query, and includes data from 1981 - 2016. 

### Data analysis

**Recreational fleet effort diversity**

Code used to for effort diversity data analysis can be found [here](https://github.com/NOAA-EDAB/tech-doc/blob/master/R/stored_scripts/rec_effort_div_analysis.R). 

**Recreational catch diversity**

Code used to for catch diversity data analysis can be found [here](https://github.com/NOAA-EDAB/tech-doc/blob/master/R/stored_scripts/rec_catch_div_analysis.R). 


### Data processing

Recreational fishing indicators were formatted for inclusion in the `ecodata` R package using this [code](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_rec.R).

<!--chapter:end:chapters/Recreational_Data.Rmd-->

# Recreational Shark Fishing Indicators

**Description**: Recreational Shark Landings

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2021+), State of the Ecosystem - Mid-Atlantic (2021+)

**Indicator category**: Database pull with analysis
  
**Contributor(s)**: Kimberly Bastille
  
**Data steward**: Kimberly Bastille, <kimberly.bastille@noaa.gov>
 
**Point of contact**: Kimberly Bastille, <kimberly.bastille@noaa.gov>

**Public availability statement**: Data sets are publicly available (see Data Sources below).


## Methods

### Data sources
All recreational shark fishing indicator data were downloaded from the Marine Recreational Information Program [MRIP Recreational Fisheries Statistics Queries](https://www.st.nmfs.noaa.gov/recreational-fisheries/data-and-documentation/queries/index) portal. 

From the main [Recreational fisheries statistics queries](https://www.fisheries.noaa.gov/data-tools/recreational-fisheries-statistics-queries) page, the [download query](https://www.st.nmfs.noaa.gov/SASStoredProcess/do?) link is available. From here the following selections made include:

| Prompt            | Selected                            |
|-------------------|-------------------------------------|
| Minimum Year      | 1981                                |
| Maximum Year      | *Max year available                 |
| Data Type         | Estimate: Catch                     |
| Wave Options      | All Waves                           |   
| Geographical Area | Not Specified                       |
| Species           | *26 Species outlined in table below |
| Output            | Download CSV as ZIP File            |

The ZIP file was used in the following analysis. 


### Data analysis

Data regions "4 = New England"  and "5 = Mid-Atlantic" were selected for as to remove data from regions not relevant to the State of the Ecosystem reports. The data were then grouped into categories using the table below. This species list was the list used in the above "species" section in the MRIP query.  Data were grouped by year, category and region, and the sum of all the landings for each was used as the indicator for recreational shark harvest. 

| Category       | Common Name          | Species Name                 |
|----------------|----------------------|------------------------------|
| Small Coastal  | Atlantic Sharpnose   | *Rhizoprionodon terraenovae* |
| Small Coastal  | Blacknose            | *Carcharhinus acronotus*     |
| Small Coastal  | Bonnethead           | *Sphyrna tiburo*             |
| Small Coastal  | Finetooth            | *Carcharhinus isodon*        |
| Large Coastal  | Blacktip             | *Carcharhinus limbatus*      |  
| Large Coastal  | Bull                 | *Carcharhinus leucas*        |
| Large Coastal  | Great Hammerhead     | *Sphyrna mokarran*           |
| Large Coastal  | Lemon                | *Negaprion brevirostris*     |
| Large Coastal  | Nurse                | *Ginglymostoma cirratum*     |
| Large Coastal  | Sandbar              | *Carcharhinus plumbeus*      |
| Large Coastal  | Scalloped Hammerhead | *Sphyrna lewini*             |
| Large Coastal  | Silky                | *Carcharhinus falciformis*   |
| Large Coastal  | Smooth Hammerhead    | *Sphyrna zygaena*            | 
| Large Coastal  | Spinner              | *Carcharhinus brevipinna*    |
| Large Coastal  | Tiger                | *Galeocerdo cuvier*          |
| Prohibited     | Atlantic Angel       | *Squatina dumeril*           |
| Prohibited     | Basking              | *Cetorhinus maximus*         |
| Prohibited     | Bigeye Thresher      | *Alopias superciliosus*      |
| Prohibited     | White                | *Carcharodon carcharias*     |
| Pelagic        | Blue                 | *Prionace glauca*            |
| Pelagic        | Dusky                | *Carcharhinus obscurus*      |
| Pelagic        | Oceanic Whitetip     | *Carcharhinus longimanus*    |
| Pelagic        | Porbeagle            | *Lamna nasus*                |
| Pelagic        | Shortfin Mako        | *Isurus oxyrinchus*          |
| Pelagic        | Thresher             | *Alopias vulpinus*           |

### Data processing

Recreational shark fishing indicators were formatted for inclusion in the `ecodata` R package using this [code](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_rec_hms.R).

<!--chapter:end:chapters/rec_hms.Rmd-->

# Regime Shift Analysis


**Description**: Qualitative regime shift analysis with plotting tool

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2023+), State of the Ecosystem - Mid-Atlantic (2023+)

**Indicator category**: 

**Contributor(s)**: Kimberly Bastille

**Data steward**: NA

**Point of contact**: Kimberly Bastille, <kimberly.bastille@noaa.gov>

**Public availability statement**: NA


## Methods
The regime analysis uses the [`rpart`](https://cran.r-project.org/web/packages/rpart/vignettes/longintro.pdf) package to calculate breaks in the time series. `rpart` creates regression trees using classification and recursive partitioning. This methodology was outlined "Classification and regression trees", a 1984 book written by Leo Breiman and others.

The code used to calculate the statistics behind the plotting visuals can be found in [`ecodata`](https://github.com/NOAA-EDAB/ecodata/blob/master/R/StatREGIME.R). Lines 12-16 show the tree calculations and the pruning. 

There are many ways to calculate regime shifts. This method had been applied previously for select indicators and has been scaled up to apply to other time series datasets for the State of the Ecosystem reports. 

### Data source(s)
NA

### Data extraction
NA

### Data analysis

The red vertical lines indicate the years in which a shift occurs. 

**Example plot**
```{r , echo = F, fig.align="center", eval = T, fig.cap=""}

lt_sst <- ecodata::long_term_sst %>% 
  dplyr::mutate(hline = mean(Value, na.rm = TRUE))

hline <- mean(lt_sst$Value)

lt_sst %>% 
  ggplot2::ggplot(aes(x = Time, y = Value, group = Var)) +
  ggplot2::annotate("rect", fill = shade.fill, alpha = shade.alpha,
      xmin = x.shade.min , xmax = x.shade.max,
      ymin = -Inf, ymax = Inf) +
  ecodata::geom_gls() +
  #ecodata::geom_lm(aes(x = Time, y = Value, group = Var))+
  ecodata::geom_regime()+
  ggplot2::geom_line() +
  ggplot2::geom_point() +
  ggplot2::geom_hline(aes(yintercept = hline),
             size = hline.size,
             alpha = hline.alpha,
           linetype = hline.lty)+
  ggplot2::ylab("Temperature (C)") +
  ggplot2::xlab(element_blank())+
  ggplot2::ggtitle("Long-term SST") +
  ggplot2::scale_x_continuous(expand = c(0.01, 0.01), breaks = seq(1840,2020,10))+
  ecodata::theme_facet() +
  ggplot2::theme(strip.text=element_text(hjust=0,
                                face = "italic"))+
  ecodata::theme_title()


```


<!--chapter:end:chapters/regime_shift_analysis.Rmd-->

# Right Whale Abundance {#narw}


```{r,  echo = F, message=F}

#Load packages
library(knitr)
library(rmarkdown)

```
**Description**: Right Whale

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2017+), State of the Ecosystem - Mid-Atlantic (2017+)

**Indicator category**: Synthesis of published information; Published methods

**Contributor(s)**: Christopher D. Orphanides
  
**Data steward**: Chris Orphanides, chris.orphanides@noaa.gov
  
**Point of contact**: Richard Pace, richard.pace@noaa.gov
  
**Public availability statement**: Source data are available from the New England Aquarium upon request. Derived data are available [here](http://comet.nefsc.noaa.gov/erddap/tabledap/protected_species_soe_v1.html).

## Methods

### Data sources
The North Atlantic right whale abundance estimates were taken from a published document [see @Pace2017], except for the most recent 2016 and 2017 estimates. Abundance estimates from 2016 and 2017 were taken from the 2016 National Oceanographic and Atmospheric Administration marine mammal stock assessment [@Hayes2017] and an unpublished 2017 stock assessment.

Calves birth estimates are taken from a published report [@narw2019] put out yearly by the North American Right Whale Consortium. 

### Data extraction 
Data were collected from existing reports and validated by report authors. 

### Data analysis
Analysis for right whale abundance estimates is provided by @Pace2017, and code can be found in the [supplemental materials](https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fece3.3406&file=ece33406-sup-0001-SupInfo.docx). 

### Data processing

Time series of right whale  and calf abundance estimates were formatted for inclusion in the `ecodata` R package using this R [code](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_narw.R).

<!--chapter:end:chapters/RW_indicator.rmd-->

# SAFMC managed spp

**Description**: SAFMC Species on NES

**Found in**: State of the Ecosystem - Mid-Atlantic (2020), State of the Ecosystem - New England (2020)

**Indicator category**: Database pull

**Contributor(s)**: Sean Lucey
  
**Data steward**: Sean Lucey <Sean.Lucey@noaa.gov>
  
**Point of contact**: Sean Lucey <Sean.Lucey@noaa.gov>
  
**Public availability statement**: Source data are available to qualified researchers upon request (see "Access Information" [here](https://inport.nmfs.noaa.gov/inport/item/22560)).

## Methods


### Data sources
The [Survdat](#survdat) data set was used to examine the presence of "southern" species (table \@ref(tab:southern)) in Mid-Atlantic and New England waters.

### Data extraction 
Survdat was subsetted by common "southern" species (table \@ref(tab:soe2018class)). 

```{r southern, eval = T, echo = F}
comnames <- c('Black snapper', 'Queen snapper', 'Mutton snapper', 'Schoolmaster snapper',
              'Blackfin snapper', 'Northern Red snapper', 'Cubera snapper',
              'Grey snapper', 'Mahogany snapper', 'Dog snapper', 'Lane snapper',
              'Silk snapper', 'Yellowtail snapper', 'Vermilion snapper',
              'Bank sea bass', 'Rock sea bass', 'Black sea bass',
              'Rock hind', 'Graysby', 'Calico grouper',
              'Yellowedge grouper', 'Coney', 'Red hind',
              'Atlantic goliath grouper', 'Red grouper', 'Misty grouper',
              'Warsaw grouper', 'Snowy grouper', 'Nassau grouper',
              'Black grouper', 'Yellowmouth grouper', 'Gag grouper',
              'Scamp grouper', 'Tiger grouper', 'Yellowfin grouper',
              'Sheepshead', 'Grass porgy', 'Jolthead porgy',
              'Saucereye porgy', 'Whitebone porgy', 'Knobbed porgy',
              'Red porgy', 'Longspine porgy', 'Black margate',
              'Porkfish', 'White margate', 'Tomtate',
              'Smallmouth grunt', 'French grunt', 'Spanish grunt',
              'Cottonwick grunt', "Sailor's grunt", 'White grunt', 'Blue Striped grunt',
              'Grey triggerfish', 'Queen triggerfish', 'Ocean triggerfish',
              'Hogfish', 'Puddingwife wrasse', 'Yellow jack',
              'Blue runner', 'Crevalle jack', 'Bar jack', 'Greater amberjack',
              'Almaco jack')

scinames <- c('Apsilus dentatus', 'Etelis oculatus', 'Lutjanus analis', 'Lutjanus apodus', 
              'Lutjanus buccanella', 'Lutjanus campechanus', 'Lutjanus cyanopterus', 
              'Lutjanus griseus', 'Lutjanus mahogoni', 'Lutjanus jocu', 'Lutjanus synagris', 
              'Lutjanus vivanus', 'Ocyurus chrysurus', 'Rhomboplites aurorubens', 
              'Centropristis ocyurus', 'Centropristis philadelphica', 'Centropristis striata', 
              'Epinephelus adscensionis', 'Epinephelus cruentatus', 'Epinephelus drummondhayi', 
              'Epinephelus flavolimbatus', 'Epinephelus fulvus',  'Epinephelus guttatus', 
              'Epinephelus itajara', 'Epinephelus mario', 'Epinephelus mystacinus', 
              'Epinephelus nigritus', 'Epinephelus niveatus', 'Epinephelus striatus', 
              'Mycteroperca bonaci', 'Mycteroperca interstitialis', 'Mycteroperco microlepis', 
              'Mycteroperca phenax', 'Mycteroperca tigris', 'Mycteroperca venenoso', 
              'Archosargus probotocephalus', 'Calamus arctifrons', 'Calamus bajonado', 
              'Calamus calamus', 'Calamus leucosteus', 'Calamus leucosteus', 
              'Pagrus pagrus', 'Stenotomus caprinus', 'Anisotremus surinamensis', 
              'Anisotremus virginicus', 'Haemulon album', 'Haemulon aurolineatum', 
              'Hemulon chrysargyreum', 'Haemulon flavolineatum', 'Haemulon macrostomum', 
              'Haemulon melanurum', 'Haemulon parra', 'Haemulon plumieri', 'Haemulon sciurus',
              'Balistes capriscus', 'Balistes vetula', 'Canthidermis sufflamen',
              'Lachnolaimus maximus', 'Halichoeres rodiatus', 'Caranx bartholomaei', 
              'Caranx crysos', 'Caranx hippos', 'Caranx ruber', 'Seriola dumerili', 
              'Seriola rivoliano')

type <- c(rep('Snappers', 14), rep('Sea Basses', 3), rep('Groupers', 18), 
          rep('Porgies', 8), rep('Grunts', 11), rep('Triggerfishes', 3),
          rep('Wrasses', 2), rep('Jacks', 6))

southern.sp <- data.frame('Common Name' = comnames,
                          'Scientific Name' = scinames,
                          Group = type)

knitr::kable(southern.sp, booktabs = TRUE, longtable = T,
      caption = "Southern Species that were examined within the NEFSC trawl survey data") %>% 
  kableExtra::column_spec(2, italic = TRUE)

```

### Data analysis
The presence/absence of "southern" species was broadly examined for all species listed in table \@ref(tab:southern).  It was quickly determined that these species were extremely rare in the bottom trawl survey.  When a species was present, they were found during the fall survey and not the spring.  No trends were apparent in the data.  The one species that was commonly present was the blue runner (*Caranx crysos*).  Stations were binned temporally by three categories: Prior to 2001, 2001 - 2010, and since 2010.  Stations were then plotted on a map of the survey region and visually inspected.

### Data processing

Blue runner (*Caranx crysos*) data were formatted for inclusion in the `ecodata` R package using this [R code](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_blue_runner.R).

<!--chapter:end:chapters/SAFMC_spp.Rmd-->

# Sandlance

**Description**: Sandlance survey data from Stellwagen Bank National Marine Sanctuary

**Found In**: 2022 Indicator Catalog

**Indicator category**: Published methods

**Contributor(s)**: David N. Wiley, Tammy L. Silva
  
**Data steward**: Moe Nelson <david.moe.nelson@noaa.gov>
  
**Point of contact**: Moe Nelson <david.moe.nelson@noaa.gov>

**Public availability statement**:Source data are publicly available.

## Methods
### Data Sources

This data set is taken directly from Table 1, @Silva2020.  See full citation in "References" section below.

### Data Analysis

Data processing and analysis methods are described in @Silva2020.  The catch counts of sand lance and observational counts of humpback whales and great shearwater were used to derive spatial metrics (center of gravity, and inertia) for each species.  Equations for these spatial metrics are provided in Table 2 of @Silva2020. The spatial metrics (center of gravity,  inertia) were used to calculate the global index of collocation (GIC) to quantify spatial overlap between pairs of species for each cruise.  GICs for species pairs are reported in Table 3 of @Silva2020, but data were not sufficient to calculate GICs for each pair of species in each cruise.


### Data Processing

Code for processing salinity data can be found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_sandlance.R).

<!--chapter:end:chapters/sandlance.Rmd-->

# Submerged Aquatic Vegetation {#SAV}

**Description**: Chesapeake Bay Submerged Aquatic Vegetation Trends

**Found in**:  State of the Ecosystem - Mid-Atlantic (2022+) 

**Indicator category**: Database pull with analysis

**Contributor(s)**: David Wilcox, Brooke Landry, Christopher Patrick
  
**Data steward**: David Wilcox <dwilcox@vims.edu>
  
**Point of contact**: David Wilcox <dwilcox@vims.edu>
  
**Public availability statement**: Source data are NOT publicly available. Please email David Wilcox at dwilcox@vims.edu for further information about the submerged aquatic vegetation indicator.

## Methods

### Data Sources
Data for this indicator comes from the aerial survey of submerged aquatic vegetation coverage in the Chesapeake Bay: https://www.chesapeakeprogress.com/abundant-life/sav.

### Data Extraction
The data is available in excel spreadsheet form using the Downloads `Data (.xlsx)` link. The data used is in the “Salinity zone totals” tab and the hectares column can be extracted for each salinity zone.

### Data Analysis 
The [analysis and methods](https://d18lev1ok5leia.cloudfront.net/chesapeakeprogress/chart-assets/submerged-aquatic-vegetation-sav-abundance-1984-2019/Analysis-and-Methods_2020-Submerged-Aquatic-Vegetation_Prelim_070621_final.pdf) are described at the Chesapeake progress page.  

### Data Processing 

Data were formatted for inclusion in the `ecodata` R package using the R code found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_sav.R).

<!--chapter:end:chapters/SAV.Rmd-->

# Seabird diet and productivity - New England {#seabird_ne}

**Description**: Common tern annual diet and productivity at seven Gulf of Maine colonies managed by the National Audubon Society's Seabird Restoration Program

**Indicator category**: Published method

**Found in**: State of the Ecosystem - New England (2019+)

**Contributor(s)**: Don Lyons, Steve Kress, Paula Shannon, Sue Schubel
                
**Data steward**: Don Lyons, <dlyons@audubon.org>
  
**Point of contact**: Don Lyons, <dlyons@audubon.org>
  
**Public availability statement**: Please email dlyons@audubon.org for further information and queries on this indicator source data.


## Methods

**Chick diet**

Common tern (*Sterna hirundo*) chick diet was quantified at each of the seven nesting sites by observing chick provisioning from portable observation blinds. The locations of observation blinds within each site were chosen to maximize the number of visible nests, and provisioning observations took place between mid-June and early August annually. Observations of chick diet were made during one or two, three to four hour periods throughout the day, but typically proceed according to nest activity levels (moreso in the morning hours). Observations began with chicks as soon as they hatched, and continue until the chicks fledged or died. 

Most common tern prey species were identifiable to the species level due to distinct size, color and shape. However, when identification was not possible or was unclear, prey species were listed as "unknown" or "unknown fish". More detailed methods can be found in @hall2000. 

**Nest productivity**

Common tern nest productivity, in terms of the number of fledged chicks per nest, was collected annually from fenced enclosures at island nesting sites (known as "productivity plots"). Newly hatched chicks within these enclosures were weighed, marked or banded, and observed until fledging, death, or until a 15 day period had passed when chicks were assumed to have fledged. Productivity was also quantified from observer blinds for nests outside of the productivity plots where chicks were marked for identification. More detailed methods for quantifying nest productivity can be found in @hall2004.


### Data sources

Common tern diet and nest productivity data were provided by the National Audubon Society's Seabird Restoration Program.

### Data processing

Diet and productivity data were formatted for inclusion in the `ecodata` R package using this R [code](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_seabird_ne.R).


### Data analysis

Raw diet data were used to create time series of mean shannon diversity through time and across study sites using the `vegan` R package [@R-vegan]. Code for this calculation can be found [here](https://github.com/NOAA-EDAB/tech-doc/blob/master/R/stored_scripts/seabird_ne_div_analysis.R). Diet diversity is presented along with nest productivity (+/- 1 SE).

Code used to create the figures below can be found at these links, [diet diversity](https://github.com/NOAA-EDAB/ecodata/blob/master/chunk-scripts/macrofauna.Rmd-tern-diet-diversity.R), [prey frequencies](https://github.com/NOAA-EDAB/ecodata/blob/master/chunk-scripts/macrofauna.Rmd-stacked-bar-prey-freq.R) and [common tern productivity](https://github.com/NOAA-EDAB/ecodata/blob/master/chunk-scripts/macrofauna.Rmd-aggregate-prod.R)

<!--chapter:end:chapters/seabird_ne.Rmd-->

# Seasonal SST Anomalies {#seasonal_sst_anomaly_gridded}

**Description**: Seasonal SST Anomalies

**Indicator category**: Database pull with analysis

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2018+), State of the Ecosystem - Mid-Atlantic (2018+)

**Contributor(s)**: Sean Hardison, Vincent Saba
  
**Data steward**: Kimberly Bastille, <kimberly.bastille@noaa.gov>
  
**Point of contact**: Kimberly Bastille, <kimberly.bastille@noaa.gov>
  
**Public availability statement**: Source data are available [here](https://www.esrl.noaa.gov/psd/data/gridded/data.noaa.oisst.v2.highres.html).


## Methods

### Data sources
Data for seasonal sea surface tempature anomalies (Fig. \@ref(fig:MAB-SST)) were derived from the National Oceanographic and Atmospheric Administartion optimum interpolation sea surface temperature high resolution data set ([NOAA OISST V2](https://www.esrl.noaa.gov/psd/data/gridded/data.noaa.oisst.v2.highres.html)) provided by NOAA Earth System Research Laboratory's Physical Science Division, Boulder, CO. The data extend from 1981 to present, and provide a 0.25&deg; x 0.25&deg; global grid of SST measurements [@Reynolds2007]. 


In 2021, the Daily OISST data was updated and there are a couple papers describing and comparing the new version [@Huang2021assessment, @Huang2021improvements]. 


### Data extraction 

Individual files containing daily mean SST data for each year during the period of 1981-present were downloaded from the [OI SST V5 site](https://www.esrl.noaa.gov/psd/data/gridded/data.noaa.oisst.v2.highres.html). Yearly data provided as layered rasters were masked according to the extent of Northeast US Continental Shelf. Data were split into three month seasons for (Winter = Jan, Feb, Mar; Spring = Apr, May, Jun; Summer = July, August, September; Fall = Oct, Nov, Dec).  

This is done in a GitHub action and is available online in [`ecopull`](https://github.com/kimberly-bastille/ecopull/actions). 

### Data analysis
We calculated the long-term mean (LTM) for each season-specific stack of rasters over the period of 1982-2010, and then subtracted the (LTM) from daily mean SST values to find the SST anomaly for a given year. The use of climatological reference periods is a standard procedure for the calculation of meteorological anomalies [@WMO2017]. Prior to 2019 State of the Ecosystem reports, SST anomaly information made use of a 1982-2012 reference period. A 1982-2010 reference period was adopted to facilitate calculating anomalies from a standard [NOAA ESRL](https://www.esrl.noaa.gov/psd/data/gridded/data.noaa.oisst.v2.highres.html) data set.

R code used in extraction and processing [gridded](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_seasonal_oisst_anom_gridded.R)  and [timeseries](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_seasonal_oisst_anom.R)  data can found in the `ecodata` package.

<!--chapter:end:chapters/Seasonal_SST_anomaly_maps_indicator.Rmd-->

# Single Species Status Indicator {#stockstatus} {#stock_status}


**Description**: Summary of the most recent stock assessment results for each assessed species.

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2017+), State of the Ecosystem - Mid-Atlantic (2017+)

**Indicator category**: Synthesis of published information (StockSMART)

**Contributor(s)**: Sarah Gaichas, Andy Beet, Jeff Vieser, Chris Legault

**Data steward**: Sarah Gaichas <sarah.gaichas@noaa.gov>

**Point of contact**: Sarah Gaichas <sarah.gaichas@noaa.gov>

**Public availability statement**: All stock assessment results are publicly available (see Data Sources). Summarized data are available [here](http://comet.nefsc.noaa.gov/erddap/tabledap/assess_soe_v1.htmlTable?No,Entity_Name,Science_Center,Assessment_Year,Last_Data_Year,Assessment_Level,Citation,Comments,Best_F,F_Year,Flimit,Fmsy,F_Flimit,F_Fmsy,Best_B,B_Year,B_Blimit,B_Bmsy,Stock_Level_Relative_to_Bmsy,Bmsy,Blim).

## Methods

### Data sources
<!--Please provide a text description of data sources, inlcuding primary collection methods. What equipment was used to turn signal to data? From which vessel were data collected and how? What quality control procedures were employed, if any?--> 

"Data" used for this indicator are the outputs of stock assessment models and review processes, including reference points (proxies for fishing mortality limits and stock biomass targets and limits), and the current fishing mortality rate and biomass of each stock. These metrics are reported to the a national repository, [Stock SMART](https://www.st.nmfs.noaa.gov/stocksmart?app=homepage).

Recent stock assessment updates for each species are available on the Northeast Fisheries Science Center (NEFSC) website using the form here: https://apps-nefsc.fisheries.noaa.gov/saw/sasi/sasi_report_options.php

For example, to download the 2020 assessment data, use the form by checking the boxes: Year--2020

Check each available 2020 species and stock area in turn, downloaded .zip of "all files". 

Species with 2020 updates included: Acadian redfish, Atlantic halibut, Atlantic herring, Atlantic Sea Scallop, Atlantic surfclam, Atlantic wolffish, Butterfish, Longfin squid, Ocean Pout, Ocean quahog, Red Hake (2 stocks), Silver hake (2 stocks), Windowpane flounder (2 stocks), Winter flounder (3 stocks). 

These 2020 stock assessment results were compiled as preliminary information by Jeff Vieser, who provided the spreadsheet `NE Stock Assessment Results.xlsx` 10 December 2020. These results are considered preliminary until uploaded to StockSMART.


### Data extraction

Beginning in 2020 for the 2021 SOE, we used Andy Beet's [stocksmart package](https://github.com/NOAA-EDAB/stocksmart) to extract assessment results from [Stock SMART](https://www.st.nmfs.noaa.gov/stocksmart?app=homepage). 

The code used to work up this data can be found in [`sgaichas/stockstatusindicator`](https://github.com/sgaichas/stockstatusindicator). 

```{r}

library(stocksmart)
```


Two data frames are in the `stocksmart` package, `stockAssessmentData` and `stockAssessmentSummary`.

In `stockAssessmentData` we have time series. Columns are `r names(stockAssessmentData)` and the reported metrics are `r unique(stockAssessmentData$Metric)`. 


```{r}
#library(DT)
DT::datatable(head(stockAssessmentData), rownames = FALSE)

```

In `stockAssessmentSummary` we have assessment metadata. Columns are `r (names(stockAssessmentSummary))`.

```{r}

DT::datatable(head(stockAssessmentSummary), rownames = FALSE, options = list(scrollX = TRUE))

```
In 2021, `stocksmart` was updated with all current assessments, so data extraction was simply:

```{r make-2021assess, eval=FALSE}
assess2021 <- stockAssessmentSummary %>%
  filter(`Science Center` == "NEFSC") %>%
  select(c(`Stock Name`, Jurisdiction, FMP, `Science Center`, 
           `Stock Area`, `Assessment Year`, `Last Data Year`,
           `F Year`, `Estimated F`, Flimit, Fmsy, `F/Flimit`, 
           `F/Fmsy`, Ftarget, `F/Ftarget`, `B Year`, `Estimated B`,
           `B Unit`, Blimit, Bmsy, `B/Blimit`, `B/Bmsy`)) %>%
  arrange(Jurisdiction, `Stock Name`, FMP, `Assessment Year`) %>%
  rename(Entity.Name = `Stock Name`,
         Assessment.Year = `Assessment Year`,
         F.Fmsy = `F/Fmsy`,
         B.Bmsy = `B/Bmsy`)
write.csv(assess2021, here("assess.csv"))

decode <- read.csv(here("2020decoder.csv"))
  
write.csv(decode, here("decoder.csv"))

```

Year-specific naming conventions for assess and decoder files were dropped in 2021 to facilitate future data updates.


In 2020, assessment summary data were extracted from `stockAssessmentSummary` for 2019 and prior records, and the 2020 assessments results were added from the preliminary results provided by Jeff Vieser. 

* The `assess.csv` fields used in previous years were recreated from stockSMART to include necessary metadata:

```{r newassess, echo=TRUE, eval=FALSE}

new2019assess <- stockAssessmentSummary %>%
  filter(`Science Center` == "NEFSC") %>%
  rename(Entity.Name = `Stock Name`) %>%
  rename_all(list(~make.names(.)))

```

* Add 2020 assessments and write `2020assess.csv` data contribution:

```{r addrow-replace, echo=TRUE, eval=FALSE}

prelim2020 <- read.csv(here("NE Stock Assessment Results.csv")) %>%
  filter(Assessment.Year == 2020) %>%
  rename(Entity.Name = Stock,
         FSSI.Stock. = FSSI,
         Estimated.F = Best.F,
         Estimated.B = Best.B,
         Review.Result = Review.Type) %>%
  select(-c(Year, Status.Stock., Record.Status, TimeSeries.Data.,
            Survey.Links., Adequate, Minimum.F, Maximum.F,
            Minimum.B, Maximum.B, 
            Stock.Level.Relative.to.Bmsy:Decision.memo.related.to.inadequate.rebuilding.progress))


update2020assess <- bind_rows(new2019assess, prelim2020)

write.csv(update2020assess, here("2020assess.csv"))

```

The `decoder.csv` data contribution was updated in December 2020 to retain only Entity.Name, Council, and Code fields (used by `get_stocks`):

```{r newdecoder, echo=TRUE, eval=FALSE}

newdecoder <- read.csv(here("2019decoder.csv")) %>%
  select(Entity.Name, Code, Council)

write.csv(newdecoder, here("2020decoder.csv"))

```


For the 2017-2020 SOEs, each assessment document was searched to find the following information (often but not always summarized under a term of reference to determine stock status in the executive summary), and the spreadsheets were updated by hand:

*    **Bcur**: current year biomass, (most often spawning stock biomass (SSB) or whatever units the reference points are in)

*    **Fcur**: current year fishing mortality, F

*    **Bref**: biomass reference point, a proxy of Bmsy (the target)

*    **Fref**: fishing mortality reference point, a proxy of Fmsy

### Data processing

R code used to process the stock status data set for inclusion in the `ecodata` R package can be found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_stocks.R).

### Data analysis
<!--Text description of analysis methods, similar in structure and detail to a peer-reviewed paper methods section-->

For each assessed species, Bcur is divided by Bref and Fcur is divided by Fref. They are then plotted for each species on an x-y plot, with Bcur/Bref on the x axis, and Fcur/Fref on the y axis. 

<!--What packages or libraries did you use in your work flow?-->


<!--Include accompanying R code, pseudocode, flow of scripts, and/or link to location of code used in analyses.-->

<!--chapter:end:chapters/singlespp_status_indicator.Rmd-->

# Slopewater proportions {#slopewater}

**Description**: Percent total of water type observed in the deep Northeast Channel (150-200 m water depth).

**Indicator category**: Published methods

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2019+)

**Contributors**: Paula Fratantoni, paula.fratantoni@noaa.gov; David Mountain, NOAA Fisheries, retired.

**Data steward**: Kimberly Bastille, kimberly.bastille@noaa.gov

**Point of contact**: Paula Fratantoni, paula.fratantoni@noaa.gov

**Public availability statement**: Source data are publicly available at ftp://ftp.nefsc.noaa.gov/pub/hydro/matlab_files/yearly and in the World Ocean Database housed at  http://www.nodc.noaa.gov/OC5/SELECT/dbsearch/dbsearch.html under institute code 258

## Methods

### Data sources

The slope water composition index incorporates temperature and salinity measurements collected on Northeast Fisheries Science Center surveys between 1977-present within the geographic confines of the Northeast Channel in the Gulf of Maine.  Early measurements were made using water samples collected primarily with Niskin bottles at discreet depths, mechanical bathythermographs and expendable bathythermograph probes, but by 1991 the CTD – an acronym for conductivity temperature and depth – became standard equipment on all NEFSC surveys.  

### Data extraction

While all processed hydrographic data are archived in an Oracle database (OCDBS), we work from Matlab-formatted files stored locally. 

### Data analysis

Temperature and salinity measurements are examined to assess the composition of the waters entering the Gulf of Maine through the Northeast Channel.  The analysis closely follows the methodology described by @mountain2012.   This method assumes that the waters flowing into the Northeast Channel between 150 and 200 meters depth are composed of slope waters, originating offshore of the continental shelf, and shelf waters, originating on the continental shelf south of Nova Scotia. 

For each survey in the hydrographic archive, ocean temperature and salinity observations sampled in the area just inside the Northeast Channel (bounded by 42.2-42.6`r ifelse(knitr::is_latex_output(),"\\textdegree" ,'&deg;')` latitude north and 66-66.8`r ifelse(knitr::is_latex_output(),"\\textdegree" ,'&deg;')` longitude west) and between 150 - 200 meters depth are extracted and a volume-weighted average temperature and salinity is calculated.    The volume weighting is accomplished by apportioning the area within the Northeast Channel polygon among the stations occupying the region, based on inverse distance squared weighting.  The result of this calculation is a timeseries of volume-average temperature and salinity having a temporal resolution that matches the survey frequency in the database.  

The average temperature and salinity observed at depth in the Northeast Channel is assumed to be the product of mixing between three distinct sources having the following temperature and salinity characteristics: (1) Warm Slope Water (T=10 `r ifelse(knitr::is_latex_output(),"\\textdegree " ,'&deg;')`C, S=35), (2) Labrador Slope Water (T=6 `r ifelse(knitr::is_latex_output(),"\\textdegree " ,'&deg;')`C, S=34.7) and (3) Scotian Shelf Water (T=2 `r ifelse(knitr::is_latex_output(),"\\textdegree " ,'&deg;')`C, S=32).  As described by @mountain2012, the relative proportion of each source is determined via a rudimentary 3-point mixing algorithm.  On a temperature-salinity diagram, lines connecting the T-S coordinates for these three sources form a triangle, the sides of which represent mixing lines between the sources. A water sample that is a mixture of two sources will have a temperature and salinity that falls somewhere along the line connecting the two sources on the temperature-salinity diagram.  Observations of temperature and salinity collected within the Northeast Channel would be expected to fall within the triangle if the water sampled is a mixture of the three sources. Simple geometry allows us to calculate the relative proportion of each source in a given measurement.  As an example, a line drawn from the T-S point representing shelf water through an observed T-S in the center of the triangle will intersect the opposite side of the triangle (the mixing line connecting the coordinates of the two slope water sources).  This intersecting T-S value may then be used to calculate the relative proportions (percentage) of the two slope water sources.  Using this method, the percentage of Labrador slope water and Warm slope water are determined for the timeseries of volume-average temperature and salinity.

It should be noted that our method assumes that the temperature and salinity properties associated with the source watermasses are constant.  In reality, these may vary from year to year, modified by atmospheric forcing, mixing and/or advective processes.  Likewise, other sources are periodically introduced into the Northeast Channel, including intrusions of Gulf Stream water flowing into the Gulf of Maine and modified shelf water flowing out of the Gulf of Maine along the flank of Georges Bank.  These sources are not explicitely considered in the 3-point mixing algorithm and may introduce errors in the proportional estimates.  Code used to calculate slopewater proportions can be found [here](https://github.com/NOAA-EDAB/tech-doc/blob/master/R/stored_scripts/slopewater_analysis.R).


### Data processing

Source data were formatted for inclusion in the `ecodata` R package using the R code found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_slopewater.R).

<!--chapter:end:chapters/slopewater_proportions.Rmd-->

# Species Density Estimates

**Description**: Current and Historical Species Distributions

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2017, 2018), State of the Ecosystem - Mid-Atlantic (2017, 2018)

**Indicator category**: Database pull; Database pull with analysis

**Contributor**: Kevin Friedland

**Data steward**: Kevin Friedland

**Point of contact**: Kevin Friedland, kevin.friedland@noaa.gov

**Public availability statement**: Source data are publicly available.


## Methods
We used kernel density plots to depict shifts in species' distributions over time. These figures characterize the probability of a species occurring in a given area based on Northeast Fisheries Science Center (NEFSC) Bottom Trawl Survey data. Kernel density estimates (KDEs) of distributions are shown for the period of 1970-1979 (shaded blue) and most recent three years of survey data (shaded red) (e.g. Figure \@ref(fig:kde-fig)). Results are typically visualized for spring and fall bottom trawl surveys seperately. 

Three probability levels (25%, 50%, 75%) are shown for each time period, where the 25% region depicts the core area of the distribution and the 75% region shows the area occupied more broadly by the species. A wide array of KDEs for many ecologically and economically important species on the Northeast US Continental Shelf are available [here](https://www.nefsc.noaa.gov/ecosys/current-conditions/kernel-density.html).

### Data sources
Current and historical species distributions are based on the NEFSC Bottom Trawl Survey data (aka ["Survdat"](#survdat)) and depth strata. Strata are available as shapefiles that can be downloaded  [here](https://github.com/NOAA-EDAB/tech-doc/tree/master/gis) (listed as 
"strata.shp"). 

### Data analysis

Code used for species density analysis can be found [here](https://github.com/NOAA-EDAB/tech-doc/blob/master/R/stored_scripts/species_density_analysis.R). 
```{r , code = readLines("https://raw.githubusercontent.com/NOAA-EDAB/tech-doc/master/R/stored_scripts/species_density_analysis.R"), eval=F, echo=F}

```

<!--chapter:end:chapters/Species_density_estimates.Rmd-->

# Species Distribution Indicators {#species_dist}

**Description**: Species mean depth, along-shelf distance, and distance to coastline

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2017+), State of the Ecosystem - Mid-Atlantic (2017+)

**Indicator category**: Extensive analysis; not yet published

**Contributor(s)**: Kevin Friedland
  
**Data steward**: Kevin Friedland, <kevin.friedland@noaa.gov>
  
**Point of contact**: Kevin Friedland, <kevin.friedland@noaa.gov>
  
**Public availability statement**: Source data are available upon request (read more [here](https://inport.nmfs.noaa.gov/inport/item/22560)). Derived data may be downloaded [here](https://comet.nefsc.noaa.gov/erddap/tabledap/SOE_habitat_soe_v1.html).


## Methods
Three metrics quantifying spatial-temporal distribution shifts within fish populations were developed by @Friedland2018, including mean depth, along-shelf distance, and distance to coastline. Along-shelf distance is a metric for quantifying the distribution of a species through time along the axis of the US Northeast Continental Shelf, which extends northeastward from the Outer Banks of North Carolina. Values in the derived time series correspond to mean distance in km from the southwest origin of the along-shelf axis at 0 km. The along-shelf axis begins at 76.53&deg;W 34.60&deg;N and terminates at 65.71&deg;W 43.49&deg;N. 

Once mean distance is found, depth of occurrence and distance to coastline can be calculated for each species' positional center. Analyses present in the State of the Ecosystem (SOE) reports include mean depth and along-shelf distance for Atlantic cod, sea scallop, summer flounder, and black sea bass. 


### Data sources
Data for these indicators were derived from fishery-independent bottom trawl survey data collected by the Northeast Fisheries Science Center (NEFSC). 


<!-- ### Data extraction  -->


### Data analysis

Species distribution indicators were derived using the R code found [here](https://github.com/NOAA-EDAB/tech-doc/blob/master/R/stored_scripts/species_distribution_analysis.R). 


### Data processing

Distribution indicators were further formatted for inclusion in the `ecodata` R package using the R code found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_species_dist.R).

<!--chapter:end:chapters/Species_dist_indicators.Rmd-->

# Stomach fullness

**Description**: Stomach Fullness

**Found in**: State of the Ecosystem - Mid-Atlantic (2020), State of the Ecosystem - New England (2020)

**Indicator category**: Database pull with analysis

**Contributor(s)**: Laurel Smith
  
**Data steward**: Kimberly Bastille <kimberly.bastille@noaa.gov>
  
**Point of contact**: Kimberly Bastille <kimberly.bastille@noaa.gov>
  
**Public availability statement**: NEFSC survey data used in these analyses are available upon request (see [Food Habits Database (FHDBS)](https://inport.nmfs.noaa.gov/inport) for access procedures). Derived stomach fullness data are available.


## Methods
An index of stomach fullness was calculated from NEFSC autumn bottom trawl food habits data, as a simple ratio of estimated stomach content weight to total weight of an individual fish. Stomach fullness may be a better measure than absolute stomach weight if combining across species into a feeding guild, to prevent larger animals with heavier stomachs from dominating the index. An average stomach fullness was calcuated annually for each species and Ecological Production Unit (EPU).

### Data sources
Stomach contents weights and individual fish weights (both to the nearest gram) were collected on the NEFSC bottom trawl surveys from 1992-present aboard RVs Albatross IV, Delaware II and the Henry B. Bigelow (see [Food Habits Database (FHDBS)](https://inport.nmfs.noaa.gov/inport) for access procedures).

### Data extraction 
NEFSC food habits data summarized in the R data file allfh.RData were obtained from Brian Smith (Brian.Smith@noaa.gov) for this index.

### Data analysis
The stomach fullness index was calculated using the R script found [here](https://github.com/Laurels1/StomachFullness). 

### Data processing
Fish stomach fullness index was formatted for inclusion in the `ecodata` R package using this [R code](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_stom_fullness.R). Stomach fullness was expressed as an annual anomaly for each species in each region.

<!--chapter:end:chapters/stom_fullness.Rmd-->

# Storminess Indicator

**Description**: Long-term trends in storminess based on wind speed and wave height exceeding specific extreme thresholds that are related to the effect on fisheries and fishing behavior.

**Indicator category**: Database pull with analysis

**Found in**: 

**Contributor(s)**: Art DeGaetano (Cornell, Northeast Climate Center), Gabe Larouche (Cornell, Northeast Climate Center), Kimberly Hyde (NEFSC), Ellen Mecray (NOAA/NESDIS/NCEI)
  
**Data steward**:  Art DeGaetano <Arthur.T.DeGaetano@noaa.gov>  
  
**Point of contact**:  Art DeGaetano <Arthur.T.DeGaetano@noaa.gov>  

  
**Public availability statement**:  Source data is freely available to the public (see Data Sources).


## Methods

### Data sources
European Centre for Medium-Range Weather Forecasts atmospheric reanalysis version 5 (ERA5) accessed via the  Copernicus Climate Change Service https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels?tab=form. The 3-hour data cover the Earth on a 30km grid and are freely available to the public.

### Data Extraction
The following variables were extracted from the input data: 
 + 10m_u_component_of_wind
 + '10m_v_component_of_wind
 + Mean_sea_level_pressure
 + Mean_wave_period
 + significant_height_of_combined_wind_waves_and_swell

Extractions were limited to the region bounded by 80°W, 50°N, 60°W, and 20°N.  

Extraction code fetch_data.py is available at:  https://github.com/nrcc-cornell/regional-swh  
Data were subset into five regions: southern Mid-Atlantic bight, northern Mid-Atlantic bight, Georges Bank, western Gulf of Maine, eastern Gulf of Maine.

```{r, echo = F, fig.align="center", fig.cap=""}

knitr::include_graphics(c(file.path(image.dir, "NEUS_regions.png")))

```


### Data Processing 

Code for processing wind process_data.py and wave data can be found at: 
https://github.com/nrcc-cornell/regional-swh 

The wind index was defined using four thresholds set at the beginning of the processing code.  Gale_thres = 34 kts, temporal_thres = 3 hours, intervene = 96 hours, and st = 0.25.  These translate to the index defining storminess events as windspeeds ≥34kts that persist for at least 3hrs, are separated from previous events by at least 96 hrs and occur at more than 25% of the 30km grid points within a region.
For wave height data, the same thresholds are used with the exception of gale_thres being replaced with wave_t=5.  This sets the index to events with >5m wave heights.  


The data were analyzed at their base 1-hour temporal and 30 km spatial resolutions.  At each gridpoint falling within a region (e.g. southern Mid-Atlantic bight) the raw data were screened to identify winds exceeding the gale threshold. Then each point was further analyzed to determine if at least 3 consecutive hours exceeded the threshold.  If more than 25% of the grid points within the region met these criteria, an event was indicated and the annual event tally for the region was increased by one, provided it was separated from a previous event by >96 hours. 

Code used to process storminess data for inclusion in `ecodata` can be found on github - [NOAA-EDAB/ecodata](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_storminess.R).

<!--chapter:end:chapters/storminess.Rmd-->

# Survey Data {#survdat}

**Description**: Survdat (Survey database)

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2017+), State of the Ecosystem - Mid-Atlantic (2017+)

**Indicator category**: Database pull

**Contributor(s)**: Sean Lucey

**Data steward**: Sean Lucey <sean.lucey@noaa.gov>

**Point of contact**: Sean Lucey <sean.lucey@noaa.gov>

**Public availability statement**: Source data are available to qualified researchers upon request (see "Access Information" [here](https://inport.nmfs.noaa.gov/inport/item/22560)). Derived data used in SOE reports are available [here](https://comet.nefsc.noaa.gov/erddap/tabledap/group_landings_soe_v1.html).

NO SURVEYS IS 2020

## Methods
The Northeast Fisheries Science Center (NEFSC) has been conducting standardized bottom trawl surveys
in the fall since 1963 and spring since 1968.  The surveys follow a stratified random design.  Fish
species and several invertebrate species are enumerated on a tow by tow basis [@Azarovitz1981].  
The data are housed in the NEFSC's survey database (SVDBS) maintained by the Ecosystem Survey Branch.  

Direct pulls from the database are not advisable as there have been several gear modifications and
vessel changes over the course of the time series [@Miller_2010].  Survdat was developed as a database 
query that applies the appropriate calibration factors for a seamless time series since the 1960s.
As such, it is the base for many of the other analyses conducted for the State of the Ecosystem
report that involve fisheries independent data.

The Survdat script can be broken down into two sections.  The first pulls the raw data from SVDBS.
While the script is able to pull data from more than just the spring and fall bottom trawl surveys,
for the purposes of the State of the Ecosystem reports only the spring and fall data are used.
Survdat identifies those research cruises associated with the seasonal bottom trawl surveys and pulls
the station and biological data.  Station data includes tow identification (cruise, station, 
and stratum), tow location and date, as well as several environmental variables (depth, surface/bottom salinity, 
and surface/bottom temperature).  Stations are filtered for representativness using a station, haul, gear
(SHG) code for tows prior to 2009 and a tow, operations, gear, and aquisition (TOGA) code from 2009
onward.  The codes that correspond to a representative tow (SHG <= 136 or TOGA <= 1324) are the same
used by assessment biologists at the NEFSC.  Biological data includes the total biomass and abundance
by species, as well as lengths and number at length.

The second section of the Survdat script applies the calibration factors.  There are four calibrartion
factors applied (Table \@ref(tab:calibration)).  Calibration factors are pulled directly from SVDBS.  Vessel conversions were made from 
either the NOAA Ship *Delaware II* or NOAA Ship *Henry Bigelow* to the NOAA Ship *Albatross IV* which was 
the primary vessel for most of the time series.  The Albatross was decommisioned in 2009 and the Bigelow is 
now the primary vessel for the bottom trawl survey.

```{r calibration, eval = T, echo = F}
cal.factors <- data.frame(Name = c('Door Conversion', 'Net Conversion', 'Vessel Conversion I', 'Vessel Conversion II'),
                          Code = c('DCF', 'GCF', 'VCF', 'BCF'),
                          Applied = c('<1985', '1973 - 1981 (Spring)', 'Delaware II records', 'Henry Bigelow records'))
kable(cal.factors, booktabs = TRUE,
      caption = "Calibration factors for NEFSC trawl survey data")
```

The output from Survdat is an RData file that contains all the station and biological data, corrected
as noted above, from the NEFSC Spring Bottom Trawl Survey and NEFSC Fall Bottom Trawl Survey.  The RData
file is a data.table, a powerful wrapper for the base data.frame (https://cran.r-project.org/web/packages/data.table/data.table.pdf).
There are also a series of tools that have been developed in order to utilize the Survdat data set
(https://github.com/NOAA-EDAB/survdat).

### Data sources
Survdat is a database query of the NEFSC survey database (SVDBS).These data are available to qualified researchers upon request. More information on the data request process is available under the "Access Information" field [here](https://inport.nmfs.noaa.gov/inport/item/22560).

### Data extraction 
Extraction methods are described above. The R code found [here](https://noaa-edab.github.io/survdat/) was used in the survey data extraction process.


### Data analysis
The fisheries independent data contained within the Survdat is used in a variety of
products; the more complicated analyses are detailed in their own sections.  The most straightforward use of this data is for the resource species aggregate biomass 
indicators.  For the purposes of the aggregate biomass indicators, fall and spring 
survey data are treated separately.  Additionally, all length data is dropped and 
species seperated by sex at the catch level are merged back together.

Since 2020, survey strata where characterized as being within an [Ecological Production Unit](#epu) based on where at least 50% of the area of the strata was located (Figure \@ref(fig:epustrata).  While this does not create a perfect match for the EPU boundaries it allows us to calculate the variance associated with the index as the survey was designed. 


```{r epustrata, fig.cap="Map of the Northeast Shelf broken into the four Ecological Production Units by strata.Strata were assigned to an EPU based on which one contained at least 50% of the area of the strata." , out.width="90%",echo = F}

knitr::include_graphics(file.path(image.dir,"EPU_Designations_Map.jpg"))

```
  

Prior to 2020, Survdat was first post stratified into EPUs by labeling stations by the EPU they fell within using the `over` function from the `rgdal` R package [@rgdal].  Next, the total number of stations within each EPU per year is counted using unique station records. Biomass is summed by species per year per EPU.  Those sums are divided by the appropriate station count to get the EPU mean.  Finally, the mean biomasses are summed by [aggregate groups](#aggroups). These steps are encompassed in the [processing code](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_agg_bio.R), which also includes steps taken to format the data set for inclusion in the `ecodata` R package.

<!--chapter:end:chapters/survey_data.rmd-->

# Thermal Habitat Projections

**Description**: Species Thermal Habitat Projections

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2018), State of the Ecosystem - Mid-Atlantic (2018)

**Indicator category**: Published methods

**Contributor(s)**: Vincent Saba
  
**Data steward**: Vincent Saba, <vincent.saba@noaa.gov>
  
**Point of contact**: Vincent Saba, <vincent.saba@noaa.gov>
  
**Public availability statement**: Source data are available to the public. Model outputs for thermal habitat projections are available [here](https://comet.nefsc.noaa.gov/erddap/info/index.html?page=1&itemsPerPage=1000).


## Methods

This indicator is based on work reported in @Kleisner2017.

### Data sources

#### Global Climate Model Projection

We used [National Oceanographic and Atmosheric Administration's Geophysical Fluid Dynamics Laboratory (NOAA GFDL) CM2.6 simulation](https://www.gfdl.noaa.gov/high-resolution-climate-modeling/) consisting of (1) a 1860 pre-industrial control, which brings the climate system into near-equilibrium with 1860 greenhouse gas concentrations, and (2) a transient climate response (2xCO2) simulation where atmospheric CO2 is increased by 1% per year, which results in a doubling of CO2 after 70 years. The climate change response from CM2.6 was based on the difference between these two experimental runs. Refer to @Saba2016 for further details. 

#### Modeling Changes in Suitable Thermal Habitat

The NOAA Northeast Fisheries Science Center, U.S. Northeast Shelf (NES) bottom trawl survey, which has been conducted for almost 50-years in the spring and fall, provides a rich source of data on historical and current marine species distribution, abundance, and habitat, as well as oceanographic conditions [@Azarovitz1981]. The survey was implemented to meet several objectives: (1) monitor trends in abundance, biomass, and recruitment, (2) monitor the geographic distribution of species, (3) monitor ecosystem changes, (4) monitor changes in life history traits (e.g., trends in growth, longevity, mortality, and maturation, and food habits), and (5) collect baseline oceanographic and environmental data. These data can be leveraged for exploring future changes in the patterns of abundance and distribution of species in the region. 

### Data analysis

#### Global Climate Model Projection

The CM2.6 80-year projections can be roughly assigned to a time period by using the International Panel on Climate Change (IPCC) Representative Concentration Pathways (RCPs), which describe four different 21st century pathways of anthropogenic greenhouse gas emissions, air pollutant emissions, and land use [@IPCC2014]. There are four RCPs, ranging from a stringent mitigation scenario (RCP2.6), two intermediate scenarios (RCP4.5 and RCP6.0), and one scenario with very high greenhouse gas emissions (RCP8.5). For RCP8.5, the global average temperature at the surface warms by 2C by approximately 2060-2070 relative to the 1986-2005 climatology (see Figure SPM.7a in [IPCC, 2013](https://www.ipcc.ch/pdf/assessment-report/ar5/wg1/WG1AR5_SPM_FINAL.pdf)). For CM2.6, the global average temperature warms by 2C by approximately years 60-80 (see Fig. 1 in @winton_has_2014). Therefore, the last 20 years of the transient climate response simulation roughly corresponds to 2060-2080 of the RCP8.5 scenario. 

Here, the monthly differences in surface and bottom temperatures ('deltas') for spring (February-April) and fall (September- November) are added to an average annual temperature climatology for spring and fall, respectively, derived from observed surface and bottom temperatures to produce an 80-year time series of future bottom and surface temperatures in both seasons. The observed temperatures come from the NEFSC spring and fall bottom trawl surveys conducted from 1968 to 2013 and represent approximately 30,000 observations over the time series. 


#### Modeling Changes in Suitable Thermal Habitat

We modeled individual species thermal habitat across the whole U.S. NES and not by sub-region because we did not want to assume that species would necessarily maintain these assemblages in the future. Indeed, the goal here is to determine future patterns of thermal habitat availability for species on the U.S. NES in more broad terms. We fit one generalizaed additive model (GAM) based on both spring and fall data (i.e., an annual model as opposed to separate spring and fall models) and use it to project potential changes in distribution and magnitude of biomass separately for each season for each species. By creating a single annual model based on temperature data from both spring and fall, we ensure that the full thermal envelope of each species is represented. For example, if a species with a wide thermal tolerance has historically been found in cooler waters in the spring, and in warmer waters in the fall, an annual model will ensure that if there are warmer waters in the spring in the future, that species will have the potential to inhabit those areas. Additionally, because the trawl survey data are subject to many zero observations, we use delta-lognormal GAMs [@Wood2011a], which model presence-absence separately from logged positive observations. The response variables in each of the GAMs are presence/absence and logged positive biomass of each assemblage or individual species, respectively. A binomial link function is used in the presence/absence models and a Gaussian link function is used in the models with logged positive biomass. 

The predictor variables are surface and bottom temperature and depth (all measured by the survey at each station), fit with penalized regression splines, and survey stratum, which accounts for differences in regional habitat quality across the survey region. Stratum may be considered to account for additional information not explicitly measured by the survey (e.g., bottom rugosity). Predictions of species abundance are calculated as the product of the predictions from the presence-absence model, the exponentiated predictions from the logged positive biomass model, and a correction factor to account for the retransformation bias associated with the log transformation [@Duan1983; and see @Pinsky2013]. 

We calculated the suitable thermal habitat both in terms of changes in 'suitable thermal abundance', defined as the species density possible given appropriate temperature, depth and bathymetric conditions, and changes in 'suitable thermal area', defined as the size of the physical area potentially occupied by a species given appropriate temperature, depth and bathymetric conditions. Suitable thermal abundance is determined from the predictions from the GAMs (i.e., a prediction of biomass). However, this quantity should not be interpreted directly as a change in future abundance or biomass, but instead as the potential abundance of a species in the future given changes in temperature and holding all else (e.g., fishing effort, species interactions, productivity, etc.) constant. Suitable thermal area is determined as a change in the suitable area that a species distribution occupies in the future and is derived from the area of the kernel density of the distribution. To ensure that the estimates are conservative, we select all points with values greater than one standard deviation above the mean. We then compute the area of these kernels using the `gArea` function from the `rgeos` package in R [@Bivand2011].

<!--chapter:end:chapters/Thermal_hab_proj_indicator.Rmd-->

# Transition Dates {#trans_dates}

**Description**: Sea surface temperature transition dates
 

**Indicator category**: Extensive analysis

**Found in**: State of the Ecosystem - New England (2023), 
State of the Ecosystem - Mid-Atlantic (2023) 

**Contributor(s)**: Kevin Friedland <kevin.friedland@noaa.gov>
  
**Data steward**: Kimberly Bastille <kimberly.bastille@noaa.gov>
  
**Point of contact**: Kimberly Bastille <kimberly.bastille@noaa.gov>
  
**Public availability statement**: Data is publically available. 


## Methods


### Data sources

Data comes from NOAA's high resolution blended analysis of daily sea surface temperature on a 1/4 degree grid and is available online at [Physical Science Labratory](https://psl.noaa.gov/data/gridded/data.noaa.oisst.v2.highres.html). 

### Data analysis

Analysis was recreated from @friedland2015. 


### Data Processing

The Transiton Date indicator was formatted for inclusion in the `ecodata` R package with the code found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_transition_dates.R).

<!--chapter:end:chapters/trans_dates.Rmd-->

# Trend Analysis


**Description**: Time series trend analysis

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2018+), State of the Ecosystem - Mid-Atlantic (2018+)

**Indicator category**: Extensive analysis, not yet published

**Contributor(s)**: Sean Hardison, Charles Perretti, Geret DePiper

**Data steward**: NA

**Point of contact**: Kimberly Bastille, <kimberly.bastille@noaa.gov>

**Public availability statement**: NA


## Methods
Summarizing trends for ecosystem indicators is desirable, but the power of statistical tests to detect a trend is hampered by low sample size and autocorrelated observations [see @Nicholson2004; @Wagner2013; @VonStorch1999a].  Prior to 2018, time series indicators in State of the Ecosystem reports were presented with trend lines based on a Mann-Kendall test for monotonic trends to test significance (p < 0.05) of both long term (full time series) and recent (2007–2016) trends, although not all time series were considered for trend analysis due to limited series lengths. There was also concern that a Mann-Kendall test would not account for any autocorrelation present in State of the Ecosystem (SOE) indicators.

In a simulation study [@hardison2019], we explored the effect of time series length and autocorrelation strength on statistical power of three trend detection methods: a generalized least squares model selection approach, the Mann-Kendall test, and Mann-Kendall test with trend-free pre-whitening. Methods were applied to simulated time series of varying trend and autocorrelation strengths. Overall, when sample size was low (N = 10) there were high rates of false trend detection, and similarly, low rates of true trend detection. Both of these forms of error were further amplified by autocorrelation in the trend residuals. Based on these findings, we selected a minimum series length of N = 30 for indicator time series before assessing trend.

We also chose to use a GLS model selection (GLS-MS) approach to evaluate indicator trends in the 2018 (and future) State of the Ecosystem reports, as this approach performed best overall in the simulation study. GLS-MS also allowed for both linear and quadratic model fits and quantification of uncertainty in trend estimates. The model selection procedure for the GLS approach fits four models to each time series and selects the best fitting model using AICc. The models are, 1) linear trend with uncorrelated residuals, 2) linear trend with correlated residuals, 3) quadratic trend with uncorrelated residuals, and 4) quadratic trend with correlated residuals. I.e., the models are of the form

$$ Y_t = \alpha_0 + \alpha_1X_t + \alpha_2X_t^2 + \epsilon_t$$
$$\epsilon_t = \rho\epsilon_{t-1} + \omega_t$$

$$w_t \sim N(0, \sigma^2)$$

Where $Y_t$ is the observation in time $t$, $X_t$ is the time index, $\epsilon_t$ is the residual in time $t$, and $\omega_t$ is a normally distributed random variable. Setting $\alpha_2 = 0$ yields the linear trend model, and $\rho = 0$ yields the uncorrelated residuals model.

The best fit model was tested against the null hypothesis of no trend through a likelihood ratio test (p < 0.05). All models were fit using the R package `nlme` [@Pinheiro2017] and AICc was calculated using the R package `AICcmodavg` [@Mazerolle2017a]. In SOE time series figures, significant positive trends were colored orange, and negative trends purple. 

### Data source(s)
NA

### Data extraction
NA

### Data analysis

Code used for trend analysis can be found [here](https://github.com/NOAA-EDAB/tech-doc/blob/master/R/stored_scripts/trend_analysis.R).

<!--chapter:end:chapters/Trend_analysis.Rmd-->

# Verified Records of Southern Kingfish

**Description**: Fisheries Observer Data – Verified Records of Southern Kingfish

**Found in**: State of the Ecosystem - Mid-Atlantic (2018)

**Indicator category**: Database pull 

**Contributor(s)**: Debra Duarte, Loren Kellogg
  
**Data steward**: Gina Shield, gina.shield@noaa.gov
  
**Point of contact**: Gina Shield, gina.shield@noaa.gov
  
**Public availability statement**: Due to PII concerns data for this indicator are not publicly available.


## Methods

### Data sources

The Fisheries Monitoring and Research Division deploys observers on commercial fisheries trips from Maine to North Carolina. On observed tows, observers must fully document all kept and discarded species encountered. Observers must comply with a Species Verification Program (SVP), which requires photo or sample submissions of high priority species at least once per quarter. Photos and samples submitted for verification are identified independently by at least two reviewers.

The derived data presented in the Mid-Atlantic State of the Ecosystem report for southern kingfish include records verified by the SVP program only. The occurrence of southern kingfish in SVP records were chosen for inclusion in the report due to the recent increases of the species in SVP observer records since 2010. These data are not a complete list from the Northeast Fisheries Observer Program (NEFOP). Southern Kingfish are less common than Northern Kingfish in observer data and are possibly misidentified so we have initially included records here only when a specimen record was submitted to and verified through the SVP (see Data extraction). 

### Data extraction 
SQL query for observer data extraction can be found [here](https://github.com/NOAA-EDAB/tech-doc/blob/master/R/stored_scripts/observer_data_extraction.sql).


### Data analysis
Time series were summed by year and plotted, and mapped data for individual records were plotted according to the location where gear was hauled. As coordinate data were not always available for each record, the map does not include all occurrences of southern kingfish, but was included for spatial context.

<!--chapter:end:chapters/observer_data_indicator.Rmd-->

# Warm Core Rings {#wcr}

**Description**: Warm Core Rings

**Found in**: State of the Ecosystem - Mid-Atlantic (2020+), State of the Ecosystem - New England (2020+)

**Indicator category**: Published Results

**Contributor(s)**: Avijit Gangopadhyay  <avijit.gangopadhyay@umassd.edu>
  
**Data steward**: Avijit Gangopadhyay 
  
**Point of contact**: Avijit Gangopadhyay
  
**Public availability statement**: Data is available upon request. 

## Methods

The plot showing the number of warm core ring formations and regime shift replicates figure 3 in @gangopadhyay2019.  Detailed methods on the warm core ring time series and regime shift analysis are described in the manuscript.

### Data sources


[Gulf Stream charts from Jennifer Clark](https://jcgulfstream.com/charts/) are the primary data source for the warm core ring analysis in @gangopadhyay2019.  The Gulf Stream charts use infra-red (IR) imagery, satellite altimetry data, and surface in-situ temperature data in 3-day composite images are regularly produced by NOAA and/or the Johns Hopkins University Applied Physics Lab (fermi) group (see http://fermi.jhuapl.edu for more details).


### Data extraction 
The data from @gangopadhyay2019 were provided by Avijit Gangopandhyay, *School for Marine Science and Technology, University of Massachusetts Dartmouth, MA*. 

### Data analysis
A sequential regime shift detection algorithm was used to identify the regimes evident in the warm core ring formation time-series.  See @gangopadhyay2019 for details.

### Data processing
Warm core ring data were formatted for inclusion in the `ecodata` R package using this [R code](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_warm_core_rings.R).

<!--chapter:end:chapters/warm_core_rings.Rmd-->

# Waterbird productivity - Mid-Atlantic

**Description**: Virginia waterbird data 

**Indicator category**: Published Results

**Found in**: State of the Ecosystem - Mid-Atantic (2020)

**Contributor(s)**: Ruth Boettcher
                
**Data steward**: Kimberly Bastille <kimberly.bastille@noaa.gov>
  
**Point of contact**: Kimberly Bastille <kimberly.bastille@noaa.gov>
  
**Public availability statement**: Data is publically available. 


## Methods


### Data sources

Virginia colonial waterbird breeding pair population estimates derived from table 4 of "Status and distribution of colonial waterbirds in coastal Virginia: 2018 breeding season." Center for Conservation Biology Technical Report Series, CCBTR-18-17. College of William and Mary & Virginia Commonwealth University, Williamsburg, VA.  Available at: https://scholarworks.wm.edu/cgi/viewcontent.cgi?article=1237&context=ccb_reports

### Data analysis

NA

### Data processing

VA colonial waterbird data were formatted for inclusion in the `ecodata` R package using this [R code](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_seabird_MAB.R).

<!--chapter:end:chapters/Seabird_MAB.Rmd-->

# WEA Fishing Port Landings {#wind_port}

**Description**: Port Landings from within Wind Lease Areas and Community Social Vulnerability Indicators

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2022+), State of the Ecosystem - Mid-Atlantic (2022+)

**Indicator category**: Port Landings from within Wind Lease Areas and Community Social Vulnerability Indicators

**Contributor(s)**: Angela Silva, Doug Christel
  
**Data steward**: Angela Silva
  
**Point of contact**: Angela Silva <angela.silva@noaa.gov>
  
**Public availability statement**: Source data are NOT publicly available. Please email angela.silva@noaa.gov for further information and queries of Speed and Extent of Offshore Wind Development indicator source data.

## Methods
### Data Sources
Social Indicators Data: 
*https://www.fisheries.noaa.gov/national/socioeconomics/social-indicators-coastal-communities
*https://www.st.nmfs.noaa.gov/data-and-tools/social-indicators/

Wind Data: 
*https://www.greateratlantic.fisheries.noaa.gov/ro/fso/reports/WIND/ALL_WEA_BY_AREA_DATA.html
*https://www.fisheries.noaa.gov/resource/data/socioeconomic-impacts-atlantic-offshore-wind-development


### Data Analysis

Cumulative port landings(pounds) and revenue(dollars) from Wind Energy Areas (WEA) were pulled for communities along the Northeast US Shelf from 2010 to 2019. Percent of wind lease area landings were calculated compared to total landings for those communities. Environmental Justice and Gentrification Vulnerability were then matched to these communities. 


### Data Processing 

Data were formatted for inclusion in the `ecodata` R package using the R code found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_wind_port.R).

<!--chapter:end:chapters/wind_port_landings_EJ.Rmd-->

# Wind Energy Development Timeline {#wind_dev_speed}

**Description**: Wind Energy Lease Area Development 

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2021+), State of the Ecosystem - Mid-Atlantic (2021+)

**Indicator category**: Published methods, Synthesis of published information, Database pull, Database pull with analysis

**Contributor(s)**: Angela Silva, Andrew Lipsky, Doug Christel
  
**Data steward**: Angela Silva
  
**Point of contact**: Angela Silva <angela.silva@noaa.gov>
  
**Public availability statement**: Source data are NOT publicly available. Please email angela.silva@noaa.gov for further information and queries of Speed and Extent of Offshore Wind Development indicator source data.

## Methods
### Data Sources
BOEM lease area, Call Areas, Planning Area shapefiles:  https://www.boem.gov/renewable-energy/mapping-and-data/renewable-energy-gis-data; 

Maine Area of Interest: Maine Department of Marine Resources, Central Atlantic Bight planning area draft (BOEM communication, INTERNAL ONLY private shapefile); Foundation and Cable data from South Fork Final Environemntal Impact Statement (SWFW FEIS) data tables E-4, E-4-1, E-2: https://www.boem.gov/sites/default/files/documents/renewable-energy/state-activities/SFWF%20FEIS.pdf


### Data Analysis
All data was updated for 2022 with South Fork Wind Farm FEIS and the following assumptions were made on future wind areas: 
* (1) There are no reported values for foundations, cable acres and miles and year of construction for NY WEA, Maine AOI, and Central Atlantic Bight draft planning area. 
* (2) To estimate the variables, the ratio of each (Cumul_FNDS, Cumul_Offsh_Cbl_Acres, Cumul_OffExp_Inter_Cab_Miles, TBNSinstall_no) was calculated by using reported values for existing lease area. All data is reported as ""2030""

Spatial Analysis for Project_Acres:


Project Areas and Call Area acres were calculated using BOEM Project Area Shapefiles (Project_Areas_12_3_2019), BOEM NY Call Area Shapefiles (NY_Call_Areas), and NY Call Area Primary and Secondary Recommendation shapefiles (BOEM_NY_Draft_WEAs_11_1_2018) in ArcMap. 


Project_Areas_12_3_2019, NY_Call_Areas, and BOEM_NY_Draft_WEAs_11_1_2018 Acres were calculated using Add Field and Field Calculator tool. Python Expression = !shape.area@acres!


Project_Name:  Table E-4 of South Fork FEIS Project names were matched to shapefiles by name and lease number. 


FDNS: Number of foundations proposed or expected for each Project area taken directly from Table E-4 of South Fork DEIS.


Offsh_Cbl_Acres: Values taken directly from Table A-4 in South Fork DEIS (Table A-4: Offshore Wind Leasing Activities in the U.S. East Coast: Projects and Assumptions [part 2], pg. E-3-4). Total values for MA/RI lease areas Bay State Wind, Liberty Wind, OCS-A 0522 Remainder, OCS-A 0500 Remainder, OCS-A 0521 Remainder, OCS-A 0520 were aggregated in the table (567 total acres).  Values were evenly distributed across the 6 Project areas.  As such, these values should be treated as estimates until more information is released specific to individual project areas and their landing sites. 


Dominion Energy was presented as 3 phases in Table E-4 for Project_Name (Dominion Energy Phase1, Dominion Energy Phase 2, Dominion Energy Phase 3). Only one Project shapefile area exists for this lease  area OCS-A 0483. Therefore, the total shapefile acreage was evenly divided between 3 phases similar to how the foundations were treated in table E-4 (Future Offshore Wind Project Construction Schedule, pg. E-14). 


OffExpCab_Miles: Offshore Export Cable Length OCS-A 0482, OCS-A 0519 OCS-A 0490 had 360 offshore export cable miles reported in Table E-4. This number was divided by 3 and 120 were assigned to these three project areas. 


### Data Processing 

Data were formatted for inclusion in the `ecodata` R package using the R code found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_wind_dev_speed.R).

<!--chapter:end:chapters/Wind_dev_speed.Rmd-->

# Wind lease areas and habitat occupancy overlap

**Description**: Wind lease areas and habitat occupancy

**Found in**: State of the Ecosystem - Mid-Atlantic (2020)

**Indicator category**: Database pull with analysis; Extensive analysis; not yet published; Published methods

**Contributor(s)**: Kevin Friedland
  
**Data steward**: Kimberly Bastille <kimberly.bastille@noaa.gov>
  
**Point of contact**: Kimberly Bastille <kimberly.bastille@noaa.gov>
  
**Public availability statement**: Source data are publicly available.

## Methods

Habitat area with a probability of occupancy greater than 0.5 was modeled for many species throughout the Northeast Large Marine Ecosystem (NE-LME) [@friedland2020]. Methodology for habitat occupancy models have been discussed in a [seperate chapter](#hab-occu). 

[Bureau of Ocean Energy Management](https://www.boem.gov/) (BOEM) is the department responsible for the developement of offshore wind energy. Existing and proposed and lease areas were overlayed with habitat occupancy models to determine the species most likely to be found in the wind lease areas (Table \@ref(tab:wind-table)). 

### Data extraction 

BOEM existing and proposed lease areas (as of Feb 2019) shape files were taken from the [BOEM website](https://www.boem.gov/renewable-energy/mapping-and-data/renewable-energy-gis-data). 

### Data analysis

For the purposes of this indicator, the Northeast Shelf was broken into three general areas (North, Mid and South). The species shown in the table below (Table \@ref(tab:wind-table))are those that have the highest average probablity of occupancy in the lease areas. 

### Data processing

Code used to format wind lease area and habitat occupancy overlap for inclusion in the `ecodata` package can be found [here](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_wind_occupancy.R).

<!--chapter:end:chapters/wind_habitat_occupancy.Rmd-->

# Zooplankton {#zoo_abundance_anom}

**Description**: Annual time series of zooplankton abundance

**Found in**: State of the Ecosystem - Gulf of Maine & Georges Bank (2017+), State of the Ecosystem - Mid-Atlantic (2017+)

**Indicator category**: Database pull with analysis; Synthesis of published information; Extensive analysis, not yet published; Published methods

**Contributor(s)**: Ryan Morse, Kevin Friedland
  
**Data steward**: Harvey Walsh, <harvey.walsh@noaa.gov>; Mike Jones, <michael.jones@noaa.gov>
  
**Point of contact**: Ryan Morse, <ryan.morse@noaa.gov>; Harvey Walsh, <harvey.walsh@noaa.gov>; Kevin Friedland, <kevin.friedland@noaa.gov>
  
**Public availability statement**: Source data through 2019 are publicly available [here](ftp://ftp.nefsc.noaa.gov/pub/hydro/zooplankton_data/), and data through 2021 are available upon request from harvey.walsh@noaa.gov. Derived data can be found [here](https://comet.nefsc.noaa.gov/erddap/tabledap/zoo_abundance_soe_v1.html).

## Methods

### Data sources
Zooplankton data are from the National Oceanographic and Atmospheric Administration Marine Resources Monitoring, Assessment and Prediction (MARMAP) program and Ecosystem Monitoring (EcoMon) cruises detailed extensively in @Kane2007, @Kane2011, and @Morse2017.

### Data extraction 
Data are from the publicly available plankton dataset at NCEI Accession 0187513. The accession metadata has a list of excluded samples and cruises based on @Kane2007 and @Kane2011 in addition to other collection details.

### Data analysis

#### Annual abundance anomalies


Data are processed similarly to @Kane2007 and @Perretti2017, where a mean annual abundance by date is computed by area for each species meeting inclusion metrics set in @Morse2017. This is accomplished by binning all samples for a given species to bi-monthly collection dates based on median cruise date and taking the mean, then fitting a spline interpolation between mean bi-monthly abundance to give expected abundance on any given day of the year. 

Code used for zooplankton data analysis can be found [here](https://github.com/NOAA-EDAB/tech-doc/blob/master/R/stored_scripts/zooplankton_analysis.R). 

#### Copepod

Abundance anomalies are computed from the expected abundance on the day of sample collection. Abundance anomaly time series are constructed for *Centropages typicus*, *Pseudocalanus* spp., *Calanus finmarchicus*, and total zooplankton biovolume. The small-large copepod size index is computed by averaging the individual abundance anomalies of *Pseudocalanus* spp., *Centropages hamatus*, *Centropages typicus*, and *Temora longicornis*, and subtracting the abundance anomaly of *Calanus finmarchicus*. This index tracks the overall dominance of the small bodied copepods relative to the largest copepod in the Northeast U.S. region, *Calanus finmarchicus*.

#### Euphausiids and Cnidarians 

Stratified abundance of euphausiids and cnidarians were included in the 2020 State of the Ecosystem reports. These were calculated as the log of estimated absolute number of individuals. 


#### Seasonal abundance

Time series of zooplankton abundance in the spring and fall months have been presented in the 2019 Mid-Atlantic State of the Ecosystem report. Raw abundance data were sourced from the EcoMon cruises referenced above, and ordinary kriging was used to estimate seasonal abundance over the Northeast Shelf. These data were then aggregated further into time series of mean abundance by Ecological Production Unit. 

#### Zooplankton Diversity

Time series of zooplankton diversity (effective shannon) was calculated using 42 zooplankton classifications collected fromt the EcoMon cruises, referenced above. 

### Data processing

Zooplankton abundances indicators were formatted for inclusion in the `ecodata` R package using the code at these links,  [abundance anomaly](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_zoo_abun_anom.R) and  [seasonal abundance](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_zoo_oi.R).

<!--chapter:end:chapters/Zooplankton_indicators.Rmd-->

# Glossary


**Apex Predator:**
Predators with no natural predators of their own, such as large sharks, toothed whales, seals, tunas, and billfish.

**Benthivore:**
Predator feeding on bottom-dwelling prey, such as lobster and haddock.

**Benthos:**
Organisms that live on or in the sea bottom [@madden2004], such as scallop and quahog.

**Bmsy:**
The weight (biomass) of a group of fish necessary to produce maximum sustainable yield (MSY) [@nwfsc].  

**Catch:**
The total number (or weight) of fish caught by fishing operations. The component of fish that comes into contact with fishing gear, which is retained by the gear [@unfao]. 

**Climate Vulnerability:**
The degree to which the habitat/species are unable to cope with negative impacts of climate change.

**Climatology:**
Average conditions over a specific time period. 

**Cold Pool:**
Area of relatively cold bottom water that forms on the US northeast shelf in the Mid-Atlantic Bight. 

**Commercial Fishery:**
Large-scale industry selling fish, shellfish and other aquatic animals. 

**Community Engagement:**
A mathematical measure of how engaged a community is in commercial fisheries. This index includes the amount of landings, dealers and permits. 

**Conceptual Model:**
A representation of the most current understanding of the major system features and processes of a particular environment [@madden2004]. 
 
**Condition:**
A mathematical measurement of the “plumpness,” or the general health of a fish or group of fishes [@wallace1994]. 

**Continental Shelf:**
Underwater portion (shelf) of the continent, extending seaward from the shore to the edge of the continental slope where the depth increases rapidly [@unfao]. 

**Continental Slope:**
Part of the continental margin; the ocean floor from the continental shelf to the continental rise [@madden2004]. 

**Ecological Production Unit (EPU):**
A specific geographic region of similar physical features and plankton characteristics supporting an ecological community within a large marine ecosystem (LME). 

**Ecosystem Assessment:**
A social process through which the findings of science concerning the causes of ecosystem change, their consequences for human well-being, and management and policy options are presented to decision makers [@unfao]. 

**Effort:**
The amount of time and fishing power used to harvest fish; includes gear size, boat size, and horsepower [@wallace1994]. 

**Elasmobranch:**
Describes a group of fish without a hard bony skeleton, including sharks, skates, and rays [@unfao]. 


**Endangered Species:**
A species as defined in the US Endangered Species Act, that is in danger of extinction through a significant portion of its range [@noaaglos]. 


**Energy Density:**
A measurement of the amount of energy (calories) contained in a certain amount of food or prey organism. 

**Estuary:**
Coastal body of brackish water which may be an important nursery habitat for many species of interest. 

**Estuarine:**
Conditions found in an estuary: shallow water, high variability in water temperature, salt content, nutrients, and oxygen level.

**Eutrophication:**
The enrichment of water by nutrients causing increased growth of algae and higher forms of plant life creating an imbalance of organisms present in the water and to the quality of the water they live in [@ospar2003].  

**Exclusive Economic Zone:**
The EEZ is the area that extends from the seaward boundaries of the coastal states 3 to 200 nautical miles off the U.S. coast. Within this area, the United States claims exclusive fishery management authority over all fishery resources [@nmfs2004]. 

**Feeding Guild:**
A group of species consuming similar prey species; for example, planktivores are different species that all eat plankton. 

**Fishery:**
The combination of fish and fishers in a region, the latter fishing for similar or the same species with similar or the same gear types [@madden2004]. 

**Fishery-Dependent Data:**
Data collected directly on a fish or fishery from commercial or sport fishermen and seafood dealers. Common methods include logbooks, trip tickets, port sampling, fishery observers, and phone surveys [@wallace1994]. 

**Fishery-Independent Data:**
Stock/habitat/environmental data collected independently of the activity of the fishing sector usually on a research vessel [@unfao]. 


**Fmsy:**
The rate of removal of fish from a population by fishing that, if applied constantly, would result in maximum sustainable yield (MSY) [@unfao]. 

**Forage Species:**
Species used as prey by a larger predator for its food. Includes small schooling fishes such as anchovies, sardines, herrings, capelin, smelts, and menhaden [@unfao].
 
**GB:**
George’s Bank Ecological Production Unit [@techdoc]. 

**GOM:**
Gulf of Maine Ecological Production Unit [@techdoc]. 

**Groundfish:**
Group of commercially harvested ocean bottom-oriented fish in cooler regions of the Northern Hemisphere including cods, flounders, and other associated species. The exact species list varies regionally. 

**Gulf Stream:**
A warm ocean current flowing northward along the eastern United States.

**Habitat:**
1. The environment in which the fish live, including everything that surrounds and affects its life, e.g. water quality, bottom, vegetation, associated species (including food supplies); 2. The site and particular type of local environment occupied by an organism [@unfao]. 
 
**Harvest:**
The total number or weight of fish caught and kept from an area over a period of time [@wallace1994]. 

**Highly Migratory Species:**
Marine species whose life cycle includes lengthy migrations, usually through the exclusive economic zones of two or more countries as well as into international waters. This term usually is used to denote tuna and tuna-like species, sharks, swordfish, and billfish [@unfao]. 

**Ichthyoplankton:**
Fish eggs and larvae belonging to the planktonic community [@unfao]. 

**Indicator:**
1. A variable, pointer, or index. Its fluctuation reveals the variations in key elements of a system. The position and trend of the indicator in relation to reference points or values indicate the present state and dynamics of the system. Indicators provide a bridge between objectives and action [@unfao].

**Landings:**
1. The number or weight of fish unloaded by commercial fishermen or brought to shore by recreational fishermen for personal use. Landings are reported at the locations at which fish are brought to shore [@wallace1994].


**Large Marine Ecosystem (LME):**
A geographic area of an ocean that has distinct physical and oceanographic characteristics, productivity, and trophically dependent populations [@unfao]. 

**MAB:**
Mid-Atlantic Bight Ecological Production Unit [@techdoc]. 

**Marine Heatwave:**
Period of five or more days where sea surface temperature is warmer than 90% of all previously measured temperatures based on a 30-year historical baseline period [@hobday2016]. 

**Marine Mammals:**
Warm-blooded animals that live in marine waters and breathe air directly. These include porpoises, dolphins, whales, seals, and sea lions [@wallace2000]. 

**Mortality Event: **
The death of one or more individuals of a species. 

**Northeast Shelf:**
The Northeast U.S. Continental Shelf Large Marine Ecosystem (NES LME). The region  spans from Cape Hatteras, NC to Nova Scotia and includes the waters between the eastern coastline of the U.S and the continental shelf break. 

**Ocean Acidification (OA):**
Global-scale changes in ocean marine carbonate chemistry driven by ocean uptake of atmospheric carbon dioxide (CO2). Human-induced ocean acidification specifically refers to the significant present shifts in the marine carbonate system that are a direct result of the exponential increase in atmospheric CO2 concentrations associated with human activities like fossil fuel use [@jewett2020].

**Overfished:**
When a stock’s biomass is below the point at which stock can produce sustainable yield. The term is used when biomass has been estimated to be below a limit biological reference point: in the US when biomass is less than ½ of Bmsy [@unfao]. 

**Overfishing:**
Whenever a stock is subjected to a fishing morality greater than the fishing mortality that produces maximum sustainable yield (MSY) on a continuing basis [@unfao]. 

**Phytoplankton:**
Microscopic single-celled, free-floating algae (plants) that take up carbon dioxide and use nutrients and sunlight to produce biomass and form the base of the food web [@unfao]. 

**Piscivore:**
Predator whose diet primarily consists of fish and squid, such as cod and striped bass. 

**Planktivore:**
Predator whose diet primarily consists of plankton, such as herring and mackerel. 

**Primary Production:**
The amount of energy produced by the assimilation and fixation of inorganic carbon  and other nutrients by autotrophs (plants and certain bacteria) [@unfao]. 

**Primary Production Required:**
Indicator expressing the total amount of fish removed from an area as a fraction of the total primary production in the area [@pauly1995].

**Primary Productivity:**
The rate at which food energy is generated, or fixed, by photosynthesis or chemosynthesis.  

**Probability of Occupancy:**
The modelled chance of a species being likely to occur in a specific area. 

**Productivity:**
Relates to the birth, growth and death rates of a stock. A highly productive stock is characterized by high birth, growth, and mortality rates, and as a consequence, a high turnover and production to biomass ratios (P/B) [@unfao]. 

**Recreational Fishery:**
Fishing for fun or competition instead of profit like a commercial fishery. Includes for-hire charter and party boats, private boats, and shore-based fishing activities.

**Recruitment:**
The number of young fish entering the population each year at the age first caught in fishing/survey gear.

**Revenue:**
The dollar value commercial fishermen receive for selling landed fish. 

**Salinity:**
The total mass of salts dissolved in seawater per unit of water; generally expressed in parts per thousands (ppt) or practical salinity units (psu) [@madden2004]. 

**Satellite Imagery:**
Imagery of the ocean surface gathered by earth-orbiting satellites [@unfao]. 

**Slopewater Proportion:**
The proportion of deep water entering the Gulf of Maine through the Northeast channel from two main water sources. The Labrador slope water is colder water moving south from Canada and Warm slope water is warmer water moving north from the southern U.S. [@techdoc].  

**Socio-Economic:**
The combination or interaction of social and economic factors and involves topics such as distributional issues, labor market structure, social and opportunity costs, community dynamics, and decision-making processes [@unfao]. 

**SS:**
Scotian Shelf Ecological Production Unit [@techdoc].

**Stock:**
A part of a fish population usually with a particular migration pattern, specific spawning grounds, and subject to a distinct fishery. Total stock refers to both juveniles and adults, either in numbers or by weight [@unfao].


**Trophic Level:**
Position in the food chain determined by the number of energy-transfer steps to that level. Primary producers constitute the lowest level, followed by zooplankton, etc. [@unfao].

**Warm Core Ring:**
A clockwise turning eddy of cold water surrounding warm water in the center that breaks away from the Gulf Stream as it meanders. 

**Water Quality:**
The chemical, physical, and biological characteristics of water in respect to its suitability for a particular purpose [@noaaglos]. 

**Zooplankton:**
Plankton consisting of small animals and the immature stages of larger animals, ranging from microscopic organisms to large species, such as jellyfish.


(ref:neusmap1) Map of Northeast U.S. Continental Shelf Large Marine Ecosystem from @Hare2016.


```{r neusmap1, message = FALSE, warning=FALSE, fig.align='center', fig.height=6, echo = F, fig.cap='(ref:neusmap1)'}
knitr::include_graphics(here::here("images/journal.pone.0146756.g002.PNG"))
 
```

<!--chapter:end:chapters/pl_lan_glossary.Rmd-->

# References {-}

```{r include=FALSE}
#`r if (knitr::is_html_output()) '# References {-}'`
knitr::write_bib(c(
  .packages(), 'bookdown', 'knitr', 'rmarkdown', 'htmlwidgets', 'webshot', 'DT',
  'miniUI', 'tufte', 'servr', 'citr', 'rticles'
), 'packages.bib')
```

<!--chapter:end:chapters/references.Rmd-->