# CCFRP data: convert to DwC

**Pre-processing:** CCFRP data was originally shared as an excel file with multiple sheets. Saved each sheet as a .csv:
1. CPUE.Avg.Summary.all_spp --> CPUE_Avg.csv
2. CPUE.SE.Summary.all.spp --> CPUE_SE.csv
3. Counts.Summary.all.spp --> Counts.csv

This could be automated if desired.

**Reminder:** Still have to deal with the olive/yellowtail species name issue.

In [1]:
## Imports

import pandas as pd
import numpy as np
import random

from datetime import datetime # for handline dates
import pytz # for handling time zones


In [2]:
## Ensure my general functions for the MPA data integration project can be imported, and import them

import sys
sys.path.insert(0, "C:\\Users\\dianalg\\PycharmProjects\\PythonScripts\\MPA data integration")

import WoRMS # functions for querying WoRMS REST API

In [3]:
## Load CCFRP count data

path = 'C:\\Users\\dianalg\\Documents\\Work\\MBARI\\MPA Data Integration\\CCFRP\\'
filename = 'Counts.csv'
data = pd.read_csv(path+filename)

data.head()

Unnamed: 0,Area,Site,Lat Center Point,Lon Center Point,Year,Barred Sand Bass,Bat Ray,Bigmouth Sole,Black-and-Yellow Rockfish,Black Rockfish,...,Vermilion Rockfish,White Croaker,White Seabass,Widow Rockfish,Wolf Eel,Yelloweye Rockfish,Yellowfin Croaker,Yellowtail Jack,Yellowtail Rockfish,Total
0,Trinidad,REF,41.115,-124.173,2018,0,0,0,0,708,...,2,0,0,0,0,0,0,0,22,898
1,Trinidad,REF,41.115,-124.173,2019,0,0,0,0,384,...,1,0,0,0,0,1,0,0,16,504
2,Cape Mendocino,MPA,40.426,-124.478,2017,0,0,0,0,113,...,9,0,0,0,0,3,0,0,4,229
3,Cape Mendocino,MPA,40.426,-124.478,2018,0,0,0,0,58,...,20,0,0,0,0,10,0,0,7,300
4,Cape Mendocino,MPA,40.426,-124.478,2019,0,0,0,0,52,...,15,0,0,0,0,6,0,0,10,234


In [4]:
## Load scientific names

path = 'C:\\Users\\dianalg\\PycharmProjects\\PythonScripts\\MPA data integration\\CCFRP\\'
filename = 'CCFRP_common_to_scientific.csv'
species = pd.read_csv(path+filename)

species.head()

Unnamed: 0,common_names,scientific_names
0,Bigmouth Sole,Hippoglossina stomata
1,Longfin Sanddab,Citharichthys xanthostigma
2,Pacific Halibut,Hippoglossus stenolepis
3,Pelagic Stingray,Pteroplatytrygon violacea
4,Northern Anchovy,Engraulis mordax


### Convert data to long format

In [5]:
## I don't think we want to include Total as a species, so drop it

data.drop('Total', axis=1, inplace=True)

In [6]:
## How many unique areas are there?

len(data['Area'].unique())

16

In [7]:
## How many years has each area been surveyed?

num_years = data.groupby(['Area', 'Site'])['Year'].count()
num_years

Area              Site
Anacapa Island    MPA      3
                  REF      3
Ano Nuevo         MPA     13
                  REF     13
Bodega Head       MPA      3
                  REF      3
Cape Mendocino    MPA      3
                  REF      3
Carrington Point  MPA      3
                  REF      3
Farallon Islands  MPA      2
                  REF      2
Laguna Beach      MPA      1
                  REF      1
Piedras Blancas   MPA     11
                  REF     11
Point Buchon      MPA     13
                  REF     13
Point Conception  MPA      1
                  REF      1
Point Lobos       MPA     13
                  REF     13
South La Jolla    MPA      3
                  REF      3
Stewarts Point    MPA      3
                  REF      3
Swamis            MPA      3
                  REF      3
Ten Mile          MPA      3
                  REF      3
Trinidad          REF      2
Name: Year, dtype: int64

In [8]:
## So how many rows should each species have after converting to long format?

sum(num_years)

158

In [9]:
## Melt data

data_long = pd.melt(data, id_vars=data.columns[0:5].tolist(), var_name='species_common_name', value_name='count')
data_long.head()

Unnamed: 0,Area,Site,Lat Center Point,Lon Center Point,Year,species_common_name,count
0,Trinidad,REF,41.115,-124.173,2018,Barred Sand Bass,0
1,Trinidad,REF,41.115,-124.173,2019,Barred Sand Bass,0
2,Cape Mendocino,MPA,40.426,-124.478,2017,Barred Sand Bass,0
3,Cape Mendocino,MPA,40.426,-124.478,2018,Barred Sand Bass,0
4,Cape Mendocino,MPA,40.426,-124.478,2019,Barred Sand Bass,0


In [10]:
## Check number of records per species

print(data_long[data_long['species_common_name'] == 'Barred Sand Bass'].shape)
print(data_long[data_long['species_common_name'] == 'Garibaldi'].shape)
print(data_long[data_long['species_common_name'] == 'Unknown'].shape)

(158, 7)
(158, 7)
(158, 7)


### Join to obtain scientific names

In [11]:
## Merge

data_long = data_long.merge(species, how='left', left_on='species_common_name', right_on='common_names')
data_long.head()

Unnamed: 0,Area,Site,Lat Center Point,Lon Center Point,Year,species_common_name,count,common_names,scientific_names
0,Trinidad,REF,41.115,-124.173,2018,Barred Sand Bass,0,Barred Sand Bass,Paralabrax nebulifer
1,Trinidad,REF,41.115,-124.173,2019,Barred Sand Bass,0,Barred Sand Bass,Paralabrax nebulifer
2,Cape Mendocino,MPA,40.426,-124.478,2017,Barred Sand Bass,0,Barred Sand Bass,Paralabrax nebulifer
3,Cape Mendocino,MPA,40.426,-124.478,2018,Barred Sand Bass,0,Barred Sand Bass,Paralabrax nebulifer
4,Cape Mendocino,MPA,40.426,-124.478,2019,Barred Sand Bass,0,Barred Sand Bass,Paralabrax nebulifer


In [12]:
## Double check that only Unknown species have missing scientific_names

# pd.set_option('display.max_rows', None)
pd.set_option('display.max_rows', 60)
data_long[data_long['scientific_names'].isnull() == True]

Unnamed: 0,Area,Site,Lat Center Point,Lon Center Point,Year,species_common_name,count,common_names,scientific_names
12482,Trinidad,REF,41.115,-124.173,2018,Unknown,0,Unknown,
12483,Trinidad,REF,41.115,-124.173,2019,Unknown,0,Unknown,
12484,Cape Mendocino,MPA,40.426,-124.478,2017,Unknown,0,Unknown,
12485,Cape Mendocino,MPA,40.426,-124.478,2018,Unknown,0,Unknown,
12486,Cape Mendocino,MPA,40.426,-124.478,2019,Unknown,0,Unknown,
...,...,...,...,...,...,...,...,...,...
12635,South La Jolla,MPA,32.815,-117.298,2018,Unknown,0,Unknown,
12636,South La Jolla,MPA,32.815,-117.298,2019,Unknown,0,Unknown,
12637,South La Jolla,REF,32.839,-117.302,2017,Unknown,0,Unknown,
12638,South La Jolla,REF,32.839,-117.302,2018,Unknown,0,Unknown,


**Question:** Should we include observations of unknown species in this data set? My gut reaction is no; those are important data in some contexts, but not very helpful in a presence/absence data set.

In [13]:
## Drop unnecessary columns

data_long.drop(['species_common_name', 'common_names'], axis=1, inplace=True)
data_long.head()

Unnamed: 0,Area,Site,Lat Center Point,Lon Center Point,Year,count,scientific_names
0,Trinidad,REF,41.115,-124.173,2018,0,Paralabrax nebulifer
1,Trinidad,REF,41.115,-124.173,2019,0,Paralabrax nebulifer
2,Cape Mendocino,MPA,40.426,-124.478,2017,0,Paralabrax nebulifer
3,Cape Mendocino,MPA,40.426,-124.478,2018,0,Paralabrax nebulifer
4,Cape Mendocino,MPA,40.426,-124.478,2019,0,Paralabrax nebulifer


### Conversion terms

**eventID** - Need to create this. Perhaps Area_Site_Year (e.g. Trinidad_REF_2018) <br>
**year** - year. <span style="color:red">Is eventDate required?</span><br>
**habitat** - <span style="color:red">Perhaps this is a good heading for reference versus mpa information?</span> <br>
**location** - <span style="color:red">Is there anything in Location that would be a good fit for area data?</span> <br>
**decimalLatitude, decimalLongitude** - Lat Center Point, Lon Center Point. <span style="color:red">Is there some way to give the corners of the grid as well? Or indicate that this is the center of a larger area over which the result is cumulative? Perhaps something associated with sampling? Should a **samplingProtocol** be linked?</span><br>
**occurrenceID** - Need to create this. <span style="color:red">Ideas? Can it just be a number increasing from 1 to the number of occurrences?</span> <br>
**scientificName** - scientific_names <br>
**scientificNameID** - WoRMS ID <br>
**taxonID** - WoRMS taxon ID <br>
**nameAccordingTo** - Worms <br>
**occurrenceStatus** - present <br>
**basisOfRecord** - HumanObservation <br>
**individualCount** - count <br>
**organismQuantity, organismQuantityType** - <span style="color:red">Do we want to include CPUE this way? Can join this data set with CPUE data. **Actually, CPUE isn't really an "organism quantity." Perhaps there's a better option under MeasurementOrFact?** MeasurementOrFact seems reasonable, would include the fields **measurementType, measurementValue, measurementAccuracy, measurementUnit.**</span>

<span style="color:red">**Do we want to include some kind of attribution, like institutionCode? Or will that be clear enough in the data submission?**</span>

**Where are all the places the MPA data will ultimately be submitted, anyway?**

### Assemble count data


In [14]:
### Build eventID and put it in a new data frame

eventID = data_long['Area'] + '_' + data_long['Site'] + '_' + data_long['Year'].astype('str')
converted = pd.DataFrame({'eventID':eventID})
converted.head()

Unnamed: 0,eventID
0,Trinidad_REF_2018
1,Trinidad_REF_2019
2,Cape Mendocino_MPA_2017
3,Cape Mendocino_MPA_2018
4,Cape Mendocino_MPA_2019


In [15]:
## Add year

converted['year'] = data_long['Year']
converted.head()

Unnamed: 0,eventID,year
0,Trinidad_REF_2018,2018
1,Trinidad_REF_2019,2019
2,Cape Mendocino_MPA_2017,2017
3,Cape Mendocino_MPA_2018,2018
4,Cape Mendocino_MPA_2019,2019


In [16]:
## Add habitat

converted['habitat'] = data_long['Site']
converted.head()

Unnamed: 0,eventID,year,habitat
0,Trinidad_REF_2018,2018,REF
1,Trinidad_REF_2019,2019,REF
2,Cape Mendocino_MPA_2017,2017,MPA
3,Cape Mendocino_MPA_2018,2018,MPA
4,Cape Mendocino_MPA_2019,2019,MPA


In [17]:
## Change MPA and REF to something more interpretable

habitat_dict = {
    'REF':'fished area',
    'MPA':'marine protected area'
}
converted['habitat'].replace(habitat_dict, inplace=True)
converted.head()

Unnamed: 0,eventID,year,habitat
0,Trinidad_REF_2018,2018,fished area
1,Trinidad_REF_2019,2019,fished area
2,Cape Mendocino_MPA_2017,2017,marine protected area
3,Cape Mendocino_MPA_2018,2018,marine protected area
4,Cape Mendocino_MPA_2019,2019,marine protected area


In [18]:
## Add decimal latitude and decimal longitude

converted['decimalLatitude'] = data_long['Lat Center Point']
converted['decimallongitude'] = data_long['Lon Center Point']
converted.head()

Unnamed: 0,eventID,year,habitat,decimalLatitude,decimallongitude
0,Trinidad_REF_2018,2018,fished area,41.115,-124.173
1,Trinidad_REF_2019,2019,fished area,41.115,-124.173
2,Cape Mendocino_MPA_2017,2017,marine protected area,40.426,-124.478
3,Cape Mendocino_MPA_2018,2018,marine protected area,40.426,-124.478
4,Cape Mendocino_MPA_2019,2019,marine protected area,40.426,-124.478


**Need to remember to double check whether lats and longs are in WGS84.**

In [19]:
## Add occurrenceID

converted['occurrenceID'] = range(1, converted.shape[0]+1)
converted.head()

Unnamed: 0,eventID,year,habitat,decimalLatitude,decimallongitude,occurrenceID
0,Trinidad_REF_2018,2018,fished area,41.115,-124.173,1
1,Trinidad_REF_2019,2019,fished area,41.115,-124.173,2
2,Cape Mendocino_MPA_2017,2017,marine protected area,40.426,-124.478,3
3,Cape Mendocino_MPA_2018,2018,marine protected area,40.426,-124.478,4
4,Cape Mendocino_MPA_2019,2019,marine protected area,40.426,-124.478,5


#### Use new WoRMS functions to add in scientific name information

In [26]:
## Get unique scientific names, remove nan's

sci_names = data_long['scientific_names'].dropna().unique()
sci_names[0:5]

array(['Paralabrax nebulifer', 'Myliobatis californica',
       'Hippoglossina stomata', 'Sebastes chrysomelas',
       'Sebastes melanops'], dtype=object)

In [27]:
%%time

## Call run_get_worms_from_scientific_name

name_id_dict, name_name_dict, name_taxid_dict = WoRMS.run_get_worms_from_scientific_name(sci_names)

Wall time: 1min 10s


**Note** that right now, WoRMS is matching the Olive or Yellowtail Rockfish category with Sebastes.

In [43]:
## Add scientific name-related columns

converted['scientificName'] = data_long['scientific_names']

converted['scientificNameID'] = data_long['scientific_names']
converted['scientificNameID'].replace(name_id_dict, inplace=True)

converted['taxonID'] = data_long['scientific_names']
converted['taxonID'].replace(name_taxid_dict, inplace=True)
converted.head()

Unnamed: 0,eventID,year,habitat,decimalLatitude,decimallongitude,occurrenceID,scientificName,scientificNameID,taxonID
0,Trinidad_REF_2018,2018,fished area,41.115,-124.173,1,Paralabrax nebulifer,urn:lsid:marinespecies.org:taxname:282059,282059.0
1,Trinidad_REF_2019,2019,fished area,41.115,-124.173,2,Paralabrax nebulifer,urn:lsid:marinespecies.org:taxname:282059,282059.0
2,Cape Mendocino_MPA_2017,2017,marine protected area,40.426,-124.478,3,Paralabrax nebulifer,urn:lsid:marinespecies.org:taxname:282059,282059.0
3,Cape Mendocino_MPA_2018,2018,marine protected area,40.426,-124.478,4,Paralabrax nebulifer,urn:lsid:marinespecies.org:taxname:282059,282059.0
4,Cape Mendocino_MPA_2019,2019,marine protected area,40.426,-124.478,5,Paralabrax nebulifer,urn:lsid:marinespecies.org:taxname:282059,282059.0


**Note:** TaxonID is currently a float instead of an integer because the column contains NaN values. This can be handled once we decide what to do with the 'Unknown' species category.

In [46]:
## Add final name-related columns

converted['nameAccordingTo'] = 'WoRMS'
converted['occurrenceStatus'] = 'present'
converted['basisOfRecord'] = 'HumanObservation'

converted.head()

Unnamed: 0,eventID,year,habitat,decimalLatitude,decimallongitude,occurrenceID,scientificName,scientificNameID,taxonID,nameAccordingTo,occurrenceStatus,basisOfRecord
0,Trinidad_REF_2018,2018,fished area,41.115,-124.173,1,Paralabrax nebulifer,urn:lsid:marinespecies.org:taxname:282059,282059.0,WoRMS,present,HumanObservation
1,Trinidad_REF_2019,2019,fished area,41.115,-124.173,2,Paralabrax nebulifer,urn:lsid:marinespecies.org:taxname:282059,282059.0,WoRMS,present,HumanObservation
2,Cape Mendocino_MPA_2017,2017,marine protected area,40.426,-124.478,3,Paralabrax nebulifer,urn:lsid:marinespecies.org:taxname:282059,282059.0,WoRMS,present,HumanObservation
3,Cape Mendocino_MPA_2018,2018,marine protected area,40.426,-124.478,4,Paralabrax nebulifer,urn:lsid:marinespecies.org:taxname:282059,282059.0,WoRMS,present,HumanObservation
4,Cape Mendocino_MPA_2019,2019,marine protected area,40.426,-124.478,5,Paralabrax nebulifer,urn:lsid:marinespecies.org:taxname:282059,282059.0,WoRMS,present,HumanObservation


#### Add count data

In [47]:
## Add count data

converted['individualCount'] = data_long['count']
converted.head()

Unnamed: 0,eventID,year,habitat,decimalLatitude,decimallongitude,occurrenceID,scientificName,scientificNameID,taxonID,nameAccordingTo,occurrenceStatus,basisOfRecord,individualCount
0,Trinidad_REF_2018,2018,fished area,41.115,-124.173,1,Paralabrax nebulifer,urn:lsid:marinespecies.org:taxname:282059,282059.0,WoRMS,present,HumanObservation,0
1,Trinidad_REF_2019,2019,fished area,41.115,-124.173,2,Paralabrax nebulifer,urn:lsid:marinespecies.org:taxname:282059,282059.0,WoRMS,present,HumanObservation,0
2,Cape Mendocino_MPA_2017,2017,marine protected area,40.426,-124.478,3,Paralabrax nebulifer,urn:lsid:marinespecies.org:taxname:282059,282059.0,WoRMS,present,HumanObservation,0
3,Cape Mendocino_MPA_2018,2018,marine protected area,40.426,-124.478,4,Paralabrax nebulifer,urn:lsid:marinespecies.org:taxname:282059,282059.0,WoRMS,present,HumanObservation,0
4,Cape Mendocino_MPA_2019,2019,marine protected area,40.426,-124.478,5,Paralabrax nebulifer,urn:lsid:marinespecies.org:taxname:282059,282059.0,WoRMS,present,HumanObservation,0


### How might we add in the CPUE data?

In [52]:
## Load CPUE data

path = 'C:\\Users\\dianalg\\Documents\\Work\\MBARI\\MPA Data Integration\\CCFRP\\'
filename = 'CPUE_Avg.csv'
cpue = pd.read_csv(path+filename)

filename = 'CPUE_SE.csv'
cpue_err = pd.read_csv(path+filename)

cpue.head()

Unnamed: 0,Area,Site,Lat Center Point,Lon Center Point,Year,Barred Sand Bass,Bat Ray,Bigmouth Sole,Black-and-Yellow Rockfish,Black Rockfish,...,Vermilion Rockfish,White Croaker,White Seabass,Widow Rockfish,Wolf Eel,Yelloweye Rockfish,Yellowfin Croaker,Yellowtail Jack,Yellowtail Rockfish,Total
0,Trinidad,REF,41.115,-124.173,2018,0.0,0.0,0.0,0.0,9.404579,...,0.027174,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.297565,11.96054
1,Trinidad,REF,41.115,-124.173,2019,0.0,0.0,0.0,0.0,5.344784,...,0.014205,0.0,0.0,0.0,0.0,0.013587,0.0,0.0,0.221865,7.006875
2,Cape Mendocino,MPA,40.426,-124.478,2017,0.0,0.0,0.0,0.0,3.205102,...,0.253702,0.0,0.0,0.0,0.0,0.08411,0.0,0.0,0.116012,6.506929
3,Cape Mendocino,MPA,40.426,-124.478,2018,0.0,0.0,0.0,0.0,1.649172,...,0.557115,0.0,0.0,0.0,0.0,0.284795,0.0,0.0,0.200139,8.507731
4,Cape Mendocino,MPA,40.426,-124.478,2019,0.0,0.0,0.0,0.0,1.434477,...,0.429367,0.0,0.0,0.0,0.0,0.172666,0.0,0.0,0.278409,6.552735


In [53]:
## Perform initial processing steps and convert to long-form

# Drop species 'Total'
cpue.drop('Total', axis=1, inplace=True)
cpue_err.drop('Total', axis=1, inplace=True)

## Melt data
cpue_long = pd.melt(cpue, id_vars=data.columns[0:5].tolist(), var_name='species_common_name', value_name='cpue')
cpue_err_long = pd.melt(cpue_err, id_vars=data.columns[0:5].tolist(), var_name='species_common_name', value_name='cpue_se')
cpue_long.head()

Unnamed: 0,Area,Site,Lat Center Point,Lon Center Point,Year,species_common_name,cpue
0,Trinidad,REF,41.115,-124.173,2018,Barred Sand Bass,0.0
1,Trinidad,REF,41.115,-124.173,2019,Barred Sand Bass,0.0
2,Cape Mendocino,MPA,40.426,-124.478,2017,Barred Sand Bass,0.0
3,Cape Mendocino,MPA,40.426,-124.478,2018,Barred Sand Bass,0.0
4,Cape Mendocino,MPA,40.426,-124.478,2019,Barred Sand Bass,0.0


In [54]:
## Add error column to cpue

cpue_long['cpue_se'] = cpue_err_long['cpue_se']
cpue_long.head()

Unnamed: 0,Area,Site,Lat Center Point,Lon Center Point,Year,species_common_name,cpue,cpue_se
0,Trinidad,REF,41.115,-124.173,2018,Barred Sand Bass,0.0,0.0
1,Trinidad,REF,41.115,-124.173,2019,Barred Sand Bass,0.0,0.0
2,Cape Mendocino,MPA,40.426,-124.478,2017,Barred Sand Bass,0.0,0.0
3,Cape Mendocino,MPA,40.426,-124.478,2018,Barred Sand Bass,0.0,0.0
4,Cape Mendocino,MPA,40.426,-124.478,2019,Barred Sand Bass,0.0,0.0


In [55]:
## Check that number of rows for count and cpue data match

print(converted.shape)
cpue_long.shape

(14220, 13)


(14220, 8)

In [58]:
## Add CPUE data to converted

converted['measurmentType'] = 'average catch per unit effort'
converted['measurementValue'] = cpue_long['cpue']
converted['measurementAccuracy'] = cpue_long['cpue_se']
converted['measurementUnit'] = 'number of fish per angler hour'

converted.head()

Unnamed: 0,eventID,year,habitat,decimalLatitude,decimallongitude,occurrenceID,scientificName,scientificNameID,taxonID,nameAccordingTo,occurrenceStatus,basisOfRecord,individualCount,measurmentType,measurementValue,measurementAccuracy,measurementUnit
0,Trinidad_REF_2018,2018,fished area,41.115,-124.173,1,Paralabrax nebulifer,urn:lsid:marinespecies.org:taxname:282059,282059.0,WoRMS,present,HumanObservation,0,average catch per unit effort,0.0,0.0,number of fish per angler hour
1,Trinidad_REF_2019,2019,fished area,41.115,-124.173,2,Paralabrax nebulifer,urn:lsid:marinespecies.org:taxname:282059,282059.0,WoRMS,present,HumanObservation,0,average catch per unit effort,0.0,0.0,number of fish per angler hour
2,Cape Mendocino_MPA_2017,2017,marine protected area,40.426,-124.478,3,Paralabrax nebulifer,urn:lsid:marinespecies.org:taxname:282059,282059.0,WoRMS,present,HumanObservation,0,average catch per unit effort,0.0,0.0,number of fish per angler hour
3,Cape Mendocino_MPA_2018,2018,marine protected area,40.426,-124.478,4,Paralabrax nebulifer,urn:lsid:marinespecies.org:taxname:282059,282059.0,WoRMS,present,HumanObservation,0,average catch per unit effort,0.0,0.0,number of fish per angler hour
4,Cape Mendocino_MPA_2019,2019,marine protected area,40.426,-124.478,5,Paralabrax nebulifer,urn:lsid:marinespecies.org:taxname:282059,282059.0,WoRMS,present,HumanObservation,0,average catch per unit effort,0.0,0.0,number of fish per angler hour


**Question:** How to specify that measurementAccuracy is standard error? Where to provide information on how average CPUE is obtained? How many replicates go into this average?

### Save

In [59]:
## Save

converted.to_csv('CCFRP_converted.csv', index=False, na_rep='NaN')

### Questions for Patrick

1. Grouped olive and yellowtail rockfish: Eliminate, leave at Sebastes, include occurrenceRemarks?
2. Include 'Unknown' as a species category? Could identify as Pisces and clarify that the species was unknown in occurrenceRemarks?
3. How to include 'Area' field? Is there anything appropriate in Location?
4. How to deal with the fact that lat, long are center points of an area? 
5. Is it OK for occurrenceID just to be a row number?
6. I feel like including zeros is important in these data, but it's not really a "presence" record then. 
7. Do we need to include some kind of attribution under institutionCode? Or will the information in the submission be enough?
8. How to specify that measurementAccuracy is standard error? Where to provide information on how average CPUE is obtained? How many replicates go into this average?
9. **Where are all the places we're hoping to submit the MPA data?**

*Also, ask Patrick lingering VARS question when you have him on the phone.*