# Sound production to presence

This notebook uses the coastwatch erddap to collect species presence information from SanctSound sound production datasets.

It creates a `sanctsound_presence.zip` file that contains when animals were acoustically present at a specific time and location.

The notebook `presence_to_occurrence.ipynb` reads the results of this notebook and converts them to an occurrence table. 
The notebook `sound_propagation_processing.ipynb` reads the occurrence table and adds information about the coordinateUncertainty from sound propagation modeling data.


Let's search the [Coastwatch ERDDAP](https://coastwatch.pfeg.noaa.gov/erddap/index.html) for datasets that contain the following information:

```
sanctsound "Sound Production"
```

In [None]:
import erddapy

erddapy.__version__

'2.0.1'

In [1]:
from erddapy import ERDDAP
import pandas as pd

server = "https://coastwatch.pfeg.noaa.gov/erddap/"

protocol = "griddap"

search_for = 'sanctsound "Sound Production"'

e = ERDDAP(server=server, protocol=protocol)

url = e.get_search_url(search_for=search_for, response="csv")

datasets = pd.read_csv(url)[["Dataset ID","Title"]]

datasets

Unnamed: 0,Dataset ID,Title
0,noaaSanctSound_GR01_01_dolphins_1h,NOAA-Navy Sanctuary Soundscape Monitoring Proj...
1,noaaSanctSound_GR01_02_dolphins_1h,NOAA-Navy Sanctuary Soundscape Monitoring Proj...
2,noaaSanctSound_GR01_03_dolphins_1h,NOAA-Navy Sanctuary Soundscape Monitoring Proj...
3,noaaSanctSound_GR01_04_dolphins_1h,NOAA-Navy Sanctuary Soundscape Monitoring Proj...
4,noaaSanctSound_GR01_05_dolphins_1h,NOAA-Navy Sanctuary Soundscape Monitoring Proj...
...,...,...
690,noaaSanctSound_SB03_08_finwhale_1d,NOAA-Navy Sanctuary Soundscape Monitoring Proj...
691,noaaSanctSound_SB03_09_finwhale_1d,NOAA-Navy Sanctuary Soundscape Monitoring Proj...
692,noaaSanctSound_SB03_10_finwhale_1d,NOAA-Navy Sanctuary Soundscape Monitoring Proj...
693,noaaSanctSound_SB03_11_finwhale_1d,NOAA-Navy Sanctuary Soundscape Monitoring Proj...


# Next let's start building our presence table from each dataset

Now lets find the `start_time` and `end_time` when animals were present (e.g. `dolphin_presence == 1`).

To do this we look through the variables in the dataset to find the data variable which ends with the phrase `presence` or `detection_count`. This will return the string of the variable name (e.g. `dolphin_presence`). Then, we want to filter the dataset for when that variable only has values equal to 1.0 (or present). Then, we drop any entrys not equal to 1.0.

This returns a filtered xarray dataset of only presence values along with `start_time`, `end_time`, and all the associated metadata.

In [7]:
%%time

df_final = pd.DataFrame()

df_broken = pd.DataFrame()

for index, row in datasets.iterrows():
    
    print('\n{}/{} datasets\n'.format(index+1,datasets.shape[0]))
    print('querying {}'.format(row['Dataset ID']))
    
    e.dataset_id = row['Dataset ID']
    
    ds = e.to_xarray()
    
    time_var = list(ds.coords)[0]
    
    print('{} ({} rows): {}'.format(row['Dataset ID'],ds.coords.dims[time_var],ds.geospatial_bounds))
    
    # set up try/except to test for presence vars, skip datasets without them.
    try:
        da = [da for varname, da in ds.data_vars.items() if (varname.endswith("presence") | varname.endswith("detection_count"))][0]
    except:
        
        string = 'Skipping {} - no presence vars'.format(row['Dataset ID'])
        
        df_broke = pd.DataFrame({'Dataset ID': [row['Dataset ID']], 
                                'reason': [string]})
        df_broken = pd.concat([df_broken, df_broke])
        
        continue

    
    # subset to only presences (presence var == 1)
    ds_subset = da[da.values != 0]
    
    # kick out if ds_subset is empty
    if len(ds_subset[time_var]) == 0:
        
        string = '{} obs - moving to next dataset'.format(len(ds_subset[time_var]))
        
        df_broke = pd.DataFrame({'Dataset ID': [row['Dataset ID']], 
                                'reason': [string]})
        
        df_broken = pd.concat([df_broken, df_broke])
        
        continue
        
    df = ds_subset.to_dataframe().reset_index()
    
    print('Subsetted to ({} rows)'.format(len(ds_subset[time_var])))
    
    # Store global attributes as data.
    df['dataset_id'] = row['Dataset ID']
    df['WKT'] = ds.geospatial_bounds
    df['decimalLatitude'] = ds.geospatial_bounds.split(" ")[1].replace("(","")
    df['decimalLongitude'] = ds.geospatial_bounds.split(" ")[2].replace(")","")
    df['vernacularName'] = ds.title.split(",")[1].replace(" Sound Production","").replace(" Sound Producion","").lower().lstrip()
    
    df_final = pd.concat([df_final, df])

print(f'There were {df_broken.shape[0]} datasets that either didn\'t have a presence variable or didn\'t contain presence data')


1/695 datasets

querying noaaSanctSound_GR01_01_dolphins_1h
noaaSanctSound_GR01_01_dolphins_1h (3185 rows): POINT (31.396417 -80.8904)
Subsetted to (191 rows)

2/695 datasets

querying noaaSanctSound_GR01_02_dolphins_1h
noaaSanctSound_GR01_02_dolphins_1h (3029 rows): POINT (31.396417 -80.8904)
Subsetted to (56 rows)

3/695 datasets

querying noaaSanctSound_GR01_03_dolphins_1h
noaaSanctSound_GR01_03_dolphins_1h (2532 rows): POINT (31.396417 -80.8904)
Subsetted to (146 rows)

4/695 datasets

querying noaaSanctSound_GR01_04_dolphins_1h
noaaSanctSound_GR01_04_dolphins_1h (2693 rows): POINT (31.396417 -80.8904)
Subsetted to (110 rows)

5/695 datasets

querying noaaSanctSound_GR01_05_dolphins_1h
noaaSanctSound_GR01_05_dolphins_1h (3344 rows): POINT (31.396417 -80.8904)
Subsetted to (341 rows)

6/695 datasets

querying noaaSanctSound_GR02_02_dolphins_1h
noaaSanctSound_GR02_02_dolphins_1h (2741 rows): POINT (31.376133 -80.839133)
Subsetted to (291 rows)

7/695 datasets

querying noaaSanctSoun

In [9]:
df_final.sample(n=5)

Unnamed: 0,start_time,dolphin_presence,dataset_id,WKT,decimalLatitude,decimalLongitude,vernacularName,time,bluewhale_presence,bluewhale_manual_presence,...,pinniped_presence,redgrouper_detection_count,seiwhale_presence,atlanticcod_presence,blackgrouper_detection_count,humpbackwhale_presence,killerwhale_presence,minkewhale_presence,plainfinmidshipman_presence,northatlanticrightwhale_presence
19562,NaT,,noaaSanctSound_MB01_03_bluewhale,POINT (36.798 -121.976),36.798,-121.976,blue whale,2019-10-30 21:50:59.648,1.0,,...,,,,,,,,,,
28438,NaT,,noaaSanctSound_CI02_07_bluewhale,POINT (34.0855 -120.5224),34.0855,-120.5224,blue whale,2021-08-31 23:05:59.872,1.0,,...,,,,,,,,,,
13354,2020-04-24 10:33:53.125999872,,noaaSanctSound_CI03_03_bocaccio,POINT (33.48687 -119.01609),33.48687,-119.01609,bocaccio,NaT,,,...,,,,,,,,,,
46079,NaT,,noaaSanctSound_CI02_05_bluewhale,POINT (34.0853 -120.5223),34.0853,-120.5223,blue whale,2020-10-18 03:15:56.104,1.0,,...,,,,,,,,,,
6426,NaT,,noaaSanctSound_CI04_08_bluewhale,POINT (33.8485 -120.1159),33.8485,-120.1159,blue whale,2021-08-22 23:56:29.424,1.0,,...,,,,,,,,,,


## WoRMS Mapping
WoRMS lookup. Abby Benson created a mapping table which we will use below to insert the appropriate WoRMS idenfitiers.

In [10]:
df_mapping = pd.read_csv('SanctSound_SpeciesLookupTable.csv')

df_mapping

Unnamed: 0,vernacularName,scientificName,scientificNameID,taxonRank,kingdom,propagationFrequency
0,dolphin,Cetacea,urn:lsid:marinespecies.org:taxname:2688,Infraorder,Animalia,5000
1,blue whale,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,63
2,bocaccio,Sebastes paucispinis,urn:lsid:marinespecies.org:taxname:274833,Species,Animalia,300
3,fin whale,Balaenoptera physalus,urn:lsid:marinespecies.org:taxname:137091,Species,Animalia,20
4,pinniped,Pinnipedia,urn:lsid:marinespecies.org:taxname:148736,Infraorder,Animalia,1000
5,red grouper,Epinephelus morio,urn:lsid:marinespecies.org:taxname:159354,Species,Animalia,125
6,sei whale,Balaenoptera borealis,urn:lsid:marinespecies.org:taxname:137088,Species,Animalia,63
7,black grouper,Mycteroperca bonaci,urn:lsid:marinespecies.org:taxname:159231,Species,Animalia,125
8,humpback whale,Megaptera novaeangliae,urn:lsid:marinespecies.org:taxname:137092,Species,Animalia,300
9,killer whale,Orcinus orca,urn:lsid:marinespecies.org:taxname:137102,Species,Animalia,1000


Now lets add in the WoRMS mapping for species information.

In [11]:
# merge in the WoRMS species information
df_presence = df_final.merge(df_mapping, how='left', on='vernacularName')  

df_presence.sample(5)

Unnamed: 0,start_time,dolphin_presence,dataset_id,WKT,decimalLatitude,decimalLongitude,vernacularName,time,bluewhale_presence,bluewhale_manual_presence,...,humpbackwhale_presence,killerwhale_presence,minkewhale_presence,plainfinmidshipman_presence,northatlanticrightwhale_presence,scientificName,scientificNameID,taxonRank,kingdom,propagationFrequency
181509,NaT,,noaaSanctSound_CI04_03_bluewhale,POINT (33.84888 -120.117),33.84888,-120.117,blue whale,2019-11-03 03:10:25.984000000,1.0,,...,,,,,,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,63
381921,NaT,,noaaSanctSound_MB01_03_bluewhale,POINT (36.798 -121.976),36.798,-121.976,blue whale,2019-11-10 01:22:41.208000000,1.0,,...,,,,,,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,63
247910,NaT,,noaaSanctSound_CI04_05_bluewhale,POINT (33.8489 -120.1171),33.8489,-120.1171,blue whale,2020-10-12 23:36:04.064000256,1.0,,...,,,,,,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,63
128655,NaT,,noaaSanctSound_CI04_02_bluewhale,POINT (33.8489 -120.1175),33.8489,-120.1175,blue whale,2019-07-23 17:47:28.248000256,1.0,,...,,,,,,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,63
270676,NaT,,noaaSanctSound_CI04_06_bluewhale,POINT (33.8489 -120.1174),33.8489,-120.1174,blue whale,2020-11-28 22:21:12.792000000,1.0,,...,,,,,,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,63


In [12]:
df_presence.columns

Index(['start_time', 'dolphin_presence', 'dataset_id', 'WKT',
       'decimalLatitude', 'decimalLongitude', 'vernacularName', 'time',
       'bluewhale_presence', 'bluewhale_manual_presence', 'bocaccio_presence',
       'finwhale_presence', 'pinniped_presence', 'redgrouper_detection_count',
       'seiwhale_presence', 'atlanticcod_presence',
       'blackgrouper_detection_count', 'humpbackwhale_presence',
       'killerwhale_presence', 'minkewhale_presence',
       'plainfinmidshipman_presence', 'northatlanticrightwhale_presence',
       'scientificName', 'scientificNameID', 'taxonRank', 'kingdom',
       'propagationFrequency'],
      dtype='object')

## Determining `time`

Okay, we have two time variables: 
`start_time`, and `time`

We need to make one `eventDate`!

Let's first check to see if we can mash things together.

First, lets print out all the times when `time` has an entry:

In [13]:
df_presence.loc[df_presence['time'].notna(),['start_time','time']].sample(20)

Unnamed: 0,start_time,time
13763,NaT,2020-08-05 06:15:05.240000000
591552,NaT,2020-12-01 23:45:29.952000256
338704,NaT,2020-12-10 07:26:42.672000000
507469,NaT,2018-12-20 22:49:31.608000000
95345,NaT,2021-09-06 14:13:55.112000000
54322,NaT,2020-10-18 23:21:49.528000000
53154,NaT,2020-10-17 06:36:43.120000000
567323,NaT,2020-09-06 08:19:35.520000000
231718,NaT,2020-09-20 18:45:20.512000000
292438,NaT,2021-07-28 05:49:55.104000000


Okay, so let's see if `start_time` is only NaN for all those rows:

In [15]:
df_presence.loc[df_presence['time'].notna(),'start_time'].unique()

array(['NaT'], dtype='datetime64[ns]')

Looking good! We have only NaN's returned so we don't have conflicting dates between `start_time` and `time`.

Fantastic! So, this means we can make a new column for `eventDate` which merges `time` into `start_time`.

In [16]:
#df_presence_copy = df_presence.copy()

# start eventDate column with values where `time` exists.
df_presence['eventDate'] = df_presence.loc[df_presence['time'].notna(),['time']]

# fillna with values from start_time
df_presence['eventDate'].fillna(df_presence['start_time'], inplace=True)

df_presence[['eventDate','time','start_time']].sample(n=5)

Unnamed: 0,eventDate,time,start_time
116785,2019-06-27 13:06:12.336,2019-06-27 13:06:12.336,NaT
551719,2019-11-17 01:13:48.928,2019-11-17 01:13:48.928,NaT
470289,2021-01-31 03:24:13.920,2021-01-31 03:24:13.920,NaT
592650,2020-12-09 00:54:28.576,2020-12-09 00:54:28.576,NaT
428263,2020-10-19 14:05:28.296,2020-10-19 14:05:28.296,NaT


In [17]:
df_presence.loc[df_presence['eventDate'].isna()]

Unnamed: 0,start_time,dolphin_presence,dataset_id,WKT,decimalLatitude,decimalLongitude,vernacularName,time,bluewhale_presence,bluewhale_manual_presence,...,killerwhale_presence,minkewhale_presence,plainfinmidshipman_presence,northatlanticrightwhale_presence,scientificName,scientificNameID,taxonRank,kingdom,propagationFrequency,eventDate


## Double check we moved the right values

Show me where `time` is NaN and we used `start_time`.

In [18]:
df_presence.loc[df_presence['time'].isna(),['start_time','eventDate','time']].sample(5)

Unnamed: 0,start_time,eventDate,time
672147,2020-01-09 19:00:00.000000000,2020-01-09 19:00:00.000000000,NaT
625625,2020-04-13 04:24:32.341000192,2020-04-13 04:24:32.341000192,NaT
676220,2019-06-23 09:00:00.000000000,2019-06-23 09:00:00.000000000,NaT
657342,2019-09-24 05:44:58.614000128,2019-09-24 05:44:58.614000128,NaT
608600,2019-10-22 03:08:01.860000256,2019-10-22 03:08:01.860000256,NaT


Show me where `start_time` is NaN and we used `time`.

In [19]:
df_presence.loc[df_presence['start_time'].isna(),['start_time','eventDate','time']].sample(5)

Unnamed: 0,start_time,eventDate,time
409503,NaT,2020-08-17 08:24:09.200000000,2020-08-17 08:24:09.200000000
317524,NaT,2021-09-24 13:03:51.176000000,2021-09-24 13:03:51.176000000
216989,NaT,2020-08-25 05:17:54.328000000,2020-08-25 05:17:54.328000000
529394,NaT,2020-08-09 02:50:25.232000000,2020-08-09 02:50:25.232000000
124050,NaT,2019-07-13 19:48:09.192000256,2019-07-13 19:48:09.192000256


Now, lets make `eventDate` the index for our DataFrame so we can make a nice plot and output the dates in a format we like.

In [20]:
df_presence['eventDate'] = pd.to_datetime(df_presence['eventDate'], format='%Y-%m-%d %H:%M:%S.%f')

df_presence

Unnamed: 0,start_time,dolphin_presence,dataset_id,WKT,decimalLatitude,decimalLongitude,vernacularName,time,bluewhale_presence,bluewhale_manual_presence,...,killerwhale_presence,minkewhale_presence,plainfinmidshipman_presence,northatlanticrightwhale_presence,scientificName,scientificNameID,taxonRank,kingdom,propagationFrequency,eventDate
0,2018-12-15 04:00:00,1.0,noaaSanctSound_GR01_01_dolphins_1h,POINT (31.396417 -80.8904),31.396417,-80.8904,dolphin,NaT,,,...,,,,,Cetacea,urn:lsid:marinespecies.org:taxname:2688,Infraorder,Animalia,5000,2018-12-15 04:00:00
1,2018-12-15 05:00:00,1.0,noaaSanctSound_GR01_01_dolphins_1h,POINT (31.396417 -80.8904),31.396417,-80.8904,dolphin,NaT,,,...,,,,,Cetacea,urn:lsid:marinespecies.org:taxname:2688,Infraorder,Animalia,5000,2018-12-15 05:00:00
2,2018-12-15 06:00:00,1.0,noaaSanctSound_GR01_01_dolphins_1h,POINT (31.396417 -80.8904),31.396417,-80.8904,dolphin,NaT,,,...,,,,,Cetacea,urn:lsid:marinespecies.org:taxname:2688,Infraorder,Animalia,5000,2018-12-15 06:00:00
3,2018-12-15 07:00:00,1.0,noaaSanctSound_GR01_01_dolphins_1h,POINT (31.396417 -80.8904),31.396417,-80.8904,dolphin,NaT,,,...,,,,,Cetacea,urn:lsid:marinespecies.org:taxname:2688,Infraorder,Animalia,5000,2018-12-15 07:00:00
4,2018-12-15 18:00:00,1.0,noaaSanctSound_GR01_01_dolphins_1h,POINT (31.396417 -80.8904),31.396417,-80.8904,dolphin,NaT,,,...,,,,,Cetacea,urn:lsid:marinespecies.org:taxname:2688,Infraorder,Animalia,5000,2018-12-15 18:00:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
713671,2020-11-18 00:00:00,,noaaSanctSound_SB03_12_finwhale_1d,POINT (42.25508 -70.179047),42.25508,-70.179047,fin whale,NaT,,,...,,,,,Balaenoptera physalus,urn:lsid:marinespecies.org:taxname:137091,Species,Animalia,20,2020-11-18 00:00:00
713672,2020-11-19 00:00:00,,noaaSanctSound_SB03_12_finwhale_1d,POINT (42.25508 -70.179047),42.25508,-70.179047,fin whale,NaT,,,...,,,,,Balaenoptera physalus,urn:lsid:marinespecies.org:taxname:137091,Species,Animalia,20,2020-11-19 00:00:00
713673,2020-11-20 00:00:00,,noaaSanctSound_SB03_12_finwhale_1d,POINT (42.25508 -70.179047),42.25508,-70.179047,fin whale,NaT,,,...,,,,,Balaenoptera physalus,urn:lsid:marinespecies.org:taxname:137091,Species,Animalia,20,2020-11-20 00:00:00
713674,2020-11-21 00:00:00,,noaaSanctSound_SB03_12_finwhale_1d,POINT (42.25508 -70.179047),42.25508,-70.179047,fin whale,NaT,,,...,,,,,Balaenoptera physalus,urn:lsid:marinespecies.org:taxname:137091,Species,Animalia,20,2020-11-21 00:00:00


## Write presence file

In [21]:
# overwrite to csv file
fname = 'data/sanctsound_presence.zip'
df_presence.to_csv(fname, index=False, compression='zip')

df_presence.sample(10)

Unnamed: 0,start_time,dolphin_presence,dataset_id,WKT,decimalLatitude,decimalLongitude,vernacularName,time,bluewhale_presence,bluewhale_manual_presence,...,killerwhale_presence,minkewhale_presence,plainfinmidshipman_presence,northatlanticrightwhale_presence,scientificName,scientificNameID,taxonRank,kingdom,propagationFrequency,eventDate
234983,NaT,,noaaSanctSound_CI04_05_bluewhale,POINT (33.8489 -120.1171),33.8489,-120.1171,blue whale,2020-09-23 23:51:51.280000000,1.0,,...,,,,,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,63,2020-09-23 23:51:51.280000000
510351,NaT,,noaaSanctSound_MB02_03_bluewhale,POINT (36.6495 -121.9084),36.6495,-121.9084,blue whale,2019-10-14 16:59:19.216000000,1.0,,...,,,,,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,63,2019-10-14 16:59:19.216000000
517690,NaT,,noaaSanctSound_MB02_03_bluewhale,POINT (36.6495 -121.9084),36.6495,-121.9084,blue whale,2019-11-15 11:09:46.360000256,1.0,,...,,,,,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,63,2019-11-15 11:09:46.360000256
72830,NaT,,noaaSanctSound_CI02_07_bluewhale,POINT (34.0855 -120.5224),34.0855,-120.5224,blue whale,2021-07-13 10:04:36.968000000,1.0,,...,,,,,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,63,2021-07-13 10:04:36.968000000
528180,NaT,,noaaSanctSound_MB02_05_bluewhale,POINT (36.6484 -121.9075),36.6484,-121.9075,blue whale,2020-07-14 17:41:54.488000000,1.0,,...,,,,,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,63,2020-07-14 17:41:54.488000000
520102,NaT,,noaaSanctSound_MB02_03_bluewhale,POINT (36.6495 -121.9084),36.6495,-121.9084,blue whale,2019-12-01 16:28:58.192000256,1.0,,...,,,,,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,63,2019-12-01 16:28:58.192000256
472381,2021-08-13 08:19:10.856,,noaaSanctSound_MB01_09_bluewhale,POINT (36.798 -122.9758),36.798,-122.9758,blue whale,NaT,1.0,,...,,,,,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,63,2021-08-13 08:19:10.856000000
246192,NaT,,noaaSanctSound_CI04_05_bluewhale,POINT (33.8489 -120.1171),33.8489,-120.1171,blue whale,2020-10-10 04:17:40.000000000,1.0,,...,,,,,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,63,2020-10-10 04:17:40.000000000
228775,NaT,,noaaSanctSound_CI04_05_bluewhale,POINT (33.8489 -120.1171),33.8489,-120.1171,blue whale,2020-09-12 07:26:43.552000000,1.0,,...,,,,,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,63,2020-09-12 07:26:43.552000000
552081,NaT,,noaaSanctSound_MB03_02_bluewhale,POINT (36.37021 -122.314903),36.37021,-122.314903,blue whale,2019-11-21 03:49:34.480000000,1.0,,...,,,,,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,63,2019-11-21 03:49:34.480000000
