# Sound propagation model data processing

This notebook brings in the sound propagation modeling results and links them to the occurrence records.

Sound propagation modeling data are available on Google Cloud at https://console.cloud.google.com/storage/browser/noaa-passive-bioacoustic/sanctsound/products/sound_propagation_models;tab=objects?prefix=&forceOnObjectsSortingFiltering=false

We can use the [`gutil cp`](https://cloud.google.com/storage/docs/gsutil/commands/cp) command to recursively download files from Google Cloud to a local directory.

```
gutil cp -r gs://noaa-passive-bioacoustic/sanctsound/products/sound_propagation_models/ci01/sanctsound_ci01_propmodeling/data/
```

But we only want the netCDF data and we only want the `sound_propagation` value.

```
ds = xr.open_dataset('SanctSound_CI01_propmodeling_SD0001m_SL165dB_FQ01000Hz_Apr_radarformat_highres.nc')
ds.variables['listening_range']

Out[6]:
<xarray.Variable (month: 1)>
array([2815])
Attributes:
    long_name:    distance_from_hydrophone_to_zero_SNR
    Description:  The median distance from the hydrophone to a zero signal-to...
    units:        m
```



In [1]:
import pandas as pd
import xarray as xr

# Function to download public files.

From https://cloud.google.com/storage/docs/access-public-data#storage-download-public-object-python

```
conda install google-cloud-storage
```

https://console.cloud.google.com/storage/browser/_details/noaa-passive-bioacoustic/sanctsound/products/sound_propagation_models/ci01/sanctsound_ci01_propmodeling/data/SanctSound_CI01_propmodeling_SD0001m_SL165dB_FQ01000Hz_Apr_radarformat_highres.nc;tab=live_object

In [2]:
# Download the readme for the noaa-passive-bioacoustic bucket

from google.cloud import storage
import os

storage_client = storage.Client.create_anonymous_client()

bucket_name = 'noaa-passive-bioacoustic'
delimiter='/'
bucket=storage_client.get_bucket(bucket_name)
blobs=bucket.list_blobs(delimiter=delimiter) #List all objects that satisfy the filter.


for blob in blobs:
    print(blob.name)
    if not os.path.exists(blob.name):
        blob.download_to_filename(blob.name)

README.pdf


In [3]:
from IPython.display import IFrame

IFrame(blob.name, width=900, height=1200)

Install `gsutil` from https://cloud.google.com/storage/docs/gsutil_install

and recusively download netCDF files from `noaa-passive-bioacoustic/sanctsound/products/sound_propagation_models`

gsutil uri for one of the datasets is:
```
gs://noaa-passive-bioacoustic/sanctsound/products/sound_propagation_models/ci01/sanctsound_ci01_propmodeling/data/SanctSound_CI01_propmodeling_SD0001m_SL165dB_FQ01000Hz_Apr_radarformat_highres.nc
```

Data citation:
```
NOAA National Centers for Environmental Information. 2017. Passive Acoustic Data Collection. NOAA National Centers for Environmental Information.
https://doi.org/10.25921/PF0H-SQ72. access date
```

In [4]:
# for one station
#!gcloud storage ls gs://noaa-passive-bioacoustic/sanctsound/products/sound_propagation_models/sb03/**/*.nc
    
# for all stations

temp = !gcloud storage ls gs://noaa-passive-bioacoustic/sanctsound/products/sound_propagation_models/**/*.nc

print(f'Found {len(temp)} files:')    

for file in temp:
    print(file.split('/')[-1])

Found 924 files:
SanctSound_CI01_propmodeling_SD0001m_SL165dB_FQ01000Hz_Apr_radarformat_highres.nc
SanctSound_CI01_propmodeling_SD0001m_SL165dB_FQ01000Hz_Jan_radarformat_highres.nc
SanctSound_CI01_propmodeling_SD0001m_SL165dB_FQ01000Hz_Jul_radarformat_highres.nc
SanctSound_CI01_propmodeling_SD0001m_SL165dB_FQ01000Hz_Oct_radarformat_highres.nc
SanctSound_CI01_propmodeling_SD0001m_SL165dB_FQ05000Hz_Apr_radarformat_highres.nc
SanctSound_CI01_propmodeling_SD0001m_SL165dB_FQ05000Hz_Jan_radarformat_highres.nc
SanctSound_CI01_propmodeling_SD0001m_SL165dB_FQ05000Hz_Jul_radarformat_highres.nc
SanctSound_CI01_propmodeling_SD0001m_SL165dB_FQ05000Hz_Oct_radarformat_highres.nc
SanctSound_CI01_propmodeling_SD0010m_SL185dB_FQ00125Hz_Apr_radarformat_highres.nc
SanctSound_CI01_propmodeling_SD0010m_SL185dB_FQ00125Hz_Jan_radarformat_highres.nc
SanctSound_CI01_propmodeling_SD0010m_SL185dB_FQ00125Hz_Jul_radarformat_highres.nc
SanctSound_CI01_propmodeling_SD0010m_SL185dB_FQ00125Hz_Oct_radarformat_highres.nc

In [5]:
## BE CAREFUL WITH THIS >900 FILES!
#
# !gsutil cp gs://noaa-passive-bioacoustic/sanctsound/products/sound_propagation_models/**/*.nc data\sound_propagation\

In [6]:
# Might be able to use xarray to grab data from gc:

# url = 'gs://noaa-passive-bioacoustic/sanctsound/products/sound_propagation_models/ci01/sanctsound_ci01_propmodeling/data/SanctSound_CI01_propmodeling_SD0001m_SL165dB_FQ01000Hz_Apr_radarformat_highres.nc'

# xr.open_dataset(url,
#                 engine="netcdf4"
#                )

# Match occurrence data with Sound propagation model

Now we can bring in the occurrence data and determine how to link the two resources together.

Reminder, occurrence data was collected from https://coastwatch.pfeg.noaa.gov/erddap/search/index.html?page=1&itemsPerPage=1000&searchFor=noaaSanctSound

In [7]:
df_occur = pd.read_csv('data/occurrence.zip', compression='zip')

df_occur.head(5)

Unnamed: 0,WKT,decimalLatitude,decimalLongitude,vernacularName,scientificName,scientificNameID,taxonRank,kingdom,eventDate,occurrenceID
0,POINT (31.396417 -80.8904),31.396417,-80.8904,dolphin,Cetacea,urn:lsid:marinespecies.org:taxname:2688,Infraorder,Animalia,2018-12-15T04:00:00.000000Z,noaaSanctSound_GR01_01_dolphins_1h_2018-12-15T...
1,POINT (31.396417 -80.8904),31.396417,-80.8904,dolphin,Cetacea,urn:lsid:marinespecies.org:taxname:2688,Infraorder,Animalia,2018-12-15T05:00:00.000000Z,noaaSanctSound_GR01_01_dolphins_1h_2018-12-15T...
2,POINT (31.396417 -80.8904),31.396417,-80.8904,dolphin,Cetacea,urn:lsid:marinespecies.org:taxname:2688,Infraorder,Animalia,2018-12-15T06:00:00.000000Z,noaaSanctSound_GR01_01_dolphins_1h_2018-12-15T...
3,POINT (31.396417 -80.8904),31.396417,-80.8904,dolphin,Cetacea,urn:lsid:marinespecies.org:taxname:2688,Infraorder,Animalia,2018-12-15T07:00:00.000000Z,noaaSanctSound_GR01_01_dolphins_1h_2018-12-15T...
4,POINT (31.396417 -80.8904),31.396417,-80.8904,dolphin,Cetacea,urn:lsid:marinespecies.org:taxname:2688,Infraorder,Animalia,2018-12-15T18:00:00.000000Z,noaaSanctSound_GR01_01_dolphins_1h_2018-12-15T...


Let's collect unique station and locality identifiers to match to propagation results.

In [8]:
df_occur['occurrenceID'].str.split('_',expand=True)[1].unique()

array(['GR01', 'GR02', 'GR03', 'CI01', 'CI02', 'CI03', 'CI04', 'CI05',
       'MB01', 'MB02', 'MB03', 'OC02', 'OC03', 'OC04', 'SB02', 'SB03',
       'FK01', 'FK02', 'FK03', 'HI03', 'HI04', 'PM05', 'SB01', 'OC01'],
      dtype=object)

# Investigate downloaded sound propagation files

We only downloaded a subset of the propagation model data for testing. There are **924** files, so we should explore how we might be able to do this work without downloading all the data.

What does our sound propagation model output look like? Let's look at the first file that we downloaded.

In [9]:
import os

directory = 'data/sound_propagation/'

fname = os.listdir(directory)[0]

print(fname,'\n')

ds = xr.open_dataset(directory+fname, engine='netcdf4')

ds.info()

SanctSound_CI01_propmodeling_SD0001m_SL165dB_FQ01000Hz_Apr_radarformat_highres.nc 

xarray.Dataset {
dimensions:
	month = 1 ;
	depth = 1 ;
	bearing = 361 ;
	range = 13853 ;

variables:
	float64 month(month) ;
		month:long_name = month_of_climatological_sound_speed_profiles ;
		month:description = month # for GDEM sound speed profiles: 1-Jan, 2-Feb etc ;
		month:units = 1 ;
	float64 depth(depth) ;
		depth:long_name = sound_source_depth ;
		depth:description = Depth of the sound source ;
		depth:units = m ;
	float64 bearing(bearing) ;
		bearing:axis = Y ;
		bearing:long_name = true_north_bearing_from_hydrophone ;
		bearing:units = degrees_true ;
	float64 range(range) ;
		range:axis = X ;
		range:long_name = range_away_from_hydrophone ;
		range:units = km ;
	|S4 site() ;
		site:long_name = SanctSound_site_name ;
	float64 latitude(bearing, range) ;
		latitude:standard_name = latitude ;
		latitude:units = degrees_north ;
	float64 longitude(bearing, range) ;
		longitude:standard_name = longi

We know we want the data from the variable `listening_range`.

In [10]:
ds['listening_range']

Grab all the `listening_range` data (in meters) with filenames to see if we can match to the occurrence data

In [11]:

df_listening_range = pd.DataFrame()

fnames = os.listdir(directory)

for fname in fnames:
    
    with xr.open_dataset(directory+fname, engine='netcdf4') as ds:
    
        df_temp = ds[['listening_range','depth','site']].to_dataframe().reset_index()

        # add additional data
        df_temp['fname'] = fname
        df_temp['site'] = df_temp['site'].values[0].decode('utf-8')
        df_temp['freq_Hz'] = int(ds.SoundSourcefrequency.replace('Hz','').strip())
        df_temp['hydrophone_depth_m'] = ds.HydrophoneDepth.replace('m','').strip()
        df_temp.rename(columns={'listening_range':'listening_range_m','month':'climatology'}, inplace=True)
        
        df_listening_range = pd.concat([df_listening_range, df_temp])
        
df_listening_range.sample(5)

Unnamed: 0,climatology,depth,listening_range_m,site,fname,freq_Hz,hydrophone_depth_m
0,4.0,10.0,59866,CI03,SanctSound_CI03_propmodeling_SD0010m_SL185dB_F...,125,22.7
0,7.0,20.0,8862,CI01,SanctSound_CI01_propmodeling_SD0020m_SL170dB_F...,300,17.5
0,4.0,10.0,21261,CI04,SanctSound_CI04_propmodeling_SD0010m_SL185dB_F...,125,152.5
0,7.0,10.0,63579,CI03,SanctSound_CI03_propmodeling_SD0010m_SL185dB_F...,125,22.7
0,7.0,20.0,128344,CI02,SanctSound_CI02_propmodeling_SD0020m_SL192dB_F...,63,73.5


Print out the dataframe to share via chat

In [12]:
# pd.set_option('display.max_colwidth', None)

# columns = [ 'site','month','freq_Hz', 'depth', 'listening_range_m']

# print(df_listening_range.sort_values(by=columns, ascending=True).to_csv(columns=columns,index=False))

Find occurrence records associated with the station `CI01`.

In [13]:
df_occur.loc[df_occur['occurrenceID'].str.contains('CI01')]

Unnamed: 0,WKT,decimalLatitude,decimalLongitude,vernacularName,scientificName,scientificNameID,taxonRank,kingdom,eventDate,occurrenceID
3447,POINT (34.0438 -120.0811),34.0438,-120.0811,blue whale,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,2018-11-07T22:02:01.456000Z,noaaSanctSound_CI01_01_bluewhale_2018-11-07T22...
3448,POINT (34.0438 -120.0811),34.0438,-120.0811,blue whale,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,2018-11-15T15:18:02.648000Z,noaaSanctSound_CI01_01_bluewhale_2018-11-15T15...
3449,POINT (34.0438 -120.0811),34.0438,-120.0811,blue whale,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,2018-11-15T15:18:55.896000Z,noaaSanctSound_CI01_01_bluewhale_2018-11-15T15...
3450,POINT (34.0438 -120.0811),34.0438,-120.0811,blue whale,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,2018-11-15T15:19:49.144000Z,noaaSanctSound_CI01_01_bluewhale_2018-11-15T15...
3451,POINT (34.0438 -120.0811),34.0438,-120.0811,blue whale,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,2018-11-15T15:22:30.936000Z,noaaSanctSound_CI01_01_bluewhale_2018-11-15T15...
...,...,...,...,...,...,...,...,...,...,...
709891,POINT (34.0436 -120.0803),34.0436,-120.0803,plainfin midshipman,Porichthys notatus,urn:lsid:marinespecies.org:taxname:275658,Species,Animalia,2021-09-07T02:57:13.500000Z,noaaSanctSound_CI01_08_plainfinmidshipman_2021...
709892,POINT (34.0436 -120.0803),34.0436,-120.0803,plainfin midshipman,Porichthys notatus,urn:lsid:marinespecies.org:taxname:275658,Species,Animalia,2021-09-08T02:59:29.500000Z,noaaSanctSound_CI01_08_plainfinmidshipman_2021...
709893,POINT (34.0436 -120.0803),34.0436,-120.0803,plainfin midshipman,Porichthys notatus,urn:lsid:marinespecies.org:taxname:275658,Species,Animalia,2021-09-09T03:05:29.500000Z,noaaSanctSound_CI01_08_plainfinmidshipman_2021...
709894,POINT (34.0436 -120.0803),34.0436,-120.0803,plainfin midshipman,Porichthys notatus,urn:lsid:marinespecies.org:taxname:275658,Species,Animalia,2021-09-10T02:59:30.500000Z,noaaSanctSound_CI01_08_plainfinmidshipman_2021...


Now find the **listening ranges** for station `CI01` for the January climatology (month = 1.0).

In [14]:
df_listening_range.loc[(df_listening_range['site']=='CI01') & (df_listening_range['climatology']==1.0)].sort_values(by=['freq_Hz'])

Unnamed: 0,climatology,depth,listening_range_m,site,fname,freq_Hz,hydrophone_depth_m
0,1.0,15.0,4861,CI01,SanctSound_CI01_propmodeling_SD0015m_SL189dB_F...,20,17.5
0,1.0,20.0,15001,CI01,SanctSound_CI01_propmodeling_SD0020m_SL192dB_F...,63,17.5
0,1.0,10.0,18270,CI01,SanctSound_CI01_propmodeling_SD0010m_SL185dB_F...,125,17.5
0,1.0,20.0,9492,CI01,SanctSound_CI01_propmodeling_SD0020m_SL170dB_F...,300,17.5
0,1.0,1.0,4593,CI01,SanctSound_CI01_propmodeling_SD0001m_SL165dB_F...,1000,17.5
0,1.0,1.0,3028,CI01,SanctSound_CI01_propmodeling_SD0001m_SL165dB_F...,5000,17.5
0,1.0,19.0,7325,CI01,SanctSound_CI01_propmodeling_SD0019m_SL176dB_F...,12000,17.5


Let's look at the SanctSound website and see how we might be able to link these together.

https://sanctsound.portal.axds.co/#sanctsound/sanctuary/channel-islands/site/CI01

Since propagation model data are separated into quarterly observations on months 1, 4, 7, and 10. We can use pandas to group by quarters starting in January.

In [15]:
df_occur['eventDate'] = pd.to_datetime(df_occur['eventDate'])

df_occur['site'] = df_occur['occurrenceID'].str.split("_",expand=True)[1]

df_occur.sample(5)

Unnamed: 0,WKT,decimalLatitude,decimalLongitude,vernacularName,scientificName,scientificNameID,taxonRank,kingdom,eventDate,occurrenceID,site
350645,POINT (34.018 -119.3168),34.018,-119.3168,blue whale,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,2021-10-31 05:11:21.096000+00:00,noaaSanctSound_CI05_08_bluewhale_2021-10-31T05...,CI05
389315,POINT (36.798 -121.976),36.798,-121.976,blue whale,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,2019-11-20 17:07:53.344000+00:00,noaaSanctSound_MB01_03_bluewhale_2019-11-20T17...,MB01
712108,POINT (42.255235 -70.1796283),42.255235,-70.179628,fin whale,Balaenoptera physalus,urn:lsid:marinespecies.org:taxname:137091,Species,Animalia,2020-02-28 00:00:00+00:00,noaaSanctSound_SB03_08_finwhale_1d_2020-02-28T...,SB03
710672,POINT (36.6484 -121.9075),36.6484,-121.9075,plainfin midshipman,Porichthys notatus,urn:lsid:marinespecies.org:taxname:275658,Species,Animalia,2020-07-18 04:02:04.500000+00:00,noaaSanctSound_MB02_05_plainfinmidshipman_2020...,MB02
232183,POINT (33.8489 -120.1171),33.8489,-120.1171,blue whale,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,2020-09-21 07:59:18.240000+00:00,noaaSanctSound_CI04_05_bluewhale_2020-09-21T07...,CI04


Map the months to the appropriate climatology

Months | Climatology
-------|------------
1,2,3 | 1
4,5,6 | 4
7,8,9 | 7
10,11,12 | 10

See this gist for confirmation they line up: <https://gist.github.com/ocefpaf/412a6ddcfa3524862160653f1718da5f>

In [16]:
df_occur['climatology'] = pd.Series(dtype=float)

mask = (df_occur['eventDate'].dt.quarter==1)

df_occur.loc[mask,'climatology'] = 1

mask = (df_occur['eventDate'].dt.quarter==2)

df_occur.loc[mask,'climatology'] = 4

mask = (df_occur['eventDate'].dt.quarter==3)

df_occur.loc[mask,'climatology'] = 7

mask = (df_occur['eventDate'].dt.quarter==4)

df_occur.loc[mask, 'climatology'] = 10

 Use [DataFrame.merge](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html) to combine the datasets together.
 
 First, however, we pick an appropriate frequency. In this example we choose 125 Hz.
 
 **Note: We have not copied all of the listening range data over, so we will have some gaps.**

In [17]:
df_listening_range_125 = df_listening_range.loc[df_listening_range['freq_Hz']==125]

df_combined = df_occur.merge(df_listening_range_125,how='left', on=['site','climatology'], indicator=True)

df_combined.sample(10)

Unnamed: 0,WKT,decimalLatitude,decimalLongitude,vernacularName,scientificName,scientificNameID,taxonRank,kingdom,eventDate,occurrenceID,site,climatology,depth,listening_range_m,fname,freq_Hz,hydrophone_depth_m,_merge
454226,POINT (36.7977 -121.9757),36.7977,-121.9757,blue whale,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,2020-12-10 23:52:10.256000+00:00,noaaSanctSound_MB01_06_bluewhale_2020-12-10T23...,MB01,10.0,,,,,,left_only
45913,POINT (34.0853 -120.5223),34.0853,-120.5223,blue whale,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,2020-10-04 19:46:18.704000+00:00,noaaSanctSound_CI02_05_bluewhale_2020-10-04T19...,CI02,10.0,10.0,91118.0,SanctSound_CI02_propmodeling_SD0010m_SL185dB_F...,125.0,73.5,both
355187,POINT (36.798 -121.976),36.798,-121.976,blue whale,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,2018-12-08 02:52:04.088000+00:00,noaaSanctSound_MB01_01_bluewhale_2018-12-08T02...,MB01,10.0,,,,,,left_only
85323,POINT (34.0855 -120.5224),34.0855,-120.5224,blue whale,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,2021-08-13 03:25:43.800000+00:00,noaaSanctSound_CI02_07_bluewhale_2021-08-13T03...,CI02,7.0,10.0,91118.0,SanctSound_CI02_propmodeling_SD0010m_SL185dB_F...,125.0,73.5,both
147975,POINT (33.84888 -120.117),33.84888,-120.117,blue whale,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,2019-08-13 21:55:40.208000+00:00,noaaSanctSound_CI04_03_bluewhale_2019-08-13T21...,CI04,7.0,10.0,22493.0,SanctSound_CI04_propmodeling_SD0010m_SL185dB_F...,125.0,152.5,both
339343,POINT (34.0178 -119.3171),34.0178,-119.3171,blue whale,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,2020-12-11 19:13:20.120000+00:00,noaaSanctSound_CI05_06_bluewhale_2020-12-11T19...,CI05,10.0,,,,,,left_only
300729,POINT (33.8485 -120.1159),33.8485,-120.1159,blue whale,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,2021-08-23 23:30:07.872000+00:00,noaaSanctSound_CI04_08_bluewhale_2021-08-23T23...,CI04,7.0,10.0,22493.0,SanctSound_CI04_propmodeling_SD0010m_SL185dB_F...,125.0,152.5,both
319202,POINT (33.8485 -120.1159),33.8485,-120.1159,blue whale,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,2021-09-26 05:44:55.184000+00:00,noaaSanctSound_CI04_08_bluewhale_2021-09-26T05...,CI04,7.0,10.0,22493.0,SanctSound_CI04_propmodeling_SD0010m_SL185dB_F...,125.0,152.5,both
578077,POINT (36.37021 -122.314903),36.37021,-122.314903,blue whale,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,2020-10-05 21:12:36.704000+00:00,noaaSanctSound_MB03_04_bluewhale_2020-10-05T21...,MB03,10.0,,,,,,left_only
341378,POINT (34.0178 -119.3171),34.0178,-119.3171,blue whale,Balaenoptera musculus,urn:lsid:marinespecies.org:taxname:137090,Species,Animalia,2021-01-10 11:08:39.592000+00:00,noaaSanctSound_CI05_06_bluewhale_2021-01-10T11...,CI05,1.0,,,,,,left_only
