# Processing EK60 Data to compute MVBS

## Step 1: Setting Up Configuration Files
We begin by importing the required libraries and specifying the paths for the dataset and pipeline configuration files. These files contain the necessary information for data processing.

In [1]:
from pathlib import Path
from echoflow import echoflow_start, StorageType, glob_url

dataset_config = Path("./datastore.yaml").resolve()
pipeline_config = Path("./pipeline.yaml").resolve()

### Step 1.1: Configuration Files

Familiarize yourself with the configuration options by exploring the documentation for:

- Pipeline Configuration: Learn about configuration settings for the pipeline by referring to [Pipeline Configuration](./pipelineconfiguration.md).

- Datastore Configuration: Understand the various configuration options related to data storage by reading [Datastore Configuration](./datastoreconfiguration.md).

These documents provide detailed information on the configurations used during the setup process.

## Step 2: Getting Data
Next, we'll use the glob_url function to retrieve a list of URLs matching a specific pattern. In this case, we're targeting raw EK60 data files from the SH1707 survey.

In [2]:
all_files = glob_url("s3://ncei-wcsd-archive/data/raw/Bell_M._Shimada/SH1707/EK60/*.raw", {'anon':True})

severe performance issues, see also https://github.com/dask/dask/issues/10276

To fix, you should specify a lower version bound on s3fs, or
update the current installation.



## Step 3: Preparing Files
We'll now extract the file names from the URLs and create a file listing for the transect. This will help us organize and work with the data effectively.

In [3]:
files = []
for file in all_files:
    f = file.split(".r")[0]
    files.append(f.split("/")[-1])

transect = open('EK60_SH1707_Shimada.txt','w')
i = 0
for f in files:
    if i == 5:
        break
    transect.write(f+".raw\n")
    i = i + 1
transect.close()

## Step 4: Processing with echoflow
Now, we're ready to kick off the data processing using echoflow. We'll provide the dataset and pipeline configurations, along with additional options.

In [4]:
options = {"storage_options_override": False}
data  = echoflow_start(dataset_config=dataset_config, pipeline_config=pipeline_config, options=options)


Checking Configuration



Configuration Check Completed

Checking Connection to Prefect Server

Starting the Pipeline




Dataset Configuration Loaded For This Run
--------------------------------------------------
{'name': 'Bell_M._Shimada-SH1707-EK60', 'sonar_model': 'EK60', 'raw_regex': '(.*)-?D(?P<date>\\w{1,8})-T(?P<time>\\w{1,6})', 'args': {'urlpath': 's3://ncei-wcsd-archive/data/raw/{{ ship_name }}/{{ survey_name }}/{{ sonar_model }}/*.raw', 'parameters': {'ship_name': 'Bell_M._Shimada', 'survey_name': 'SH1707', 'sonar_model': 'EK60'}, 'storage_options': {'anon': True}, 'transect': {'file': './EK60_SH1707_Shimada.txt'}, 'default_transect_num': 2017, 'json_export': True}, 'output': {'urlpath': 's3://echoflow-workground/test-flow-9-1-24', 'retention': False, 'overwrite': True, 'storage_options': {'block_name': 'echoflow-aws-credentials', 'type': 'AWS'}}}


Pipeline Configuration Loaded For This Run
--------------------------------------------------
{'active_recipe': 'standard', 'use_local_dask': True, 'n_workers': 3, 'pipeline': [{'recipe_name': 'standard', 'stages': [{'name': 'echoflow_open_raw', 


Initiliazing Singleton Object

Reading Configurations


Total Files ['s3://ncei-wcsd-archive/data/raw/Bell_M._Shimada/SH1707/EK60/Summer2017-D20170615-T190214.raw', 's3://ncei-wcsd-archive/data/raw/Bell_M._Shimada/SH1707/EK60/Summer2017-D20170615-T190843.raw', 's3://ncei-wcsd-archive/data/raw/Bell_M._Shimada/SH1707/EK60/Summer2017-D20170615-T212409.raw', 's3://ncei-wcsd-archive/data/raw/Bell_M._Shimada/SH1707/EK60/Summer2017-D20170615-T212933.raw', 's3://ncei-wcsd-archive/data/raw/Bell_M._Shimada/SH1707/EK60/Summer2017-D20170615-T213153.raw', 's3://ncei-wcsd-archive/data/raw/Bell_M._Shimada/SH1707/EK60/Summer2017-D20170615-T214606.raw', 's3://ncei-wcsd-archive/data/raw/Bell_M._Shimada/SH1707/EK60/Summer2017-D20170615-T214712.raw', 's3://ncei-wcsd-archive/data/raw/Bell_M._Shimada/SH1707/EK60/Summer2017-D20170615-T214734.raw', 's3://ncei-wcsd-archive/data/raw/Bell_M._Shimada/SH1707/EK60/Summer2017-D20170619-T225736.raw', 's3://ncei-wcsd-archive/data/raw/Bell_M._Shimada/SH1707/EK60/Summer2017-D20170619-T231647.raw', 's3://ncei-wcsd-archive/dat

[{'instrument': 'EK60', 'file_path': 's3://ncei-wcsd-archive/data/raw/Bell_M._Shimada/SH1707/EK60/Summer2017-D20170615-T190214.raw', 'transect_num': 2017, 'month': 6, 'year': 2017, 'jday': 166, 'datetime': '2017-06-15T19:02:14'}, {'instrument': 'EK60', 'file_path': 's3://ncei-wcsd-archive/data/raw/Bell_M._Shimada/SH1707/EK60/Summer2017-D20170615-T190843.raw', 'transect_num': 2017, 'month': 6, 'year': 2017, 'jday': 166, 'datetime': '2017-06-15T19:08:43'}, {'instrument': 'EK60', 'file_path': 's3://ncei-wcsd-archive/data/raw/Bell_M._Shimada/SH1707/EK60/Summer2017-D20170615-T212409.raw', 'transect_num': 2017, 'month': 6, 'year': 2017, 'jday': 166, 'datetime': '2017-06-15T21:24:09'}, {'instrument': 'EK60', 'file_path': 's3://ncei-wcsd-archive/data/raw/Bell_M._Shimada/SH1707/EK60/Summer2017-D20170615-T212933.raw', 'transect_num': 2017, 'month': 6, 'year': 2017, 'jday': 166, 'datetime': '2017-06-15T21:29:33'}, {'instrument': 'EK60', 'file_path': 's3://ncei-wcsd-archive/data/raw/Bell_M._Shimad

Storing JSON Metadata


Out path is  s3://echoflow-workground/test-flow-9-1-24/json_metadata/Bell_M._Shimada-SH1707-EK60.json


<Client: 'tcp://127.0.0.1:58713' processes=3 threads=18, memory=15.63 GiB>
Scheduler at :  tcp://127.0.0.1:58713
--------------------------------------------------

Executing stage :  name='echoflow_open_raw' module='echoflow.stages.subflows.open_raw' external_params=None options={'save_raw_file': True, 'use_raw_offline': True, 'use_offline': True} prefect_config=None


## Step 5: Results
Finally, let's take a look at the first entry from the processed data.

In [5]:
data

[[<xarray.Dataset>
  Dimensions:            (channel: 3, ping_time: 456, echo_range: 38)
  Coordinates:
    * channel            (channel) <U37 'GPT  18 kHz 009072058c8d 1-1 ES18-11' ...
    * echo_range         (echo_range) float64 0.0 20.0 40.0 ... 700.0 720.0 740.0
    * ping_time          (ping_time) datetime64[ns] 2017-06-15T19:02:00 ... 201...
  Data variables:
      Sv                 (channel, ping_time, echo_range) float64 dask.array<chunksize=(3, 456, 38), meta=np.ndarray>
      frequency_nominal  (channel) float64 dask.array<chunksize=(3,), meta=np.ndarray>
      water_level        float64 ...
  Attributes:
      processing_function:          commongrid.compute_MVBS
      processing_software_name:     echopype
      processing_software_version:  0.8.2
      processing_time:              2024-01-09T21:08:09Z]]