# Processing EK60 Data to Extract Target Strength


## Step 1: Fetch Configuration Files

We begin by importing the required libraries and specifying the paths for the dataset and pipeline configuration files. These files contain the necessary information for data processing.

In [1]:
from pathlib import Path
from echodataflow import echodataflow_start, glob_url

dataset_config = Path("./datastore.yaml").resolve()
pipeline_config = Path("./pipeline.yaml").resolve()

### Step 1.1: Configuration Files

Familiarize yourself with the configuration options by exploring the documentation for:

- Pipeline Configuration: Learn about configuration settings for the pipeline by referring to [Pipeline Configuration](./pipelineconfiguration.md).

- Datastore Configuration: Understand the various configuration options related to data storage by reading [Datastore Configuration](./datastoreconfiguration.md).

These documents provide detailed information on the configurations used during the setup process.

## Step 2: Getting Data
Next, we'll use the glob_url function to retrieve a list of URLs matching a specific pattern. In this case, we're targeting raw EK60 data files from the SH1707 survey.

In [2]:
all_files = glob_url("s3://ncei-wcsd-archive/data/raw/Bell_M._Shimada/SH1707/EK60/*.raw", {'anon':True})

severe performance issues, see also https://github.com/dask/dask/issues/10276

To fix, you should specify a lower version bound on s3fs, or
update the current installation.



## Step 3: Preparing Files
We'll now extract the file names from the URLs and create a file listing for the transect. This will help us organize and work with the data effectively.

In [3]:
files = []
for file in all_files:
    f = file.split(".r")[0]
    files.append(f.split("/")[-1])

transect = open('EK60_SH1707_Shimada.txt','w')
i = 0
for f in files:
    if i == 20:
        break
    transect.write(f+".raw\n")
    i = i + 1
transect.close()

## Step 4: Processing with echodataflow
Now, we're ready to kick off the data processing using echodataflow. We'll provide the dataset and pipeline configurations, along with additional options.

In [4]:
options = {"storage_options_override": False}
data  = echodataflow_start(dataset_config=dataset_config, pipeline_config=pipeline_config, options=options)


Checking Configuration

Configuration Check Completed

Checking Connection to Prefect Server

Starting the Pipeline
echodataflow_trigger.py  Echodataflow Trigger : Dataset Configuration Loaded For This Run
echodataflow_trigger.py  Echodataflow Trigger : --------------------------------------------------
echodataflow_trigger.py  Echodataflow Trigger : {"name": "Bell_M._Shimada-SH1707-EK60", "sonar_model": "EK60", "raw_regex": "(.*)-?D(?P<date>\\w{1,8})-T(?P<time>\\w{1,6})", "args": {"urlpath": "s3://ncei-wcsd-archive/data/raw/{{ ship_name }}/{{ survey_name }}/{{ sonar_model }}/*.raw", "parameters": {"ship_name": "Bell_M._Shimada", "survey_name": "SH1707", "sonar_model": "EK60"}, "storage_options": {"anon": true}, "group": {"file": "./EK60_SH1707_Shimada.txt"}, "group_name": 2017, "json_export": true}, "output": {"urlpath": "./echodataflow-output", "retention": true, "overwrite": true}}
{'name': 'Bell_M._Shimada-SH1707-EK60', 'sonar_model': 'EK60', 'raw_regex': '(.*)-?D(?P<date>\\w{1,8}

Perhaps you already have a cluster running?
Hosting the HTTP server on port 64184 instead


{'mod_name': 'initialization_flow.py', 'func_name': 'Init Flow'} : <Client: 'tcp://127.0.0.1:64186' processes=5 threads=20, memory=15.63 GiB>
initialization_flow.py  Init Flow : <Client: 'tcp://127.0.0.1:64186' processes=5 threads=20, memory=15.63 GiB>
{'mod_name': 'initialization_flow.py', 'func_name': 'Init Flow'} : Scheduler at : tcp://127.0.0.1:64186
initialization_flow.py  Init Flow : Scheduler at : tcp://127.0.0.1:64186
{'mod_name': 'initialization_flow.py', 'func_name': 'Init Flow'} : --------------------------------------------------
initialization_flow.py  Init Flow : --------------------------------------------------
{'mod_name': 'initialization_flow.py', 'func_name': 'Init Flow'} : Executing stage : name='echodataflow_open_raw' module='echodataflow.stages.subflows.open_raw' external_params=None options={'save_raw_file': True, 'use_raw_offline': True, 'use_offline': True, 'use_dask': True} prefect_config=None dependson=None
initialization_flow.py  Init Flow : Executing stage 

## Step 5: Results
Finally, let's take a look at the first entry from the processed data.

In [5]:
data[0][0]

Unnamed: 0,Array,Chunk
Bytes,1.69 MiB,303.75 kiB
Shape,"(3, 19, 3888)","(2, 10, 1944)"
Dask graph,8 chunks in 2 graph layers,8 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 1.69 MiB 303.75 kiB Shape (3, 19, 3888) (2, 10, 1944) Dask graph 8 chunks in 2 graph layers Data type float64 numpy.ndarray",3888  19  3,

Unnamed: 0,Array,Chunk
Bytes,1.69 MiB,303.75 kiB
Shape,"(3, 19, 3888)","(2, 10, 1944)"
Dask graph,8 chunks in 2 graph layers,8 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,24 B,24 B
Shape,"(3,)","(3,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 24 B 24 B Shape (3,) (3,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",3  1,

Unnamed: 0,Array,Chunk
Bytes,24 B,24 B
Shape,"(3,)","(3,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,24 B,24 B
Shape,"(3,)","(3,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 24 B 24 B Shape (3,) (3,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",3  1,

Unnamed: 0,Array,Chunk
Bytes,24 B,24 B
Shape,"(3,)","(3,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,24 B,24 B
Shape,"(3,)","(3,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 24 B 24 B Shape (3,) (3,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",3  1,

Unnamed: 0,Array,Chunk
Bytes,24 B,24 B
Shape,"(3,)","(3,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,24 B,24 B
Shape,"(3,)","(3,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 24 B 24 B Shape (3,) (3,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",3  1,

Unnamed: 0,Array,Chunk
Bytes,24 B,24 B
Shape,"(3,)","(3,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,24 B,24 B
Shape,"(3,)","(3,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 24 B 24 B Shape (3,) (3,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",3  1,

Unnamed: 0,Array,Chunk
Bytes,24 B,24 B
Shape,"(3,)","(3,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,24 B,24 B
Shape,"(3,)","(3,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 24 B 24 B Shape (3,) (3,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",3  1,

Unnamed: 0,Array,Chunk
Bytes,24 B,24 B
Shape,"(3,)","(3,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.69 MiB,303.75 kiB
Shape,"(3, 19, 3888)","(2, 10, 1944)"
Dask graph,8 chunks in 2 graph layers,8 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 1.69 MiB 303.75 kiB Shape (3, 19, 3888) (2, 10, 1944) Dask graph 8 chunks in 2 graph layers Data type float64 numpy.ndarray",3888  19  3,

Unnamed: 0,Array,Chunk
Bytes,1.69 MiB,303.75 kiB
Shape,"(3, 19, 3888)","(2, 10, 1944)"
Dask graph,8 chunks in 2 graph layers,8 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,24 B,24 B
Shape,"(3,)","(3,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 24 B 24 B Shape (3,) (3,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",3  1,

Unnamed: 0,Array,Chunk
Bytes,24 B,24 B
Shape,"(3,)","(3,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,24 B,24 B
Shape,"(3,)","(3,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 24 B 24 B Shape (3,) (3,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",3  1,

Unnamed: 0,Array,Chunk
Bytes,24 B,24 B
Shape,"(3,)","(3,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,456 B,456 B
Shape,"(3, 19)","(3, 19)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 456 B 456 B Shape (3, 19) (3, 19) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",19  3,

Unnamed: 0,Array,Chunk
Bytes,456 B,456 B
Shape,"(3, 19)","(3, 19)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,456 B,456 B
Shape,"(3, 19)","(3, 19)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 456 B 456 B Shape (3, 19) (3, 19) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",19  3,

Unnamed: 0,Array,Chunk
Bytes,456 B,456 B
Shape,"(3, 19)","(3, 19)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,456 B,456 B
Shape,"(3, 19)","(3, 19)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 456 B 456 B Shape (3, 19) (3, 19) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",19  3,

Unnamed: 0,Array,Chunk
Bytes,456 B,456 B
Shape,"(3, 19)","(3, 19)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,456 B,456 B
Shape,"(3, 19)","(3, 19)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 456 B 456 B Shape (3, 19) (3, 19) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",19  3,

Unnamed: 0,Array,Chunk
Bytes,456 B,456 B
Shape,"(3, 19)","(3, 19)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,612 B,612 B
Shape,"(1,)","(1,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,,
"Array Chunk Bytes 612 B 612 B Shape (1,) (1,) Dask graph 1 chunks in 2 graph layers Data type",1  1,

Unnamed: 0,Array,Chunk
Bytes,612 B,612 B
Shape,"(1,)","(1,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,,


**Congratulations!** You've successfully processed EK60 data using echodataflow. This notebook provides a simplified overview, and you can explore the capabilities of echodataflow for more advanced processing tasks.

Feel free to modify the parameters, paths, and configurations as needed to adapt to your data and requirements.