(data-format:raw-data)=
# Raw converted data

(data-format:echodata-object)=
## The `EchoData` object

`EchoData` is an object that conveniently handles raw converted data from either raw instrument files (via `open_raw`) or previously converted and standardized raw files (via `open_converted`). It is essentially a container for multiple `xarray Dataset` objects, where each such object corresponds to one of the netCDF4 groups specified in the SONAR-netCDF4 convention. `EchoData` objects are used for conveniently accessing and exploring the echosounder data, for calibration and other processing, and for [serializing into netCDF4 or Zarr file formats](convert.html#file-export).

A sample `EchoData` object is presented below, showing the hierarchical structure of the SONAR-netCDF4 version 1 groups. Click on a group to drill down to variables and attributes and to examine the structure and representative content of an `EchoData` object.

In [1]:
from pathlib import Path
import echopype as ep
bucket = "ncei-wcsd-archive"
rawdirpath = "data/raw/Bell_M._Shimada/SH1707/EK60/Summer2017-D20170728-T181619.raw"
s3raw_fpath = f"s3://{bucket}/{rawdirpath}"
ed = ep.open_raw(s3raw_fpath, sonar_model='EK60', storage_options={'anon': True})
# Manually populate additional metadata about the dataset and the platform
# -- SONAR-netCDF4 Top-level Group attributes
ed['Top-level'].attrs['title'] = "2017 Pacific Hake Acoustic Trawl Survey"
ed['Top-level'].attrs['summary'] = (
    f"EK60 raw file {s3raw_fpath} from the {ed['Top-level'].attrs['title']}, "
    "converted to a SONAR-netCDF4 file using echopype."
)
# -- SONAR-netCDF4 Platform Group attributes
ed['Platform'].attrs['platform_type'] = "Research vessel"
ed['Platform'].attrs['platform_name'] = "Bell M. Shimada"
ed['Platform'].attrs['platform_code_ICES'] = "315"

21:33:49  parsing file Summer2017-D20170728-T181619.raw, time of first ping: 2017-Jul-28 18:16:19


In [2]:
ed

Unnamed: 0,Array,Chunk
Bytes,16.91 kiB,16.91 kiB
Shape,"(2165,)","(2165,)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 16.91 kiB 16.91 kiB Shape (2165,) (2165,) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",2165  1,

Unnamed: 0,Array,Chunk
Bytes,16.91 kiB,16.91 kiB
Shape,"(2165,)","(2165,)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,16.91 kiB,16.91 kiB
Shape,"(2165,)","(2165,)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 16.91 kiB 16.91 kiB Shape (2165,) (2165,) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",2165  1,

Unnamed: 0,Array,Chunk
Bytes,16.91 kiB,16.91 kiB
Shape,"(2165,)","(2165,)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,25.37 kiB,25.37 kiB
Shape,"(2165,)","(2165,)"
Count,2 Tasks,1 Chunks
Type,numpy.ndarray,
"Array Chunk Bytes 25.37 kiB 25.37 kiB Shape (2165,) (2165,) Count 2 Tasks 1 Chunks Type numpy.ndarray",2165  1,

Unnamed: 0,Array,Chunk
Bytes,25.37 kiB,25.37 kiB
Shape,"(2165,)","(2165,)"
Count,2 Tasks,1 Chunks
Type,numpy.ndarray,

Unnamed: 0,Array,Chunk
Bytes,12.40 kiB,12.40 kiB
Shape,"(3, 529)","(3, 529)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 12.40 kiB 12.40 kiB Shape (3, 529) (3, 529) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",529  3,

Unnamed: 0,Array,Chunk
Bytes,12.40 kiB,12.40 kiB
Shape,"(3, 529)","(3, 529)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,12.40 kiB,12.40 kiB
Shape,"(3, 529)","(3, 529)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 12.40 kiB 12.40 kiB Shape (3, 529) (3, 529) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",529  3,

Unnamed: 0,Array,Chunk
Bytes,12.40 kiB,12.40 kiB
Shape,"(3, 529)","(3, 529)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,12.40 kiB,12.40 kiB
Shape,"(3, 529)","(3, 529)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 12.40 kiB 12.40 kiB Shape (3, 529) (3, 529) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",529  3,

Unnamed: 0,Array,Chunk
Bytes,12.40 kiB,12.40 kiB
Shape,"(3, 529)","(3, 529)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,12.40 kiB,12.40 kiB
Shape,"(3, 529)","(3, 529)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 12.40 kiB 12.40 kiB Shape (3, 529) (3, 529) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",529  3,

Unnamed: 0,Array,Chunk
Bytes,12.40 kiB,12.40 kiB
Shape,"(3, 529)","(3, 529)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray


(data-format:mod-to-sonart-netcdf4)=
## Modifications to SONAR-netCDF4

Echopype follows the [ICES SONAR-netCDF4 convention ver.1](http://www.ices.dk/sites/pub/Publication%20Reports/Cooperative%20Research%20Report%20(CRR)/CRR341.pdf) when possible to create interoperable data. However, to fully leverage the power of label-aware manipulation provided by the [xarray](https://docs.xarray.dev/en/stable/) library and enhance coherence of data representation for scientific echosounders, we (the echopype developers) have made decisions to deviate from the convention in key aspects. These changes are explained below.

### Organization of multi-frequency data

Echopype implements a modification of the SONAR-netCDF4 data model that optimizes data access and filtering (“slicing”) efficiency and usability at the expense of
potentially increased file storage. For each sonar beam, the convention defines data
variables such as `backscatter_r` based on a one-dimensional ragged array structure that uses a custom variable-length vector data type (`sample_t`) and `ping_time` as its coordinate dimensions; each frequency channel is stored in a separate netCDF4 group (`Sonar/Beam_group1`, `Sonar/Beam_group2`, ...).

Echopype restructures this multi-group ragged array representation into a single-group, 4-dimensional gridded representation, with dimensions `(channel, range_sample, ping_time, beam)` across all channels. Here, the `ping_time` and `beam` dimensions are defined in the convention, whereas  the `channel` and `range_sample` (along-range sample number) dimensions are echopype-specific modifications. Data from each frequency channel (i.e., transducers for echosounders) are mapped along the `channel` dimension, and echo data from each ping are mapped along the `range_sample` dimension. These consolidated, uniform multi-channel (or multi-frequency) [`DataArrays`](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.html) are stored in `Sonar/Beam_group1`, `Sonar/Beam_group2`, and potentially other such groups (`Sonar/Beam_group3`, etc.) in the netCDF data model.

:::{Note}
Due to flexibility in echosounder settings, there can potentially be unequal number of samples along sonar range (i.e., length of the `range_sample` dimension) across different `ping_time` or `channel`. Echopype addresses this by padding `NaN` for pings or channels with fewer samples to maintain the uniform shape of the 4-dimensional gridded representation.

The `NaN` padding approach could consume large amount of memory in some specific cases due to the echosounder setup. This is an issue we are actively working on. See [#489](https://github.com/OSOceanAcoustics/echopype/issues/489) for detail.
:::

<!-- Below is a comparison of data representations defined in the convention and in echopype.

### ADD FIGURE -->

<!-- ### Other modifications

HERE PUT IN OTHER MODIFICATIONS WE HAVE MADE FOR ECHOPYPE SPECIFIC NEEDS. -->

## Data from different echosounders

### Power/Angle data

For single-beam setups, only the echo power (or intensity) data are available and these data are stored in the variable `backscatter_r` (the `r` in the suffix means the real part of the signal). This is the case for data from the AZFP echosounder or EK60/EK80 echosounder paired with single-beam transducers (see below for more details on EK80 data).

For split-beam setups, the echo power data are similarly stored in the variable `backscatter_r`, but with the additional split-beam angle data for each sample (along `range_sample`) stored in variables `angle_alongship` and `angle_athwartship`. This is the case for data from the EK60 echosounder or the EK80 echosounder configured to store power/angle data.

All the above data variables (`backscatter_r`, `angle_alongship`, `angle_athwartship`) use the gridded representation with dimensions `(channel, range_sample, ping_time, beam)`. Here, the length of the `beam` dimension equals to 1. This length is intuitive for single-beam data. For split-beam data, the length of this dimension is 1, because the power/angle data are already in a derived form from the split-beam transducer sectors. All data are stored in the `Sonar/Beam_group1` group.

### Complex data

A deviation from the above is the case when the raw _complex_ samples are recorded by EK80 echosounders paired with split-beam transducers. In this case, both `backscatter_r` and `backscatter_i` variables exist and contain the real and imaginary part of the echo waveform data, respectively. These vairables are with dimension `(channel, range_sample, ping_time, beam)` as before, but the length of the `beam` dimension can be 3 or 4, depending on the specific transducer used in the setup. The `angle_alongship` and `angle_athwartship` variables are not present in such files.

:::{Note}
It is possible for power/angle data and complex data to coexist in files collected by EK80  echosounders, since each frequency channel can be configured separately. In this case, the complex data are stored in the `Sonar/Beam_group1` group and the power/angle data are stored in the `Sonar/Beam_group2` group.
:::
