# Processing notebook: preparing the draft for IoT Edge Device's main

## L0 - Unprocessed data

### Setting up project root for module imports in Jupyter Notebook

In [1]:
# Import the os module to work with the file system
import os

# Get the absolute path of the current notebook.
# This assumes the notebook's name is "L0_tour.ipynb".
# Note: "L0_tour.ipynb" is located in the "notebooks" folder of the "oceanstream" repo.
# This folder is typically ignored to maintain a clear organization of the project.

notebook_path = os.path.abspath("L0_tour.ipynb")
print(f"notebook_path: {notebook_path}")
# Navigate up two directories from the notebook's path to get to the project root
# os.pardir represents the parent directory, so using it twice moves up two levels
project_root = os.path.abspath(os.path.join(notebook_path, os.pardir, os.pardir))

# Print the project root path to verify
print(f"project_root: {project_root}")

# Import the sys module to modify the Python path
import sys

# Append the project root to the Python path so that modules from the project can be imported
sys.path.append(project_root)

notebook_path: /Users/simedroniraluca/Documents/pineview/oceanstream/notebooks/L0_tour.ipynb
project_root: /Users/simedroniraluca/Documents/pineview/oceanstream


Specifies the path to the test data utilized in this notebook:

In [2]:
L0_tour_data_path = os.path.join(project_root, "test_data", "L0_tour_data")

### Files manipulation

In [3]:
from oceanstream.L0_unprocessed_data import(
    raw_reader,    
)

**`raw_reader` remarks**
- The name `raw_reader` might be too specific, as the module does more than just deal with raw file reading.

**File finder**

In [4]:
# Using the `file_finder` method of `raw_reader` to search for specific file types in the "L0_tour_data" directory.
# Displaying all `.raw` files from the "L0_tour_data" directory
display(raw_reader.file_finder(L0_tour_data_path, file_type='raw'))
print("________________________________________________________________________________________________")

# Displaying all `.nc` files from the "L0_tour_data" directory
display(raw_reader.file_finder(L0_tour_data_path, file_type='nc'))
print("____________________________________________________________________________________________________")
# Find and display all `.raw` files from the specified list of paths
display(raw_reader.file_finder(paths=[os.path.join(L0_tour_data_path, "JR161-D20061118-T010645.raw"),\
                                      os.path.join(L0_tour_data_path, "Summer2017-D20170620-T011027.nc")\
                                     ], file_type='raw'))
print("_____________________________________________________________________________________________________")

# Displaying `.zarr` files. 
# Due to their folder-like structure, `.zarr` files are not directly detected 
# by the function in the "L0_tour_data" directory.
# We can provide the full path to the `.zarr` file. However, this doesn't capture all the required information.
# Future improvement (if it is needed): Enhance the function to detect "file.zarr" 
# directly within the "L0_tour_data" path.

display(raw_reader.file_finder(os.path.join(L0_tour_data_path, "Summer2017-D20170620-T011027.zarr"),\
                               file_type='zarr'))
display(raw_reader.file_finder(L0_tour_data_path, file_type='zarr'))

['/Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/D20161109-T163350.raw',
 '/Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/JR161-D20061118-T010645.raw',
 '/Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/JR179-D20080410-T150637.raw',
 '/Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/JR230-D20091215-T121917.raw']

________________________________________________________________________________________________


['/Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/JR161-D20061118-T010645.nc',
 '/Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/JR161-D20061127-T115759.nc',
 '/Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/JR161-D20061127-T144557.nc',
 '/Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/test_dataset_Sv.nc']

____________________________________________________________________________________________________


['/Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/JR161-D20061118-T010645.raw']

_____________________________________________________________________________________________________


['/Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/Summer2017-D20170620-T011027.zarr/.zattrs',
 '/Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/Summer2017-D20170620-T011027.zarr/.zgroup',
 '/Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/Summer2017-D20170620-T011027.zarr/.zmetadata']

[]

**Remarks**:
- **General remarks for the `file_finder`  method of `raw_reader` module**:
    - About `.zarr`:
        - The current function does not recognize `.zarr` archives because they are represented as directories, not traditional file types.
        - To work with `.zarr` archives, users currently need to specify the full path to the `.zarr` directory.
        - Suggested Improvement: Modify the function to treat `.zarr` directories as valid data entities and, if necessary, search within them.
    - About `cloud storage URIs`:
        - The function is designed to work with local file paths and does not support cloud storage URIs, such as S3 paths.
        - Suggested Improvement: Integrate cloud storage client functionalities (e.g., `boto3` for S3) to allow the function to handle cloud storage URIs.

**File integrity checker**

In [5]:
# Verify if the provided echo sounder file is
# readable by echopype and extracts
# essential metadata such as the campaign ID, date of measurement,
# and sonar model.
raw_file_ck = raw_reader.file_integrity_checking(\
            raw_reader.file_finder(paths=[os.path.join(L0_tour_data_path, "JR161-D20061118-T010645.raw"),\
            os.path.join(L0_tour_data_path, "Summer2017-D20170620-T011027.nc")], file_type='raw')[0])

nc_file_ck = raw_reader.file_integrity_checking(raw_reader.file_finder(L0_tour_data_path, file_type='nc')[0])

zarr_file_ck = raw_reader.file_integrity_checking(os.path.join(L0_tour_data_path, "JR161-D20061118-T010645.zarr"))

display(raw_file_ck)
display(nc_file_ck)
display(zarr_file_ck)

{'file_path': '/Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/JR161-D20061118-T010645.raw',
 'campaign_id': 'JR161',
 'date': datetime.datetime(2006, 11, 18, 1, 6, 45),
 'sonar_model': 'EK60',
 'file_integrity': True}

{'file_path': '/Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/JR161-D20061118-T010645.nc',
 'campaign_id': 'JR161',
 'date': datetime.datetime(2006, 11, 18, 1, 6, 45),
 'sonar_model': 'EK60',
 'file_integrity': True}

{'file_path': '/Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/JR161-D20061118-T010645.zarr',
 'campaign_id': 'JR161',
 'date': datetime.datetime(2006, 11, 18, 1, 6, 45),
 'sonar_model': 'EK60',
 'file_integrity': True}

In [6]:
raw_jr161_file_info = raw_reader.file_integrity_checking(os.path.join(L0_tour_data_path, "JR161-D20061118-T010645.raw"))
raw_jr179_file_info = raw_reader.file_integrity_checking(os.path.join(L0_tour_data_path, "JR179-D20080410-T150637.raw"))
nc_jr161_file_info_1 = raw_reader.file_integrity_checking(os.path.join(L0_tour_data_path, "JR161-D20061127-T115759.nc"))
nc_jr161_file_info_2 = raw_reader.file_integrity_checking(os.path.join(L0_tour_data_path, "JR161-D20061127-T144557.nc"))

display(raw_jr161_file_info)
display(raw_jr179_file_info)

{'file_path': '/Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/JR161-D20061118-T010645.raw',
 'campaign_id': 'JR161',
 'date': datetime.datetime(2006, 11, 18, 1, 6, 45),
 'sonar_model': 'EK60',
 'file_integrity': True}

{'file_path': '/Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/JR179-D20080410-T150637.raw',
 'campaign_id': 'JR179',
 'date': datetime.datetime(2008, 4, 10, 15, 6, 37),
 'sonar_model': 'EK60',
 'file_integrity': True}

**Checked raw files reader**

In [7]:
# Read multiple raw echo sounder files and return a list of Datasets.
# Processes a list of file information dictionaries, open each raw file
# using the specified sonar model, and return the corresponding datasets.
# Input data: List of dictionaries, \
# each containing file information \
# as provided by the file_integrity_checking function.
jr161_jr179_ds_list = raw_reader.read_raw_files([raw_jr161_file_info, raw_jr179_file_info])
print("_________________________________List of the read datasets___________________________________________")
display(jr161_jr179_ds_list)
print("_________________________________Example of read dataset___________________________________________")
display(jr161_jr179_ds_list[0])

_________________________________List of the read datasets___________________________________________


[<EchoData: standardized raw data from Internal Memory>
 Top-level: contains metadata about the SONAR-netCDF4 file format.
 ├── Environment: contains information relevant to acoustic propagation through water.
 ├── Platform: contains information about the platform on which the sonar is installed.
 │   └── NMEA: contains information specific to the NMEA protocol.
 ├── Provenance: contains metadata about how the SONAR-netCDF4 version of the data were obtained.
 ├── Sonar: contains sonar system metadata and sonar beam groups.
 │   └── Beam_group1: contains backscatter power (uncalibrated) and other beam or channel-specific data, including split-beam angle data when they exist.
 └── Vendor_specific: contains vendor-specific information about the sonar and the data.,
 <EchoData: standardized raw data from Internal Memory>
 Top-level: contains metadata about the SONAR-netCDF4 file format.
 ├── Environment: contains information relevant to acoustic propagation through water.
 ├── Platform: co

_________________________________Example of read dataset___________________________________________


**Processed files reader**

In [8]:
# Read multiple processed echo sounder files and return a list of Datasets.
# Process a list of file paths, open each processed file, 
# and return the corresponding datasets into a list of datasets.
nc_jr161_18_file_path = os.path.join(L0_tour_data_path, "JR161-D20061118-T010645.nc")
nc_jr161_27_file_path = os.path.join(L0_tour_data_path,"JR161-D20061127-T115759.nc")
zarr_s20170620_file_path = os.path.join(L0_tour_data_path, "Summer2017-D20170620-T011027.zarr")
print("_________________________________List of the read datasets___________________________________________")
nc_zarr_ds_list = raw_reader.read_processed_files([nc_jr161_18_file_path, \
                                                   nc_jr161_27_file_path, \
                                                   zarr_s20170620_file_path])
display(nc_zarr_ds_list)
print("_________________________________Example of read dataset___________________________________________")
display(nc_zarr_ds_list[1])

_________________________________List of the read datasets___________________________________________


[<EchoData: standardized raw data from /Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/JR161-D20061118-T010645.nc>
 Top-level: contains metadata about the SONAR-netCDF4 file format.
 ├── Environment: contains information relevant to acoustic propagation through water.
 ├── Platform: contains information about the platform on which the sonar is installed.
 │   └── NMEA: contains information specific to the NMEA protocol.
 ├── Provenance: contains metadata about how the SONAR-netCDF4 version of the data were obtained.
 ├── Sonar: contains sonar system metadata and sonar beam groups.
 │   └── Beam_group1: contains backscatter power (uncalibrated) and other beam or channel-specific data, including split-beam angle data when they exist.
 └── Vendor_specific: contains vendor-specific information about the sonar and the data.,
 <EchoData: standardized raw data from /Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/JR161-D20061127-T115759

_________________________________Example of read dataset___________________________________________


**Raw file converter to 'nc' or 'zarr'**

In [9]:
# Convert multiple raw echo sounder files to the
# specified file type and save them.
# Input data: 
# - List of dictionaries, \
#   each containing file information \
#   as provided by the file_integrity_checking function.
# - Directory path where the converted files will be saved.
# - Desired file type for saving the converted files. Options are 'nc' or 'zarr'.
saving_path = os.path.join(L0_tour_data_path, "test_convert_raw_files")
raw_reader.convert_raw_files([raw_jr161_file_info, raw_jr179_file_info], saving_path, save_file_type='nc')
raw_reader.convert_raw_files([raw_jr179_file_info], saving_path, save_file_type='zarr')

['/Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/test_convert_raw_files/JR179-D20080410-T150637.zarr']

**File similarity grouper**

In [10]:
# Split a list of file information dictionaries into sublists based on their similarity.
splited_files_info = raw_reader.split_files([raw_jr161_file_info, \
                                             raw_jr179_file_info, 
                                             nc_jr161_file_info_1, \
                                             nc_jr161_file_info_1])
splited_files_info

[[{'file_path': '/Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/JR161-D20061118-T010645.raw',
   'campaign_id': 'JR161',
   'date': datetime.datetime(2006, 11, 18, 1, 6, 45),
   'sonar_model': 'EK60',
   'file_integrity': True}],
 [{'file_path': '/Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/JR179-D20080410-T150637.raw',
   'campaign_id': 'JR179',
   'date': datetime.datetime(2008, 4, 10, 15, 6, 37),
   'sonar_model': 'EK60',
   'file_integrity': True}],
 [{'file_path': '/Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/JR161-D20061127-T115759.nc',
   'campaign_id': 'JR161',
   'date': datetime.datetime(2006, 11, 27, 11, 57, 59),
   'sonar_model': 'EK60',
   'file_integrity': True},
  {'file_path': '/Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/JR161-D20061127-T115759.nc',
   'campaign_id': 'JR161',
   'date': datetime.datetime(2006, 11, 27, 11, 57, 59),
   'sonar_mode

**Files concatenator**

In [11]:
raw_reader.concatenate_files([nc_jr161_file_info_1, nc_jr161_file_info_2])

**`concatenate_files` remarks**
- It might be useful to integrate `concatenate_files` with `split_files`. By doing this, `concatenate_files` could produce a list of `echodata` datasets. Specifically, `split_files` divides a list of file information dictionaries into sublists based on files' similarities. As a result, the potential `echodata` list would comprise `echodata` objects. If any `echodata` is derived from similar files, that `echodata` would contain concatenated information from those files.

### Time continuity checker

In [12]:
from oceanstream.L0_unprocessed_data import(
    ensure_time_continuity,    
)

In [13]:
raw_jr230_file_path = os.path.join(L0_tour_data_path, "JR230-D20091215-T121917.raw")
raw_jr230_file_info = raw_reader.file_integrity_checking(raw_jr230_file_path)

jr230_ed = raw_reader.read_raw_files([raw_jr230_file_info])

jr230_ed = jr230_ed[0]
jr230_ed

In [14]:
ensure_time_continuity.check_reversed_time(jr230_ed, "Sonar/Beam_group1", "ping_time")

False

In [15]:
raw_ek80_file_path = os.path.join(L0_tour_data_path, "D20161109-T163350.raw")
raw_ek80_file_path
raw_ek80_file_info = raw_reader.file_integrity_checking(raw_ek80_file_path)
raw_ek80_file_info
ek80_ed = raw_reader.read_raw_files([raw_ek80_file_info])[0]

ensure_time_continuity.check_reversed_time(ek80_ed, "Sonar/Beam_group1", "ping_time")

False

**Time reversal handler**

In [16]:
# Correct reversed timestamps in raw EK60/80 data files. 
# Use the median pinging interval within a specified local window prior to the reversed timestamp 
# to estimate the next ping time under certain conditions.

ensure_time_continuity.fix_time_reversions(ek80_ed, {"Sonar/Beam_group1": "ping_time"})


**Remark on time reversal:** 
- We need data that contains time reversals to properly test this functionality.


### Execution time

In [17]:
%time
raw_reader.file_finder(L0_tour_data_path, file_type='raw')

CPU times: user 2 µs, sys: 1 µs, total: 3 µs
Wall time: 5.25 µs


['/Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/D20161109-T163350.raw',
 '/Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/JR161-D20061118-T010645.raw',
 '/Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/JR179-D20080410-T150637.raw',
 '/Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/JR230-D20091215-T121917.raw']

In [18]:
%time
raw_file_ck = raw_reader.file_integrity_checking(\
            raw_reader.file_finder(paths=[os.path.join(L0_tour_data_path, "JR161-D20061118-T010645.raw"),\
            os.path.join(L0_tour_data_path, "Summer2017-D20170620-T011027.nc")], file_type='raw')[0])

CPU times: user 2 µs, sys: 0 ns, total: 2 µs
Wall time: 5.01 µs


In [19]:
%time
jr161_jr179_ds_list = raw_reader.read_raw_files([raw_jr161_file_info, raw_jr179_file_info])

CPU times: user 2 µs, sys: 1 µs, total: 3 µs
Wall time: 4.77 µs


In [20]:
print('nc')
%time
nc_jr161_18_file_path = os.path.join(L0_tour_data_path, "JR161-D20061118-T010645.nc")
print('zarr')
%time
zarr_s20170620_file_path = os.path.join(L0_tour_data_path, "Summer2017-D20170620-T011027.zarr")

nc
CPU times: user 2 µs, sys: 1e+03 ns, total: 3 µs
Wall time: 7.15 µs
zarr
CPU times: user 3 µs, sys: 1 µs, total: 4 µs
Wall time: 7.87 µs


In [21]:
print('nc')
%time
raw_reader.convert_raw_files([raw_jr161_file_info, raw_jr179_file_info], saving_path, save_file_type='nc')
print('zarr')
%time
raw_reader.convert_raw_files([raw_jr179_file_info], saving_path, save_file_type='zarr')

nc
CPU times: user 4 µs, sys: 2 µs, total: 6 µs
Wall time: 8.82 µs
zarr
CPU times: user 2 µs, sys: 1 µs, total: 3 µs
Wall time: 6.2 µs


['/Users/simedroniraluca/Documents/pineview/oceanstream/test_data/L0_tour_data/test_convert_raw_files/JR179-D20080410-T150637.zarr']

In [22]:
%time
splited_files_info = raw_reader.split_files([raw_jr161_file_info, \
                                             raw_jr179_file_info, 
                                             nc_jr161_file_info_1, \
                                             nc_jr161_file_info_1])

CPU times: user 2 µs, sys: 1 µs, total: 3 µs
Wall time: 5.96 µs


In [23]:
%time
raw_reader.concatenate_files([nc_jr161_file_info_1, nc_jr161_file_info_2])

CPU times: user 4 µs, sys: 2 µs, total: 6 µs
Wall time: 9.06 µs


In [24]:
%time
ensure_time_continuity.check_reversed_time(jr230_ed, "Sonar/Beam_group1", "ping_time")

CPU times: user 2 µs, sys: 1e+03 ns, total: 3 µs
Wall time: 4.77 µs


False

In [25]:
%time
ensure_time_continuity.fix_time_reversions(ek80_ed, {"Sonar/Beam_group1": "ping_time"})

CPU times: user 3 µs, sys: 1 µs, total: 4 µs
Wall time: 5.25 µs


In [26]:
from datetime import timedelta

5 * timedelta(microseconds=3) + 5*timedelta(microseconds=4) + timedelta(microseconds=5)

datetime.timedelta(microseconds=40)