# Raw EK60 Data
## pyEcholab Tutorial
Retrieved from:
[pyEcholab - Plotting Raw EK60 Data Example](https://github.com/CI-CMG/pyEcholab/blob/master/examples/cloud%20tutorials/Plotting_Raw_EK60_Data.ipynb)

# Setup

In [1]:
from echolab2.instruments import EK60
from echolab2.plotting.matplotlib import echogram
from echolab2.processing import mask, line
import numpy as np
import pandas as pd
import os

In [2]:
import boto3, botocore
from botocore import UNSIGNED
from botocore.client import Config

s3 = boto3.resource(
    's3',
    aws_access_key_id='',
    aws_secret_access_key='',
    config=Config(signature_version=UNSIGNED)
)

BUCKET = 'noaa-wcsd-pds'

# Read the data
- Read in some data from the S3 bucket 
- Pick two files with the same channels, but with different pulse lengths and a different installation order

In [3]:
paths = [
        'data/raw/Oscar_Dyson/DY1201/EK60/',
    'data/raw/Oscar_Dyson/DY1706/EK60/'
]

rawfiles = [
    'DY1201_EK60-D20120214-T231011.raw',
    'DY1706_EK60-D20170609-T005736.raw'
]


try:
    for ii in range(len(rawfiles)):
        if rawfiles[ii] not in os.listdir('.'):
            print(paths[ii] + rawfiles[ii])
            s3.Bucket(BUCKET).download_file(paths[ii] + rawfiles[ii], rawfiles[ii])
            print('downloaded: ', rawfiles[ii])
        else:
            print('already found: ', rawfiles[ii])
except botocore.exceptions.ClientError as e:
    if e.response['Error']['Code'] == "404":
        print("The object does not exist.")
    else:
        raise

already found:  DY1201_EK60-D20120214-T231011.raw
already found:  DY1706_EK60-D20170609-T005736.raw


In [4]:
# Create an instance of the EK60 instrument.
# This is the top level object used to interact with EK60 and  data sources.
ek60 = EK60.EK60()

# Use the read_raw method to read in a data file.
ek60.read_raw(rawfiles)

# Print some basic info about our object.  As you will see, 10 channels are 
# reported.  Each file has 5 channels, and are in fact, physically the same 
# hardware.  The reason there are 10 channels reported is because their 
# transceiver number in the ER60 software changed.
print(ek60)

<class 'echolab2.instruments.EK60.EK60'> at 0x125bbb250
    EK60 object contains data from 10 channels:
        1:GPT  18 kHz 009072034d45 3-1 ES18-11
        2:GPT  38 kHz 009072033fa2 1-1 ES38B
        3:GPT  70 kHz 009072058c6c 2-1 ES70-7C
        4:GPT 120 kHz 00907205794e 5-1 ES120-7C
        5:GPT 200 kHz 0090720346a8 4-1 ES200-7C
        6:GPT  18 kHz 009072034d45 1-1 ES18-11
        7:GPT  38 kHz 009072033fa2 2-1 ES38B
        8:GPT  70 kHz 009072058c6c 3-1 ES70-7C
        9:GPT 120 kHz 00907205794e 4-1 ES120-7C
        10:GPT 200 kHz 0090720346a8 5-1 ES200-7C
    data start time: 2012-02-14T23:10:11.298
      data end time: 2017-06-09T01:05:15.638
    number of pings: 899



# Parse the data
- Parse the data
- Retrieve data from the first and second channels of the 38kHZ frequency transciever

In [5]:
# Get a reference to the RawData object that contains data from the first 38 kHz
# channel.
raw_data_38_1 = ek60.get_raw_data(channel_number=2)

# The sample data from channel 2 is contained in a 136x994 array.  The data was
# recorded with a 1024us transmit pulse length, which on the EK60 and related
# hardware results in a sample interval of 256us (sample interval = pulse
# length / 4).  The data were recorded in 2012.

print(raw_data_38_1)

<class 'echolab2.instruments.EK60.RawData'> at 0x124b91960
                channel(s): [GPT  38 kHz 009072033fa2 1-1 ES38B]
    frequency (first ping): 38000.0
 pulse length (first ping): 0.001024
           data start time: 2012-02-14T23:10:11.298
             data end time: 2012-02-14T23:10:51.642
           number of pings: 136
    power array dimensions: (136,994)
    angle array dimensions: (136,994)



```
# The sample data from channel 2 is contained in a 136x994 array.  The data was
# recorded with a 1024us transmit pulse length, which on the EK60 and related
# hardware results in a sample interval of 256us (sample interval = pulse
# length / 4).  The data were recorded in 2012.
````
1. **EK60:** The EK60 is a scientific echo sounder used for quantitative measurements of fish and other marine life. It emits pulses of sound and records the echoes that bounce back from objects (like fish or the ocean floor), allowing it to create a detailed picture of the underwater environment. 

2. **136x994 array:** This refers to the dimensions of the data array that contains the information captured by the echosounder. Each "cell" in this array is likely to represent a single echo return, with the two dimensions perhaps corresponding to depth and time.

3. **1024us transmit pulse length:** This describes the duration of each individual sound pulse that the echosounder sends out. Each pulse lasts for 1024 microseconds.

4. **Sample interval of 256us (sample interval = pulse length / 4):** The "sample interval" is the time between each sample taken by the echosounder. In this case, it's stated that the sample interval is one-fourth of the transmit pulse length, which is why it's 256 microseconds (1024us / 4). This implies that the system is capturing data (or "samples") four times for each transmitted pulse.
    - When the system sends out a pulse and listens for the echo, it doesn't just listen for a set number of data points. Instead, it continuously listens for a certain period of time, and takes a "snapshot" of the signal at regular intervals (the sample interval).
    - The exact length of time it listens for will depend on a variety of factors, including the specific settings used during data collection (such as the range setting, which determines how deep the system is listening), as well as environmental factors like the depth of the water.
    - Each of these snapshots (samples) are recorded and make up the data for a single ping. Over the course of many pings and many samples per ping, this results in a large amount of data, hence the large number of columns in the array.


In [6]:
# Also get a reference to the RawData object that contains data from the
# second 38 kHz channel.
# Channel 7's sample data is a 763x1059 array recoded with a 512us pulse length
# resulting in a sample interval of 128us.  These data were recorded in 2017.
raw_data_38_2 = ek60.get_raw_data(channel_number=7)

In [7]:
print(raw_data_38_2)

<class 'echolab2.instruments.EK60.RawData'> at 0x125bbb520
                channel(s): [GPT  38 kHz 009072033fa2 2-1 ES38B]
    frequency (first ping): 38000.0
 pulse length (first ping): 0.000512
           data start time: 2017-06-09T00:57:36.074
             data end time: 2017-06-09T01:05:13.929
           number of pings: 763
    power array dimensions: (763,1059)
    angle array dimensions: (763,1059)



In [8]:
# Check the 'power' param before processing second 38 kHz channel
power_before_get_power_function = raw_data_38_2.power

In [9]:
power_before_get_power_function

array([[  14.134299 ,   24.63507  ,   25.422924 , ..., -119.20082  ,
        -115.94358  , -114.26205  ],
       [  14.1225395,   24.63507  ,   25.422924 , ..., -119.377205 ,
        -123.63396  , -130.99509  ],
       [  14.1225395,   24.63507  ,   25.422924 , ..., -132.38264  ,
        -124.609955 , -123.37526  ],
       ...,
       [  14.1225395,   24.63507  ,   25.422924 , ..., -116.75495  ,
        -118.2601   , -122.22288  ],
       [  14.1225395,   24.63507  ,   25.422924 , ..., -123.41054  ,
        -117.90733  , -116.41394  ],
       [  14.1225395,   24.63507  ,   25.422924 , ..., -128.14941  ,
        -123.99849  , -119.6359   ]], dtype=float32)

In [10]:
# Call get_power to get a ProcessedData object that contains power data.  We
# provide no arguments so we get all pings ordered by time.
processed_power_2 = raw_data_38_2.get_power()
print(processed_power_2)

<class 'echolab2.processing.processed_data.ProcessedData'> at 0x11053b100
                channel(s): [GPT  38 kHz 009072033fa2 2-1 ES38B]
                 frequency: 38000.0
           data start time: 2017-06-09T00:57:36.074
             data end time: 2017-06-09T01:05:13.929
            number of pings: 763
            data attributes: ping_time (763)
                            data (763,1059)
                            range (1059)



In [11]:
# Check the 'power' param after processing second 38 kHz channel
power_after_get_power_function = processed_power_2.data

In [12]:
power_after_get_power_function

array([[  14.134299 ,   24.63507  ,   25.422924 , ..., -119.20082  ,
        -115.94358  , -114.26205  ],
       [  14.1225395,   24.63507  ,   25.422924 , ..., -119.377205 ,
        -123.63396  , -130.99509  ],
       [  14.1225395,   24.63507  ,   25.422924 , ..., -132.38264  ,
        -124.609955 , -123.37526  ],
       ...,
       [  14.1225395,   24.63507  ,   25.422924 , ..., -116.75495  ,
        -118.2601   , -122.22288  ],
       [  14.1225395,   24.63507  ,   25.422924 , ..., -123.41054  ,
        -117.90733  , -116.41394  ],
       [  14.1225395,   24.63507  ,   25.422924 , ..., -128.14941  ,
        -123.99849  , -119.6359   ]], dtype=float32)

In [13]:
power_before_get_power_function.dtype

dtype('float32')

In [14]:
# Compare 'power' param before and after processing second 38 kHz channel
np.all(power_before_get_power_function == power_after_get_power_function)

True

## Inspect data after processing second 38 kHz channel

In [15]:
# Ping time
processed_power_2.ping_time

array(['2017-06-09T00:57:36.074', '2017-06-09T00:57:36.745',
       '2017-06-09T00:57:37.336', '2017-06-09T00:57:37.938',
       '2017-06-09T00:57:38.539', '2017-06-09T00:57:39.140',
       '2017-06-09T00:57:39.751', '2017-06-09T00:57:40.352',
       '2017-06-09T00:57:40.963', '2017-06-09T00:57:41.564',
       '2017-06-09T00:57:42.155', '2017-06-09T00:57:42.755',
       '2017-06-09T00:57:43.356', '2017-06-09T00:57:43.948',
       '2017-06-09T00:57:44.549', '2017-06-09T00:57:45.150',
       '2017-06-09T00:57:45.751', '2017-06-09T00:57:46.342',
       '2017-06-09T00:57:46.934', '2017-06-09T00:57:47.525',
       '2017-06-09T00:57:48.126', '2017-06-09T00:57:48.718',
       '2017-06-09T00:57:49.308', '2017-06-09T00:57:49.899',
       '2017-06-09T00:57:50.500', '2017-06-09T00:57:51.092',
       '2017-06-09T00:57:51.692', '2017-06-09T00:57:52.293',
       '2017-06-09T00:57:52.895', '2017-06-09T00:57:53.495',
       '2017-06-09T00:57:54.097', '2017-06-09T00:57:54.698',
       '2017-06-09T00:57

In [16]:
# Range
processed_power_2.range

array([0.00000000e+00, 9.44639966e-02, 1.88927993e-01, ...,
       9.97539804e+01, 9.98484444e+01, 9.99429084e+01])

In [17]:
# Power
processed_power_2.data

array([[  14.134299 ,   24.63507  ,   25.422924 , ..., -119.20082  ,
        -115.94358  , -114.26205  ],
       [  14.1225395,   24.63507  ,   25.422924 , ..., -119.377205 ,
        -123.63396  , -130.99509  ],
       [  14.1225395,   24.63507  ,   25.422924 , ..., -132.38264  ,
        -124.609955 , -123.37526  ],
       ...,
       [  14.1225395,   24.63507  ,   25.422924 , ..., -116.75495  ,
        -118.2601   , -122.22288  ],
       [  14.1225395,   24.63507  ,   25.422924 , ..., -123.41054  ,
        -117.90733  , -116.41394  ],
       [  14.1225395,   24.63507  ,   25.422924 , ..., -128.14941  ,
        -123.99849  , -119.6359   ]], dtype=float32)

<strong >Combine data from the first and second channels of the 38kHZ frequency transciever into a raw data object.</strong>

In [18]:
print(raw_data_38_2)

# Append the 2nd object's data to the first and print out the results.
raw_data_38_1.append(raw_data_38_2)

# The result of this append is that raw_data_38_1 now contains data from 899
# pings.  The first 136 pings are the 2012 data and the next 763 the 2017
# data.  The sample data arrays are 899x1059 and the object contains 2 unique
# sample intervals.

print(raw_data_38_1)

# Insert the 2nd object's data into the first at ping 50.
raw_data_38_1.insert(raw_data_38_2, ping_number=50)

# Now raw_data_38_1 contains 1662 pings. Pings 1-50 are from the 2012 data.
# Pings 51-813 are the 763 pings from the 2012 data. Pings 814-899 are the
# rest of the 2012 data and pings 900-1662 are a second copy of the 2017 data.

print(raw_data_38_1)

<class 'echolab2.instruments.EK60.RawData'> at 0x125bbb520
                channel(s): [GPT  38 kHz 009072033fa2 2-1 ES38B]
    frequency (first ping): 38000.0
 pulse length (first ping): 0.000512
           data start time: 2017-06-09T00:57:36.074
             data end time: 2017-06-09T01:05:13.929
           number of pings: 763
    power array dimensions: (763,1059)
    angle array dimensions: (763,1059)

<class 'echolab2.instruments.EK60.RawData'> at 0x124b91960
                channel(s): [GPT  38 kHz 009072033fa2 1-1 ES38B, GPT  38 kHz 009072033fa2 2-1 ES38B]
    frequency (first ping): 38000.0
 pulse length (first ping): 0.001024
           data start time: 2012-02-14T23:10:11.298
             data end time: 2017-06-09T01:05:13.929
           number of pings: 899
    power array dimensions: (899,1059)
    angle array dimensions: (899,1059)

<class 'echolab2.instruments.EK60.RawData'> at 0x124b91960
                channel(s): [GPT  38 kHz 009072033fa2 1-1 ES38B, GPT  38 kHz 0090

<strong >New combined data needs to get processed</strong>

At this point, we have a 1662x1059 array with data recorded at two different sample intervals. When we convert this data to return a ProcessedData object, we have to resample to a constant sample interval. By default, the get_ methods will resample to the shortest sample interval (highest resolution) in the data that is being returned. In our case, that will result in the 136 pings from 2012 recorded with a sample rate of 256us being resampled to 128us.

The files were also recorded with slightly different sound speed values and we're not going to supply a constant sound speed (or any calibration values) to the get_power method so it will use the calibration parameter values from the RawData. When no sound speed calibration data is provided, the get_* methods will resort to interpolating range using the sound speed that occurs most in the data (in other words, it interpolates the fewest pings it needs to).

When we request data using the get_ methods, we can provide a time range or ping range to return data from. Providing no constraints on the range of data returned will return all of the data. By default, the data will be in time order. You can force the method to return data in ping order (the order it exists in the RawData object) by setting the time_order keyword to False. Advanced indexing can be done outside of the get_ methods and passed into them using the return_indices keyword.

In [19]:
# Call get_power to get a ProcessedData object that contains power data.  We
# provide no arguments so we get all pings ordered by time.
processed_power_1 = raw_data_38_1.get_power()
print(processed_power_1)

<class 'echolab2.processing.processed_data.ProcessedData'> at 0x125bb99f0
                channel(s): [GPT  38 kHz 009072033fa2 1-1 ES38B, GPT  38 kHz 009072033fa2 2-1 ES38B, GPT  38 kHz 009072033fa2 2-1 ES38B]
                 frequency: 38000.0
           data start time: 2012-02-14T23:10:11.298
             data end time: 2017-06-09T01:05:13.929
            number of pings: 1662
            data attributes: ping_time (1662)
                            data (1662,1988)
                            range (1988)



In [20]:
Sv = raw_data_38_1.get_Sv()
print(Sv)

<class 'echolab2.processing.processed_data.ProcessedData'> at 0x125bbb880
                channel(s): [GPT  38 kHz 009072033fa2 1-1 ES38B, GPT  38 kHz 009072033fa2 2-1 ES38B, GPT  38 kHz 009072033fa2 2-1 ES38B]
                 frequency: 38000.0
           data start time: 2012-02-14T23:10:11.298
             data end time: 2017-06-09T01:05:13.929
            number of pings: 1662
            data attributes: ping_time (1662)
                            data (1662,1988)
                            range (1988)



In [21]:
Sv.data[0]

array([ 11.2262094 ,  11.40182853,  19.63388228, ..., -73.62142424,
       -73.61524745, -73.60907288])

In [22]:
raw_data_38_1.power[0]

array([16.59192657, 24.99959946, 25.23477936, ...,         nan,
               nan,         nan])