# PREAMBLE

#### This notebook follows the same steps than `spectrogram_generator` but for several datasets at once

Here are imported a few librairies to run the codes
You simply have to adapt `path_osmose_home` which points to OSmOSE working directory


In [None]:
# FILL IN RED PARTS !
import os
import sys
from pathlib import Path

from OSmOSE import Spectrogram, Job_builder
from OSmOSE.utils.core_utils import display_folder_storage_info, list_dataset

sys.path.append(r"../src")
from utils_datarmor import generate_spectro, monitor_job

path_osmose_home = r"/home/datawork-osmose/"
path_osmose_dataset = os.path.join(path_osmose_home, "dataset")

jb = Job_builder()

display_folder_storage_info(path_osmose_home)

In [None]:
# FILL IN RED PARTS !
list_dataset(path_osmose_dataset, "dataset/name")

# Summary

**I. Select dataset** : choose your dataset to be processed and get key metadata on it

**II. Configure spectrograms** : define all spectrogram parameters, and adjust them based on spectrograms computed on the fly

**III. Generate spectrograms** : launch the complete generation of spectrograms

# I. Select dataset 

If your datasets are part of a recording campaign, please provide their names in the list `list_campaign_name`; in that case your dataset should be present in `{path_osmose_dataset}/{campaign_name}/{dataset_name}`. Otherwise set the default value to "".

In [None]:
# FILL IN RED PARTS !
list_dataset_name = [
    "C5D1_ST7181",
    "C5D1_ST7194",
    "C5D2_ST7189",
    "C5D2_ST7190",
    "C5D3_ST7189",
    "C5D3_ST7190",
    "C5D4_ST7181",
    "C5D4_ST7194",
    "C5D5_ST7181",
    "C5D5_ST7194",
    "C5D6_ST7189",
    "C5D6_ST7190",
    "C5D7_ST7181",
    "C5D7_ST7194",
    "C5D8_ST7189",
    "C5D8_ST7190",
    "C5D9_ST7181",
    "C5D9_ST7194",
]

list_campaign_name = ["APOCADO3"] * len(list_dataset_name)

## Metadata of one dataset

Here you can display several parameters from a single dataset by selecting it with `i`

In [None]:
# FILL IN GREEN PART !
i = 0

dataset_name = list_dataset_name[i]
campaign_name = list_campaign_name[i]

dataset = Spectrogram(
    dataset_path=Path(path_osmose_dataset, campaign_name, dataset_name),
    owner_group="gosmose",
    local=False,
)

print(dataset)

## Configure spectrograms

Set your spectrogram parameters, they will be the same for all your datasets.

The two following parameters `spectro_duration` (in s) and `dataset_sr` (in Hz) will allow you to process your data using different file durations (ie segmentation) and/or sampling rate (ie resampling) parameters. `spectro_duration` is the maximal duration of the spectrogram display window.

To process audio files from your original folder (ie without any segmentation and/or resampling operations), use the original audio file duration and sample rate parameters estimated at your dataset uploading (they are printed in the previous cell). 

Then, you can set the value of `zoom_levels`, which is the number of zoom levels you want (they are used in our web-based annotation tool APLOSE). With `zoom_levels = 0`, your shortest spectrogram display window has a duration of `spectro_duration` seconds (that is no zoom at all) ; with `zoom_levels = 1`, a duration of `spectro_duration`/2 seconds ; with `zoom_levels = 2`, a duration of `spectro_duration`/4 seconds ...

After that, you can set the following classical spectrogram parameters : `nfft` (in samples), `winsize` (in samples), `overlap` (in \%). **Note that with those parameters you set the resolution of your spectrogram display window with the smallest duration, obtained with the highest zoom level.**

Finally:
- `batch_number` indicates the number of concurrent jobs. A higher number can speed things up until a certain point. It still does not work very well.

- The variable below `save_matrix` should be set to True if you want to generate the numpy matrices along your png spectrograms

### /!\ These parameters will be affected to all the selected datasets /!\

In [None]:
# FILL IN GREEN PARTS !
spectro_duration = 10
dataset_sr = 128000

zoom_level = 0

nfft = 1024
window_size = 1024
overlap = 20

batch_number = 10

save_matrix = False
force_init = False

#### Amplitude normalization 

Eventually, we also propose you different modes of data/spectrogram normalization.

Normalization over raw data samples with the variable `data_normalization` (default value `'none'`, i.e. no normalization) :
- instrument-based normalization with the three parameters `sensitivity_dB` (in dB, default value = 0), `gain` (in dB, default value = 0) and `peak_voltage` (in V, default value = 1). Using default values, no normalization will be performed ;

- z-score normalization over a given time period through the variable `zscore_duration`, applied directly on your raw timeseries. The possible values are:
    - `zscore_duration = 'original'` : the audio file duration will be used as time period ;
    - `zscore_duration = '10H'` : any time period put as a string using classical [time alias](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases). This period should be higher than your file duration. 

Normalization over spectra with the variable `spectro_normalization` (default value `'density'`, see OSmOSEanalytics/documentation/theory_spectrogram.pdf for details) :
- density-based normalization by setting `spectro_normalization = 'density'`
- spectrum-based normalization by setting `spectro_normalization = 'spectrum'` 

In the cell below, you can also have access to the amplitude dynamics in dB throuh the parameters `dynamic_max` and `dynamic_min`, the colormap `spectro_colormap` to be used (see possible options in the [documentation](https://matplotlib.org/stable/tutorials/colors/colormaps.html)) and specify the frequency cut `HPfilter_freq_min` of a high-pass filter if needed.

In [None]:
# FILL IN GREEN PARTS !
list_sensitivity = [
    -175.9,
    -175.7,
    -174.5,
    -174.7,
    -174.5,
    -174.7,
    -175.9,
    -175.7,
    -175.9,
    -175.7,
    -174.5,
    -174.7,
    -175.9,
    -175.7,
    -174.5,
    -174.7,
    -175.9,
    -175.7,
]

list_gain_dB = [0] * len(list_sensitivity)  # parameter for 'instrument' mode
list_peak_voltage = [2] * len(list_sensitivity)  # parameter for 'instrument' mode

In [None]:
# FILL IN RED and GREEN PARTS !
data_normalization_param = "instrument"  # 'instrument' OR 'zscore' OR 'none'
spectro_normalization_param = "density"  # 'density' OR 'spectrum'
zscore_duration = ""  # parameter for 'zscore' mode, values = time alias OR 'original'
dynamic_min = 0  # dB
dynamic_max = 120  # dB
colormap = "viridis"
hp_filter_min_freq = 1  # Hz

In [None]:
list_datetime_begin = [
    None,
    None,
    None,
    None,
    None,
    None,
    None,
    None,
    None,
    None,
    None,
    None,
    None,
    None,
    None,
    None,
    None,
    None,
]

list_datetime_end = [
    None,
    None,
    None,
    None,
    None,
    None,
    None,
    None,
    None,
    None,
    None,
    None,
    None,
    None,
    None,
    None,
    None,
    None,
]

In [None]:
# JUST RUN THIS CELL : NOTHING TO FILL !

for campaign_name, dataset_name, sensitivity, gain_dB, peak_voltage, datetime_begin, datetime_end in zip(
    list_campaign_name,
    list_dataset_name,
    list_sensitivity,
    list_gain_dB,
    list_peak_voltage,
    list_datetime_begin,
    list_datetime_end,
):

    print(f"\n### {dataset_name}")

    dataset = Spectrogram(
        dataset_path=Path(path_osmose_dataset, campaign_name, dataset_name),
        owner_group="gosmose",
        local=False,
    )

    dataset.spectro_duration = spectro_duration
    dataset.dataset_sr = dataset_sr
    dataset.nfft = nfft
    dataset.window_size = window_size
    dataset.overlap = overlap
    dataset.data_normalization = data_normalization_param
    dataset.zscore_duration = zscore_duration
    dataset.sensitivity = sensitivity
    dataset.gain_dB = gain_dB
    dataset.peak_voltage = peak_voltage
    dataset.spectro_normalization = spectro_normalization_param
    dataset.dynamic_max = dynamic_max
    dataset.dynamic_min = dynamic_min
    dataset.colormap = colormap
    dataset.hp_filter_min_freq = hp_filter_min_freq
    dataset.batch_number = batch_number

    ## segmentation
    dataset.initialize(
        env_name=sys.executable.replace("/bin/python", ""),
        force_init=force_init,
        datetime_begin=datetime_begin,
        datetime_end=datetime_end,
    )

    ## spectrogram generation
    generate_spectro(
    dataset=dataset,
    path_osmose_dataset=path_osmose_dataset,
    overwrite=True,
    save_matrix=save_matrix,
    datetime_begin=datetime_begin,
    datetime_end=datetime_end,
    )

### Track progress
You can monitor specific jobs status put their names in this list as follows, eg `file_list = ['job1_ID','job2_ID']` or `file_list = 'job1_ID'` for a single job

In [None]:
# FILL IN RED PART !
monitor_job(["9893958.datarmor0", "9893959.datarmor0", "9893960.datarmor0"])