# Save Different Stages of the Chirp Sequence Pipeline as MAT Files

If you haven't gone through the `Walkthrough.ipynb` notebook, please take a look at that notebook first, as it contains descriptions of all configurable parameters used here.

In [None]:
import Pkg;
Pkg.develop(path="../BatlabJuliaUtils")
using BatlabJuliaUtils
using Plots;
using Printf;
using MAT;
using Statistics;
using DataStructures;

## Specify MAT Files

In [None]:
AUDIO_FILENAME = "../data/Pu166_02.mat";
CENTROID_FILENAME = "../data/centroid/Pu166_002_centroidxyz.mat";
MIC_POSITION_FILENAME = "../data/mic_positions_fall2021.mat";

CENTROID_VARIABLE_NAME = collect(keys(matread(CENTROID_FILENAME)))[1];
MIC_POSITION_VARIABLE_NAME = collect(keys(matread(MIC_POSITION_FILENAME)))[1];

In [None]:
(@printf "Do the variable names for these MAT files look right?\n\tFor the centroid file: \"%s\",\n\tand for the mic position file: \"%s\"" CENTROID_VARIABLE_NAME MIC_POSITION_VARIABLE_NAME)

If the variable names for the MAT files don't look right, then uncomment and run the following two cells.

In [None]:
# println("The keys of the centroid file are:\n", keys(matread(CENTROID_FILENAME)), "\nand the MAT file looks like")
# centroids = matread(CENTROID_FILENAME)

In [None]:
# println("The keys of the mic position file are:\n", keys(matread(MIC_POSITION_FILENAME)), "\nand the MAT file looks like")
# mic_locations = matread(MIC_POSITION_FILENAME)

## Set up where to save data

In [None]:
## Makes a new folder in the current directory to store the data
mkpath("saved_chirp_sequences");

Each function that saves data takes in the arguments `dataset_name` and `save_dir`, among others like the microphone and centroid data, _etc._
- `dataset_name` is the name of the dataset, e.g. `Pu166_01`. The name of any MAT file saved by this notebook will start with this dataset name.
- `save_dir` is the directory in which to save the data. A default directory was created in the cell above.

In [None]:
DATASET_NAME = "Pu166_02";
SAVE_DIR = "./saved_chirp_sequences";

## Read in microphone and centroid data

In [None]:
## Read in microphone data
y = readmicdata(AUDIO_FILENAME);

In [None]:
centroids = Matrix(transpose( matread(CENTROID_FILENAME)[CENTROID_VARIABLE_NAME]));
mic_positions = Matrix(transpose( matread(MIC_POSITION_FILENAME)[MIC_POSITION_VARIABLE_NAME]));

## Stage 1: High-SNR Regions
**Note**: to run future sections, you need to run the **Parameters** and **Load Helper Functions** parts of previous sections, but you don't need to run the **Save Data** part of Stage 1 if you only want the output from Stage 2, _e.g._. Please do read through the description of the saved data for all stages, however, so that you can understand what is saved for each stage.

### Parameters (From `Walkthrough.ipynb`)

In [None]:
### SET THESE PARAMETERS ###
SIGNAL_THRESH = 30
MAXFILTER_LENGTH_MS = 0.1
MAXFILTER_LENGTH = Int64(round(MAXFILTER_LENGTH_MS / 1000 * FS));

MIN_PEAK_THRESH = 35;
SNR_DROP_THRESH = 20;
PEAK_SNR_THRESH_RADIUS = 2500;

TAIL_SNR_THRESH = 20;
TAIL_MAXFILTER_LENGTH = 50;
#############################
HIGH_SNR_PARAMS = Dict(
    :signal_thresh => SIGNAL_THRESH,
    :maxfilter_length => MAXFILTER_LENGTH,
    :min_peak_thresh => MIN_PEAK_THRESH,
    :snr_drop_thresh => SNR_DROP_THRESH,
    :peak_snr_thresh_radius => PEAK_SNR_THRESH_RADIUS,
    :tail_snr_thresh => TAIL_SNR_THRESH,
    :tail_maxfilter_length => TAIL_MAXFILTER_LENGTH
);

### Load Helper Functions

In [None]:
include("SaveHighSnrRegions.jl");

In [None]:
function getmicfield(mat_data, name_after_mic_k_, mic)
    return mat_data["mic_" * string(mic) * "_" * name_after_mic_k_];
end

### Save Data

In [None]:
savehighsnrregions(y, DATASET_NAME, SAVE_DIR);

### Breakdown of saved data
A mat file will be stored in `save_dir` from above, with the name format `{dataset_name}_high_snr_regions.mat` (for example. `Pu166_01_high_snr_regions.mat`).

In [None]:
filename_stage1 = (@sprintf "%s/%s_high_snr_regions.mat" SAVE_DIR DATASET_NAME)

In [None]:
saved_mat_data_stage1 = matreadsorted(filename_stage1)

In [None]:
mics = Vector{Int}(undef, 0);
num_high_snr_regions_per_mic = Dict();
for key=keys(saved_mat_data_stage1)
    maybe_match = match(r"mic_(\d)_high_snr_region_onsets_ms", key);
    if isnothing(maybe_match)
        continue
    end
    mic = parse(Int64, maybe_match[1]);
    mics = vcat(mics, mic);
    num_high_snr_regions_per_mic[mic] = length(saved_mat_data_stage1[key]);
    @printf "***For mic %d, there were %d high-SNR regions found.***\n" mic length(saved_mat_data_stage1[key])
end
println("Mics that found high-SNR regions: ", mics)

**The variables in the mat file are**:
- `mic_k_data_per_high_snr_region`: array where each column is a different high-
    SNR region for microphone `k`. Zeros are added to the end of each column to make all columns
    the same length.
- `mic_k_high_snr_region_lengths`: length, in audio samples, of each high-SNR region.
- `mic_k_high_snr_region_onsets_ms`: time, in milliseconds since the beginning of
    the microphone data, that the high-SNR region starts.
- `mic_k_snr_data_per_high_snr_region`: SNR of each high-SNR region, in the same
    format as `mic_k_data_per_high_snr_region`.

`k` is the number of any microphone that found at least one high-SNR region.

**We can get the mic data for the `i`-th high-SNR region for microphone `k` as follows:**

In [None]:
k = 1; # SET TO THE MIC YOU WANT
i = 10; # SET TO WHICH HIGH-SNR REGION YOU WANT

@assert !isnothing(findfirst(mics .== k)) "Mic chosen didn't find any high-SNR regions!"
@assert i <=  num_high_snr_regions_per_mic[k] "k is larger than the number of high-SNR regions found!"

In [None]:
high_snr_region_length = getmicfield(saved_mat_data_stage1, "high_snr_region_lengths", k)[i];
mic_data = getmicfield(saved_mat_data_stage1, "data_per_high_snr_region", k);

## Remove extra zeros at the end!
mic_data = mic_data[1:high_snr_region_length, i]
plotSTFTtime(mic_data, noverlap=255)

## Stage 2: Chirp Sequences

### Parameters (From `Walkthrough.ipynb`)

In [None]:
### SET THESE PARAMETERS ###
TEMPORAL_TOLERANCE_MS = 2;
SINGLE_MIC_SNR_THRESH = 85; 
ANY_MIC_SNR_THRESH = 45; 
#############################
CHIRP_SEQ_PARAMS = Dict(
    :vocalization_start_tolerance_ms => TEMPORAL_TOLERANCE_MS,
    :single_mic_snr_thresh => SINGLE_MIC_SNR_THRESH,
    :any_mic_snr_thresh => ANY_MIC_SNR_THRESH,
    HIGH_SNR_PARAMS...
);

### Load Helper Functions

In [None]:
include("SaveChirpSequences.jl");

### Save Data

In [None]:
savechirpsequences(y, centroids, mic_positions, DATASET_NAME, SAVE_DIR; CHIRP_SEQ_PARAMS...);

In [None]:
filename_stage2 = (@sprintf "%s/%s_chirp_sequences.mat" SAVE_DIR DATASET_NAME)
saved_mat_data_stage2 = matreadsorted(filename_stage2)

### Breakdown of saved data

**`vocalization_times`**: list of estimated times that the bat vocalized.

In [None]:
vocalization_times = saved_mat_data_stage2["vocalization_times"];
println("There were ", length(vocalization_times), " vocalizations!");
myplot(vocalization_times, ones(length(vocalization_times)), line=:stem, marker=:circle, color=:1, markersize=5,
    title="Vocalizations ", xlabel="Milliseconds", ylabel="", size=(1200, 200), yrange=(0, 1.2), legend=false)

**`valid_mics`**: matrix with N columns and 4 rows, where N is the total number of chirp sequences. Each column consists of 0s and 1s for whether each microphone heard anything for the chirp sequence corresponding to that column.

For instance, if `valid_mics` is
```
1   1   0   1   1
0   0   0   0   0
1   0   1   1   1
0   1   1   0   0
```
this means that the first vocalization was picked up by microphones 1 and 3, the second vocalization was picked up by microphones 1 and 4, the third was picked up by microphones 3 and 4, _etc._

In [None]:
valid_mics_stage2 = saved_mat_data_stage2["valid_mics"]

**Rest of the variables:**
- `mic_k_chirp_seq_lengths`: length, in samples, of the single-mic chirp sequence heard by microphone `k` for each vocalization. This will be `0` for vocalizations that mic `k` did not pick up.
- `mic_k_data_per_chirp_seq`: oscillogram data for the single-mic chirp sequence heard by microphone `k` for each vocalization. Zeros are added to the end of each column to make all columns the same length. This is all zeros for vocalizations that mic `k` did not pick up.
- `mic_k_snr_data_per_chirp_seq`: same as `mic_k_data_per_chirp_seq`, but SNR data instead of oscillogram data.

**Get microphone data for a vocalization**:

In [None]:
MIC = 1; ## Change this to any: 1, 2, 3, or 4 ##
CHIRP_NUM = 1; # You can change this too
if valid_mics_stage2[MIC, CHIRP_NUM] == 0
    println("Mic ", MIC, " did not hear vocalization ", CHIRP_NUM, ".");
else
    seq_length = getmicfield(saved_mat_data_stage2, "chirp_seq_lengths", MIC)[CHIRP_NUM];
    data = getmicfield(saved_mat_data_stage2, "data_per_chirp_seq", MIC)[1:seq_length, CHIRP_NUM];
    plotSTFTtime(data, noverlap=255)
end

## Stage 3: Chirps and Melodies

### Parameters (From `Walkthrough.ipynb`)

In [None]:
### SET THESE PARAMETERS ###
MAXIMUM_MELODY_SLOPE = 5;
MELODY_DROP_THRESH_DB = 20;
FIND_HIGHEST_SNR_IN_FIRST_MS = 1.5;
MELODY_THRESH_DB_LOW = -20;
MOVING_AVG_SIZE = 10;
MELODY_DROP_THRESH_DB_START = 35; 
#############################
CHIRP_MELODY_PARAMS = Dict(
    :maximum_melody_slope => MAXIMUM_MELODY_SLOPE,
    :melody_drop_thresh_db => MELODY_DROP_THRESH_DB,
    :melody_thresh_db_low => MELODY_THRESH_DB_LOW,
    :moving_avg_size => MOVING_AVG_SIZE,
    :melody_drop_thresh_db_start => MELODY_DROP_THRESH_DB_START,
    :find_highest_snr_in_first_ms => FIND_HIGHEST_SNR_IN_FIRST_MS,
    CHIRP_SEQ_PARAMS...
);

### Load Helper Functions

In [None]:
include("SaveChirpsAndMelodies.jl");

### Save Data

In [None]:
savechirpsandmelodies(y, centroids, mic_positions, DATASET_NAME, SAVE_DIR; CHIRP_MELODY_PARAMS...);

In [None]:
filename_stage3 = (@sprintf "%s/%s_chirps_and_melodies.mat" SAVE_DIR DATASET_NAME)
saved_mat_data_stage3 = matreadsorted(filename_stage3)

### Breakdown of saved data

**`valid_mics`**: exactly the same as in Stage 2.

**`highest_snr_estimated_chirp_per_chirp_seq`**: matrix where each column corresponds to a different bat vocalization. Each column contains the highest-SNR estimate of the bat vocalization for the corresponding multi-mic chirp sequence.

**`highest_snr_chirp_length_per_chirp_seq`**: length, in samples, of the chirps in `highest_snr_estimated_chirp_per_chirp_seq`.

**`highest_snr_melody_kHz_per_chirp_seq`**: matrix where each column corresponds to a different bat vocalization. Each column is the fundamental harmonic, in kHz, estimated using the corresponding column of `highest_snr_estimated_chirp_per_chirp_seq`.

**`updated_vocalization_times`**: now, we have a (hopefully) better estimate of when each bat vocalization happened.

**Rest of the variables:**

- `mic_k_chirp_lengths`: length, in samples, of the vocalization estimated by mic `k` for each multi-mic chirp sequence. This (and all subsequent variables lister here) will be `0` for vocalizations that mic `k` did not pick up.
- `mic_k_estimated_chirp_per_chirp_seq`: matrix where each column corresponds to a different bat vocalization. Each column contains the vocalization that mic `k` estimated for the corresponding multi-mic chirp sequence. 
- `mic_k_melody_kHz_per_chirp_seq`: matrix where each column corresponds to a different bat vocalization.  Each column is the fundamental harmonic, in kilohertz, estimated by mic `k`, of the bat vocalization. This will be the same length as `mic_k_estimated_chirp_per_chirp_seq`.
- `mic_k_samples_cut_off_from_beginning_per_chirp_seq`: The number of samples, if, any, were cut off from the beginning of the corresponding estimated chirp due to low SNR.

In [None]:
updated_vocalization_times = saved_mat_data_stage3["updated_vocalization_times"];
println("There were ", length(updated_vocalization_times), " vocalizations!");
# myplot(updated_vocalization_times, ones(length(updated_vocalization_times)), line=:stem, marker=:circle, color=:1, markersize=5,
#     title="Vocalizations ", xlabel="Milliseconds", ylabel="", size=(1200, 200), yrange=(0, 1.2), legend=false)

**Get the estimated chirp and melody for a vocalization, using the highest-SNR data**:

In [None]:
CHIRP_NUM = 8; # You can change this to look at another vocalization
len = saved_mat_data_stage3["highest_snr_chirp_length_per_chirp_seq"][CHIRP_NUM];
estimated_chirp = saved_mat_data_stage3["highest_snr_estimated_chirp_per_chirp_seq"][1:len, CHIRP_NUM];
estimated_melody_kHz = saved_mat_data_stage3["highest_snr_melody_kHz_per_chirp_seq"][1:len, CHIRP_NUM];

In [None]:
plotSTFTtime(estimated_chirp, noverlap=255);
plot!(audioindextoms.((1:len) .+ 128), estimated_melody_kHz, linewidth=3, color=:blue)

### Stage 4: Optimized Chirps

**Note** this section is not for the impatient; it takes a long time to run the `saveoptimizationresults` cell.

### Parameters (From `Walkthrough.ipynb`)

In [None]:
### SET THESE PARAMETERS ###
H_FFT_THRESH = 0.1;
DATA_FITTING_WEIGHT = 70;
H_SPARSITY_WEIGHT = 10;
MELODY_WEIGHT = 35;
MAX_ITER = 10000;
MELODY_RADIUS_START = 10;
MELODY_RADIUS_END = 0;
#############################
OPT_PARAMS = Dict(
    :h_fft_thresh => H_FFT_THRESH,
    :data_fitting_weight => DATA_FITTING_WEIGHT,
    :h_sparsity_weight => H_SPARSITY_WEIGHT,
    :melody_weight => MELODY_WEIGHT,
    :max_iter => MAX_ITER,
    :melody_radius_start => MELODY_RADIUS_START,
    :melody_radius_end => MELODY_RADIUS_END,
    CHIRP_MELODY_PARAMS...
);

### Load Helper Functions

In [None]:
include("SaveOptimizationResult.jl");

In [None]:
saveoptimizationresult(y, centroids, mic_positions, DATASET_NAME, SAVE_DIR; OPT_PARAMS...);

In [None]:
filename_stage4 = (@sprintf "%s/%s_optimization_result.mat" SAVE_DIR DATASET_NAME)
saved_mat_data_stage4 = matreadsorted(filename_stage4)

### Breakdown of saved data

**`valid_mics`**: exactly the same as in Stage 2.

**`estimated_vocalizations`**: matrix where each column corresponds to a different bat vocalization. Each column is what the optimization algorithm estimates the bat vocalization to be.

**`estimated_vocalization_lengths`**: the estimated length, in microphone samples, of each bat vocalization.

**`mic_k_impulse_response_per_chirp_seq`**: matrix where each column corresponds to a different bat vocalization. Each column is the impulse response that maps the estimated vocalization to the corresponding microphone output. This is all zero if the microphone didn't pick up the vocalization.

**`impulse_response_lengths`**: the length of the impulse responses for each vocalization. The impulse response for each microphone has the same length.

**Get the estimated chirp for a vocalization**:

_Run the following cell once_:

In [None]:
i = 0;

_Then run this cell again and again to loop through the vocalizations:_

In [None]:
i = min(i+1, length(saved_mat_data_stage4["estimated_vocalization_lengths"]));
(@printf "Vocalization %d\n" i);
if i == length(saved_mat_data_stage4["estimated_vocalization_lengths"])
    println("(this is the last vocalization)");
end

CHIRP_NUM = i; # You can change this to look at another vocalization
len = saved_mat_data_stage4["estimated_vocalization_lengths"][CHIRP_NUM];
estimated_chirp = saved_mat_data_stage4["estimated_vocalizations"][1:len, CHIRP_NUM];
plotSTFTtime(estimated_chirp, noverlap=255)

**Compare with Highest-SNR Mic Output**

The top plot is the output of the optimization algorithm (should be de-echoed), and the bottom plot is what the highest-SNR microphone picked up.

In [None]:
i = 0;

In [None]:
i = min(i+1, length(saved_mat_data_stage4["estimated_vocalization_lengths"]));
(@printf "Vocalization %d\n" i);
if i == length(saved_mat_data_stage4["estimated_vocalization_lengths"])
    println("(this is the last vocalization)");
end

CHIRP_NUM = i; # You can change this to look at another vocalization
len = saved_mat_data_stage4["estimated_vocalization_lengths"][CHIRP_NUM];
estimated_chirp = saved_mat_data_stage4["estimated_vocalizations"][1:len, CHIRP_NUM];
chirp_init = saved_mat_data_stage3["highest_snr_estimated_chirp_per_chirp_seq"][1:len, CHIRP_NUM];
p1 = plotSTFTtime(estimated_chirp, noverlap=255, title="Optimized Vocalization Estimate");
p2 = plotSTFTtime(chirp_init, noverlap=255, title="Highest-SNR Mic Output");
plot(p1, p2, layout=(2, 1), size=(1100, 600))