# Instructions to Use this Notebook

1. To run this notebook, first run `Libraries to Install`, `Imports`, and the subsections of`Constants and Classes` (`Constants`, `Model`, and `Annotation_Maker`)in that order. Then depending on if you want to use `Generate CSV` or `Generate Training Data`, run that cell specifically. Make sure you have modified whatever user parameters you would like to use before running this cell.

2. User adjustable parameters are located towards the bottom of this notebook under `Generate CSV` and `Generate Training Data`. What they do will be explained there.
 
3. At the very bottom of this file are the times of how long it took to run different set ups to give you an idea of your runtime (e.g spectral subtaction vs no spectral subtraction, 11 minute file vs 3 hour long file, generating images with predictions drawn on top versus without)

### Common Troubleshooting Issues For When the Code Does Not Run
- Make sure the paths are valid and formatted properly
  - If the file is a directory it must have a / at the end
  - If the file is an audio file or model path file, make sure it exists and that the path is correct
- If the code crashes due to memory issues, try splitting the audio file into multiple, shorter audio files. You can monitor memory in the right corner of this notebook in colab
- If you didn't hit run all to run this notebook, make sure you run the cells in order starting from `Libraries to Install`


### Libraries to Install

Run the following cell to install libraries not already installed onto Google Colab

In [None]:
!pip install pydub



### Imports

The following imports are used to create the `Annotation_Maker` and `Model` classes.

In [None]:
# imports
import numpy as np
import tensorflow as tf
from PIL import Image,ImageDraw
import os

import glob
from pydub import AudioSegment
import librosa
import matplotlib.pyplot as plt
import scipy.signal as signal
import csv
import gc 

import shutil

import warnings
warnings.filterwarnings( "ignore") #surpresses a divide by 0 warning that's not really an issuesince the library handles it

### Constants and Classes

There are two classes defined in the following cell: `Model` and `Annotation_Maker`. `Model` is used as a wrapper class for the the `tflite` model generated by Google Cloud Vision. `Annotation_Maker` uses `Model` to generate a csv file.

#### Constants

Important variables used in the classes `Model` and `Annotation_Maker`

In [None]:

# time constants for pydub
minute = 1000*60
second = 1000
half_minute = 1000*30

# pyplot spectrogram constants
ymax = 2000 #max frequency
dim = 11.05 #image dimensions (forces the image to be a square image whose sides are 600px )
prefix = 'whale_'

# converting to time_frequency dictionary constants
frequency_max = 2000
time_max = 60

# image scaling constants
W_H = 600 #pixel dimensions of image

# spectral subtraction constant to convert pydub segments to work with librosa
int16_max = 32767

# class mappings
class_dict = {
    0: "hb whale",
    1: "other"
}

#### Model

The wrapper class for the `tflite` model

In [None]:

# wrapper class for the tflite model, used in the Annotation_Maker class
class Model:
  
  def __init__(self, tf_path, thresh = 0.5):
    # thresh is the thresh hold of confidence for a prediction
    self.interpreter = None
    self.input_details = None
    self.input_shape = None

    self._init_model(tf_path)
    self.thresh = thresh
    self.d_x = self.input_shape[1]
    self.d_y = self.input_shape[2]

  def _init_model(self, tf_path):
    #helper function to initialize the variables in init
    self.interpreter = tf.lite.Interpreter(model_path=tf_path)
    self.input_details = self.interpreter.get_input_details()[0]
    self.input_shape = self.input_details['shape']
    self.output_details = self.interpreter.get_output_details()
    self.interpreter.allocate_tensors()  

  
  def _reshape_input(self, image_array):
    # converts the image into a numpy array and rearranges it to what the tflite model expects
    return [np.array(Image.open(img).resize((self.d_x, self.d_y)))[:, :,:3].reshape(1,self.d_x, self.d_y,3) for img in image_array]


  
  def _filter_preds(self, image_name):
      # formats the predictions from the tflite model in a usable way
      # also filters out predictions whose scores are below a certain thresh hold

      num_det = int(self.interpreter.get_tensor(self.output_details[3]['index'])[0])
      boxes = self.interpreter.get_tensor(self.output_details[0]['index'])[0]
      classes = self.interpreter.get_tensor(self.output_details[1]['index'])[0]
      scores = self.interpreter.get_tensor(self.output_details[2]['index'])[0]
  
      ret_boxes = []
      ret_classes = []
      ret_scores = []
      ret_det = num_det              
                                     
      for i in range(num_det):

        if scores[i] >= self.thresh:
          ret_boxes.append(boxes[i])
          ret_classes.append(classes[i])
          ret_scores.append(scores[i])
        else:
          ret_det -= 1

      return {
          "image_name": image_name,
          "boxes": ret_boxes,
          "classes": ret_classes,
          "scores": ret_scores,
          "num_det":ret_det
      }


  def _draw_box(self, pred, dir):
    # draws prediction boxes onto a single image (classes are not labeled though)
     
    img = Image.open(pred["image_name"])
    draw = ImageDraw.Draw(img)

    for box in pred["boxes"]:
      new_box = box * W_H
      x0 = new_box[1]
      x1 = new_box[3]

      y0 = new_box[0]
      y1 = new_box[2]

      draw.rectangle([x0, y0, x1,y1 ],outline = "black")

    img.save(dir + os.path.basename(pred["image_name"]))


  def get_details(self):
    # get the tensor details about the tf_lite model
    return {
        "input": self.interpreter.get_input_details(),
        "output": self.interpreter.get_output_details()
    }
      

  def predict(self, image_array):
    # returns the predictions for each image
    input_data = self._reshape_input(image_array)
    image_preds = []

    for i in range(len(image_array)):
      self.interpreter.set_tensor(self.input_details['index'], input_data[i])
      self.interpreter.invoke()
      image_preds.append(self._filter_preds(image_array[i]))
 
    return image_preds

    
  def export_boxed_images(self, prediction_dict, dir_path):
    # draws the predictions for all images that are predicted on (classes are not labeled though)
    for pred in prediction_dict:
      self._draw_box(pred, dir_path)


#### Annotation_Maker

The class that generates both the csv and training images.

In [None]:

class Annotation_Maker:

  def __init__(self, tf_path = None, spectral_subtraction_noise_file = None, thresh_model = 0.5, thresh_area = 0.5):
    
    # if tf_path is not defined, you can only generate training images
    if tf_path:
      self.model = Model(tf_path, thresh_model)

    # self.mns will indicate if we are using spectral subtraction
    if spectral_subtraction_noise_file:
      nw, nsr = librosa.load(spectral_subtraction_noise_file, sr=None, mono=True)
      self.mns= np.mean(np.abs(librosa.stft(nw)), axis=1)
      del nw, nsr
      self.mns = self.mns.reshape((self.mns.shape[0],1))
    else:
      self.mns = None

    # thresh hold for overlapping boxes
    self.thresh = thresh_area


  def _handle_audio_file(self, audio_file, img_dir):
    #splits the audio file into minute long spectrograms every 30 seconds

    whale_song = AudioSegment.from_wav(audio_file)
    song_length = len(whale_song)
    pause = half_minute - (song_length % half_minute)

    if pause != half_minute:
      whale_song = whale_song + AudioSegment.silent(duration=pause)
    else:
      pause = 0

    song_length += pause
    prev_minute = 0
    
    if self.mns is not None: #if the model is trained on spectrally subtracted images
      while (prev_minute + minute) <= song_length:
        segment = whale_song[prev_minute:(prev_minute + minute)]
        self._spectral_subtraction(img_path=img_dir, start=int(prev_minute/1000), end=int((prev_minute + minute)/1000), segment = segment)
        prev_minute += half_minute
        gc.collect()

    else:
      while (prev_minute + minute) <= song_length:
        segment = whale_song[prev_minute:(prev_minute + minute)]
        Annotation_Maker._create_spectrogram(img_path=img_dir, start=int(prev_minute/1000), end=int((prev_minute + minute)/1000), segment = segment)
        prev_minute += half_minute

  @classmethod 
  def _create_spectrogram(cls, img_path, start, end, segment = None, samples = None, sample_rate = None):
      # generates a spectrogram from a minute long sound segment
      if segment:
        samples = np.array(segment.get_array_of_samples())
        sample_rate = segment.frame_rate

      frequencies, times, spectrogram = signal.spectrogram(samples, sample_rate)

      fig,ax = plt.subplots()
      plt.specgram(samples,Fs=sample_rate,NFFT=14705)

      plt.axis(ymax=ymax)
      ax.set_aspect(1./ax.get_data_ratio())
      plt.axis('off')
      
      fig.set_size_inches(dim,dim, forward=True)

      new_file = prefix + str(start) + '_' + str(end)
      plt.savefig(img_path + new_file, bbox_inches='tight', pad_inches = 0)
      plt.close()


  def _spectral_subtraction(self, img_path, start,end, segment):
    # spectrally subtracts the minute long sound segment with given noise file before generating an image out of it

    w = np.array(segment.get_array_of_samples()).astype(np.float32)/int16_max
    sr = segment.frame_rate

    s= librosa.stft(w)    # Short-time Fourier transform
    angle = np.angle(s)  # get phase

    ss = np.abs(s)         # get magnitude
    b =np.exp(1.0j* angle) # use this phase information when Inverse Transform

    sa = ss - self.mns  
    sa0 = sa * b  # apply phase information
    y = (librosa.istft(sa0) * int16_max).astype(np.int16) 

    Annotation_Maker._create_spectrogram(img_path=img_path, start=start, end=end, samples=y, sample_rate=sr)


  def _predict_on_images(self, img_array, box_path):
    # runs the model on the images generated from _handle_audio_file
    # if box_path is defined, exports the drawn image predictions from the model there
    preds = self.model.predict(img_array)
    if box_path is not None:
      if not os.path.exists(box_path):
        os.makedirs(box_path)
      self.model.export_boxed_images(preds, box_path)
    
    return preds
    
  @classmethod
  def _process_preds(cls, preds):
    #converts predictions to time-frequency domain
    for pred in preds:
      time_freq_list = []
      for box,class_num, score in zip(pred["boxes"], pred["classes"], pred["scores"]):
        time_freq_list.append(
            cls._convert_to_time_freq(pred["image_name"], box[0], box[1], box[2], box[3], class_num, score)
        )
      pred["time_freq"] = time_freq_list

    return preds

  @classmethod
  def _convert_to_time_freq(cls, image_name, y0, x0, y1, x1, class_num, score):
    # helper function to convert to time-frequency domain
    time_start = int(os.path.basename(image_name).split("_")[1])
    time_begin = (time_max * x0) + time_start
    time_end = (time_max * x1) + time_start

    freq_low = frequency_max - (frequency_max * y1) 
    freq_high = frequency_max - (frequency_max * y0)

    return {
            "class": class_num,
            "time_start": time_begin,
            "time_end": time_end,
            "freq_low":freq_low,
            "freq_high": freq_high,
            "score": score
           }

  @classmethod
  def _handle_overlaps(cls, preds, area_thresh):  
    # condenses predictions of the same class with at least area_thresh of overlap
    # returns a list of dictionaries whose format is defined in _condense_to_pred
    pred_list = []

    for pred in preds:
      for time_freq in pred["time_freq"]:
        pred_list.append(time_freq)

    pred_list = sorted(pred_list, key= lambda k: k["time_start"]) 

    return cls._condense_preds(pred_list, area_thresh)
    
  @classmethod
  def _condense_preds(cls, pred_list, area_thresh):
    # helper function that condenses predictions of the same class
    i = 0
    list_len = len(pred_list)
    while i + 1 < list_len:
      
      j = i + 1

      while j < list_len and pred_list[j]["time_start"] < pred_list[i]["time_end"]:
        if cls._check_class_and_area(pred_list[i], pred_list[j], area_thresh):
          pred_list[i] =  cls._condense_to_pred(pred_list[i], pred_list[j])
          del pred_list[j]
          list_len -= 1
          i-=1
          break
        else:
          j+=1

      i += 1
    
    return pred_list

  @classmethod
  def _condense_to_pred(cls, A, B):
    # helper function to condense 2 predictions
    return {
        "class": A["class"],
        "time_start": A["time_start"],
        "time_end": max(A["time_end"],B["time_end"]),
        "freq_low": min(A["freq_low"],B["freq_low"]),
        "freq_high": max(A["freq_high"],B["freq_high"]),
        "score": (A["score"] + B["score"])/2 # averaging the score
    }
  @classmethod
  def _check_class_and_area(cls, pred_A, pred_B, area_thresh):
    # boolean function to determine if we should condense two predictions

    if pred_A["class"] == pred_B["class"]:
      t = pred_A["time_end"] - pred_B["time_start"]
      f = min(pred_A["freq_high"], pred_B["freq_high"]) - max(pred_A["freq_low"], pred_B["freq_low"])

      shared_area = t * f

      A = (pred_A["time_end"] - pred_A["time_start"]) * (pred_A["freq_high"] - pred_A["freq_low"])
      B = (pred_B["time_end"] - pred_B["time_start"]) * (pred_B["freq_high"] - pred_B["freq_low"])

      return shared_area > 0 and (shared_area/A >= area_thresh or shared_area/B >= area_thresh)
    return False

  @classmethod
  def _make_csv(cls, output_csv_path, pred_list):
    # generates the csv after all the predictions have been condensed 
    # pred_list is a list of dictionaries whose format is defined in _condense_to_pred
    rows = []
    first_row = ["Class", "Time Start(s)", "Time End(s)", "Frequency Low(Hz)", "Frequency High(Hz)", "Delta Time(s)","Delta Frequency(Hz)", "Score"]

    rows.append(first_row)
    
    for time_freq in pred_list:
      class_name = class_dict[time_freq["class"]]
      begin = time_freq["time_start"]
      end = time_freq["time_end"]
      low = time_freq["freq_low"]
      high = time_freq["freq_high"]
      delta_t = end-begin
      delta_f = high-low
      score = time_freq["score"]
      rows.append([class_name, begin, end, low, high, delta_t, delta_f, score])

    with open(output_csv_path, "w+") as csv_file:
      writer = csv.writer(csv_file)
      writer.writerows(rows)

  def generate_training_images(self, audio_file, image_segment_path):
      # generates training images for google cloud vision
      # if a noise file is given when initializing the class, it will generate spectrally subtracted images
      if not os.path.exists(image_segment_path):
        os.makedirs(image_segment_path)
      self._handle_audio_file(audio_file, image_segment_path)

  def generate_csv(self, audio_path, image_segment_path, output_csv_path, box_path = None):

    # generates a csv out of an audio file detecting the whale and other noises in it
    # can only be run if tf_path is defined when initializing the class

    # the csv is formatted as such: 
    # "Class", "Time Start(s)", "Time End(s)", "Frequency Low(Hz)", "Frequency High(Hz)", "Delta Time(s)","Delta Frequency(Hz)", "Score"

    if not os.path.exists(image_segment_path):
      os.makedirs(image_segment_path)
    self._handle_audio_file(audio_path, image_segment_path)

    img_arr = glob.glob(image_segment_path + "*")
    preds = self._predict_on_images(img_arr, box_path)
    del img_arr
    gc.collect()

    processed_preds = Annotation_Maker._process_preds(preds)
    del preds
    gc.collect()

    processed_preds = Annotation_Maker._handle_overlaps(processed_preds, self.thresh)
    Annotation_Maker._make_csv(output_csv_path, processed_preds)
    del processed_preds
    gc.collect()

# additional functions to wrap Annotation_Maker
def generate_a_csv(audio_path, tf_path, output_csv_path, thresh_model, thresh_area, image_segment_path, box_path, noise_file):
  am = Annotation_Maker(tf_path=tf_path, spectral_subtraction_noise_file=noise_file, thresh_model=thresh_model, thresh_area=thresh_area)
  display("Beginning to Generate a CSV")
  if image_segment_path is None:
    temp_path = "./tmp/"
    am.generate_csv(audio_path=audio_path, image_segment_path=temp_path, output_csv_path=output_csv_path, box_path=box_path)
    shutil.rmtree(temp_path)
  else:
    am.generate_csv(audio_path=audio_path, image_segment_path=image_segment_path, output_csv_path=output_csv_path, box_path=box_path)
  gc.collect()
  display("Completed")


def generate_the_training_images(audio_path, image_segment_path, noise_file):
  display("Beginning to Generate Training Images")
  am = Annotation_Maker(spectral_subtraction_noise_file= noise_file)
  am.generate_training_images(audio_path, image_segment_path)
  gc.collect()
  display("Completed")

### **Generate CSV**

Run the following cell to generate a csv with the following format:  
`Class, Time Start(s), Time End(s), Frequency Low(Hz), Frequency High(Hz), Delta Time(s), Delta Frequency(Hz), Score`

Modify the following values in the cells to control what the following code does

#### *General User modifiable parameters:*

`audio_path`: The path to where the audio file that you want to detect whale sounds on is located

`tf_path`: The path to the `tflite` model.

`output_csv_path`:The path of the csv you want to create

`thresh_model`(range 0 to 1 - should be in decimals): The minimum percentage of confidence that the prediction is correctly detecting something 

`thresh_area`(range 0 to 1 - should be in decimals): The minimum percentage of area of overlap between two overlapping predictions to be treated as a single prediction.

`image_segment_path` (default value is `None`): If given `None`, the code will generate images for the model to predict on and deletes them. If given a path to a directory, it will create that directory if it doesn't exist and won't delete the images stored there after it generates the csv.  
**Important:** Make sure your path ends in a **/** if you give it a directory. 

`box_path` (default value is `None`): If given `None`, the code will not generate images with the initial unprocessed predictions given by the `tflite_model`. If given a path, those images will be generated and stored at that path. Giving this value a path typically adds 1-3 minutes to the total run time on Google Colab.
**Important:** Make sure your path ends in a **/** if you give it a directory. 


#### *Spectral Subtraction specific modifiers:*

**Note**: Spectral Subtraction causes this code to run for twice as long with minimal improvement for our current models.

`noise_file` (default value is `None`): The path to wehere the background noise file used in spectral subtraction will be located. Do not set this value if you are not using a model trained on spectrally subtracted images. If given `None`, assumes spectral subtraction won't be used. 





In [None]:

audio_path = "/content/whale_song_segment_0_10.wav"
tf_path = "/content/drive/MyDrive/Capstone - Whale Sounds/Models/Model6/Model6_model-export_iod_tflite-Model6-2021-05-30T05_10_46.676148Z_model.tflite"
output_csv_path = "small_test_s.csv"

thresh_model = 0.5
thresh_area = 0.5

image_segment_path = None
box_path = None

noise_file = None

generate_a_csv(audio_path, tf_path, output_csv_path, thresh_model, thresh_area, image_segment_path, box_path, noise_file)
gc.collect()


### **Generate Training Images**

Run the following cell to generate training images to use in Google Cloud Vision. Make sure the `image_segment_path` is a place in your Google Drive so you can download the images easily.

#### *General User modifiable parameters:*

`audio_path`: The path to where the audio file that you want to use for training to be located

`image_segment_path`: The path to where your training images will be located

**Important:** Make sure your path ends in a **/** if you give it a directory. 

#### *Spectral Subtraction specific modifiers:*

`noise_file` (default value is `None`): The path to wehere the background noise file used in spectral subtraction will be located. Only set this if you want to create spectrally subtracted models from Google Cloud Vision



In [None]:
# remove the # of the lines below to run this code
# audio_path = "/content/whale_song_segment_0_10.wav"
# image_segment_path = "/content/test_dir_small/"

# noise_file = None

# generate_the_training_images(audio_path, image_segment_path, noise_file)
# gc.collect()

# Performance Metrics

#### Generating CSVs

##### Small File (11 Minutes), No spectral Subtraction, No Prediction Outputs

In [None]:
# %%time
# audio_path = "/content/whale_song_segment_0_10.wav"
# tf_path = "/content/drive/MyDrive/Capstone - Whale Sounds/Models/Model5/Model_No_Spectral_Subtraction.tflite" 
# output_csv_path = "small_test_no_s_no_p.csv"

# thresh_model = 0.5
# thresh_area = 0.5

# image_segment_path = None
# box_path = None
# noise_file = None

# generate_a_csv(audio_path, tf_path, output_csv_path, thresh_model, thresh_area, image_segment_path, box_path, noise_file)

'Beginning to Generate a CSV'

'Completed'

CPU times: user 35.9 s, sys: 213 ms, total: 36.1 s
Wall time: 36.1 s


##### Small File (11 Minutes), No spectral Subtraction, With Prediction Outputs

---



In [None]:
# %%time
# audio_path = "/content/whale_song_segment_0_10.wav"
# tf_path = "/content/drive/MyDrive/Capstone - Whale Sounds/Models/Model5/Model_No_Spectral_Subtraction.tflite" 
# output_csv_path = "small_test_no_s_w_p.csv"

# thresh_model = 0.5
# thresh_area = 0.5

# image_segment_path = None
# box_path = "/content/box_dir/"
# noise_file = None

# generate_a_csv(audio_path, tf_path, output_csv_path, thresh_model, thresh_area, image_segment_path, box_path, noise_file)

'Beginning to Generate a CSV'

'Completed'

CPU times: user 40.7 s, sys: 262 ms, total: 40.9 s
Wall time: 40.8 s


##### Small File (11 Minutes), Spectral Subtraction, No Prediction Outputs

In [None]:
# %%time
# audio_path = "/content/whale_song_segment_0_10.wav"
# tf_path = "/content/drive/MyDrive/Capstone - Whale Sounds/Models/Model6/Model_Spectral_Subtraction.tflite"
# output_csv_path = "small_test_w_s_no_p.csv"

# thresh_model = 0.5
# thresh_area = 0.5

# image_segment_path = None
# box_path = None
# noise_file = "/content/drive/MyDrive/Capstone - Whale Sounds/Matt-Bounding-Box-Work/Sounds/Original/noise.wav"

# generate_a_csv(audio_path, tf_path, output_csv_path, thresh_model, thresh_area, image_segment_path, box_path, noise_file)

'Beginning to Generate a CSV'

'Completed'

CPU times: user 1min 21s, sys: 2.4 s, total: 1min 23s
Wall time: 1min 26s


##### Small File (11 Minutes), Spectral Subtraction, With Prediction Outputs

In [None]:
# %%time
# audio_path = "/content/whale_song_segment_0_10.wav"
# tf_path = "/content/drive/MyDrive/Capstone - Whale Sounds/Models/Model6/Model_Spectral_Subtraction.tflite"
# output_csv_path = "small_test_w_s_w_p.csv"

# thresh_model = 0.5
# thresh_area = 0.5

# image_segment_path = None
# box_path = "/content/box_dir/"
# noise_file = "/content/drive/MyDrive/Capstone - Whale Sounds/Matt-Bounding-Box-Work/Sounds/Original/noise.wav"

# generate_a_csv(audio_path, tf_path, output_csv_path, thresh_model, thresh_area, image_segment_path, box_path, noise_file)

'Beginning to Generate a CSV'

'Completed'

CPU times: user 1min 24s, sys: 955 ms, total: 1min 25s
Wall time: 1min 25s


##### Large File (approx. 3hrs), No Spectral Subtraction, No Prediction Outputs

In [None]:
# %%time
# audio_path = "/content/drive/MyDrive/Capstone - Whale Sounds/amp_671658014.180929033558.wav"
# tf_path = "/content/drive/MyDrive/Capstone - Whale Sounds/Models/Model5/Model_No_Spectral_Subtraction.tflite"
# output_csv_path = "large_test_no_s_no_p.csv"

# thresh_model = 0.5
# thresh_area = 0.5

# image_segment_path = None
# box_path = None
# noise_file = None

# generate_a_csv(audio_path, tf_path, output_csv_path, thresh_model, thresh_area, image_segment_path, box_path, noise_file)

'Beginning to Generate a CSV'

'Completed'

CPU times: user 10min 40s, sys: 24.3 s, total: 11min 4s
Wall time: 11min 39s


##### Large File (approx. 3hrs),  No spectral Subtraction, With Prediction Outputs

In [None]:
# %%time
# audio_path = "/content/drive/MyDrive/Capstone - Whale Sounds/amp_671658014.180929033558.wav"
# tf_path = "/content/drive/MyDrive/Capstone - Whale Sounds/Models/Model5/Model_No_Spectral_Subtraction.tflite"
# output_csv_path = "large_test_no_s_w_p.csv"

# thresh_model = 0.5
# thresh_area = 0.5

# image_segment_path = None
# box_path = "/content/box_dir/"
# noise_file = None

# generate_a_csv(audio_path, tf_path, output_csv_path, thresh_model, thresh_area, image_segment_path, box_path, noise_file)

'Beginning to Generate a CSV'

  Z = 10. * np.log10(spec)


'Completed'

CPU times: user 12min 3s, sys: 51.8 s, total: 12min 55s
Wall time: 13min 37s



##### Large File (approx. 3hrs), Spectral Subtraction, No Prediction Outputs



In [None]:
# %%time
# audio_path = "/content/drive/MyDrive/Capstone - Whale Sounds/amp_671658014.180929033558.wav"
# tf_path = "/content/drive/MyDrive/Capstone - Whale Sounds/Models/Model6/Model_Spectral_Subtraction.tflite"
# output_csv_path = "large_test_w_s_no_p.csv"

# thresh_model = 0.5
# thresh_area = 0.5

# image_segment_path = None
# box_path = None
# noise_file = "/content/drive/MyDrive/Capstone - Whale Sounds/Matt-Bounding-Box-Work/Sounds/Original/noise.wav"

# generate_a_csv(audio_path, tf_path, output_csv_path, thresh_model, thresh_area, image_segment_path, box_path, noise_file)

'Beginning to Generate a CSV'

'Completed'

CPU times: user 21min 1s, sys: 23.7 s, total: 21min 24s
Wall time: 22min 4s


##### Large File (approx. 3hrs), Spectral Subtraction, With Prediction Outputs


In [None]:
# %%time
# audio_path = "/content/drive/MyDrive/Capstone - Whale Sounds/amp_671658014.180929033558.wav"
# tf_path = "/content/drive/MyDrive/Capstone - Whale Sounds/Models/Model6/Model_Spectral_Subtraction.tflite"
# output_csv_path = "large_test_w_s_w_p.csv"

# thresh_model = 0.5
# thresh_area = 0.5

# image_segment_path = None
# box_path = "/content/box_dir/"
# noise_file = "/content/drive/MyDrive/Capstone - Whale Sounds/Matt-Bounding-Box-Work/Sounds/Original/noise.wav"

# generate_a_csv(audio_path, tf_path, output_csv_path, thresh_model, thresh_area, image_segment_path, box_path, noise_file)

'Beginning to Generate a CSV'

'Completed'

CPU times: user 22min 16s, sys: 29.3 s, total: 22min 45s
Wall time: 23min 24s


### Generating Training Images

#### Large File (approx. 3hrs), No Spectral Subtraction

In [None]:
# %%time
# audio_path = "/content/drive/MyDrive/Capstone - Whale Sounds/amp_671658014.180929033558.wav"
# image_segment_path = "/content/test_dir/"

# noise_file = None

# generate_the_training_images(audio_path, image_segment_path, noise_file)
# gc.collect()

'Beginning to Generate Training Images'

'Completed'

CPU times: user 7min 47s, sys: 24.6 s, total: 8min 12s
Wall time: 8min 34s


#### Large File (approx. 3hrs), Spectral Subtraction

In [None]:
# %%time
# audio_path = "/content/drive/MyDrive/Capstone - Whale Sounds/amp_671658014.180929033558.wav"
# image_segment_path = "/content/test_dir/"

# noise_file = "/content/drive/MyDrive/Capstone - Whale Sounds/Matt-Bounding-Box-Work/Sounds/Original/noise.wav"

# generate_the_training_images(audio_path, image_segment_path, noise_file)
# gc.collect()

'Beginning to Generate Training Images'

'Completed'

CPU times: user 19min 43s, sys: 37.1 s, total: 20min 20s
Wall time: 20min 59s
