# How to prepare the audio? (manually)
- select any long single audio file.
- extract the audio and the substitles.
- then in audacity start a new project
  - import the audio file
  - then in Tracks>>Add New>>Label Tracks
  - then make sure each 10-20 secs audio has the label (this is the longest task)
  - as for export options:
    - select the folder (ideally an empty one) where you want your WAV files to be placed
    - for audio options choose Mono
    - for sample rate, choose "Other" and type 24000 into the dialog that comes up
    - for export range, choose Multiple Files
    - for the "Split files based on" section, choose Labels
    - in the "Name files" section, choose "Numbering after File name prefix" and choose a prefix that you'll also use in the TTS training Notebook (such as "B1" or similar)
    - click on Export

# How to prepare the audio? (Automactically)
follow this [notebook](https://colab.research.google.com/drive/1ZK-2lAV2DokrN92sYPJRl47XyuuluCJe?usp=sharing). but not reliable resource as it use whisper to transcribe and label.

# Prerequisities for this ipynb:
- audio should be 12-20 secs long
- and the folder should contain label.txt file init.
- label.txt file must be audacity compatible
- convert the folder in zip file and then upload here to make them ready for training.
- *ZIP file example:*
  - *B1-00001.wav*
  - *B1-00002.wav*
  - *B1-00003.wav*
  - *B1-00004.wav*
  - *labels.txt*

- name your files:
    - name your files, so they begin with the **same prefix** and always **have 5 digits** after it (such as B1-, e.g. *B1-00001.wav*)
- results structure:
  - *ZIP file example:*
    - *metadata.csv*
    - *metadata_phonetical.csv*
    - *training.csv*
    - *validation.csv*
    - *wavs/B1-00001.wav*
    - *wavs/B1-00002.wav*
    - *wavs/B1-00003.wav*
    - *wavs/B1-00004.wav*

In [None]:
#################
# Settings Form #
#################

# @title 1. Settings & Data Sources
# @markdown ***Data source to be used:***
# @markdown <small>(choose a data source options and then run this cell)</small>
data_source = "Google Drive (Google Colab only)" # @param ["Google Drive (Google Colab only)", "External Files (Dropbox etc.)"]

# @markdown ***Path of a ZIP file containing WAV files:***
# @markdown <br><small>An archive with all WAV files to be used for training.</small>
# @markdown <br><small>See [Section 0](#scrollTo=6DEC0-d48M-G) to verify that your ZIP file is in the correct form.</small>
# @markdown <br><small>If you're using Google Colab, this is the relative path from your Drive root (e.g. *tts/wavs.zip*)</small>
# @markdown <br><small>If you're using External Files, use a full URL (e.g. *https://www.example.com/wavs.zip*)</small>
wav_files_path = "tts/wavs.zip" # @param {type:"string"}

# @markdown ***WAV files prefix:***
# @markdown <br><small>Prefix for each of the WAV files, as described in [Section 0](#scrollTo=6DEC0-d48M-G).</small>
wavs_prefix = 'B1-' # @param {type:"string"}

# @markdown ***WAV files in ZIP archive are already prepared:***
# @markdown <br><small>If you're resuming training or have previously prepared your WAV files, so they conform to the structure generated by this Notebook (see [Section 0](#scrollTo=6DEC0-d48M-G)), tick this checkbox. In such case, this Notebook will not try to convert your WAV files, nor will trim leading and trailing silences from them.</small>
wav_files_prepared_for_training = False # @param {type:"boolean"}

# @markdown ***Silence trimming treshold:***
# @markdown <br><small>This value is used as the "*-af silenceremove*" ffmpeg parameter.</small>
# @markdown <br><small>I found the default (-60dB) to be good enough to trim silences from well-recorded, denoised WAV files.</small>
# @markdown <br><small>If this trims too much of the audio, try -96dB for a more lenient trim.</small>
db_trim_value = "-60dB" # @param {type:"string"}

# @markdown ***1st stage model ZIP archive (restoration):***
# @markdown <br><small>If you're resuming training and have your 1st stage model (and potentially its config file) stored in a ZIP files, provide its path here.</small>
# @markdown <br><small>If you're using Google Colab, this is the relative path from your Drive root (e.g. *tts/model_training_checkpoint.zip*)</small>
# @markdown <br><small>If you're using External Files, use a full URL (e.g. *https://www.example.com/model_training_checkpoint.zip*)</small>
model_1st_stage_backup_zip_file_path = "tts/model_backup_stage_1.zip" # @param {type:"string"}

# @markdown ***2nd stage model ZIP archive (restoration):***
# @markdown <br><small>If you're resuming training and have your 2nd stage model (and potentially its config file) stored in a ZIP files, provide its path here.</small>
# @markdown <br><small>If you're using Google Colab, this is the relative path from your Drive root (e.g. *tts/model_training_checkpoint_stage_2.zip*)</small>
# @markdown <br><small>If you're using External Files, use a full URL (e.g. *https://www.example.com/model_training_checkpoint_stage_2.zip*)</small>
model_2nd_stage_backup_zip_file_path = "tts/model_backup_stage_2.zip" # @param {type:"string"}

# @markdown ***Phonemization language:***
# @markdown <br><small>This language will be used to convert your text into phonemes.</small>
# @markdown <br><small>This value must be one of the languages compatible with the phonemizer: https://pypi.org/project/phonemizer/</small>
# @markdown <br><small>For example, if you're using the default espeak phonemization: https://github.com/espeak-ng/espeak-ng/blob/master/docs/languages.md</small>
phonemization_language = 'en-us' # @param {type:"string"}

# @markdown ***Training VS validation data percentage:***
# @markdown <br><small>This value says how much % of the total data will be used for training.</small>
# @markdown <br><small>Default value: 90 (i.e. 90% training data, 10% validation data)</small>
training_data_percentage = 90 # @param {type:"integer"}

# @markdown ***Labels text replacements:***
# @markdown <br><small>In case some words in your labels.txt file are pronounced differently than they appear, set your replacement rules here.</small>
# @markdown <br><small>This text is in a JSON form and needs to be updated in the same form in order for this cell to keep working.</small>
# @markdown <br><small>The example default are some replacements from a fantasy book, so you can see the syntax of the JSON value.</small>
text_replacements = { "Bannick":"ban-nick", "Banshee": "ban-shee", "Cait Sidhe": "kay-th shee", "Coblynau": "cob-lee-now", "Daoine": "doon-ya", "Djinn": "jin", "Glastig": "glass-tig", "Gwragen": "guh-war-a-gen", "Kelpie": "kel-pee", "Kitsune": "kit-soon", "Lamia": "lay-me-a", "Manticore": "man-tee-core", "Nixie": "nix-ee", "Peri": "pear-ee", "Piskie": "piss-key", "Pixie": "pix-ee", "Puca": "puh-ca", "Roane": "ro-an", "Selkie": "sell-key", "Sidhe": "shee", "Silene": "sigh-lean", "The Luidaeg": "the lou-sha-k", "Tuatha": "tootha", "de Dannan": "day danan", "Tybalt": "tiebolt", "Tylwyth": "till-with", "Teg": "teeg", "Undine": "un-deen", "Will o' Wisp": "will-oh wisp", "de Dannan": "day danan" } # @param {type:"raw"}


##################
# Execution Code #
##################
if data_source == "Google Drive (Google Colab only)":
  from google.colab import drive
  drive.mount('/gdrive')

# 2. Required programs and libraries

This section will install everything you'll need later.

In [None]:
# install system packages (Debian, Ubuntu)
!sudo apt update && sudo apt -y install curl ffmpeg sox libtool musl-dev gcc-multilib g++-multilib espeak espeak-ng sox zip unzip

# install Python dependencies
!pip install pyloudnorm soundfile nltk phonemizer wget

# 3. Labels and WAV Files Preparation

In [None]:
if wav_files_path == "":
  raise Exception("The wav_files_path setting is empty. Please set this option in section 1, then run the cell in section 1 and re-run this cell again.")

# copy WAV files from Google Drive, if we are using it
if data_source == "Google Drive (Google Colab only)":
  !cp /gdrive/MyDrive/$wav_files_path ./wavs.zip
else:
# download WAV files from a predefined location
  import wget
  wget.download( wav_files_path, 'wavs.zip' )

# extract WAV files from the ZIP file into a local folder
!rm -rf wavs
!mkdir wavs
!unzip wavs.zip -d wavs
!rm -f wavs.zip

# prepare WAV files from original ones to a safely-renamed ones
# and phonemize the labels.txt file, so it can be used with the StyleTTS2 system
import os
import nltk
import phonemizer
import pandas as pd
from nltk.tokenize import word_tokenize
from shutil import copyfile, rmtree
from pathlib import Path

if wav_files_prepared_for_training == False:
  nltk.download('punkt')
  global_phonemizer = phonemizer.backend.EspeakBackend( language=phonemization_language, preserve_punctuation=True, with_stress=True )

def text_to_phonemes(text):
  text = text.strip()
  #print("Text before phonemization: ", text)
  ps = global_phonemizer.phonemize([text])
  #print("Text after phonemization: ", ps)
  ps = word_tokenize(ps[0])
  ps = ' '.join(ps)
  #print("Final text after tokenization: ", ps)
  return ps

wavs_counter = 1
wavs_export_path = './wavs'
wavs_processed_path = './wavs_processed'
wavs_final_path = './wavs_final'
wavs_filtered_path = './wavs_filtered'
input_filename = './wavs/labels.txt'
output_filename_clean = './wavs_final/metadata.csv'
output_filename_phonetical = './wavs_final/metadata_phonetical.csv'

if wav_files_prepared_for_training == False:
  # change Windows CR+LF line endings to Linux LF line endings
  WINDOWS_LINE_ENDING = b'\r\n'
  UNIX_LINE_ENDING = b'\n'

  with open( input_filename, 'rb' ) as open_file:
    content = open_file.read()

  # Windows ➡ Unix
  content = content.replace( WINDOWS_LINE_ENDING, UNIX_LINE_ENDING )

  # Unix ➡ Windows
  # content = content.replace( UNIX_LINE_ENDING, WINDOWS_LINE_ENDING )

  with open( input_filename, 'wb') as open_file:
      open_file.write(content)


# delete and recreate wavs_final_path when rerunning
if wav_files_prepared_for_training == False:
  if os.path.exists(wavs_final_path):
    rmtree(wavs_final_path)
  os.mkdir(wavs_final_path)

  # delete and recreate wavs_processed_path when rerunning
  if os.path.exists(wavs_processed_path):
    rmtree(wavs_processed_path)
  os.mkdir(wavs_processed_path)

  fp_clean = open(output_filename_clean, 'w')
  fp_phonetical = open(output_filename_phonetical, 'w')

  df = pd.read_csv(input_filename, sep='\t', encoding='utf-8', usecols=[0, 1, 2], names=['Start', 'End', 'Description'], quoting=3)

  for sentence in df['Description'].to_list():
    if not pd.isnull(sentence):
      # replace special words
      for original_text, replacement_text in text_replacements.items():
        sentence = sentence.replace( original_text, replacement_text )

      # adjust file name
      real_counter = str( wavs_counter )
      if wavs_counter < 10:
        real_counter = '0000' + real_counter
      elif wavs_counter < 100:
        real_counter = '000' + real_counter
      elif wavs_counter < 1000:
        real_counter = '00' + real_counter
      elif wavs_counter < 10000:
        real_counter = '0' + real_counter

      wav_path_orig = wavs_prefix + real_counter + ".wav"
      wav_path = wav_path_orig.replace(" ", "_").replace("M", "m")
      wav_filename = wav_path.replace(".wav", "")
      copyfile(wavs_export_path + '/' + wav_path_orig, wavs_processed_path + '/' + wav_path)
      print(wavs_prefix + real_counter + '.wav (' + ( wavs_processed_path + '/' + wav_path ) + '): ' + sentence)

      # write the sentence into the original LJSpeech metadata form file
      fp_clean.write(f"{wav_filename}|{sentence}|{sentence}\n")

      # transliterate using espeak's phonemizer to conform with StyleTTS2's expected input
      phonemized = text_to_phonemes( sentence )
      fp_phonetical.write(f"{wav_filename}.wav|{phonemized}|0\n")
      wavs_counter = wavs_counter + 1
  fp_clean.close()
  fp_phonetical.close()
else:
  # we've uploaded prepared WAV files, just rename the folder
  !mv ./wavs ./wavs_final


# process WAV files, trimming leading and ending silences,
# converting their sample rate to 24k and changing them to mono
if wav_files_prepared_for_training == False:
  paths = Path( wavs_processed_path ).glob("**/*.wav")
  processing_index = 0
  paths_length = len( os.listdir( wavs_processed_path ) )

  temp1 = wavs_processed_path + "/temp1.wav"
  temp2 = wavs_processed_path + "/temp2.wav"
  temp3 = wavs_processed_path + "/temp3.wav"

  for filepath in paths:
    file = str(filepath)
    if file.endswith(".wav"):
      processing_index = processing_index + 1;
      print( str( processing_index ) + " / " + str( paths_length ) + ": processing " + file + ", output into " + file.replace( wavs_processed_path.replace( "./", "" ), wavs_final_path ) )
      os.popen("ffmpeg -i " + file + " -af silenceremove=1:0:" + db_trim_value +" " + temp1).read()
      os.popen("ffmpeg -i " + temp1 + " -af areverse " + temp2).read()
      #os.popen("ffmpeg -i " + temp2 + " -af silenceremove=1:0:-96dB " + temp3).read()
      os.popen("ffmpeg -i " + temp2 + " -af silenceremove=1:0:" + db_trim_value + " " + temp3).read()
      os.popen("ffmpeg -y -i " + temp3 + " -map_metadata -1 -af areverse -ar 24000 -ac 1 " + file.replace( wavs_processed_path.replace( "./", "" ), wavs_final_path ) ).read()
      os.remove(temp1)
      os.remove(temp2)
      os.remove(temp3)


  # randomize metadata file, so the phrases from labels.txt are fed to this TTS system in random sequence
  !shuf ./wavs_final/metadata_phonetical.csv > ./wavs_final/shuffled.csv


  # prepare a script that would split the resulting phonetical transcription
  # into training and validation file by a ratio of 90:10 by default (10% validation data, 90% training data)
  # (this can be changed by changing the training_data_percentage variable on top of this Notebook)
  with open('splitter.sh', 'w') as file:
    file.write('''# Specify the names for the training and validation files
training_file="training.csv"
validation_file="validation.csv"

# Calculate the line count for the split
total_lines=$(wc -l ./wavs_final/shuffled.csv | cut -d" " -f1)
training_lines=$((total_lines * {training_data_percentage} / 100))

# Use the split command with the specified names for output files
cd ./wavs_final && split -l $training_lines shuffled.csv "$training_file"_

# Rename the generated files to the desired names
mv ./"$training_file"_aa ./"$training_file"
mv ./"$training_file"_ab ./"$validation_file"
rm -f ./shuffled.csv'''.format(training_data_percentage = training_data_percentage));

  !sudo chmod +x ./splitter.sh
  !./splitter.sh


  # cleanup temporary folders and move WAV files into a TTS structure
  !rm -rf ./wavs
  !rm -rf ./wavs_processed
  !rm -rf ./wavs_tmp
  !mkdir ./wavs_final/wavs
  !mv ./wavs_final/*.wav ./wavs_final/wavs

# 4. WAV Files Validation (optional)

In this section, you can validate the quality of your WAV files and see their durations. You can optionally choose to create a subset of all WAV files that conform to your minimum and maximum audio length preference (see the last step of this section).

In [None]:
# download and compile the Waveform Amplitude Distribution Analysis tool (WADA)
# used to analyze the Signal-to-Noise Ratio Estimation in your WAV files
# ... nothing here is configurable, just run this cell and check the output if you like
!wget http://www.cs.cmu.edu/~robust/archive/algorithms/WADA_SNR_IS_2008/WadaSNR.tar.gz
!mkdir WADA_SNR_IS_2008/ && tar -xvf WadaSNR.tar.gz -C WADA_SNR_IS_2008
!cd ./WADA_SNR_IS_2008/Build/ && rm -rf *.o && make clean && make

import sys
import glob
import subprocess
import tempfile
import IPython
import soundfile as sf
import numpy as np
from tqdm import tqdm
from multiprocessing import Pool
from matplotlib import pylab as plt
%matplotlib inline

# Set the meta parameters
DATA_PATH = wavs_final_path + "/wavs"
NUM_PROC = 1
CURRENT_PATH = "./"

# define the SNR compute function
def compute_file_snr(file_path):
    """ Convert given file to required format with FFMPEG and process with WADA."""
    _, sr = sf.read(file_path)
    new_file = file_path.replace(".wav", "_tmp.wav")
    print(new_file)
    if sr != 16000:
      command = f'ffmpeg -i "{file_path}" -ac 1 -acodec pcm_s16le -y -ar 16000 "{new_file}"'
    else:
      command = f'cp "{file_path}" "{new_file}"'
    os.system(command)
    output = !./WADA_SNR_IS_2008/Exe/WADASNR -i {new_file} -t ./WADA_SNR_IS_2008/Exe/Alpha0.400000.txt -ifmt mswav
    snr = float(output[-2].split()[-2])
    os.system(f'rm "{new_file}"')
    return snr, file_path

# define SNR results output function
def output_snr_with_audio(idx):
    file_idx = file_idxs[idx]
    file_name = file_names[file_idx]
    wav, sr = sf.read(file_name)
    # multi channel to single channel
    if len(wav.shape) == 2:
        wav = wav[:, 0]
    print(f" > {file_name} - snr:{snrs[file_idx]}")
    IPython.display.display(IPython.display.Audio(wav, rate=sr))

# run WADA script
!./WADA_SNR_IS_2008/Exe/WADASNR -i ./WADA_SNR_IS_2008/SampleCorrupt/sb01_00dB_White.sph -t ./WADA_SNR_IS_2008/Exe/Alpha0.400000.txt -ifmt nist

# validate number of WAV files
wav_files = glob.glob(f"{DATA_PATH}/*.wav", recursive=True)
print(f"Number of wav files found {len(wav_files)}")

In [None]:
# nothing to configure here either - just run and check the output if you want to
if NUM_PROC == 1:
    file_snrs = [None] * len(wav_files)
    for idx, wav_file in tqdm(enumerate(wav_files)):
        tup = compute_file_snr(wav_file)
        file_snrs[idx] = tup
else:
    with Pool(NUM_PROC) as pool:
        file_snrs = list(tqdm(pool.imap(compute_file_snr, wav_files), total=len(wav_files)))

snrs = [tup[0] for tup in file_snrs]

error_idxs = np.where(np.isnan(snrs) == True)[0]
error_files = [wav_files[idx] for idx in error_idxs]

file_snrs = [i for j, i in enumerate(file_snrs) if j not in error_idxs]
file_names = [tup[1] for tup in file_snrs]
snrs = [tup[0] for tup in file_snrs]
file_idxs = np.argsort(snrs)

print("")
print("")
print(f"Average SNR of the dataset:{np.mean(snrs)}")

In [None]:
# find worse SNR files
N = 3  # number of files to fetch
for i in range(N):
  #file_idx = file_idxs[i]
  #print(file_names[file_idx] + '|' + str(snrs[file_idx]))
  output_snr_with_audio(i)

In [None]:
# find best recordings
N = 3  # number of files to fetch
for i in range(N):
    output_snr_with_audio(-i-1)

In [None]:
# visualize SNRS
plt.hist(snrs, bins=100)

In [None]:
# check durations of your WAV files to see if you can spot some that are too short
# or too long ( i.e. < 1s or > 30s )
#!echo "filename|duration" > wavdurations.csv;
!echo "" > wavdurations.csv;

!for file in ./wavs_final/wavs/*.wav; do duration=$(eval soxi -D "$file"); echo "${file}|$duration" >> wavdurations.csv; done

!cat wavdurations.csv

# @markdown ***WAV files duration check and filtering***
# @markdown <br><small>WAV files should generally be longer than 1 second and shorter than about 10 - 12 seconds.</small>
# @markdown <br><small>You can set your own limits and run this cell to see whether any of your WAV files are either too short or too long.</small>
min_wav_duration_seconds = 1 # @param {type:"number"}
max_wav_duration_seconds = 11 # @param {type:"number"}

# @markdown <br><small>If you'd like to save only the files that conform to the min-max interval set above (along with updated labels.txt file) into a separate folder, tick this checkbox.</small>
save_valid_files_separately = True # @param {type:"boolean"}

files_too_short = {}
files_too_long = {}
min_seconds = 0
max_seconds = 0
some_files_filtered_out = False

wavs_filtered_path = './wavs_filtered'

if save_valid_files_separately == True:
  !rm -rf $wavs_filtered_path
  !mkdir $wavs_filtered_path && cd $wavs_filtered_path && mkdir wavs
  !cp $wavs_final_path/training.csv $wavs_filtered_path
  !cp $wavs_final_path/validation.csv $wavs_filtered_path

  # load all lines of training and validation files,
  # so we can remove the ones we don't want to see in them
  with open( wavs_filtered_path + '/training.csv', 'r', newline = '', encoding='UTF8') as f:
    lines_training = f.readlines()

  with open( wavs_filtered_path + '/validation.csv', 'r', newline = '', encoding='UTF8') as f:
    lines_validation = f.readlines()

df = pd.read_csv('wavdurations.csv', sep='|', encoding='utf-8', usecols=[0, 1], names=['Filename', 'Duration'])
for row in zip(df['Filename'].to_list(), df['Duration'].to_list()):
  should_copy_this_file = True

  if ( float( row[ 1 ] ) < min_wav_duration_seconds ):
    files_too_short[ row[ 0 ] ] = row[ 1 ]
    should_copy_this_file = False

  if ( float( row[ 1 ] ) > max_wav_duration_seconds ):
    files_too_long[ row[ 0 ] ] = row[ 1 ]
    should_copy_this_file = False

  if ( float( row[ 1 ] ) < min_seconds or min_seconds == 0 ):
    min_seconds = float( row[ 1 ] )

  if ( float( row[ 1 ] ) > max_seconds ):
    max_seconds = float( row[ 1 ] )

  filename_origin = row[ 0 ]

  if save_valid_files_separately == True:
    if should_copy_this_file:
      # copy this file into the filtered files folder
      filename_target = filename_origin.replace( wavs_final_path, wavs_filtered_path )
      !cp $filename_origin $filename_target
    else:
      some_files_filtered_out = True
      # remove this file from training and validation arrays
      tmp_training = [];
      tmp_validation = []
      filename_clear = filename_origin.replace( wavs_final_path + '/wavs/', '' )
      print( "filtering out " + filename_clear + " (" + str( row[ 1 ] ) + "s)" )

      for line in lines_training:
        if line.split('|')[0] != filename_clear:
          tmp_training.append( line )

      lines_training = tmp_training

      for line in lines_validation:
        if line.split('|')[0] != filename_clear:
          tmp_validation.append( line )

      lines_validation = tmp_validation

# rewrite training and validation files, if we removed files from them
if some_files_filtered_out == True:
  with open( wavs_filtered_path + '/training.csv', 'w', newline = '', encoding='UTF8') as f:
    for line in lines_training:
      f.write( line )

  with open( wavs_filtered_path + '/validation.csv', 'w', newline = '', encoding='UTF8') as f:
    for line in lines_validation:
      f.write( line )

  print( "WAVs length filtering successful, relevant files stored under " + wavs_filtered_path )

print( "\n" )
print("Minimum seconds in a WAV file found: " + str( min_seconds ) )
print("Maximum seconds in a WAV file found: " + str( max_seconds ) )
print( "\n" )

if ( len( files_too_short ) ):
  print( "WAV files too short:" )
  for file_path, file_duration in files_too_short.items():
    print( " - " + file_path + " = " + str( file_duration ) + "s" )

print( "\n" )

if ( len( files_too_long ) ):
  print( "WAV files too long:" )
  for file_path, file_duration in files_too_long.items():
    print( " - " + file_path + " = " + str( file_duration ) + "s" )

# 5. WAV Files Backup (optional)

Here you can backup your processed WAV files and copy them onto Google Drive (if connected) or download a backup copy of the ZIP file.

In [None]:
# @markdown ***Path for the WAVS backup ZIP file:***
# @markdown <br><small>This is the relative path from your Drive root (e.g. *tts/wavs_processed.zip*)</small>
# @markdown <br><small>Leave empty if you only want to zip-up your WAV files and download them manually.</small>
wav_files_backup_path = "tts/wavs_backup.zip" # @param {type:"string"}
make_backup_from = "Filtered WAV files (result of running length check in Step 4)" # @param ["Unfiltered processed WAV files (Step 4 length check not ran)", "Filtered WAV files (result of running length check in Step 4)"]

# local WAV files backup file name
wavs_backup_local_file_name = "wavs_backup.zip"

# remove old file if we're backing up again
!rm -f ./$wavs_backup_local_file_name

# make backup of the resulting WAV files
if make_backup_from == "Unfiltered processed WAV files (Step 4 length check not ran)":
  !cd $wavs_final_path && zip -r ../$wavs_backup_local_file_name *
else:
  !cd $wavs_filtered_path && zip -r ../$wavs_backup_local_file_name *

# copy backup onto Google Drive or print backup info
if data_source == "Google Drive (Google Colab only)":
  if wav_files_backup_path != "":
    !rm -f /gdrive/MyDrive/$wav_files_backup_path
    !cp ./$wavs_backup_local_file_name /gdrive/MyDrive/$wav_files_backup_path
    print( "\n\nThe backup file was successfully copied over to your Drive.\nYou can also download the file " + wavs_backup_local_file_name + " directly from the root folder of this Notebook." )
  else:
    print( "\n\nThe wav_files_backup_path setting was empty - the ZIP file was NOT copied over to your Drive.\nYou can download the ZIP file (" + wavs_backup_local_file_name + ") directly from the root folder of this Notebook." )
else:
  print( "\n\nYour WAV files were backed up.\nYou can download the ZIP file (" + wavs_backup_local_file_name +") directly from the root folder of this Notebook." )