## Premise: 

A babble tape is a digital file meant to be played in the background during
conversations. The file is complex. Forty voice tracks run simultaneously
(thirty-two in English, eight in other languages), and each track is compressed
in frequency and time to produce additional “voices” that fill the entire fre-
quency spectrum. There are also various non-human mechanical noises, and
a periodic supersonic burst (inaudible to adult listeners) engineered specifi-
cally to interfere with the automatic gain-control system of an eavesdropping
device configures itself to best pick up an audio signal. Most pertinent for
present purposes, the voices on a babble tape used by an attorney include
those of the client and the attorney themselves. The dense mélange of voices
increases the difficulty of discerning any single voice.

(Source: https://we.riseup.net/assets/355198/Obfuscation.pdf)


## Step 1: 

If you would like to make your own babble tape, please record a minute of you talking with no background noise. If you prefer to read a default text, you may read the phonetic pangram text below: 

A pangram is a sentence or phrase that contains all the letters of the alphabet, at least once. Now, phonetic pangrams are sentences that contain all forty sounds of English i.e. they use all the phonemes, or phones, of English (rather than alphabetic characters) (sourced from quora: https://www.quora.com/Is-there-a-text-that-covers-the-entire-English-phonetic-range).

Here is the text :

    "That quick beige fox jumped in the air over each thin dog. Look out, I shout, for he's foiled you again, creating chaos."

    "Are those shy Eurasian footwear, cowboy chaps, or jolly earthmoving headgear?"

    "The hungry purple dinosaur ate the kind, zingy fox, the jabbering crab, and the mad whale and started vending and quacking."

    "With tenure, Suzie’d have all the more leisure for yachting, but her publications are no good."

    "Shaw, those twelve beige hooks are joined if I patch a young, gooey mouth."

    "The beige hue on the waters of the loch impressed all, including the French queen, before she heard that symphony again, just as young Arthur wanted." 
    

## Step 2:

- Convert your audio file into a wav file
- Use dynamic range compression to fix very loud and very quiet parts of the audio recording (source: https://medium.com/@jud.dagnall/dynamic-range-compression-for-audio-with-ffmpeg-and-compand-621fe2b1a892)
- The ffmpeg volume mapping was a follows:

    - 80/-900: Remove the really quiet stuff.
    -45/-15: Make the quietest part of the audience questions pretty clear(a 3x increase). You will likely need to fiddle with this if there’s a lot of audience noise, chairs moving, etc… that you don’t want to hear.
    -27/-9: make the medium part of the questions easy to hear.
    -5/-5: Keep the normal to loud voice unchanged (for now)
    20/20: Just an extra anchor point to keep the loud stuff loud (for now)


In [None]:
!ffmpeg -i recording.m4a  -filter_complex "compand=attacks=0:points=-80/-900|-45/-15|-27/-9|-5/-5|20/20" recording.wav

## Step 3:

Slide up audio based on onset segements (using librosa's onset detection algorithm)

In [6]:
import pandas as pd
import numpy as np
import librosa
import random
from pydub import AudioSegment
from pysndfx import AudioEffectsChain


def onset_detection_timestamps(sound_file, sr=44100):
    """Takes the path to an audio file
    and returns the list of start and stop times for that audio file
    as a frame rate

    Args:
        fileName (string): The path to an audio file
        sr (int, optional): The sample rate of the audio file. Defaults to 44100.

    Returns:
        [(int, int)]: A list of start and stop times for each sound change
    """
    y, sr = librosa.load(sound_file, sr=sr)
    C = np.abs(librosa.cqt(y=y, sr=sr))
    o_env = librosa.onset.onset_strength(sr=sr, S=librosa.amplitude_to_db(C, ref=np.max))
    onset_time = librosa.onset.onset_detect(onset_envelope=o_env, sr=sr, units='time', backtrack=True)
    return np.round(onset_time, 2).tolist()

def split_audio_by_onset_timestamps(sound_file, times):
    start = 0
    audio_chunks = []
    count = 0
    audio = AudioSegment.from_wav(sound_file)
    
    for idx,t in enumerate(times):
        '''
        - break up audio into onset segments
        - if the audio segment is less than 0.3 seconds, then stitch it together with the next segment

        '''
        end = times[idx]*1000 #pydub works in millisec
        duration = (end - start)/1000

        if duration>0.3:
            audio_chunk=audio[start:end]
            audio_chunks.append(audio_chunk)
            start = end
            count += 1
        else:
            pass
        
    return audio_chunks


In [2]:
# define file names:
#input_audio_file = 'recording.wav'
#output_babble_audio_file = "babble_me.wav"

input_audio_file = 'Phonetic_pangrams_all.wav'
output_babble_audio_file = "babble_ahnjili.wav"

In [None]:
#get the onset detection starting time stamps
times = onset_detection_timestamps(input_audio_file, sr=44100)

#break up audio into onset segments
audio = AudioSegment.from_wav(input_audio_file)
audio_chunks = split_audio_by_onset_timestamps(times)

## Step 4

- Combine the segemented audio (in a random order) into a single audio file
- Create multiple versions of the randomized combined segemented audio
- Overlay the different versions and apply panning to the different layers

In [8]:
#now stitch together the segments in a random order and create x versions of the audio files so that we can eventually layer them together

def stitch_audio(audio_chunks, _layers = 7):
    if _layers == 0:
        
        #order_list represents the shuffle sequence of audio chunks
        order_list=[*range(0,len(audio_chunks))]
        temp = list(zip(audio_chunks, order_list))
        random.shuffle(temp)
        audio_chunks, order_list = zip(*temp)
        
        for idx,t in enumerate(audio_chunks):
            if idx == 0:
                combined_sounds = audio_chunks[idx].pan(random.uniform(-1, 1))
            else:
                #adding random gaps of silence between the audio segments. duration is in milliseconds
                combined_sounds += audio_chunks[idx].pan(random.uniform(-1, 1))
    
        return combined_sounds, order_list
    
    else: 
        combined_sounds_list = []
        combined_order_list = []
        
        for i in range(0, _layers):
            '''
            - Stitch together the segments in a random order
            - Create x versions of the audio files so that we can eventually layer them together
            - Each layer has a random level of panning

            '''
            #order_list represents the shuffle sequence of audio chunks
            order_list=[*range(0,len(audio_chunks))]
            temp = list(zip(audio_chunks, order_list))
            random.shuffle(temp)
            audio_chunks, order_list = zip(*temp)

            for idx,t in enumerate(audio_chunks):
                if idx == 0:
                    combined_sounds = audio_chunks[idx].pan(random.uniform(-1, 1))
                else:  
                    combined_sounds += audio_chunks[idx].pan(random.uniform(-1, 1))
            combined_sounds_list.append(combined_sounds)
            combined_order_list.append(order_list)

        return combined_sounds_list, combined_order_list


def overlay_audio(combined_sounds_list, _layers = 7):
    for i in range(0,_layers-1):
        '''
        Overlaying the different versions of the audio together
        '''
        if i == 0:
            babble = combined_sounds_list[i].overlay(combined_sounds_list[i+1])
        else:
            babble = babble.overlay(combined_sounds_list[i+1])
    
    return babble

In [None]:
combined_sounds_list, combined_order_list = stitch_audio(audio_chunks, _layers = 7)
babble = overlay_audio(combined_sounds_list, _layers = 7)

In [None]:
#That's it! Now writing the audio file

file_handle = babble.export(output_babble_audio_file, format="wav")
audio = AudioSegment.from_wav(output_babble_audio_file)
audio

## Step 5 (optional)

Add a bit of reverb to the recording

In [None]:
#Just adding a bit of reverb

infile = output_babble_audio_file
outfile = infile[:-4]+"_reverb.wav"

fx = (
    AudioEffectsChain()
    .reverb()
)

# Or, apply the effects directly to a ndarray.
# y, sr = librosa.load(infile, sr=44100)
# y = fx(y)

# Apply the effects and return the results as a ndarray.
y = fx(infile)

# Apply the effects to a ndarray but store the resulting audio to disk.
fx(y, outfile)

audio = AudioSegment.from_wav(outfile)
audio

# Encryption and Decryption

Potentially the babble tape can be used as an encryption key to hide audio. One could encrypt their 'true' audio by overlaying the babble tape audio and decrypt using phase inversion of the babble tape. 

## Encryption

In [4]:
#Load a secret audio file
myAudioFile = "Random_Secret.wav"
secret = AudioSegment.from_file(myAudioFile, format="wav")
secret

In [9]:
# #get the onset timestamps of the random_secret.wav
# secret_times = onset_detection_timestamps("Random_Secret.wav", sr=44100)

# #split the file based on the timestamps
# secret_audio_chunks = split_audio_by_onset_timestamps("Random_Secret.wav", secret_times)

# #get the randomized audio as well as the randomized order of the audiofile
# secret_combined_sounds, secret_order_list = stitch_audio(secret_audio_chunks, _layers = 0)

# #export the randomized audio file as a wav file
# file_handle = secret_combined_sounds.export("Random_Secret_Randomized.wav", format="wav")
# secret_random = AudioSegment.from_wav("Random_Secret_Randomized.wav")
# secret_random

In [11]:
#Load a babble audio file
myAudioFile = "babble_me_reverb.wav"
babble = AudioSegment.from_file(myAudioFile, format="wav")
babble

In [30]:
#Merge two audio files
encrypt = babble.overlay(secret)
encrypt

## Decryption

In [13]:
def detect_leading_silence(sound, silence_threshold=-50.0, chunk_size=10):
    '''
    sound is a pydub.AudioSegment
    silence_threshold in dB
    chunk_size in ms

    iterate over chunks until you find the first one with sound
    '''
    trim_ms = 0 # ms

    assert chunk_size > 0 # to avoid infinite loop
    while sound[trim_ms:trim_ms+chunk_size].dBFS < silence_threshold and trim_ms < len(sound):
        trim_ms += chunk_size

    return trim_ms

def remove_leading_silence(sound):
    start_trim = detect_leading_silence(sound)
    end_trim = detect_leading_silence(sound.reverse())
    duration = len(sound)    
    trimmed_sound = sound[start_trim:duration-end_trim]
    
    return trimmed_sound
    
#sound = AudioSegment.from_file("/path/to/file.wav", format="wav")

In [25]:
#Invert phase of audio file
babble_invert = babble.invert_phase()
babble_invert

#Merge two audio files
decrypt = encrypt.overlay(babble_invert)

#this returns the audio file
file_handle = decrypt.export("Random_Secret_Decrypt.wav", format="wav")
decrypt

In [26]:
#removing any potential leading silence
#decrypt = remove_leading_silence(decrypt)


In [27]:
# def sort_list(order_list, secret_audio_chunks):
#     zipped_lists = zip(order_list, secret_audio_chunks)
#     sorted_pairs = sorted(zipped_lists)
#     tuples = zip(*sorted_pairs)
#     sorted_order_list, sorted_secret_audio_chunks = [list(tuple) for tuple in  tuples]
#     for idx,t in enumerate(sorted_secret_audio_chunks):
#         if idx == 0:
#             combined_sounds = sorted_secret_audio_chunks[idx]
#         else:
#             combined_sounds += sorted_secret_audio_chunks[idx]
#     return sorted_order_list,combined_sounds

# #https://www.adamsmith.haus/python/answers/how-to-sort-two-lists-together-in-python

In [28]:
# #get the onset timestamps of the random_secret.wav
# decrypt_times = onset_detection_timestamps("Random_Secret_Decrypt.wav", sr=44100)

# #split the file based on the timestamps
# decrypt_audio_chunks = split_audio_by_onset_timestamps("Random_Secret_Decrypt.wav", decrypt_times)

# sorted_order_list,decrypted_secret_audio_chunks = sort_list(secret_order_list, decrypt_audio_chunks)
# decrypted_secret_audio_chunks