####This notebook covers some basics of working with audio files in Python:

* Loading and playing audio files
* Plotting audio in time and frequency domain
* Resamping audio (changing sampling rate)
* Audio steganography example

####Downloading example audio files & installing required libraries

In [None]:
!wget -nc http://cs.uef.fi/~vvestman/sounds/Im_Superman.wav
!wget -nc http://cs.uef.fi/~vvestman/sounds/Count_Of_Three-8khz.wav
  
# Backup links:
#!wget -nc https://vvestman.github.io/summerschool19/sounds/Im_Superman.wav  
#!wget -nc https://vvestman.github.io/summerschool19/sounds/Count_Of_Three-8khz.wav
  
!pip install pysoundfile
!pip install bitstring

#### Playing audio in notebooks

In [None]:
import IPython
IPython.display.Audio('Im_Superman.wav')


#### Loading audio using PySoundFile (http://pysoundfile.readthedocs.org/)

*   Plotting the audio signal



In [None]:
import soundfile
import matplotlib.pyplot as plt
audio_signal, sampling_rate = soundfile.read('Im_Superman.wav')
print('Sampling rate: {} samples/second'.format(sampling_rate))
print('Signal size: {} samples'.format(audio_signal.shape[0]))
print('Signal duration: {:.3f} seconds'.format(audio_signal.shape[0] / sampling_rate))
plt.plot(audio_signal)
plt.tight_layout()
plt.figure()
plt.plot(audio_signal[2000:2100], marker='x')
plt.title('Zoomed in view to samples 2000-2100')
plt.tight_layout()

#### Using short time Fourier transform to obtain magnitude spectrogram of speech



Short time Fourier transform (STFT) splits signal into small frames ($25$ms), so that consecutive frames are overlapping (below the overlap is $25-10=15$ms). Then, Fourier transform is applied to all frames individually.
Fourier transform gives complex valued outputs. In spectrogram representation of speech only the magnitudes of the complex values are used. Magnitudes can be obtained using NumPy's  (https://www.numpy.org/) ```abs()``` function.

We use Librosa's (https://librosa.github.io/librosa/index.html) STFT implementation.

In [None]:
import numpy as np
import librosa
from librosa.display import specshow

window_length = int(0.025 * sampling_rate)
hop_length = int(0.01 * sampling_rate)

spectrogram = np.abs(librosa.stft(audio_signal, hop_length=hop_length, win_length=window_length))

# Plotting the spectrogram:
specshow(librosa.amplitude_to_db(spectrogram, ref=np.max), sr=sampling_rate, hop_length=hop_length, y_axis='linear', x_axis='time')
plt.title('Spectrogram')
plt.colorbar(format='%+2.0f dB')
plt.tight_layout()

In the above code, ```spectrogram``` is a 2D numpy array. The size of the array is printed below:

In [None]:
print(spectrogram.shape)

By default, we use 2048 frequency bins in Fourier transform. Because of the symmetry properties of Fourier transform, only the first 1025 values of 2048 values are retained. (See also https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem)


The size of the second dimension is the number of frames.


#### Resampling audio
The loaded audio file is sampled at 44.1 kHz. Let's resample the audio to 8 kHz:

In [None]:
audio_signal = librosa.resample(audio_signal, sampling_rate, 8000)
sampling_rate = 8000

window_length = int(0.025 * sampling_rate)
hop_length = int(0.01 * sampling_rate)

spectrogram = np.abs(librosa.stft(audio_signal, hop_length=hop_length, win_length=window_length))
librosa.display.specshow(librosa.amplitude_to_db(spectrogram, ref=np.max), sr=sampling_rate, hop_length=hop_length, y_axis='linear', x_axis='time')
plt.title('Spectrogram')
plt.colorbar(format='%+2.0f dB')
plt.tight_layout()

#### Audio steganography with the least significant bit (LSB) coding
The idea is to embed hidden data (secret message) into a speech file (carrier). After embedding the secret message, the altered speech file should sound like the original carrier file.

Let ```"Im_Superman.wav"``` be the carrier and let ```"Count_Of_Three-8khz.wav"``` be the secret message (secret message could be some other kind of data as well, such as text).

In [None]:
carrier, carrier_sr = soundfile.read('Im_Superman.wav', dtype=np.int16)
message, message_sr = soundfile.read('Count_Of_Three-8khz.wav', dtype=np.int16)

message = np.hstack((message, message, message, message, message))

IPython.display.Audio('Count_Of_Three-8khz.wav')

A function that embeds data to the least significant bits of the carrier signal:

In [None]:
from bitstring import Bits

def lsb_embed(carrier, data, n_bits=1):
  # Assumes that both carrier and data have dtype of int16
    
  # Convert all integer values of secret message to binary strings:
  secret_bits = []
  for value in np.nditer(data):
    secret_bits.append(np.binary_repr(value, 16))
  
  # Join all binary strings together
  secret_bits = ''.join(secret_bits)
  
  # Ensure that the length of binary string is the same as the size of carrier
  secret_bits = secret_bits.ljust(carrier.size * n_bits, '0')[:carrier.size * n_bits]
    
  # Modify the least significant bits of carrier to contain hidden data
  audio_with_hidden_data = np.zeros(carrier.shape, dtype=carrier.dtype)
  for i in range(len(carrier)):
    # Convert ith value of carrier to binary string:
    binary_string = np.binary_repr(carrier[i], 16)
    # Set the last bit of the binary string to be a bit from the secret message:
    altered_binary = binary_string[:-n_bits] + secret_bits[i*n_bits:i*n_bits+n_bits]
    audio_with_hidden_data[i] = Bits(bin=altered_binary).int # Binary string to int
      
  return audio_with_hidden_data

Next, we hide a message using the above function; then save the stego audio (audio with a hidden signal) to file; and finally play the stego audio file:

In [None]:
audio_with_hidden_data = lsb_embed(carrier, message, 10)
soundfile.write('audio_with_hidden_message.wav', audio_with_hidden_data, carrier_sr)
IPython.display.Audio('audio_with_hidden_message.wav')

Does it sound different than the original file?

In [None]:
IPython.display.Audio('Im_Superman.wav') # Original wav file

A function that retrieves the embedded hidden data:

In [None]:
def lsb_retrieve(signal, n_bits=1):
  
  # Collect the least significant bits of the 'stego' signal
  secret_bits = []
  for value in np.nditer(signal):
    ls_bit = np.binary_repr(value, 16)[-n_bits:]
    secret_bits.append(ls_bit)
  
  # Join bits together to form a binary string
  secret_bits = ''.join(secret_bits)
  
  # Ensure that the length of binary string is divisable by 16
  secret_bits = secret_bits[:-(len(secret_bits) % 16)]
  
  # Convert chunks of 16 consecutive bits to 16 bit integers to retreive the secret data
  retrieved_audio = np.zeros(len(secret_bits) // 16, dtype=np.int16)
  for i in range(retrieved_audio.size):
    retrieved_audio[i] = Bits(bin=secret_bits[i*16:(i+1)*16]).int
    
  return retrieved_audio

In [None]:
retrieved_hidden_message = lsb_retrieve(audio_with_hidden_data, 10)
soundfile.write('retrieved_hidden_message.wav', retrieved_hidden_message, message_sr)
IPython.display.Audio('retrieved_hidden_message.wav')




---
--- 
#### Exercise: instead of using only the least significant bit to embed data, try using two, three, or more least significant bits. How many bits can you modify without being able to hear the difference?
---
---