An audio transcoding is the conversion of one file format to another. The most common audio coding formats are:

- MP3: high compression, signal loss
- WAV: larger file sizes
- AAC: proprietary format, better than MP3
- FLAC: small size, little loss, but must be converted before analysis
- OPUS: human voice range files, losses can be problematic

1 minute of .WAV file is 3.5MB. Once converted to FLAC, it only takes 1.5MB. This is really useful for storage.

Audio channels is the number of audio inputs and outputs of a recorded audio signal:
- mono: 1 channel, 1 microphone, 1 output on speakers
- stereo: 2 channels, 2 microphones (left and right), 2 speakers

# Basic audio processing in Python

## I. Read Files
- Pydub: Simple features to manipulate file

In [37]:
!pip install pyttsx3

Collecting pyttsx3
  Downloading pyttsx3-2.90-py3-none-any.whl (39 kB)
Collecting pyobjc>=2.4; platform_system == "Darwin"
  Downloading pyobjc-7.3-py3-none-any.whl (3.0 kB)
Collecting pyobjc-framework-AppleScriptKit==7.3
  Downloading pyobjc_framework_AppleScriptKit-7.3-py2.py3-none-any.whl (3.8 kB)
Collecting pyobjc-framework-LatentSemanticMapping==7.3
  Downloading pyobjc_framework_LatentSemanticMapping-7.3-py2.py3-none-any.whl (4.9 kB)
Collecting pyobjc-framework-MLCompute==7.3; platform_release >= "20.0"
  Downloading pyobjc_framework_MLCompute-7.3-py2.py3-none-any.whl (5.7 kB)
Collecting pyobjc-framework-MetalKit==7.3; platform_release >= "15.0"
  Downloading pyobjc_framework_MetalKit-7.3-cp36-abi3-macosx_10_9_x86_64.whl (6.6 kB)
Collecting pyobjc-framework-ModelIO==7.3; platform_release >= "15.0"
  Downloading pyobjc_framework_ModelIO-7.3-cp36-abi3-macosx_10_9_x86_64.whl (13 kB)
Collecting pyobjc-framework-MediaPlayer==7.3; platform_release >= "16.0"
  Downloading pyobjc_framewo

Collecting pyobjc-framework-ExecutionPolicy==7.3; platform_release >= "19.0"
  Downloading pyobjc_framework_ExecutionPolicy-7.3-py2.py3-none-any.whl (3.2 kB)
Collecting pyobjc-framework-CoreHaptics==7.3; platform_release >= "19.0"
  Downloading pyobjc_framework_CoreHaptics-7.3-py2.py3-none-any.whl (4.4 kB)
Collecting pyobjc-framework-Intents==7.3; platform_release >= "16.0"
  Downloading pyobjc_framework_Intents-7.3-cp36-abi3-macosx_10_9_x86_64.whl (18 kB)
Collecting pyobjc-framework-DVDPlayback==7.3
  Downloading pyobjc_framework_DVDPlayback-7.3-py2.py3-none-any.whl (7.6 kB)
Collecting pyobjc-framework-CoreAudioKit==7.3
  Downloading pyobjc_framework_CoreAudioKit-7.3-cp36-abi3-macosx_10_9_x86_64.whl (5.7 kB)
Collecting pyobjc-framework-GameCenter==7.3; platform_release >= "12.0"
  Downloading pyobjc_framework_GameCenter-7.3-cp36-abi3-macosx_10_9_x86_64.whl (12 kB)
Collecting pyobjc-framework-AppleScriptObjC==7.3; platform_release >= "10.0"
  Downloading pyobjc_framework_AppleScriptObj

  Downloading pyobjc_framework_UserNotifications-7.3-cp36-abi3-macosx_10_9_x86_64.whl (7.1 kB)
Collecting pyobjc-framework-FSEvents==7.3; platform_release >= "9.0"
  Downloading pyobjc_framework_FSEvents-7.3-cp36-abi3-macosx_10_9_x86_64.whl (9.1 kB)
Collecting pyobjc-framework-IMServicePlugIn==7.3; platform_release >= "11.0"
  Downloading pyobjc_framework_IMServicePlugIn-7.3-cp36-abi3-macosx_10_9_x86_64.whl (9.7 kB)
Collecting pyobjc-framework-DictionaryServices==7.3; platform_release >= "9.0"
  Downloading pyobjc_framework_DictionaryServices-7.3-py2.py3-none-any.whl (3.4 kB)
Collecting pyobjc-framework-Automator==7.3
  Downloading pyobjc_framework_Automator-7.3-py2.py3-none-any.whl (5.0 kB)
Collecting pyobjc-framework-FileProviderUI==7.3; platform_release >= "19.0"
  Downloading pyobjc_framework_FileProviderUI-7.3-py2.py3-none-any.whl (3.1 kB)
Collecting pyobjc-framework-WebKit==7.3
  Downloading pyobjc_framework_WebKit-7.3-cp36-abi3-macosx_10_9_x86_64.whl (28 kB)
Collecting pyobjc-fr

In [57]:
%config Completer.use_jedi = False

In [51]:
from pydub import AudioSegment
import sox
import librosa
import wave
from scipy.io import wavfile
import sounddevice as sd
import soundfile as sf
import time
import pyttsx3
import numpy as np
import matplotlib.pyplot as plt
from scipy.io.wavfile import read, write
from IPython.display import Audio
from numpy.fft import fft, ifft
%matplotlib inline
import scipy

In [9]:
data1 = AudioSegment.from_wav("test1.wav")

In [7]:
data2 = AudioSegment.from_wav("test2.wav")

In [8]:
data2

In [10]:
data3 = AudioSegment.from_wav("test3.wav")

In [11]:
data3

In [16]:
y, sr = librosa.load('test1.wav')

In [18]:
w_data1=wave.open('test1.wav', mode='rb')

In [22]:
fs, s_data_1 = wavfile.read('test1.wav')

In [23]:
s_data_1

array([ 8,  8,  8, ..., -8, -8, -8], dtype=int16)

## II. Manipulate files 

- Combine files: take in one.wav and two.wav to make three.wav

In [27]:
!sox test1.wav test2.wav test.wav

zsh:1: command not found: sox


In [32]:
mics=sd.query_devices() 
for i in range(len(mics)): 
	print(mics[i])

{'name': 'Ngoc’s AirPods', 'hostapi': 0, 'max_input_channels': 1, 'max_output_channels': 0, 'default_low_input_latency': 0.178, 'default_low_output_latency': 0.01, 'default_high_input_latency': 0.206, 'default_high_output_latency': 0.1, 'default_samplerate': 16000.0}
{'name': 'Ngoc’s AirPods', 'hostapi': 0, 'max_input_channels': 0, 'max_output_channels': 2, 'default_low_input_latency': 0.01, 'default_low_output_latency': 0.16733333333333333, 'default_high_input_latency': 0.1, 'default_high_output_latency': 0.17666666666666667, 'default_samplerate': 48000.0}
{'name': 'MacBook Pro Micrô', 'hostapi': 0, 'max_input_channels': 1, 'max_output_channels': 0, 'default_low_input_latency': 0.034520833333333334, 'default_low_output_latency': 0.01, 'default_high_input_latency': 0.043854166666666666, 'default_high_output_latency': 0.1, 'default_samplerate': 48000.0}
{'name': 'Loa MacBook Pro', 'hostapi': 0, 'max_input_channels': 0, 'max_output_channels': 2, 'default_low_input_latency': 0.01, 'defaul

In [36]:
duration = 10
fs = 10000
channels = 1
myrecording = sd.rec(int(duration * fs), samplerate=fs, channels=channels) 
sd.wait()
sf.write('sync_record.wav', myrecording, fs)

In [44]:
engine = pyttsx3.init() 
engine.say("I love you very much") 
engine.runAndWait()

In [45]:
x, sr = librosa.load('test1.wav')

In [49]:
freqs = np.fft.fftfreq(y.size)

def describe_freq(freqs):
    mean = np.mean(freqs)
    std = np.std(freqs) 
    maxv = np.amax(freqs) 
    minv = np.amin(freqs) 
    median = np.median(freqs)
    skew = scipy.stats.skew(freqs)
    kurt = scipy.stats.kurtosis(freqs)
    q1 = np.quantile(freqs, 0.25)
    q3 = np.quantile(freqs, 0.75)
    mode = scipy.stats.mode(freqs)[0][0]
    iqr = scipy.stats.iqr(freqs)
    
    return [mean, std, maxv, minv, median, skew, kurt, q1, q3, mode, iqr]

In [52]:
describe_freq(x)

[-6.767919e-05,
 0.06057845,
 0.8909219,
 -1.0230628,
 0.0005608357,
 -1.5327012538909912,
 17.186635211617666,
 -0.009401044808328152,
 0.016137802973389626,
 -0.00024414062,
 0.025538847781717777]

In [53]:
scipy.stats.describe(x)

DescribeResult(nobs=3344985, minmax=(-1.0230628, 0.8909219), mean=-6.767919e-05, variance=0.0036697497, skewness=-1.5327012538909912, kurtosis=17.186635211617666)

In [61]:
# The energy of a signal is the total magnitude of the signal, i.e. how loud the signal is
def energy(x):
    return np.sum(x**2)

In [62]:
def rmse(x):
    return np.sqrt(np.mean(x**2))

In [58]:
rmse = librosa.feature.rms(x)[0]

In [59]:
rmse

array([0.14294598, 0.14226553, 0.14721823, ..., 0.00154444, 0.00102161,
       0.00024408], dtype=float32)

In [63]:
rmse(x)

0.0605785