# Voice Standardize Examples

This Jupyter Notebook is part of a larger voice feature extraction toolkit developed to study brain ageing by means of voice features

Specifically, this noteook contains examples of how to use pydub to standardize digital voice audio files with varying metadata

## Installation instructions

For specific Installation and Ffmpeg instructions
[View README](README.md)


## Sample Input Files and Metadata

The files refered to in these examples are sample audio files with varying formats and metadata. Each audio file's metadata is captured in a JSON and CSV file.


### Files In Sample Audio



The following files are example files used to demonstrate the standardization process

In [6]:
flac_mono = '../sample_audio/flac/mono_first_ten_Sample_HV_Clip.flac'
flac_stereo = '../sample_audio/flac/first_ten_Sample_HV_Clip.flac'
mp3_mono = '../sample_audio/mp3/common_voice_en_21635524.mp3'
mp3_stereo = '../sample_audio/mp3/first_ten_Sample_HV_Clip.mp3'
m4a_mono = '../sample_audio/m4a/mono_zoom_audio.m4a'
m4a_stereo = '../sample_audio/m4a/sample_zoom_audio.m4a'
wav_mono = '../sample_audio/wav/mono_first_ten_Sample_HV_Clip.wav'
wav_stereo = '../sample_audio/wav/first_ten_Sample_HV_Clip.wav'

The following files were generated using the write_metadata function, which takes a file and arguments as input and writes this metadata to the files.

This outputs JSON and CSV files that capture the metadata for the file specified, appending the FFmpeg command used to create the file and the equivalent pydub command to the end of the JSON.

In [7]:
from metadata import write_metadata

kwargs = {'append_json_dict':{'ffmpeg_command': f"ffmpeg -i '{wav_mono}' -compression_level 5 -af aformat=s16:44100 '{flac_mono}'"}}
write_metadata(flac_mono, **kwargs)
kwargs = {'append_json_dict': {'ffmpeg_command': f"ffmpeg -i '{wav_stereo}' -compression_level 5 -af aformat=s16:44100 '{flac_stereo}'"}}
write_metadata(flac_stereo, **kwargs)
kwargs = {'append_json_dict':{'ffmpeg_command': f"ffmpeg -i '{wav_stereo}' -ar 44100 -ac 2 '{mp3_stereo}'"}}
write_metadata(mp3_stereo, **kwargs)
kwargs = {'append_json_dict':{'ffmpeg_command': f"ffmpeg -i '{m4a_stereo}' -ac 1 '{m4a_mono}'"}}
write_metadata(m4a_mono, **kwargs)
kwargs = {'append_json_dict':{'ffmpeg_command': f"ffmpeg -i '{wav_stereo}' -ar 44100 -ac 1 '{wav_mono}'"}}
write_metadata(wav_mono, **kwargs)


../sample_audio/flac/mono_first_ten_Sample_HV_Clip.json
../sample_audio/flac/mono_first_ten_Sample_HV_Clip.csv
../sample_audio/flac/first_ten_Sample_HV_Clip.json
../sample_audio/flac/first_ten_Sample_HV_Clip.csv
../sample_audio/mp3/first_ten_Sample_HV_Clip.json
../sample_audio/mp3/first_ten_Sample_HV_Clip.csv
../sample_audio/m4a/mono_zoom_audio.json
../sample_audio/m4a/mono_zoom_audio.csv
../sample_audio/wav/mono_first_ten_Sample_HV_Clip.json
../sample_audio/wav/mono_first_ten_Sample_HV_Clip.csv


These are the original audio files:

In [8]:
write_metadata(wav_stereo)
write_metadata(m4a_stereo)
write_metadata(mp3_mono)

../sample_audio/wav/first_ten_Sample_HV_Clip.json
../sample_audio/wav/first_ten_Sample_HV_Clip.csv
../sample_audio/m4a/sample_zoom_audio.json
../sample_audio/m4a/sample_zoom_audio.csv
../sample_audio/mp3/common_voice_en_21635524.json
../sample_audio/mp3/common_voice_en_21635524.csv


## Example: Standardize Audio

In [9]:
from pydub_standardize import standardize

### Convert to 16KHZ wav pcm_s16le

Utilizing the standardize function, which standardizes audio_fp with pydub using the given keyword arguments.

standardize() takes an audio file and various arguements as parameters, and outputs the standardized file

The following code block standardizes the example audio files to a sampling rate of 16,000 Hz, waveform audio file format, and 16-bit out-encoding

#### Example with wav_mono

In [10]:
#Original file
wav_mono = '../sample_audio/wav/mono_first_ten_Sample_HV_Clip.wav'

#Standardization arguments
kwargs = {'sampling_rate': 16000, 'out_fmt': 'wav',
              'out_encoding': 'pcm_s16le', 'run_validate': True,
              'raise_error': True, 'write_meta': True}

#function call
standardize(wav_mono, **kwargs)

../sample_audio/wav/pydub/ar_16000_c-a_pcm_s16le/mono_first_ten_Sample_HV_Clip.json
../sample_audio/wav/pydub/ar_16000_c-a_pcm_s16le/mono_first_ten_Sample_HV_Clip.csv
See ../sample_audio/wav/pydub/ar_16000_c-a_pcm_s16le/mono_first_ten_Sample_HV_Clip.wav for the standardized audio file.


The rest of the files are standardized in the same way:

#### flac_mono

In [11]:
flac_mono = '../sample_audio/flac/mono_first_ten_Sample_HV_Clip.flac'

kwargs = {'sampling_rate': 16000, 'out_fmt': 'wav',
              'out_encoding': 'pcm_s16le', 'run_validate': True,
              'raise_error': True, 'write_meta': True}

standardize(flac_mono, **kwargs)

../sample_audio/flac/pydub/ar_16000_c-a_pcm_s16le/mono_first_ten_Sample_HV_Clip.json
../sample_audio/flac/pydub/ar_16000_c-a_pcm_s16le/mono_first_ten_Sample_HV_Clip.csv
See ../sample_audio/flac/pydub/ar_16000_c-a_pcm_s16le/mono_first_ten_Sample_HV_Clip.wav for the standardized audio file.


#### wav_stereo

In [12]:
wav_stereo = '../sample_audio/wav/first_ten_Sample_HV_Clip.wav'

kwargs = {'sampling_rate': 16000, 'out_fmt': 'wav',
              'out_encoding': 'pcm_s16le', 'run_validate': True,
              'raise_error': True, 'write_meta': True}

standardize(wav_stereo, **kwargs)

../sample_audio/wav/pydub/ar_16000_c-a_pcm_s16le/first_ten_Sample_HV_Clip.json
../sample_audio/wav/pydub/ar_16000_c-a_pcm_s16le/first_ten_Sample_HV_Clip.csv
See ../sample_audio/wav/pydub/ar_16000_c-a_pcm_s16le/first_ten_Sample_HV_Clip.wav for the standardized audio file.


#### m4a_mono

In [13]:
m4a_mono = '../sample_audio/m4a/mono_zoom_audio.m4a'


kwargs = {'sampling_rate': 16000, 'out_fmt': 'wav',
              'out_encoding': 'pcm_s16le', 'run_validate': True,
              'raise_error': True, 'write_meta': True}

standardize(m4a_mono, **kwargs)

../sample_audio/m4a/pydub/ar_16000_c-a_pcm_s16le/mono_zoom_audio.json
../sample_audio/m4a/pydub/ar_16000_c-a_pcm_s16le/mono_zoom_audio.csv
See ../sample_audio/m4a/pydub/ar_16000_c-a_pcm_s16le/mono_zoom_audio.wav for the standardized audio file.


#### m4a_stereo

In [14]:
m4a_stereo = '../sample_audio/m4a/sample_zoom_audio.m4a'


kwargs = {'sampling_rate': 16000, 'out_fmt': 'wav',
              'out_encoding': 'pcm_s16le', 'run_validate': True,
              'raise_error': True, 'write_meta': True}

standardize(m4a_stereo, **kwargs)

../sample_audio/m4a/pydub/ar_16000_c-a_pcm_s16le/sample_zoom_audio.json
../sample_audio/m4a/pydub/ar_16000_c-a_pcm_s16le/sample_zoom_audio.csv
See ../sample_audio/m4a/pydub/ar_16000_c-a_pcm_s16le/sample_zoom_audio.wav for the standardized audio file.


#### mp3_mono

In [15]:
mp3_mono = '../sample_audio/mp3/common_voice_en_21635524.mp3'


kwargs = {'sampling_rate': 16000, 'out_fmt': 'wav',
              'out_encoding': 'pcm_s16le', 'run_validate': True,
              'raise_error': True, 'write_meta': True}

standardize(mp3_mono, **kwargs)

../sample_audio/mp3/pydub/ar_16000_c-a_pcm_s16le/common_voice_en_21635524.json
../sample_audio/mp3/pydub/ar_16000_c-a_pcm_s16le/common_voice_en_21635524.csv
See ../sample_audio/mp3/pydub/ar_16000_c-a_pcm_s16le/common_voice_en_21635524.wav for the standardized audio file.


#### mp3_stereo

In [16]:
mp3_stereo = '../sample_audio/mp3/first_ten_Sample_HV_Clip.mp3'


kwargs = {'sampling_rate': 16000, 'out_fmt': 'wav',
              'out_encoding': 'pcm_s16le', 'run_validate': True,
              'raise_error': True, 'write_meta': True}

standardize(mp3_stereo, **kwargs)

../sample_audio/mp3/pydub/ar_16000_c-a_pcm_s16le/first_ten_Sample_HV_Clip.json
../sample_audio/mp3/pydub/ar_16000_c-a_pcm_s16le/first_ten_Sample_HV_Clip.csv
See ../sample_audio/mp3/pydub/ar_16000_c-a_pcm_s16le/first_ten_Sample_HV_Clip.wav for the standardized audio file.


### Convert to 16KHZ wav pcm_s16ble stereo to mono

Using the Standardize() function, the following code block standardizes the example audio files to a sampling rate of 16,000 Hz, waveform audio file format, 16-bit out-encoding, and a single channel.

#### Example With wav_stereo

In [17]:
#Original file
wav_stereo = '../sample_audio/wav/first_ten_Sample_HV_Clip.wav'

#Arguments
kwargs = {'sampling_rate': 16000, 'out_fmt': 'wav',
                 'out_encoding': 'pcm_s16le', 'to_mono': True,
              'run_validate': True, 'raise_error': True,
              'write_meta': True }

#Function call
standardize(wav_stereo, **kwargs)

../sample_audio/wav/pydub/ar_16000_c-a_pcm_s16le_ac_1/first_ten_Sample_HV_Clip.json
../sample_audio/wav/pydub/ar_16000_c-a_pcm_s16le_ac_1/first_ten_Sample_HV_Clip.csv
See ../sample_audio/wav/pydub/ar_16000_c-a_pcm_s16le_ac_1/first_ten_Sample_HV_Clip.wav for the standardized audio file.


#### Example With flac_stereo

In [18]:
flac_stereo = '../sample_audio/flac/first_ten_Sample_HV_Clip.flac'


kwargs = {'sampling_rate': 16000, 'out_fmt': 'wav',
                 'out_encoding': 'pcm_s16le', 'to_mono': True,
              'run_validate': True, 'raise_error': True,
              'write_meta': True }

standardize(flac_stereo, **kwargs)

../sample_audio/flac/pydub/ar_16000_c-a_pcm_s16le_ac_1/first_ten_Sample_HV_Clip.json
../sample_audio/flac/pydub/ar_16000_c-a_pcm_s16le_ac_1/first_ten_Sample_HV_Clip.csv
See ../sample_audio/flac/pydub/ar_16000_c-a_pcm_s16le_ac_1/first_ten_Sample_HV_Clip.wav for the standardized audio file.


### Convert to 16KHZ flac with compression_level=5

Using the Standardize() function, the following code block standardizes the example audio files to a sampling rate of 16,000 Hz, flac audio file format, flac out-encoding, and a compression level of 5.

#### Example with wav_stereo

In [19]:
#Original file
wav_stereo = '../sample_audio/wav/first_ten_Sample_HV_Clip.wav'

#Arguments
kwargs = {'sampling_rate': 16000, 'out_fmt': 'flac',
              'out_encoding': 'flac', 'compression_level': 5,
              'run_validate': True, 'raise_error': True,
              'write_meta': True }

#Standardized file
standardize(wav_stereo, **kwargs)

../sample_audio/wav/pydub/ar_16000_c-a_flac_compression_level_5/first_ten_Sample_HV_Clip.json
../sample_audio/wav/pydub/ar_16000_c-a_flac_compression_level_5/first_ten_Sample_HV_Clip.csv
See ../sample_audio/wav/pydub/ar_16000_c-a_flac_compression_level_5/first_ten_Sample_HV_Clip.flac for the standardized audio file.


#### Example with mp3_mono

In [20]:
mp3_mono = '../sample_audio/mp3/common_voice_en_21635524.mp3'


kwargs = {'sampling_rate': 16000, 'out_fmt': 'flac',
              'out_encoding': 'flac', 'compression_level': 5,
              'run_validate': True, 'raise_error': True,
              'write_meta': True }

standardize(mp3_mono, **kwargs)

../sample_audio/mp3/pydub/ar_16000_c-a_flac_compression_level_5/common_voice_en_21635524.json
../sample_audio/mp3/pydub/ar_16000_c-a_flac_compression_level_5/common_voice_en_21635524.csv
See ../sample_audio/mp3/pydub/ar_16000_c-a_flac_compression_level_5/common_voice_en_21635524.flac for the standardized audio file.


## File Validation

The validate_files() function can be used to compare original sample output files with generated files using sha256. It outputs a csv containing comparison details.

[View README](README.md) for more details on the contents of the output csv

In [21]:
from validate import generate_comparison_files, validate_files

In [22]:
generate_comparison_files()
validate_files()

../sample_audio/wav/test_output/ar_16000_c-a_pcm_s16le/mono_first_ten_Sample_HV_Clip.json
../sample_audio/wav/test_output/ar_16000_c-a_pcm_s16le/mono_first_ten_Sample_HV_Clip.csv
See ../sample_audio/wav/test_output/ar_16000_c-a_pcm_s16le/mono_first_ten_Sample_HV_Clip.wav for the standardized audio file.
../sample_audio/wav/test_output/ar_16000_c-a_pcm_s16le/first_ten_Sample_HV_Clip.json
../sample_audio/wav/test_output/ar_16000_c-a_pcm_s16le/first_ten_Sample_HV_Clip.csv
See ../sample_audio/wav/test_output/ar_16000_c-a_pcm_s16le/first_ten_Sample_HV_Clip.wav for the standardized audio file.
../sample_audio/flac/test_output/ar_16000_c-a_pcm_s16le/mono_first_ten_Sample_HV_Clip.json
../sample_audio/flac/test_output/ar_16000_c-a_pcm_s16le/mono_first_ten_Sample_HV_Clip.csv
See ../sample_audio/flac/test_output/ar_16000_c-a_pcm_s16le/mono_first_ten_Sample_HV_Clip.wav for the standardized audio file.
../sample_audio/m4a/test_output/ar_16000_c-a_pcm_s16le/mono_zoom_audio.json
../sample_audio/m4a/t