Dentro de vs_release_16k.zip descomprimimos dos carpetas:  
- audio_16k  
- meta  

Nos servimos del módulo zipfile para descomprimir y adicionalmente del módulo concurrent para utilizar threads agilizando la descompresión.  
https://superfastpython.com/multithreaded-unzip-files/#Unzip_Files_Concurrently_with_Processes

In [1]:
import os
from zipfile import ZipFile
from concurrent.futures import ThreadPoolExecutor

def unzip_file(handle, filename, path):
    handle.extract(path=path, member=filename)

output_dir = "data"
zip_filename = "vs_release_16k.zip"
dirs_to_extract = ["audio_16k", "meta"]

if not os.path.exists(output_dir):
    try:
        with ZipFile(zip_filename, 'r') as zf:
            filenames = zf.namelist()

            files_to_extract = [
                file for file in filenames
                if any(file.startswith(dir) for dir in dirs_to_extract)
            ]

            with ThreadPoolExecutor() as exe:
                for file in files_to_extract:
                    exe.submit(unzip_file, zf, file, output_dir)
    except:
        raise FileNotFoundError

print(os.listdir(output_dir))

['audio_16k', 'meta']


Ahora pasamos los datos a numpy arrays para poder trabajar con ellos:

In [7]:
import os
import wave
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from concurrent.futures import ProcessPoolExecutor

def read_wav_file(wav_file):
    with wave.open(wav_file, 'rb') as wf:
        num_frames = wf.getnframes()
        frames = wf.readframes(num_frames)
        wave_array = np.frombuffer(frames, dtype=np.int16)
        wave_name = os.path.basename(wav_file)
        wave_name = os.path.splitext(wave_name)[0]
    return wave_name, wave_array

base_dir = "data"
dir = "audio_16k"

audio_dir = os.path.join(base_dir, dir)
audios = os.listdir(audio_dir)
audio_paths = [os.path.join(audio_dir, file) for file in audios]

with ThreadPoolExecutor() as exe:
    results = list(exe.map(read_wav_file, audio_paths))

results[0:5]

FileNotFoundError: [WinError 3] El sistema no puede encontrar la ruta especificada: 'data\\audio_16k'

Observación: Nuestros threadings acceden a elementos distintos por tanto no hay problemas de race condition.

f0003_0_cough.wav  
f -> female  
(o -> old? check transcripted)  
cough -> one of the 6 labels  
0 -> index of that cough (individuals may have more than 1 cough registered)

In [None]:
import matplotlib.pyplot as plt
name, arr = results[0]
plt.plot(arr)
plt.title(name)
plt.show()