# Biocomputing with Brainoware
### Procedure 3 - Brainoware software framework
#### 1. Information encoding
**Author: Huiyu Chu**  
**Date: June 6, 2025**  
**Description**: This part describes how to convert audio clips (or already processed audio features, like cepstral coefficients) to binary spatiotemporal pulses that will be delivered to stimulation electrodes in the next step.  

1. (Optional) Convert the raw speech or audio dataset to matrices of cepstral coefficients. Python audio libraries like librosa and spafe provide a straightforward interface to achieve this. Take librosa for example, load each audio clip in the dataset and calculate the Mel-Frequency Cepstral Coefficients  (MFCC), and save the converted coefficient matrices for future use:

In [3]:
import librosa
import numpy as np
import os

# Assume one has a audio dataset in the "1_Speech_Commands_dataset" subfolder of current working folder which only contains all audio files. For display purposes, we use only two single-word speech files from Speech Commands dataset to demonstrate how raw audio files can be converted to a cepstral feature matrix. 
## Speech Commands Dataset credits: Warden, Pete. "Speech commands: A public dataset for single-word speech recognition." Dataset available from http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz (2017)

dataset_path = "./1_Speech_Commands_dataset/"
audio_files = [
    dataset_path + filename 
    for filename in os.listdir(dataset_path)
    ]

def compute_mfcc(audio_path, n_mfcc=12):
    """
    function to calculate MFCC coefficients of an audio file

    Args:
        audio_path (str): path to the audio file
        n_mfcc (int, optional): number of coefficients to return

    Returns:
        np.ndarray [shape=(order, t)]: n_mfcc coefficients of all time windows
    """
    y, sr = librosa.load(audio_path)
    mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=n_mfcc)
    return mfccs

# Compute mfcc features of each audio file
mfcc_features = np.array(
    [compute_mfcc(audio_file).T for audio_file in audio_files]
    )
print("Shape of MFCC features:", mfcc_features.shape)

# Save processed dataset in .npy format with the shape (num_audios, num_time_windows, num_mfcc) 
save_path = dataset_path + "processed_dataset/"
os.makedirs(save_path, exist_ok=True)
np.save(f"{save_path}/mfcc_features.npy", mfcc_features)

Shape of MFCC features: (2, 44, 12)


2. The previous step can be skipped, because the original Japanese Vowels Dataset used in the paper [Cai et al.,2023] has already been converted to cepstral coefficients through Linear Predictive Coding (LPC) - another common speech feature extraction method. However, it requires additional preprocessing. Download the Japanese Vowels Dataset using this link (https://archive.ics.uci.edu/static/public/128/japanese+vowels.zip), unzip it to the /1_Japanese_Vowels_dataset subfolder in the current working directory, and generate the feature matrix with the same shape structure as in Step 72:


In [6]:
# Load the Japanese Vowels training dataset in ./1_Japanese_Vowels_dataset/ae.train
with open('./1_Japanese_Vowels_dataset/ae.train', 'r') as f:
    all_content = f.read()

# Separate LPCC-represented speech clips with 2 line breaks
speech_blocks = all_content.strip().split('\n\n')

# Convert all blocks into a NumPy array
lpcc_features = []
for block in speech_blocks:
    lines = block.strip().split('\n')
    lines_to_append = 29-len(lines) # append "nan" lines to make each block have 29 lines
    for _ in range(lines_to_append):
        lines.append(" ".join(["nan "]*12))
    matrix = [list(map(lambda x: float(x) if x.lower() != 'nan' else np.nan, line.strip().split())) for line in lines]
    lpcc_features.append(matrix)
lpcc_features_np = np.array(lpcc_features)[:240]  # Shape: (240, 29, 12)

print("Shape of LPCC features:", lpcc_features_np.shape)
save_path = "./1_Japanese_Vowels_dataset/processed_dataset"
os.makedirs(save_path, exist_ok=True)
np.save(f"{save_path}/lpcc_features.npy", lpcc_features_np)

Shape of LPCC features: (240, 29, 12)


3. Now, the raw audio dataset has been converted to a three-dimensional NumPy cepstral coefficients matrices (e.g. Japanese Vowel Dataset has a dimension of (240, 29, 12)) where the first dimension is the number of audio clips, the second dimension is the number of time windows in each audio clip, and the third dimension is the number of cepstral coefficients at each time window. Now, normalize the matrices to the range of [0,1] and perform binary thresholding to convert float coefficients to {0,1} binaries:

In [8]:
lpcc_features_nanmax = np.nanmax(
    lpcc_features_np, 
    axis=(1, 2), 
    keepdims=True)
lpcc_features_nanmin = np.nanmin(
    lpcc_features_np, 
    axis=(1, 2), 
    keepdims=True)
lpcc_features_norm = (
    lpcc_features_np - lpcc_features_nanmin) / (lpcc_features_nanmax - lpcc_features_nanmin)
lpcc_features_norm = np.nan_to_num(lpcc_features_norm)
lpcc_features_binary = np.where(lpcc_features_norm > 0.5, 1, 0)

4. For each time window in each audio clip, collect non-zero indices, which will act as indices of stimulation electrodes during the stimulation stage:

In [9]:
import pickle

selected_stimulation_electrode_indices = []
for i in range(lpcc_features_binary.shape[0]):
    selected_stimulation_electrode_indices_per_clip = []
    for j in range(lpcc_features_binary.shape[1]):
        selected_stimulation_electrode_indices_per_clip.append(list(np.nonzero(lpcc_features_binary[i][j])[0]))
    selected_stimulation_electrode_indices.append(selected_stimulation_electrode_indices_per_clip)

save_path = "./1_Japanese_Vowels_dataset/processed_dataset"
os.makedirs(save_path, exist_ok=True)
with open(f'{save_path}/selected_stimulation_electrode_indices.pkl', 'wb') as f:
    pickle.dump(selected_stimulation_electrode_indices, f)

[PAUSE POINT] This concludes the preprocessing of the Japanese Vowel Dataset. The saved two-dimensional list selected_stimulation_electrode_indices determines which stimulation electrodes will be activated at each time step of each audio clip. For example, when the index at time step i is [0, 1, 4], the stimulation electrode #0, #1, and #4 will be activated and go through a bipolar voltage stimulation at time step I, while the other 9 electrodes will remain inactivated.