File Name: mini_project_02c.ipynb

Description: This script extracts comprehensive audio features from valve recordings, handles class imbalance with SMOTE, trains an SVM classifier to detect normal vs. abnormal valve states, evaluates the model on a test set, and predicts the state of new audio files while reporting both actual and predicted labels. It also documents the iterative process used to improve model performance from 88% to 99% accuracy.

Note: Dataset not included due to GitHub size constraints.

Record of Revisions (Date | Author | Change):  
10/29/2025 | Rhys DeLoach | Initial creation

In [143]:
# Import libraries
import os
from pathlib import Path
import librosa
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
import pandas as pd
import numpy as np
from sklearn.svm import SVC
from imblearn.over_sampling import SMOTE
from sklearn.neighbors import KNeighborsClassifier

In [144]:
# Feature Extraction Function
def featureExtract(path):
    y, sr = librosa.load(path)

    # Features
    mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)
    mfcc_delta = librosa.feature.delta(mfcc)
    mfcc_delta2 = librosa.feature.delta(mfcc, order=2)
    chroma = librosa.feature.chroma_stft(y=y, sr=sr)
    spec_centroid = librosa.feature.spectral_centroid(y=y, sr=sr)
    spec_bandwidth = librosa.feature.spectral_bandwidth(y=y, sr=sr)
    spec_rolloff = librosa.feature.spectral_rolloff(y=y, sr=sr)
    spec_contrast = librosa.feature.spectral_contrast(y=y, sr=sr)
    zcr = librosa.feature.zero_crossing_rate(y)
    rms = librosa.feature.rms(y=y)
    tonnetz = librosa.feature.tonnetz(y=y, sr=sr)

    # Consolidating Features across Samples
    feats = np.hstack([
        np.mean(mfcc, axis=1),
        np.std(mfcc, axis=1),
        np.mean(mfcc_delta, axis=1),
        np.std(mfcc_delta, axis=1),
        np.mean(mfcc_delta2, axis=1),
        np.std(mfcc_delta2, axis=1),
        np.mean(chroma, axis=1),
        np.std(chroma, axis=1),
        np.mean(spec_centroid),
        np.std(spec_centroid),
        np.mean(spec_bandwidth),
        np.std(spec_bandwidth),
        np.mean(spec_rolloff),
        np.std(spec_rolloff),
        np.mean(spec_contrast, axis=1),
        np.std(spec_contrast, axis=1),
        np.mean(zcr),
        np.std(zcr),
        np.mean(rms),
        np.std(rms),
        np.mean(tonnetz, axis=1),
        np.std(tonnetz, axis=1),
    ])

    return feats

In [145]:
# Extract Features/Labels and Create Audio Dataframe
rootDir = 'data/valve'

# Define feature columns
feature_names = [f'mfcc{i}' for i in range(1, 14)] + [f'mfcc_std{i}' for i in range(1, 14)] + [f'mfcc_delta{i}' for i in range(1, 14)] + [f'mfcc_delta_std{i}' for i in range(1, 14)] + [f'mfcc_delta2{i}' for i in range(1, 14)] + [f'mfcc_delta2_std{i}' for i in range(1, 14)] + [f'chroma{i}' for i in range(1, 13)] + [f'chroma_std{i}' for i in range(1, 13)] + ['centroid', 'centroid_std', 'bandwidth', 'bandwidth_std', 'rolloff', 'rolloff_std'] + [f'contrast{i}' for i in range(1, 8)] + [f'contrast_std{i}' for i in range(1, 8)] + ['zeroCrossing', 'zeroCrossing_std', 'rms', 'rms_std'] + [f'tonnetz{i}' for i in range(1, 7) ]+ [f'tonnetz_std{i}' for i in range(1, 7)]
features = pd.DataFrame(columns=feature_names)
labels = []

for dirPath, dirNames, fileNames in os.walk(rootDir): # Search root directory
    for fileName in fileNames:
        if os.path.splitext(fileName)[1] == '.wav': # Pull all .wav files
            path = Path(os.path.join(dirPath, fileName))
            
            state = path.parts[2]
            labels.append(state)

            feats = featureExtract(path)

            features.loc[len(features)] = feats

# Normalize features
scaler = StandardScaler()
features_scaled = scaler.fit_transform(features)

In [150]:
# Train Model
# Split dataset
X_train, X_test, y_train, y_test = train_test_split(features_scaled, labels, test_size=0.2, random_state=40)

smote = SMOTE(random_state=40) # Resampling to account for class imbalance
X_res, y_res = smote.fit_resample(X_train, y_train)

svc = SVC(kernel='rbf', C=10, gamma='scale', class_weight='balanced', random_state=40) # Train support vector classifier model
svc.fit(X_res, y_res)

# Predict
y_pred = svc.predict(X_test)

In [151]:
# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))

Accuracy: 0.9892086330935251


In [162]:
# Validate
valDir = 'data/Valve_Data_for_Prediction'

for dirPath, dirNames, fileNames in os.walk(valDir): # Search root directory
    for fileName in fileNames:
        if os.path.splitext(fileName)[1] == '.wav': # Pull all .wav files
            path = Path(os.path.join(dirPath, fileName))
            feats = pd.DataFrame([featureExtract(path)], columns=feature_names)
            scaledFeats = scaler.transform(feats)
            y_predVal = svc.predict(scaledFeats)
            print(f'For file {fileName}...')
            if fileName == 'Valve1_000NB.wav':
                print('Actual State: normal')
            else:
               print('Actual State: abnormal') 

            print(f'Predicted State: {y_predVal[0]}\n')

For file Valve1_000NB.wav...
Actual State: normal
Predicted State: normal

For file Valve2_000AB.wav...
Actual State: abnormal
Predicted State: abnormal



Observation: The main challenge I encountered was handling class imbalance. Initially, my model achieved 88% accuracy but was predicting “normal/normal” for my validation data. To address the imbalance, I tried several strategies: testing different models, adding more features, tuning hyperparameters, and resampling the data to reduce the imbalance. These steps increased my model performance to 99% accuracy.