# Booker Data Analysis

Now that we have the Booker data with us, I would be analyzing the audio files. The audio files aren't very clear due to a lot of noise, so I preprocess it by manually removing noises using Adobe Audition. 

Here's a list of extracted features:

1. MFCC
2. Zero Crossings
3. Spectral Centroid
4. Spectrogram
5. Chroma FFT
6. Energy
7. RMS Energy
8. Spectral Rolloff
9. Phonation Rate
10. Speech Productivity
11. Speech Rate
12. Articulation Rate


In [1]:
import librosa 
import librosa.display
from scipy.io import wavfile as wav
import speech_recognition as sr
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import kurtosis
import sklearn
from pydub import AudioSegment 
from pydub.silence import split_on_silence 
import ffmpeg
from sklearn.svm import SVC



In [34]:
def compute_all_features(audio, path, sample_rate):
    
    # Compute Zero Crossings. And compute ZC mean and variance.
    audio_zc = librosa.zero_crossings(audio, pad=False)
    print("\nZero Crossings Mean:"+str(np.mean(audio_zc)))
    print("Zero Crossings Variance:"+str(np.var(audio_zc, dtype = np.float32)))
        
        
    # Compute Spectral Centroid. And compute SC mean and variance.
    spectral_centroids = librosa.feature.spectral_centroid(audio, sr=sample_rate)[0]
    print("\nSpectral Centroid Mean:"+str(np.mean(spectral_centroids)))
    print("Spectral Centroid Variance:"+str(np.var(spectral_centroids, dtype = np.float32)))
    
    
    # Compute RMS Energy. And compute RMS Energy mean and variance.
    rmse = librosa.feature.rms(audio, frame_length=512, hop_length=256, center=True)
    rmse = rmse[0]
    print("\nRMS Energy Mean:"+str(np.mean(rmse)))
    print("RMS Energy Variance:"+str(np.var(rmse, dtype = np.float32)))
    
    
    # Compute Spectral Rolloff. And compute Spectral Rolloff mean and variance.
    spectral_rolloff = librosa.feature.spectral_rolloff(audio+0.01, sr=sample_rate)[0]
    print("\nSpectral Rolloff Mean:"+str(np.mean(spectral_rolloff)))
    print("Spectral Rolloff Variance:"+str(np.var(spectral_rolloff, dtype = np.float32)))
    
    
    # Compute all prosodic features.
    prosodic_features = {
        "phonation_rate" : 0,
        "speech_productivity" : 0,
        "speech_rate" : 0,
        "articulation_rate" : 0
    }
    audio_duration = librosa.get_duration(filename=path)
    audio_for_prosody, sample_rate = librosa.load(path, duration=audio_duration) 
    voiced_intervals = librosa.effects.split(y=audio_for_prosody, top_db=20)
    total_voiced_duration = 0
    for interval in voiced_intervals:
        total_voiced_duration = total_voiced_duration + ((interval[1]-interval[0])/sample_rate)
    if total_voiced_duration > audio_duration:
        total_voiced_duration = audio_duration
    total_silenced_duration = audio_duration-total_voiced_duration
    prosodic_features["phonation_rate"] = total_voiced_duration/audio_duration
    prosodic_features["speech_productivity"] = (total_silenced_duration)/total_voiced_duration
    # Will calculate speech_rate and articulation rate from transcripts. This is for better accuracy.    
    print("\nProsodic Features: " + str(prosodic_features))
    
    
    # Compute MFCCs. And compute MFCC means and variances.
    audio_mfcc = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=14)
    print("\nMFCC Means:")
    for i in range(1, 14):
        print("MFCC Mean for Coefficient-"+ str(i) +": "+str(np.mean(audio_mfcc[i])))
    print("\nMFCC Variances:")
    for i in range(1, 14):
        print("MFCC Variance for Coefficient-"+ str(i) +": "+str(np.var(audio_mfcc[i], dtype = np.float32)))
    
    
    # Compute Chroma 12 Pitch Scale. And compute Chroma 12 Pitch Scale means and variances.
    chromagram = librosa.feature.chroma_stft(audio, sr=sample_rate, hop_length=512)
    print("\nChromagram Means:")
    for i in range(0, 12):
        print("Chromagram Mean for Coefficient-"+ str(i) +": "+str(np.mean(chromagram[i])))
    print("\nChromagram Variances:")
    for i in range(0, 12):
        print("Chromagram Variance for Coefficient-"+ str(i) +": "+str(np.var(chromagram[i], dtype = np.float32)))
    
    
    
    
    
    '''
    # Visualize Spectrogram
    audio_spectrogram = librosa.stft(audio)
    audio_spectrogram_db = librosa.amplitude_to_db(abs(audio_spectrogram))
    plt.figure(figsize=(14, 5))
    librosa.display.specshow(audio_spectrogram_db, sr=sample_rate, x_axis='time', y_axis='hz')
    plt.colorbar()
    '''

In [35]:
# Patient 50064
filename1 = 'booker_audio_files/50064/50064.wav'
audio, sample_rate = librosa.load(filename1) 
compute_all_features(audio, filename1, sample_rate)


Zero Crossings Mean:0.031361363630157374
Zero Crossings Variance:0.030377826

Spectral Centroid Mean:901.9457383994927
Spectral Centroid Variance:146120.34

RMS Energy Mean:0.100771286
RMS Energy Variance:0.004343562

Spectral Rolloff Mean:1747.7632726513364
Spectral Rolloff Variance:743616.06

Prosodic Features: {'phonation_rate': 0.995806239098304, 'speech_productivity': 0.004211422601141212, 'speech_rate': 0, 'articulation_rate': 0}

MFCC Means:
MFCC Mean for Coefficient-1: 165.15202
MFCC Mean for Coefficient-2: -6.6579204
MFCC Mean for Coefficient-3: 36.626087
MFCC Mean for Coefficient-4: 13.852268
MFCC Mean for Coefficient-5: 12.565309
MFCC Mean for Coefficient-6: 3.2800207
MFCC Mean for Coefficient-7: -1.4506346
MFCC Mean for Coefficient-8: 3.6713154
MFCC Mean for Coefficient-9: 6.163911
MFCC Mean for Coefficient-10: -2.0189536
MFCC Mean for Coefficient-11: 3.071324
MFCC Mean for Coefficient-12: 5.2797112
MFCC Mean for Coefficient-13: -2.8786998

MFCC Variances:
MFCC Variance fo

In [37]:
# Patient 50063
filename1 = 'booker_audio_files/50063/50063.wav'
audio, sample_rate = librosa.load(filename1) 
compute_all_features(audio, filename1, sample_rate)


Zero Crossings Mean:0.03178537901900962
Zero Crossings Variance:0.030775068

Spectral Centroid Mean:889.0570388347704
Spectral Centroid Variance:319750.12

RMS Energy Mean:0.079679936
RMS Energy Variance:0.003519452

Spectral Rolloff Mean:1539.4018234052533
Spectral Rolloff Variance:2094844.1

Prosodic Features: {'phonation_rate': 0.9847462676989224, 'speech_productivity': 0.0154900128098189, 'speech_rate': 0, 'articulation_rate': 0}

MFCC Means:
MFCC Mean for Coefficient-1: 161.23753
MFCC Mean for Coefficient-2: 28.973692
MFCC Mean for Coefficient-3: 32.278355
MFCC Mean for Coefficient-4: 7.9164352
MFCC Mean for Coefficient-5: 12.3973675
MFCC Mean for Coefficient-6: -6.5472717
MFCC Mean for Coefficient-7: -2.1759808
MFCC Mean for Coefficient-8: 4.1985564
MFCC Mean for Coefficient-9: 8.523919
MFCC Mean for Coefficient-10: -0.46817333
MFCC Mean for Coefficient-11: 9.002686
MFCC Mean for Coefficient-12: -4.390409
MFCC Mean for Coefficient-13: -2.3235826

MFCC Variances:
MFCC Variance fo

In [38]:
# Patient 50059
filename1 = 'booker_audio_files/50059/50059.wav'
audio, sample_rate = librosa.load(filename1) 
compute_all_features(audio, filename1, sample_rate)


Zero Crossings Mean:0.0324321372545126
Zero Crossings Variance:0.031380307

Spectral Centroid Mean:926.156540875741
Spectral Centroid Variance:187086.78

RMS Energy Mean:0.08039084
RMS Energy Variance:0.0034302706

Spectral Rolloff Mean:1721.7142456520012
Spectral Rolloff Variance:1356889.0

Prosodic Features: {'phonation_rate': 0.872364990794665, 'speech_productivity': 0.14630918314256072, 'speech_rate': 0, 'articulation_rate': 0}

MFCC Means:
MFCC Mean for Coefficient-1: 161.37962
MFCC Mean for Coefficient-2: 5.547751
MFCC Mean for Coefficient-3: 34.576374
MFCC Mean for Coefficient-4: 7.791508
MFCC Mean for Coefficient-5: 8.032114
MFCC Mean for Coefficient-6: 1.1233388
MFCC Mean for Coefficient-7: 4.3674273
MFCC Mean for Coefficient-8: 3.8192782
MFCC Mean for Coefficient-9: -1.4477811
MFCC Mean for Coefficient-10: 0.333811
MFCC Mean for Coefficient-11: -1.0468962
MFCC Mean for Coefficient-12: -0.61345625
MFCC Mean for Coefficient-13: 3.5625873

MFCC Variances:
MFCC Variance for Coef

In [39]:
# Patient 50057
filename1 = 'booker_audio_files/50057/50057.wav'
audio, sample_rate = librosa.load(filename1) 
compute_all_features(audio, filename1, sample_rate)


Zero Crossings Mean:0.04743126503983004
Zero Crossings Variance:0.045181543

Spectral Centroid Mean:1063.8904747599042
Spectral Centroid Variance:403329.5

RMS Energy Mean:0.117535636
RMS Energy Variance:0.0048512933

Spectral Rolloff Mean:1811.2967266613923
Spectral Rolloff Variance:2038095.1

Prosodic Features: {'phonation_rate': 1.0, 'speech_productivity': 0.0, 'speech_rate': 0, 'articulation_rate': 0}

MFCC Means:
MFCC Mean for Coefficient-1: 175.40233
MFCC Mean for Coefficient-2: 13.735252
MFCC Mean for Coefficient-3: 13.81646
MFCC Mean for Coefficient-4: -18.284494
MFCC Mean for Coefficient-5: 19.938261
MFCC Mean for Coefficient-6: -4.943243
MFCC Mean for Coefficient-7: -8.8996
MFCC Mean for Coefficient-8: 5.963514
MFCC Mean for Coefficient-9: 12.946238
MFCC Mean for Coefficient-10: 1.5585879
MFCC Mean for Coefficient-11: 6.4151607
MFCC Mean for Coefficient-12: 1.195284
MFCC Mean for Coefficient-13: 0.41297102

MFCC Variances:
MFCC Variance for Coefficient-1: 1407.8855
MFCC Vari

In [41]:
# Patient 50051
# This patient had:
# BDI Score = 4 at ED
# BDI Score = 2 after treatment
filename1 = 'booker_audio_files/50051/50051.wav'
audio, sample_rate = librosa.load(filename1) 
compute_all_features(audio, filename1, sample_rate)


Zero Crossings Mean:0.06425597818187143
Zero Crossings Variance:0.060127147

Spectral Centroid Mean:1495.3839485235508
Spectral Centroid Variance:144252.83

RMS Energy Mean:0.046276104
RMS Energy Variance:0.0007027266

Spectral Rolloff Mean:3023.1095569292024
Spectral Rolloff Variance:906166.25

Prosodic Features: {'phonation_rate': 0.8182268796750707, 'speech_productivity': 0.2221549118468388, 'speech_rate': 0, 'articulation_rate': 0}

MFCC Means:
MFCC Mean for Coefficient-1: 127.85316
MFCC Mean for Coefficient-2: -14.550718
MFCC Mean for Coefficient-3: 39.20887
MFCC Mean for Coefficient-4: 6.597484
MFCC Mean for Coefficient-5: 17.933455
MFCC Mean for Coefficient-6: -3.440447
MFCC Mean for Coefficient-7: -4.4502873
MFCC Mean for Coefficient-8: -4.0730553
MFCC Mean for Coefficient-9: 9.211929
MFCC Mean for Coefficient-10: 4.4426985
MFCC Mean for Coefficient-11: 11.246981
MFCC Mean for Coefficient-12: 3.4335325
MFCC Mean for Coefficient-13: 4.5402703

MFCC Variances:
MFCC Variance for 

## Observations

1. The numbers aren't conclusively pointing towards any traits.
2. There are just too many variables that seem to create bias. I felt that the gender had a role to play in spectral centroids calculation. There are instances where speaker is closer to the mic, which is shooting their RMS Energy a bit higher than the rest. 
3. After taking a look at the outputs, I felt that phonation and speech rate only were helpful.
4. Just comparing ED sessions of different patients LACKS context. We need to change our approach, and try comparing ED and FU sessions.
5. Another thing that I noticed was, there is no point to compare just ED and last FU session. Take patient 50051 for example. This patient had BDI values of 4 -> 6 ->7 -> 5 -> 2. So values first increase and then decrease. This proves that we can't expect a steady increase or decrease.