# Booker Data Analysis of Patients with Increasing BDIs

Now that we have the Booker data with us, I would be analyzing the audio files. The audio files aren't very clear due to a lot of noise, so I preprocess it by manually removing noises using Adobe Audition. 

Here's a list of extracted features:

1. MFCC
2. Zero Crossings
3. Spectral Centroid
4. Spectrogram
5. Chroma FFT
6. Energy
7. RMS Energy
8. Spectral Rolloff
9. Phonation Rate
10. Speech Productivity
11. Speech Rate
12. Articulation Rate

This notebook contains analyses of patients with a BDI-12-month score greater than the initial BDI score. My intent is to compare these patients with other patients whose BDI decreased after the 12 months.

In [1]:
import librosa 
import librosa.display
from scipy.io import wavfile as wav
import speech_recognition as sr
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import kurtosis
import sklearn
from pydub import AudioSegment 
from pydub.silence import split_on_silence 
import ffmpeg
from sklearn.svm import SVC



In [2]:
def compute_all_features(audio, path, sample_rate):
    
    # Compute Zero Crossings. And compute ZC mean and variance.
    audio_zc = librosa.zero_crossings(audio, pad=False)
    print("\nZero Crossings Mean:"+str(np.mean(audio_zc)))
    print("Zero Crossings Variance:"+str(np.var(audio_zc, dtype = np.float32)))
        
        
    # Compute Spectral Centroid. And compute SC mean and variance.
    spectral_centroids = librosa.feature.spectral_centroid(audio, sr=sample_rate)[0]
    print("\nSpectral Centroid Mean:"+str(np.mean(spectral_centroids)))
    print("Spectral Centroid Variance:"+str(np.var(spectral_centroids, dtype = np.float32)))
    
    
    # Compute RMS Energy. And compute RMS Energy mean and variance.
    rmse = librosa.feature.rms(audio, frame_length=512, hop_length=256, center=True)
    rmse = rmse[0]
    print("\nRMS Energy Mean:"+str(np.mean(rmse)))
    print("RMS Energy Variance:"+str(np.var(rmse, dtype = np.float32)))
    
    
    # Compute Spectral Rolloff. And compute Spectral Rolloff mean and variance.
    spectral_rolloff = librosa.feature.spectral_rolloff(audio+0.01, sr=sample_rate)[0]
    print("\nSpectral Rolloff Mean:"+str(np.mean(spectral_rolloff)))
    print("Spectral Rolloff Variance:"+str(np.var(spectral_rolloff, dtype = np.float32)))
    
    
    # Compute all prosodic features.
    prosodic_features = {
        "phonation_rate" : 0,
        "speech_productivity" : 0,
        "speech_rate" : 0,
        "articulation_rate" : 0
    }
    audio_duration = librosa.get_duration(filename=path)
    audio_for_prosody, sample_rate = librosa.load(path, duration=audio_duration) 
    voiced_intervals = librosa.effects.split(y=audio_for_prosody, top_db=20)
    total_voiced_duration = 0
    for interval in voiced_intervals:
        total_voiced_duration = total_voiced_duration + ((interval[1]-interval[0])/sample_rate)
    if total_voiced_duration > audio_duration:
        total_voiced_duration = audio_duration
    total_silenced_duration = audio_duration-total_voiced_duration
    prosodic_features["phonation_rate"] = total_voiced_duration/audio_duration
    prosodic_features["speech_productivity"] = (total_silenced_duration)/total_voiced_duration
    # Will calculate speech_rate and articulation rate from transcripts. This is for better accuracy.    
    print("\nProsodic Features: " + str(prosodic_features))
    
    
    # Compute MFCCs. And compute MFCC means and variances.
    audio_mfcc = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=14)
    print("\nMFCC Means:")
    for i in range(1, 14):
        print("MFCC Mean for Coefficient-"+ str(i) +": "+str(np.mean(audio_mfcc[i])))
    print("\nMFCC Variances:")
    for i in range(1, 14):
        print("MFCC Variance for Coefficient-"+ str(i) +": "+str(np.var(audio_mfcc[i], dtype = np.float32)))
    
    
    # Compute Chroma 12 Pitch Scale. And compute Chroma 12 Pitch Scale means and variances.
    chromagram = librosa.feature.chroma_stft(audio, sr=sample_rate, hop_length=512)
    print("\nChromagram Means:")
    for i in range(0, 12):
        print("Chromagram Mean for Coefficient-"+ str(i) +": "+str(np.mean(chromagram[i])))
    print("\nChromagram Variances:")
    for i in range(0, 12):
        print("Chromagram Variance for Coefficient-"+ str(i) +": "+str(np.var(chromagram[i], dtype = np.float32)))
    
    
    
    
    
    '''
    # Visualize Spectrogram
    audio_spectrogram = librosa.stft(audio)
    audio_spectrogram_db = librosa.amplitude_to_db(abs(audio_spectrogram))
    plt.figure(figsize=(14, 5))
    librosa.display.specshow(audio_spectrogram_db, sr=sample_rate, x_axis='time', y_axis='hz')
    plt.colorbar()
    '''

In [3]:
# Patient 50063
# This patient had:
# Starting BDI Score = 5 at ED
# 12 Month BDI Score = 45 after treatment
filename1 = 'booker_audio_files/50063/50063.wav'
audio, sample_rate = librosa.load(filename1) 
compute_all_features(audio, filename1, sample_rate)


Zero Crossings Mean:0.03178537901900962
Zero Crossings Variance:0.030775068

Spectral Centroid Mean:889.0570388347704
Spectral Centroid Variance:319750.12

RMS Energy Mean:0.079679936
RMS Energy Variance:0.003519452

Spectral Rolloff Mean:1539.4018234052533
Spectral Rolloff Variance:2094844.1

Prosodic Features: {'phonation_rate': 0.9847462676989224, 'speech_productivity': 0.0154900128098189, 'speech_rate': 0, 'articulation_rate': 0}

MFCC Means:
MFCC Mean for Coefficient-1: 161.23753
MFCC Mean for Coefficient-2: 28.973692
MFCC Mean for Coefficient-3: 32.278355
MFCC Mean for Coefficient-4: 7.9164352
MFCC Mean for Coefficient-5: 12.3973675
MFCC Mean for Coefficient-6: -6.5472717
MFCC Mean for Coefficient-7: -2.1759808
MFCC Mean for Coefficient-8: 4.1985564
MFCC Mean for Coefficient-9: 8.523919
MFCC Mean for Coefficient-10: -0.46817333
MFCC Mean for Coefficient-11: 9.002686
MFCC Mean for Coefficient-12: -4.390409
MFCC Mean for Coefficient-13: -2.3235826

MFCC Variances:
MFCC Variance fo

In [4]:
# Patient 50086
# This patient had:
# Starting BDI Score = 5 at ED
# 12 Month BDI Score = 8 after treatment
filename1 = 'booker_audio_files/50086/50086.wav'
audio, sample_rate = librosa.load(filename1) 
compute_all_features(audio, filename1, sample_rate)


Zero Crossings Mean:0.03862702777971176
Zero Crossings Variance:0.037134975

Spectral Centroid Mean:1084.1563901501725
Spectral Centroid Variance:148816.62

RMS Energy Mean:0.042531665
RMS Energy Variance:0.0003516314

Spectral Rolloff Mean:2156.7564619348404
Spectral Rolloff Variance:1141605.8

Prosodic Features: {'phonation_rate': 1.0, 'speech_productivity': 0.0, 'speech_rate': 0, 'articulation_rate': 0}

MFCC Means:
MFCC Mean for Coefficient-1: 152.23257
MFCC Mean for Coefficient-2: 2.5575066
MFCC Mean for Coefficient-3: 43.54949
MFCC Mean for Coefficient-4: -3.9327133
MFCC Mean for Coefficient-5: 7.1173816
MFCC Mean for Coefficient-6: -0.11063726
MFCC Mean for Coefficient-7: 9.230918
MFCC Mean for Coefficient-8: 11.646799
MFCC Mean for Coefficient-9: 6.668939
MFCC Mean for Coefficient-10: 0.5056702
MFCC Mean for Coefficient-11: 6.6670923
MFCC Mean for Coefficient-12: -5.131416
MFCC Mean for Coefficient-13: 0.74996686

MFCC Variances:
MFCC Variance for Coefficient-1: 971.9264
MFCC 

In [5]:
# Patient 50087
# This patient had:
# Starting BDI Score = 8 at ED
# 12 Month BDI Score = 9 after treatment
filename1 = 'booker_audio_files/50087/50087.wav'
audio, sample_rate = librosa.load(filename1) 
compute_all_features(audio, filename1, sample_rate)


Zero Crossings Mean:0.03476469706177146
Zero Crossings Variance:0.03355611

Spectral Centroid Mean:1088.4390991148196
Spectral Centroid Variance:123291.45

RMS Energy Mean:0.051659793
RMS Energy Variance:0.0009620652

Spectral Rolloff Mean:2190.3087985131046
Spectral Rolloff Variance:1263699.2

Prosodic Features: {'phonation_rate': 1.0, 'speech_productivity': 0.0, 'speech_rate': 0, 'articulation_rate': 0}

MFCC Means:
MFCC Mean for Coefficient-1: 148.93312
MFCC Mean for Coefficient-2: 17.735344
MFCC Mean for Coefficient-3: 40.27894
MFCC Mean for Coefficient-4: -22.60575
MFCC Mean for Coefficient-5: 8.038126
MFCC Mean for Coefficient-6: 7.494821
MFCC Mean for Coefficient-7: 17.367987
MFCC Mean for Coefficient-8: 16.495794
MFCC Mean for Coefficient-9: 16.732061
MFCC Mean for Coefficient-10: -0.36738697
MFCC Mean for Coefficient-11: 2.0305731
MFCC Mean for Coefficient-12: 2.9771938
MFCC Mean for Coefficient-13: 1.5465822

MFCC Variances:
MFCC Variance for Coefficient-1: 518.7714
MFCC Var

## Observations

1. The spectral features aren't showing any conclusive trends.
2. There are just too many variables that seem to create bias. For example, I felt that the gender had a role to play in spectral centroids calculation. There are instances where speaker is closer to the mic, which is shooting their RMS Energy a bit higher than the rest. 
3. After taking a look at the outputs, I felt that phonation and speech rate only were helpful.
4. Just comparing ED sessions of different patients LACKS context. We need to change our approach, and try comparing ED and FU sessions.
5. Another thing that I noticed was, there is no point to compare just ED and last FU session. Take patient 50051 for example. This patient had BDI values of 4 -> 6 ->7 -> 5 -> 2. So values first increase and then decrease. This proves that we can't expect a steady increase or decrease.

## Takeaways

1. After listnening to some FU sessions, I feel that we should be focussing on comparing a patient's speech in ED and FU for BETTER CONTEXT. Comparing patient A's audio features with patient B isn't giving us a very clear idea. Also, I felt that for a certain BDI x, the patients might have very different styles of speaking. While some patients were super-talkative, there were some patients that couldn't speak much, yet had the same BDI.

## Questions

1. Should I try to use some sort of ML model? Since I can't observe any trends from simply observing myself.
2. Should I try to compare a patient's ED and FU sessions to see fluctuations? I think this would provide a better baseline. Comparing the same patient's audio throughout their therapy could give us insights about how well their therapy is progressing, isn't it?

## Readings about Becker Depression Inventory
https://www.ismanet.org/doctoryourspirit/pdfs/Beck-Depression-Inventory-BDI.pdf