# Interactive Demonstration
## OpenWillis user tutorial

This notebook walks through an interactive demonstration of the basic functions in OpenWillis to process audio and video files. It is intended to help users get a sense of what it's like to work with OpenWillis in a jupyter notebook environment.

__Note:__ Be sure that you have gone through the OpenWillis [installation steps](https://www.notion.so/brooklynhealth/Installing-OpenWillis-and-jupyter-notebook-14983a8fe047814b88ced7d3831791f2?pvs=12) prior to continuing. 

For the demo, users can install either `openwillis` or the separate subpackages `openwillis-voice`, `openwillis-transcribe`, `openwillis-speech`, and `openwillis-face`. 

Functions in `openwillis-gps` are not covered in this demo. 

First, we'll load the necessary libraries. Some warning messages may appear but these can be safely ignored if your environment is set up correctly. 

In [None]:
import whisperx
import pandas as pd
import os

Before getting into the analysis portion, we need to load some data we can work with. For this demonstration, we will use some sample audio and video files of a person reading from a list of [standardized sentences](https://www.cs.columbia.edu/~hgs/audio/harvard.html). 

We'll also look at two samples from mock clinical interviews as a use case for the speaker separation functions. 

This data can be loaded straight from GitHub onto your local machine and into this jupyter notebook environment. 

Next, we’ll use git clone to import sample audio and video data from GitHub. First, change the directory path (below) to reference a local folder on your computer where you plan to store sample data.

In [None]:
os.chdir("/Users/michelleworthington/Documents/")

In [None]:
!git clone https://github.com/bklynhlth/sample_data.git

Below, we'll organize these files so they are easy to access in the code below. 

In [None]:
audio_dir = 'sample_data/audio_files'
interview_dir = 'sample_data/audio_files/multiple_speakers'
video_dir = 'sample_data/video_files'
baseline_dir = 'sample_data/video_files/baseline_videos'

audio_files = [f for f in os.listdir(audio_dir) if f.endswith('.wav')]
interview_files = [f for f in os.listdir(interview_dir)]
video_files = [f for f in os.listdir(video_dir) if f.endswith('.mp4')]
bl_files = [f for f in os.listdir(baseline_dir)]

Let's check to make sure we have the correct files in each of `audio_files`, `interview_files`, `video_files`, and `bl_files`. We're only working with data from 5 videos/audio files, and 2 interview files, so we should expect 4 lists of files that correspond with 'sentences_audio.m4a', 'sentences_video.mp4', and 'sentences_bl_video.mp4', numbered 1-5 (plus interview_1clip.mp3 and interview_2clip.mp3). 

In [None]:
[audio_files, interview_files, video_files, bl_files]

### __Workflow 1:__ Single speaker, vocal acoustics and speech characteristics

This example maps onto the workflow described in the user tutorials [here](https://www.notion.so/brooklynhealth/Analyzing-audio-with-a-single-speaker-14983a8fe04781389f08dadbf0667381) to examine vocal acoustics and speech characterisitics for an audio file with a single speaker. 

### 1.1 - Vocal acoustics

For this function, you should have installed either `openwillis` or `openwillis-voice`. 

In [None]:
# Remove the # in front of the library you have installed and run this cell. 

# import openwillis as ow
#import openwillis.voice as owv

#### 1.1a - Procssing a single file

Now, we can proceed with our processing. First, we'll just process a single audio file from the 'audio_files' folder above. 

In [None]:
framewise, summary = ow.vocal_acoustics(audio_path = 'sample_data/audio_files/sentences_1_audio.wav', voiced_segments = False, option = 'simple')
# change to owv.vocal_acoustics if working from openwillis-voice instead of openwillis

Now let's take a look at our summary data to make sure it populated correctly. We can first specify that we want to print all rows and columns. 

In [None]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [None]:
summary

If the columns populated, we should be in good shape! As a check, the 'f0_mean' value should be approximately 72. Notice that some of the final columns are 'NaN' - these features are specifically for the more advanced options and will not populate if 'simple' is specified in the 'option' parameter. 

#### 1.1b - Processing multiple files

Here, let's go ahead and run vocal acoustics on all 5 files in our folder using a for loop. 

In [None]:
folder_path = 'sample_data/audio_files'

framewise_data = pd.DataFrame()
summary_data = pd.DataFrame()

for filename in os.listdir(folder_path):
  if filename.endswith('.wav'):
    audio_path = os.path.join(folder_path, filename)

    # Run vocal acoustics function
    framewise, summary = ow.vocal_acoustics(audio_path = audio_path, voiced_segments = False, option = 'simple')
    # change to owv.vocal_acoustics if working from openwillis-voice instead of openwillis

    # Here, make sure we can identify each file by adding the name in the first column of the dataframe, remove '.wav' from the name
    filename_no_ext = os.path.splitext(filename)[0]

    # Add filename column as the first column using insert()
    framewise.insert(0, 'filename', filename_no_ext)
    summary.insert(0, 'filename', filename_no_ext)

    # Store results for each file in each dataframe
    framewise_data = pd.concat([framewise_data, framewise], ignore_index=True)
    summary_data = pd.concat([summary_data, summary], ignore_index=True)

Let's take a look at the first few rows of data to make sure it's looking good:

In [None]:
summary_data.head()

Below, we will save this output as a .csv so we can further analyze and run statistical tests on the output.

In [None]:
output_dir = 'sample_data' # can change to a different output path if desired
output_filename = 'summary_data.csv'
output_csv_path = os.path.join(output_dir, output_filename)

summary_data.to_csv(output_csv_path, index = False)

### 1.2 - Speech characteristics

For this function, you should have installed either `openwillis` or `openwillis-transcribe` and `openwillis-speech`. 

In [None]:
# Remove the # in front of these libraries if you installed them individually and run this cell.
# Do not run this if you have alredy imported the full openwillis library.

# import openwillis.transcribe as owt
# import openwillis.speech as ows

#### 1.2a - Processing a single file

Now we will continue with the speech characteristics function. First, we will need to transcribe our file, here we are using the 'vosk' transcription function:

In [None]:
transcript_json, transcript_text = ow.speech_transcription_vosk(filepath = 'sample_data/audio_files/sentences_1_audio.wav')
# change to owt.speech_transcription_vosk if working from openwillis-transcribe instead of openwillis

Then we will pass the JSON file from the transcription function directly to the speech characteristics function:

In [None]:
words, turns, summary_sc = ow.speech_characteristics(json_conf = transcript_json, option = 'simple')
# change to ows.speech_characteristics if working from openwillis-speech instead of openwillis

In [None]:
# Examine summary data 

summary_sc

As a check, 'file_length' should be 16.53 seconds. We'll see some NaNs here as well for some of the more advaned linguistic features that don't apply to this sample. 

#### 1.2b - Processing multiple files

The below code will run the speech characteristics function on multiple files: 

In [None]:
folder_path = 'sample_data/audio_files'

word_data = pd.DataFrame()
turns_data = pd.DataFrame()
summary_sc_data = pd.DataFrame()

for filename in os.listdir(folder_path):
  if filename.endswith('.wav'):
    audio_path = os.path.join(folder_path, filename)

    # Transcribe
    transcript_json, transcript_text = ow.speech_transcription_vosk(filepath = audio_path)
    # change to owt.speech_transcription_vosk if working from openwillis-transcribe instead of openwillis  
    words, turns, summary_sc = ow.speech_characteristics(json_conf = transcript_json, option = 'simple')
    # change to ows.speech_characteristics if working from openwillis-speech instead of openwillis

    # Here, make sure we can identify each file by adding the name in the first column of the dataframe, remove '.wav' from the name
    filename_no_ext = os.path.splitext(filename)[0]

    # Add filename column as the first column using insert()
    words.insert(0, 'filename', filename_no_ext)
    turns.insert(0, 'filename', filename_no_ext)
    summary_sc.insert(0, 'filename', filename_no_ext)

    # Store results for each file in each dataframe
    word_data = pd.concat([word_data, words], ignore_index=True)
    turns_data = pd.concat([turns_data, turns], ignore_index=True)
    summary_sc_data = pd.concat([summary_sc_data, summary_sc], ignore_index=True)

In [None]:
summary_sc_data.head()

### __Workflow 2:__ Multiple speakers, vocal acoustics and speech characteristics

For this workflow, you should have installed either `openwillis` or `openwillis-transcribe` and `openwillis-speech`. The main difference here as compared to Workflow 1 is that we will first need to separate the clinician and the participant so we are only analyzing the participant's speech. 

In [None]:
# Remove the # in front of these libraries if you installed them individually and run this cell.
# Do not run this if you have alredy imported the full openwillis library.

# import openwillis.transcribe as owt
# import openwillis.speech as ows

### 2.1 - Separating speakers in the audio file

#### 2.1a - Processing a single file

First, we will transcribe the interview clip using the WhisperX model. The sample interviews are extracted from mock PANSS interviews, so we can specify this context in the code. This will help the model identify the 'participant' from the 'clinician' in the resulting separated audio files. 

__Note:__ the processing of this function is approximately proportionate to the file length. Here, we tried to keep the clips short (under 5 minutes) to move things along, but keep in mind this will take a few minutes, and keep this in mind for use in your own research. 

In [None]:
transcript_json, transcript_text = ow.speech_transcription_whisper(filepath = 'sample_data/audio_files/multiple_speakers/interview_2clip.wav', 
                                                                   model = 'tiny', 
                                                                   hf_token = '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', 
                                                                   compute_type = 'float32',
                                                                   context = 'panss')
# change to owt.speech_transcription_whisper if working from openwillis-transcribe instead of openwillis

The next step is to separate the audio for each speaker based on the transcript above and save them as different files for downstream processing. 

In [None]:
output_dir = 'sample_data/audio_files/multiple_speakers/' # change to your local directory where the files are stroed

speaker_dict = ow.speaker_separation_labels(filepath = 'sample_data/audio_files/multiple_speakers/interview_2clip.wav', transcript_json =  transcript_json)
# change to owt.speaker_separation_labels if working from openwillis-transcribe instead of openwillis

ow.to_audio(filepath = 'sample_data/audio_files/multiple_speakers/interview_2clip.wav', speaker_dict = speaker_dict, output_dir = output_dir)
# change to owt.to_audio if working from openwillis-transcribe instead of openwillis

Now, if you go to your local folder, you will see two new files: one labeled 'interview_2clip_clinician.wav' and 'interview_2clip_participant.wav'. 

From here, we can return to Workflow 1 where we will re-transcribe _just_ the participant recording, and then run vocal acoustics and speech characteristics to examine the speech features. 

In [None]:
words, turns, summary_sc = ow.speech_characteristics(json_conf = transcript_json, option = 'simple', speaker_label = 'participant')
# change to ows.speech_characteristics if working from openwillis-speech instead of openwillis

In [None]:
framewise, summary = ow.vocal_acoustics(audio_path = 'sample_data/audio_files/multiple_speakers/interview_2clip_participant.wav', voiced_segments = False, option = 'simple')
# change to owv.vocal_acoustics if working from openwillis-voice instead of openwillis

In [None]:
summary_sc

In [None]:
summary

### __Workflow 3:__ Single speaker, video processing

For this workflow, you should have installed either `openwillis` or `openwillis-face`

In [None]:
# Remove the # in front of this libraries if you installed them individually 
# Do not run this if you only installed the full openwillis library

# import openwillis.face as owf

### 3.1 - Facial expressivity

#### 3.1a - Processing a single file

From here, let's take a look at the video data. We'll start with just running facial expressivity on a single video file. 

The video used in this example is 18 seconds and estimated runtime of `facial_expressivity` is approximately the same. For longer videos, you should expect runtimes approximately proportionate to the video duration. 

In [None]:
framewise_loc, framewise_disp, summary_fe = ow.facial_expressivity(filepath = 'sample_data/video_files/sentences_1_video.mp4', baseline_filepath = 'sample_data/video_files/baseline_videos/sentences_1_bl_video.mp4')
# change to owf.facial_expressivity if working from openwillis-face instead of openwillis

We can look at the output in a couple of ways. We can look at the `framewise_disp` output to get a sense of displacement for each facial landmark at each frame. This dataframe contains quite a bit of data, so we can also look at the `summary_fe` output which will give us an overall displacement summary for each composite facial area.

In [None]:
framewise_disp.head()

In [None]:
summary_fe

Just for demonstration, if we don't include a baseline video, the displacement calculations will differ: 

In [None]:
framewise_loc_nobl, framewise_disp_nobl, summary_fe_nobl = ow.facial_expressivity(filepath = 'sample_data/video_files/sentences_1_video.mp4')
# change to owf.facial_expressivity if working from openwillis-face instead of openwillis

summary_fe_nobl

#### 3.1b - Processing multiple files

When running this function on multiple video files, make sure to match the video file to the baseline file using a subject identifier as demonstrated in the for loop below. 

In [None]:
folder_path = 'sample_data/video_files'
baseline_folder = 'sample_data/video_files/baseline_videos/'

frames_data = pd.DataFrame()
displacement_data = pd.DataFrame()
summary_fe_data = pd.DataFrame()

for filename in os.listdir(folder_path):
    if filename.endswith('.mp4'):
        video_path = os.path.join(folder_path, filename)
        
        # Extract identifier from filename (assuming a pattern like "person1_video.mp4")
        identifier = "_".join(filename.split("_")[:2])  
        baseline_filename = f"{identifier}_bl_video.mp4"  # Construct baseline filename
        baseline_filepath = os.path.join(baseline_folder, baseline_filename)
        
        # Run facial expressivity - this sample uses the same video as a baseline because the samples are of the same person
        framewise_loc, framewise_disp, summary_fe = ow.facial_expressivity(filepath = video_path, baseline_filepath = baseline_filepath)
        # change to owf.facial_expressivity if working from openwillis-face instead of openwillis
    
        # Here, make sure we can identify each file by adding the name in the first column of the dataframe, remove '.mp4' from the name
        filename_no_ext = os.path.splitext(filename)[0]

        # Add filename column as the first column using insert()
        framewise_loc.insert(0, 'filename', filename_no_ext)
        framewise_disp.insert(0, 'filename', filename_no_ext)
        summary_fe.insert(0, 'filename', filename_no_ext)

        # Store results for each file in each dataframe
        frames_data = pd.concat([frames_data, framewise_loc], ignore_index=True)
        displacement_data = pd.concat([displacement_data, framewise_disp], ignore_index=True)
        summary_fe_data = pd.concat([summary_fe_data, summary_fe], ignore_index=True)

In [None]:
summary_fe_data.head()

### 3.2 - Emotional expressivity

#### 3.2a - Processing a single file

When running the `emotional_expressivity` function, be aware that the runtime is considerably slower than for `facial_expressivity`. For the 18 second video, processing time is approximately 50 seconds. For longer videos, plan for a processing time of about 2.5x the file length. 

In [None]:
framewise_ee, summary_ee = ow.emotional_expressivity(filepath = 'sample_data/video_files/sentences_1_video.mp4', baseline_filepath = 'sample_data/video_files/baseline_videos/sentences_1_bl_video.mp4')
# change to owf.emotional_expressivity if working from openwillis-face instead of openwillis

In [None]:
summary_ee

Again for demonstration without a baseline video, the expressivity metrics will differ: 

In [None]:
framewise_ee_nobl, summary_ee_nobl = ow.emotional_expressivity(filepath = 'sample_data/video_files/sentences_1_video.mp4')
# change to owf.emotional_expressivity if working from openwillis-face instead of openwillis

summary_ee_nobl

#### 3.2b - Processing multiple files

In [None]:
folder_path = 'sample_data/video_files'
baseline_folder = 'sample_data/video_files/baseline_videos/'  

frames_ee_data = pd.DataFrame()
summary_ee_data = pd.DataFrame()

for filename in os.listdir(folder_path):
    if filename.endswith('.mp4'):
        video_path = os.path.join(folder_path, filename)
        
        # Extract identifier from filename (assuming a pattern like "person1_video.mp4")
        identifier = "_".join(filename.split("_")[:2])  
        baseline_filename = f"{identifier}_bl_video.mp4"  # Construct baseline filename
        baseline_filepath = os.path.join(baseline_folder, baseline_filename)

        # Run emotional expressivity - this sample uses a clip from the same video as a baseline because the samples are of the same person
        framewise_ee, summary_ee = ow.emotional_expressivity(filepath = video_path, baseline_filepath = baseline_filepath)
        # change to owf.emotional_expressivity if working from openwillis-face instead of openwillis
    
        # Here, make sure we can identify each file by adding the name in the first column of the dataframe, remove '.mp4' from the name
        filename_no_ext = os.path.splitext(filename)[0]

        # Add filename column as the first column using insert()
        framewise_ee.insert(0, 'filename', filename_no_ext)
        summary_ee.insert(0, 'filename', filename_no_ext)

        # Store results for each file in each dataframe
        frames_ee_data = pd.concat([frames_ee_data, framewise_ee], ignore_index=True)
        summary_ee_data = pd.concat([summary_ee_data, summary_ee], ignore_index=True)

In [None]:
summary_ee_data.head()

### 3.3 - Eye blink rate

#### 3.3a - Processing a single file: 

In [None]:
ear, blinks, summary = ow.eye_blink_rate(video = 'sample_data/video_files/sentences_1_video.mp4')
# change to owf.eye_blink_rate if working from openwillis-face instead of openwillis

In [None]:
summary

The above output tells us that there were 8 blinks in the recording and they occured at a rate of 27 blinks per minute (in this case, because the recording is only 18 seconds long, it is extrapolated to a full minute).

#### 3.3b - Processing multiple files: 

In [None]:
video_folder = 'sample_data/video_files' 

ear_data = pd.DataFrame()
blinks_data = pd.DataFrame()
summary_eb_data = pd.DataFrame()

for filename in os.listdir(video_folder):
    if filename.endswith(('.mp4')):
        video_path = os.path.join(video_folder, filename)

        ear, blinks, summary_eb = ow.eye_blink_rate(video=video_path)
        # change to owf.eye_blink_rate if working from openwillis-face instead of openwillis

        # Remove file extension from filename
        filename_no_ext = os.path.splitext(filename)[0]

        # Add filename column as the first column using insert()
        ear.insert(0, 'filename', filename_no_ext)
        blinks.insert(0, 'filename', filename_no_ext)
        summary_eb.insert(0, 'filename', filename_no_ext)

        # Store results for each file in each dataframe
        ear_data = pd.concat([ear_data, ear], ignore_index=True)
        blinks_data = pd.concat([blinks_data, blinks], ignore_index=True)
        summary_eb_data = pd.concat([summary_eb_data, summary_eb], ignore_index=True)

In [None]:
summary_eb_data