<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">


# DSI-SG-42 Capstone Project:

### FeelFlow AI: Decoding Emotions, Advancing Patient Support

---

### **Background**

In Singapore, the urgency to address mental health issues among younger generations, particularly GenZ and millennials, is critical due to increasing pressures from work, school, and personal relationships leading to anxiety, depression, and substance abuse. Recognizing this, the Ministry of Health and AI Singapore (NUS) have initiated the "Mental Health with AI" Seminar to integrate AI technologies with clinical practices, enhancing therapeutic processes.

The aims of this study is to develop a real-time emotion predictor app. The objective is to alleviate the layer of assessing patients' emotional well-being, which is crucial in enabling a more accurate diagnosis and treatment from. The app is in its beta stages, but seeks to be presented at the seminar for. Further discussions to adoption and integration into pre-existing app/softwares can be opened during this seminar. 

### **Problem Statement**
##### *Where discerning people’s emotion can sometimes be an unnerving guessing game. How can clinicians use speech emotion recognition technology to accurately assess patients' emotional well-being, thereby improving diagnosis and treatment outcomes?*

### **Table of Contents**

### 1. [Scraping (YouTube audio)](#scraping-youtube-audio)
   #### 1.1 [Scrape the YouTube from video](#scrape-the-youtube-from-video)
   #### 1.2 [Extract audio clips into chunks](#extract-audio-clips-into-chunks)
   ##### 1.2.1 [Splitting Data (The Family Man)](#splitting-data-the-family-man)
   ##### 1.2.2 [Splitting Data (Once Suicidal, Now I Help Other Youths with Depression)](#splitting-data-once-suicidal-now-i-help-other-youths-with-depression)
   ##### 1.2.3 [Splitting Data (I Tried to Take My Life at 16)](#splitting-data-i-tried-to-take-my-life-at-16)
   ##### 1.2.4 [Splitting Data (Gen Z & Millennials' Verdict on Therapy)](#splitting-data-gen-z-millennials-verdict-on-therapy)
   ##### 1.2.5 [Splitting Data (People With Mental Health Conditions)](#splitting-data-people-with-mental-health-conditions)
   ##### 1.2.6 [Splitting Data (3 Strangers Talk About Depression And Mental Health)](#splitting-data-3-strangers-talk-about-depression-and-mental-health)
   ##### 1.2.7 [Splitting Data (College Students in Singapore Open Up About Anxiety)](#splitting-data-college-students-in-singapore-open-up-about-anxiety)
   ##### 1.2.8 [Splitting Data (Singaporeans Try - Therapy (Mental Health Special))](#splitting-data-singaporeans-try-therapy-mental-health-special)
   #### 1.3 [Compile all clips into 1 single folder](#compile-all-clips-into-1-single-folder)

## **1. Scraping (YouTube audio)**<a id='scraping-youtube-audio'></a>

##### *Note: It is advisable to run this notebook on Python 3.9.6.*

A total of 8 videos were considered for this video. We scrutinized the video to only include segments where conversations on the subject's sharing of their mental health journey/experiences are included. In sharing, some of them relive the pain and emotions, which we can then further analyse as part of data.

1) [The Family Man](https://www.youtube.com/watch?v=M9jUD3t3WkA):
    This is a snippet of a longer docuseries by Channel NewsAsia, detailing a middle-aged man's journey in facing depression.

    When you seem to have it all - supportive wife, 2 kids, a career - but: "I struggle to come up with  any moment when I can remember being happy". Depression nearly cost 38-year-old Mak Kean Loong everything. Now he's sharing his story to help others understand better, though he's been warned he'll be "marked".

2) [Once Suicidal, Now I Help Other Youths with Depression](https://www.youtube.com/watch?v=bmHbagXzius&t=244s):
    Mocked in secondary school for his voice and the way he looked, Ryan attempted suicide by drowning himself. When his bullies found out, they asked, "Why aren't you dead yet?"

    Years of bullying made him fear going to school, and when he sought help from family members and the school counsellor for depression, they deemed him "too sensitive".

    The turning point came when he started to attend polytechnic, and a lecturer made him the class chairperson. "She said that no matter what happened to me in the past, she still thinks I have the potential to do well...  That was when the sudden realisation came in... I have the power to rewrite my narrative."

3) [I Tried to Take My Life at 16](https://www.youtube.com/watch?v=5b3rkc8Fa_w): On this episode of Can Ask Meh?, Valerie talks about her road to recovery after she tried to take her life at 16. While it hasn't been a smooth journey, she has since found things that make life worth living for. 

4) [Gen Z & Millennials' Verdict on Therapy](https://www.youtube.com/watch?v=5c8F1AQgA2I&t=180s): The stark spotlight on mental health awareness has inevitably created a dichotomy: those who readily embrace professional help, and those who remain cynical about it. This video by RICE Media opens the topic on therapy, talking to loved ones about it, and the mental health space in Singapore. There were respondents in video who were seemingly neutral-sounding in this video, which also renders as varse data.

5) [People With Mental Health Conditions](https://www.youtube.com/watch?v=Eelae4tilBE&t=9s): “Does it run in the genes?”, “Are you dangerous and unpredictable?”. On this episode of Can Ask Meh, Rachel, Wei Chong, Sumaiyah and Desmond answer the tough questions you have for people with mental health conditions in a candid fashion.

6) [3 Strangers Talk About Depression And Mental Health](https://www.youtube.com/watch?v=SbSHA_nzDrQ): The Smart Local episode on a sharing of depression and mental health with 3 strangers. Do note that one of the respondents in the video is a repeat from . This is good as it provides "control" if we're looking to analyse his speech emotion variation.

7) [College Students in Singapore Open Up About Anxiety](https://www.youtube.com/watch?v=hUaBbR5EwX4): A then-student local Singaporean YouTuber shares his story about anxiety, and others in his space that fought the similar battles surrounding academic stresses. They also shared what helped them cope during the period, and the importance of mental health.

8) [Singaporeans Try - Therapy (Mental Health Special)](https://www.youtube.com/watch?v=PTFizkVqfk4&t=1109s): In this special episode of Singaporeans Try, three TSL members tell their stories on camera for the first time to a professional therapist and discuss personal issues such as self-esteem, depression, anxiety, relationships and even suicide.

### Importing Libraries

In [2]:
# pip install -r ../requirements_3.9.txt

Usage: __main__.py [options]

[31mERROR: Invalid requirement: --no-cache-dir yt-dlp
__main__.py: error: no such option: --no-cache-dir
[0m
You should consider upgrading via the '/Users/amoz/.pyenv/versions/3.9.6/bin/python -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


In [1]:
import os
import yt_dlp
import sys
import ffmpeg
import noisereduce as nr
import numpy as np

from moviepy.editor import AudioFileClip
from spleeter.separator import Separator
from pydub import AudioSegment

  from .autonotebook import tqdm as notebook_tqdm


### **1.1 Scrape the YouTube audio from video**<a id='scrape-the-youtube-from-video'></a>

In [None]:
def robust_download(url, file_name_prefix, max_retries=3):
    ydl_opts = {
        'format': 'bestaudio/best',
        'postprocessors': [{
            'key': 'FFmpegExtractAudio',
            'preferredcodec': 'wav',
            'preferredquality': '192',
        }],
        'outtmpl': os.path.join('/Users/amoz/GA/DSI-SG-42-Capstone-FeelFlowAI/dataset/YouTube (pre-split)', f'{file_name_prefix}.%(ext)s'),
    }

    for attempt in range(max_retries):
        try:
            with yt_dlp.YoutubeDL(ydl_opts) as ydl:
                ydl.download([url])
                return os.path.join('/Users/amoz/GA/DSI-SG-42-Capstone-FeelFlowAI/dataset/YouTube (pre-split)', f'{file_name_prefix}.wav')
        except Exception as e:
            print(f"Attempt {attempt + 1} failed due to {e}, retrying...")
            if attempt + 1 < max_retries:
                continue
            else:
                raise RuntimeError("Failed to download video after multiple attempts")

def download_audio(url, file_name_prefix, start_time=None, end_time=None):
    target_path = robust_download(url, file_name_prefix)
    if not target_path:
        print("Download failed for URL:", url)
        return  # Skip processing if the download failed

    try:
        audio_clip = AudioFileClip(target_path)
        if start_time is not None or end_time is not None:
            audio_clip = audio_clip.subclip(start_time, end_time)
        
        # Convert audio to numpy array
        audio_np = audio_clip.to_soundarray(nbytes=2, fps=22050)
        # Apply noise reduction
        reduced_noise_audio = nr.reduce_noise(y=audio_np, sr=22050)
        
        # Ensure reduced noise audio is two-dimensional for stacking
        if reduced_noise_audio.ndim == 1:
            reduced_noise_audio = reduced_noise_audio[np.newaxis, :]

        # Convert back to an audio clip and save
        final_audio = AudioFileClip(reduced_noise_audio.T, fps=22050)
        final_path = os.path.join('/Users/amoz/GA/DSI-SG-42-Capstone-FeelFlowAI/dataset/YouTube (pre-split)', f"{file_name_prefix}.wav")
        final_audio.write_audiofile(final_path, codec='pcm_s16le')
        final_audio.close()
    except Exception as e:
        print(f"Failed processing audio from {url}. Error: {e}")
    finally:
        if os.path.exists(target_path):
            os.remove(target_path)  # Clean up downloaded file

if __name__ == "__main__":
    videos = {
        "https://www.youtube.com/watch?v=SbSHA_nzDrQ": [(30, 11*60+22)],
        "https://www.youtube.com/watch?v=5b3rkc8Fa_w": [(None, None)],
        "https://www.youtube.com/watch?v=PTFizkVqfk4&t=1109s": [(3*60+47, None)],
        "https://www.youtube.com/watch?v=5c8F1AQgA2I&t=180s": [(None, None)],
        "https://www.youtube.com/watch?v=Eelae4tilBE&t=9s": [(None, None)],
        "https://www.youtube.com/watch?v=bmHbagXzius&t=244s": [(None, None)],
        "https://www.youtube.com/watch?v=hUaBbR5EwX4": [(2*60+30, 21*60+3)],
        "https://www.youtube.com/watch?v=M9jUD3t3WkA": [(0, 2*60+45), (3*60+13, 5*60+51)]
    }
    for url, times in videos.items():
        file_name_prefix = url.split("watch?v=")[-1]
        for start, end in times:
            download_audio(url, file_name_prefix, start_time=start, end_time=end)

[youtube] Extracting URL: https://www.youtube.com/watch?v=SbSHA_nzDrQ
[youtube] SbSHA_nzDrQ: Downloading webpage
[youtube] SbSHA_nzDrQ: Downloading ios player API JSON
[youtube] SbSHA_nzDrQ: Downloading android player API JSON




[youtube] SbSHA_nzDrQ: Downloading m3u8 information
[info] SbSHA_nzDrQ: Downloading 1 format(s): 251
[download] Destination: /Users/amoz/GA/DSI-SG-42-Capstone-FeelFlowAI/dataset/YouTube (pre-split)/SbSHA_nzDrQ.webm
[download] 100% of   10.93MiB in 00:00:00 at 16.67MiB/s    
[ExtractAudio] Destination: /Users/amoz/GA/DSI-SG-42-Capstone-FeelFlowAI/dataset/YouTube (pre-split)/SbSHA_nzDrQ.wav
Deleting original file /Users/amoz/GA/DSI-SG-42-Capstone-FeelFlowAI/dataset/YouTube (pre-split)/SbSHA_nzDrQ.webm (pass -k to keep)


: 

### **1.2 Extract audio clips into chunks**<a id='extract-audio-clips-into-chunks'></a>

The audio splitter (`spleeter`) is not able to extract audio from the video chunks earlier. An example of a chunk in the case of the below example is `The Family Man pt1.2.wav`. Each file size extracted can only be a maximum of 345KB, hence each chunk might have multiple audio files.

#### **1.2.1 Splitting Data (The Family Man)**<a id='splitting-data-the-family-man'></a>

In [None]:
def separate_audio(file_path, output_path):
    separator = Separator('spleeter:2stems')
    separator.separate_to_file(file_path, output_path)

if __name__ == "__main__":
    files = [
        "The Family Man pt1.2.wav",
        "The Family Man pt1.3.wav",
        "The Family Man pt1.4.wav",
        "The Family Man pt2.wav",
        "The Family Man pt3.wav"
    ]
    input_folder = "../dataset/YouTube (pre-split)"
    output_folder = "../dataset/YouTube"

    for file in files:
        file_path = os.path.join(input_folder, file)
        separate_audio(file_path, output_folder)

#### **1.2.2 Splitting Data (Once Suicidal, Now I Help Other Youths with Depression)**<a id='splitting-data-once-suicidal-now-i-help-other-youths-with-depression'></a>

In [None]:
def separate_audio(file_path, output_path):
    separator = Separator('spleeter:2stems')
    separator.separate_to_file(file_path, output_path)

if __name__ == "__main__":
    files = [
        "Once Suicidal, Now I Help Other Youths With Depression pt1.wav",
        "Once Suicidal, Now I Help Other Youths With Depression pt2.wav"
    ]
    input_folder = "/Users/moshimozzie/GA/personal/capstone/dataset/YouTube"
    output_folder = "/Users/moshimozzie/GA/personal/capstone/dataset/YouTube"

    for file in files:
        file_path = os.path.join(input_folder, file)
        separate_audio(file_path, output_folder)

#### **1.2.3 Splitting Data (I Tried to Take My Life at 16)**<a id='splitting-data-i-tried-to-take-my-life-at-16'></a>

In [None]:
def separate_audio(file_path, output_path):
    separator = Separator('spleeter:2stems')
    separator.separate_to_file(file_path, output_path)

if __name__ == "__main__":
    files = [
        "I Tried to Take My Life at 16 | Can Ask Meh pt1.wav",
        "I Tried to Take My Life at 16 | Can Ask Meh pt2.wav"
    ]
    input_folder = "/Users/moshimozzie/GA/personal/capstone/dataset/YouTube"
    output_folder = "/Users/moshimozzie/GA/personal/capstone/dataset/YouTube"

    for file in files:
        file_path = os.path.join(input_folder, file)
        separate_audio(file_path, output_folder)

#### **1.2.4 Splitting Data (Gen Z & Millennials' Verdict on Therapy)**<a id='splitting-data-gen-z-millennials-verdict-on-therapy'></a>

In [None]:
def separate_audio(file_path, output_path):
    separator = Separator('spleeter:2stems')
    separator.separate_to_file(file_path, output_path)

if __name__ == "__main__":
    files = [
        "Gen Z & Millennials' Verdict on Therapy | Singapore, Unfiltered.wav"
    ]
    input_folder = "/Users/moshimozzie/GA/personal/capstone/dataset/YouTube"
    output_folder = "/Users/moshimozzie/GA/personal/capstone/dataset/YouTube"

    for file in files:
        file_path = os.path.join(input_folder, file)
        separate_audio(file_path, output_folder)

#### **1.2.5 Splitting Data (People With Mental Health Conditions)**<a id='splitting-data-people-with-mental-health-conditions'></a>

In [None]:
def separate_audio(file_path, output_path):
    separator = Separator('spleeter:2stems')
    separator.separate_to_file(file_path, output_path)

if __name__ == "__main__":
    files = [
        "People With Mental Health Conditions | Can Ask Meh.wav"
    ]
    input_folder = "/Users/moshimozzie/GA/personal/capstone/dataset/YouTube"
    output_folder = "/Users/moshimozzie/GA/personal/capstone/dataset/YouTube"

    for file in files:
        file_path = os.path.join(input_folder, file)
        separate_audio(file_path, output_folder)

#### **1.2.6 Splitting Data (3 Strangers Talk About Depression And Mental Health)**<a id='splitting-data-3-strangers-talk-about-depression-and-mental-health'></a>

In [None]:
def separate_audio(file_path, output_path):
    separator = Separator('spleeter:2stems')
    separator.separate_to_file(file_path, output_path)

if __name__ == "__main__":
    files = [
        "3 Strangers Talk About Depression And Mental Health.wav"
    ]
    input_folder = "/Users/moshimozzie/GA/personal/capstone/dataset/YouTube"
    output_folder = "/Users/moshimozzie/GA/personal/capstone/dataset/YouTube"

    for file in files:
        file_path = os.path.join(input_folder, file)
        separate_audio(file_path, output_folder)

#### **1.2.7 Splitting Data (College Students in Singapore Open Up About Anxiety)**<a id='splitting-data-college-students-in-singapore-open-up-about-anxiety'></a>

In [None]:
def separate_audio(file_path, output_path):
    separator = Separator('spleeter:2stems')
    separator.separate_to_file(file_path, output_path)

if __name__ == "__main__":
    files = [
        "College Students in Singapore Open Up About Anxiety.wav"
    ]
    input_folder = "/Users/moshimozzie/GA/personal/capstone/dataset/YouTube"
    output_folder = "/Users/moshimozzie/GA/personal/capstone/dataset/YouTube"

    for file in files:
        file_path = os.path.join(input_folder, file)
        separate_audio(file_path, output_folder)

#### **1.2.8 Splitting Data (Singaporeans Try - Therapy (Mental Health Special))**<a id='splitting-data-singaporeans-try-therapy-mental-health-special'></a>

In [None]:
# Split audio files into 2 stems (data - Singaporeans Try - Therapy (Mental Health Special))
def separate_audio(file_path, output_path):
    separator = Separator('spleeter:2stems')
    separator.separate_to_file(file_path, output_path)

if __name__ == "__main__":
    files = [
        "Singaporeans Try - Therapy (Mental Health Special) pt1.wav",
        "Singaporeans Try - Therapy (Mental Health Special) pt2.wav"
    ]
    input_folder = "/Users/moshimozzie/GA/personal/capstone/dataset/YouTube"
    output_folder = "/Users/moshimozzie/GA/personal/capstone/dataset/YouTube"

    for file in files:
        file_path = os.path.join(input_folder, file)
        separate_audio(file_path, output_folder)

### **1.3 Compile all clips into 1 single folder**<a id='compile-all-clips-into-1-single-folder'></a>

Thereafter the Extraction and Splitting of Data, we have about 2,041 2-sec audio files that we can use as data for prediction.

In [None]:
# Compile all the above segmented audio in a folder

def process_audio_files(input_folder, output_folder):
    # Ensure output folder exists
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    # List all files in the input directory
    for filename in os.listdir(input_folder):
        if filename.endswith(".wav"):  # Check if the file is a WAV file
            path = os.path.join(input_folder, filename)
            audio = AudioSegment.from_wav(path)
            
            # Calculate the duration in milliseconds of the 2-second chunks
            chunk_length_ms = 2000
            num_chunks = len(audio) // chunk_length_ms
            
            # Split the audio and save each chunk
            for i in range(num_chunks):
                start_ms = i * chunk_length_ms
                end_ms = start_ms + chunk_length_ms
                chunk = audio[start_ms:end_ms]
                chunk.export(os.path.join(output_folder, f"{filename[:-4]}_chunk_{i+1}.wav"), format="wav")
    

input_folder = '/Users/moshimozzie/GA/personal/capstone/dataset/YouTube (pre-split)'
output_folder = '/Users/moshimozzie/GA/personal/capstone/dataset/YouTube'
process_audio_files(input_folder, output_folder)