## **Urdu Speech Emotion Classification**

[Click here to read blog ](https://medium.com/@rameeshamalik.143/building-an-urdu-speech-emotion-detection-with-deepgram-transformers-71f0a7f2e216)

# Step 1: Install Required Libraries


1.   **Deepgram SDK:** Converts speech to text.
2.   **Transformers:** Provides pre-trained models for text classification.
3.   **Datasets:** Helps manage and preprocess text data.
4.   **Streamlit:** Makes it easy to build web apps.
5. **Pyngrok:** Deploys the app online.

In [None]:
!pip install deepgram-sdk transformers datasets streamlit pyngrok soundfile


Collecting deepgram-sdk
  Downloading deepgram_sdk-3.7.7-py3-none-any.whl.metadata (13 kB)
Collecting datasets
  Downloading datasets-3.1.0-py3-none-any.whl.metadata (20 kB)
Collecting streamlit
  Downloading streamlit-1.40.2-py2.py3-none-any.whl.metadata (8.4 kB)
Collecting pyngrok
  Downloading pyngrok-7.2.1-py3-none-any.whl.metadata (8.3 kB)
Collecting websockets<14.0,>=12.0 (from deepgram-sdk)
  Downloading websockets-13.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.8 kB)
Collecting dataclasses-json>=0.6.3 (from deepgram-sdk)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting aiofiles>=23.2.1 (from deepgram-sdk)
  Downloading aiofiles-24.1.0-py3-none-any.whl.metadata (10 kB)
Collecting aenum>=3.1.0 (from deepgram-sdk)
  Downloading aenum-3.1.15-py3-none-any.whl.metadata (3.7 kB)
Collecting deprecation>=2.1.0 (from deepgram-sdk)
  Downloading deprecation-2.1.0-py2.py3-none-any.whl.metada

# Step 2: Load Your Dataset

In [None]:
from google.colab import drive
import os

# Mount Google Drive
drive.mount('/content/drive')

# Define the dataset path
dataset_path = "/content/drive/MyDrive/NLP/Q1/URDU-Dataset-master"
print("Dataset Path:", dataset_path)

# Check folders
folders = os.listdir(dataset_path)
print("Subfolders:", folders)  #  show data according to folders in drive ['Sad', 'Neutral', 'Happy', 'Angry']


Mounted at /content/drive
Dataset Path: /content/drive/MyDrive/NLP/Q1/URDU-Dataset-master
Subfolders: ['README.md', 'Happy', 'Sad', 'Angry', 'Neutral']


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# Step 3: Prepare the Dataset

In [None]:
import os

# Define emotion labels as numbers
emotion_map = {
    "Neutral": 0,
    "Sad": 1,
    "Angry": 2,
    "Happy": 3,
}

# Prepare dataset list
dataset = []

for emotion, label in emotion_map.items():
    folder_path = os.path.join(dataset_path, emotion)
    if os.path.isdir(folder_path):
        for file in os.listdir(folder_path):
            if file.endswith(".wav"):
                dataset.append({
                    "audio_path": os.path.join(folder_path, file),
                    "label": label,
                })

# Check dataset size
print("Number of samples:", len(dataset))


Number of samples: 400


# Step 4: Transcribe Speech with Deepgram

**Setup Deepgram**

In [None]:
!pip install deepgram-sdk==2.12.0 # Revert to deepgram-sdk version 2.12.0

import asyncio
from deepgram import Deepgram # Import Deepgram after installing the correct version

# Initialize Deepgram API
DEEPGRAM_API_KEY = "4f6f3f1686dd74ab520e485fb1edaf651750999e"
dg_client = Deepgram(DEEPGRAM_API_KEY)

Collecting deepgram-sdk==2.12.0
  Downloading deepgram_sdk-2.12.0-py3-none-any.whl.metadata (27 kB)
Downloading deepgram_sdk-2.12.0-py3-none-any.whl (25 kB)
Installing collected packages: deepgram-sdk
  Attempting uninstall: deepgram-sdk
    Found existing installation: deepgram-sdk 3.7.7
    Uninstalling deepgram-sdk-3.7.7:
      Successfully uninstalled deepgram-sdk-3.7.7
Successfully installed deepgram-sdk-2.12.0


**Transcribe Audio Files**

In [None]:
# Function to transcribe an audio file
async def transcribe_file(file_path):
    with open(file_path, 'rb') as audio:
        source = {'buffer': audio, 'mimetype': 'audio/wav'}
        response = await dg_client.transcription.prerecorded(source, {'punctuate': True})
    transcript = response.get('results', {}).get('channels', [{}])[0].get('alternatives', [{}])[0].get('transcript', '')
    return transcript

# Batch transcription
async def transcribe_dataset(dataset):
    for sample in dataset:
        transcript = await transcribe_file(sample["audio_path"])
        sample["transcript"] = transcript
    return dataset

import nest_asyncio
nest_asyncio.apply()

loop = asyncio.get_event_loop()
dataset = loop.run_until_complete(transcribe_dataset(dataset))

print("First Transcription:", dataset[0])

First Transcription: {'audio_path': '/content/drive/MyDrive/NLP/Q1/URDU-Dataset-master/Neutral/SM5_F20_N048.wav', 'label': 0, 'transcript': ''}


# Step 5: Fine-Tune a Model for Emotion Detection

**Preprocess Data**

In [None]:
from datasets import Dataset

# Prepare text and label lists
texts = [sample["transcript"] for sample in dataset if "transcript" in sample]
labels = [sample["label"] for sample in dataset if "transcript" in sample]

# Create Hugging Face Dataset
text_dataset = Dataset.from_dict({"text": texts, "label": labels})
print(text_dataset)


Dataset({
    features: ['text', 'label'],
    num_rows: 400
})


In [None]:
from sklearn.model_selection import train_test_split

# Prepare texts and labels
texts = [sample["transcript"] for sample in dataset if "transcript" in sample]
labels = [sample["label"] for sample in dataset if "transcript" in sample]

# Split data into training and testing sets
train_texts, test_texts, train_labels, test_labels = train_test_split(texts, labels, test_size=0.2, random_state=42)

print(f"Training samples: {len(train_texts)}, Testing samples: {len(test_texts)}")

Training samples: 320, Testing samples: 80


In [None]:
from transformers import pipeline

# Load pre-trained model for
classifier = pipeline("text-classification",  model="distilroberta-base")

#  classification
results = []
for text in test_texts:
    result = classifier(text)
    results.append(result[0]["label"])

# Compare with actual labels
print(" Results:", results[:5])
print("Actual Labels:", test_labels[:5])


config.json:   0%|          | 0.00/480 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/331M [00:00<?, ?B/s]

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

 Results: ['LABEL_1', 'LABEL_1', 'LABEL_1', 'LABEL_1', 'LABEL_1']
Actual Labels: [2, 2, 0, 2, 0]


**Train the Model**

In [None]:
import torch
torch.cuda.empty_cache()


In [None]:
# Preprocess the text data for fine-tuning the model
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments

# Tokenizer and model setup
model_name = "distilroberta-base"  # You can use any other model from Hugging Face if needed
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Add padding token if not present
if tokenizer.pad_token is None:
    tokenizer.add_special_tokens({'pad_token': '[PAD]'})

model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=4 , ignore_mismatched_sizes=True, pad_token_id=tokenizer.pad_token_id)  # 4 labels for the 4 emotions

# Tokenize datasets
def preprocess(batch):
    return tokenizer(batch["text"], padding=True, truncation=True)

# Create Hugging Face Datasets for fine-tuning
train_dataset = Dataset.from_dict({"text": train_texts, "label": train_labels})
test_dataset = Dataset.from_dict({"text": test_texts, "label": test_labels})

train_dataset = train_dataset.map(preprocess, batched=True)
test_dataset = test_dataset.map(preprocess, batched=True)

# Training Arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    num_train_epochs=1,
    per_device_train_batch_size=1,  # Reduced batch size from to 4
    gradient_accumulation_steps=4, # Added gradient accumulation
    save_steps=10,
    save_total_limit=2,
    logging_dir="./logs",
    report_to="none"
)

# Train setup
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
)

# model traing
trainer.train()

# Save fine-tuned model
model.save_pretrained("/content/drive/MyDrive/NLP/fine_tuned_urdu_emotion_model") # chnage accrding to ur drive path
tokenizer.save_pretrained("/content/drive/MyDrive/NLP/fine_tuned_urdu_emotion_model") # chnage accrding to ur drive path

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/80 [00:00<?, ? examples/s]



Epoch,Training Loss,Validation Loss
1,No log,1.393691


('/content/drive/MyDrive/NLP/fine_tuned_urdu_emotion_model/tokenizer_config.json',
 '/content/drive/MyDrive/NLP/fine_tuned_urdu_emotion_model/special_tokens_map.json',
 '/content/drive/MyDrive/NLP/fine_tuned_urdu_emotion_model/vocab.json',
 '/content/drive/MyDrive/NLP/fine_tuned_urdu_emotion_model/merges.txt',
 '/content/drive/MyDrive/NLP/fine_tuned_urdu_emotion_model/added_tokens.json',
 '/content/drive/MyDrive/NLP/fine_tuned_urdu_emotion_model/tokenizer.json')

# Step 6: Build the Web App

In [None]:
from pyngrok import ngrok

# Set your Ngrok authtoken
ngrok.set_auth_token("2pvISi6j0y2gnXbZq8QritKhuxY_53F9otpAWWN2L8xFmhjAv")




In [None]:
%%writefile app.py
import streamlit as st
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
import asyncio
from deepgram import Deepgram
import nest_asyncio
import wave
import os

# Apply nest_asyncio
nest_asyncio.apply()

# Set Deepgram API key
DEEPGRAM_API_KEY = "4f6f3f1686dd74ab520e485fb1edaf651750999e"
dg_client = Deepgram(DEEPGRAM_API_KEY)

# Load fine-tuned model and tokenizer
model_name = "/content/drive/MyDrive/NLP/fine_tuned_urdu_emotion_model"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Debugging function to check audio file properties
def debug_audio_file(file_path):
    try:
        with wave.open(file_path, 'rb') as wf:
            st.write(f"Channels: {wf.getnchannels()}")
            st.write(f"Sample Width: {wf.getsampwidth()}")
            st.write(f"Frame Rate: {wf.getframerate()}")
            st.write(f"Frame Count: {wf.getnframes()}")
    except wave.Error as e:
        st.error(f"Error in audio file: {e}")

# Function to transcribe using Deepgram
async def transcribe_file(file_path):
    with open(file_path, "rb") as audio:
        source = {"buffer": audio, "mimetype": "audio/wav"}
        try:
            response = await dg_client.transcription.prerecorded(
                source,
                {"punctuate": True, "language": "ur"}
            )
            st.write("Full Deepgram Response:", response)  # Log the response for debugging
            transcript = (
                response.get("results", {})
                .get("channels", [{}])[0]
                .get("alternatives", [{}])[0]
                .get("transcript", None)
            )
            return transcript
        except Exception as e:
            st.error(f"Error during transcription: {e}")
            return None

# Streamlit app UI
st.title("Urdu Emotion Classification from Speech")

# Upload audio file
uploaded_file = st.file_uploader("Upload a speech file", type=["wav"])

if uploaded_file is not None:
    # Save and debug the uploaded file
    temp_path = "temp_audio.wav"
    with open(temp_path, "wb") as f:
        f.write(uploaded_file.getbuffer())
    st.audio(temp_path, format="audio/wav")

    # Debug the audio file
    debug_audio_file(temp_path)

    # Transcribe the file
    transcript = asyncio.run(transcribe_file(temp_path))

    if transcript:
        st.write("Transcribed Text: ", transcript)

        # Use the model to predict the emotion
        emotion_classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
        result = emotion_classifier(transcript)

        # Map the label to emotion
        emotion_map = {
            0: "Neutral",
            1: "Sad",
            2: "Angry",
            3: "Happy"
        }

        try:
            label_id = int(result[0]['label'].replace("LABEL_", ""))
            emotion_label = emotion_map.get(label_id, "Unknown")
            st.write(f"Predicted Emotion: {emotion_label}")
        except Exception as e:
            st.error(f"Error in processing model output: {e}")
    else:
        st.error("Failed to transcribe the audio.")


Writing app.py


# Step 7: Run the App

In [None]:
from pyngrok import ngrok

# Set up a tunnel to the Streamlit app
public_url = ngrok.connect('8501')
print(f"Streamlit app is live at {public_url}")


Streamlit app is live at NgrokTunnel: "https://2ec0-34-172-14-143.ngrok-free.app" -> "http://localhost:8501"


In [None]:
!streamlit run app.py &  # The '&' runs the command in the background



Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.
[0m
[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Local URL: [0m[1mhttp://localhost:8501[0m
[34m  Network URL: [0m[1mhttp://172.28.0.12:8501[0m
[34m  External URL: [0m[1mhttp://34.172.14.143:8501[0m
[0m
2024-12-09 11:02:13.453210: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-12-09 11:02:13.495768: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-12-09 11:02:13.507301: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-12-09 11:02:20.504 Ex