#Audio Speech Sentiment


https://www.kaggle.com/datasets/imsparsh/audio-speech-sentiment/data



In [2]:
import pandas as pd
import librosa
import speech_recognition as sr
from transformers import T5ForConditionalGeneration, T5Tokenizer
from tqdm.auto import tqdm
# Initialize tqdm for pandas apply()
tqdm.pandas()



### **Loading the Dataset**

Load your dataset to get a DataFrame that includes the paths to your audio files and their corresponding sentiments.

In [3]:
trainPath = 'archive2/TRAIN/'
df_base = pd.read_csv('archive2/TRAIN.csv')
df_base['full_path'] = df_base['Filename'].apply(lambda x: trainPath + x)

df_base.head()

Unnamed: 0,Filename,Class,full_path
0,346.wav,Negative,archive2/TRAIN/346.wav
1,163.wav,Neutral,archive2/TRAIN/163.wav
2,288.wav,Negative,archive2/TRAIN/288.wav
3,279.wav,Negative,archive2/TRAIN/279.wav
4,244.wav,Negative,archive2/TRAIN/244.wav


### **Audio to Text Conversion**

Convert the audio files to text using the SpeechRecognition library. This process can be time-consuming, especially for large datasets, and its accuracy depends on the quality of the audio and the clarity of speech.

In [4]:
# Assuming df_base is your DataFrame and it has a 'full_path' column with audio file paths
recognizer = sr.Recognizer()

def audio_to_text(path):
    try:
        with sr.AudioFile(path) as source:
            audio_data = recognizer.record(source)
            text = recognizer.recognize_google(audio_data)
            return text, False  # Text and a flag indicating no error
    except (sr.UnknownValueError, sr.RequestError, ValueError) as e:
        return "Error: " + str(e), True  # Indicate an error occurred

# Apply the function with progress tracking
# The result is a DataFrame with two columns from the tuple returned by audio_to_text
df_base[['transcript', 'error']] = df_base['full_path'].progress_apply(lambda x: pd.Series(audio_to_text(x)))

# Filter out the errors if necessary
df_base_clean = df_base[df_base['error'] == False].drop(columns=['error'])


  0%|          | 0/250 [00:00<?, ?it/s]

In [12]:
df_base_clean.head()

Unnamed: 0,Filename,Class,full_path,transcript,formatted_text
0,346.wav,Negative,archive2/TRAIN/346.wav,how dare you say that,classify sentiment: how dare you say that
1,163.wav,Neutral,archive2/TRAIN/163.wav,you can do a lot better than this,classify sentiment: you can do a lot better th...
2,288.wav,Negative,archive2/TRAIN/288.wav,no one likes you,classify sentiment: no one likes you
3,279.wav,Negative,archive2/TRAIN/279.wav,you should be punished for this,classify sentiment: you should be punished for...
4,244.wav,Negative,archive2/TRAIN/244.wav,you do not have common,classify sentiment: you do not have common


### **Prepare Text Data for T5**

Format the transcripts as input for the T5 model. You might consider adding a prefix like `"classify sentiment:"` to each text to make it explicit that you're asking the model to classify the sentiment.

In [5]:
df_base_clean['formatted_text'] = "classify sentiment: " + df_base_clean['transcript']


### **Load and Setup T5 Model**

Ensure you have the `transformers` library installed, and then load the T5 model along with its tokenizer. You can choose a model size that balances between performance and computational efficiency, such as `t5-small`.

In [6]:
from transformers import T5ForConditionalGeneration, T5Tokenizer

tokenizer = T5Tokenizer.from_pretrained('t5-small')
model = T5ForConditionalGeneration.from_pretrained('t5-small')


You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


### **Sentiment Analysis with T5**

Iterate over the formatted text and use the T5 model to predict the sentiment. Given the potentially large number of texts, consider batching this operation or using a subset of data to test your setup first.

In [7]:
def predict_sentiment(text):
    input_ids = tokenizer.encode(text, return_tensors="pt")
    outputs = model.generate(input_ids)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example: applying the prediction in batches or on a subset
# For large datasets, consider using DataLoader from torch.utils.data or similar batching techniques
subset_df = df_base_clean.head(20)  # Example: working with a small subset for testing
subset_df['sentiment'] = subset_df['formatted_text'].apply(predict_sentiment)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  subset_df['sentiment'] = subset_df['formatted_text'].apply(predict_sentiment)


In [11]:
subset_df.head()

Unnamed: 0,Filename,Class,full_path,transcript,formatted_text,sentiment
0,346.wav,Negative,archive2/TRAIN/346.wav,how dare you say that,classify sentiment: how dare you say that,?
1,163.wav,Neutral,archive2/TRAIN/163.wav,you can do a lot better than this,classify sentiment: you can do a lot better th...,a lot better than this.
2,288.wav,Negative,archive2/TRAIN/288.wav,no one likes you,classify sentiment: no one likes you,
3,279.wav,Negative,archive2/TRAIN/279.wav,you should be punished for this,classify sentiment: you should be punished for...,
4,244.wav,Negative,archive2/TRAIN/244.wav,you do not have common,classify sentiment: you do not have common,sentiment sentiment: you do not have common


### **TESTING**

In [8]:
import pandas as pd
import os

testPath = 'archive2/TEST/'
test_files = [f for f in os.listdir(testPath) if f.endswith('.wav')]
df_test = pd.DataFrame(test_files, columns=['Filename'])
df_test['full_path'] = df_test['Filename'].apply(lambda x: os.path.join(testPath, x))


df_test['transcript'] = df_test['full_path'].progress_apply(lambda x: audio_to_text(x)[0])


df_test['formatted_text'] = "classify sentiment: " + df_test['transcript']

# Example function for sentiment prediction, assuming T5 model is already loaded
def predict_sentiment(text):
    input_ids = tokenizer.encode(text, return_tensors="pt")
    outputs = model.generate(input_ids)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

df_test['predicted_sentiment'] = df_test['formatted_text'].progress_apply(predict_sentiment)


  0%|          | 0/110 [00:00<?, ?it/s]

  0%|          | 0/110 [00:00<?, ?it/s]



In [10]:
df_test.head()

Unnamed: 0,Filename,full_path,transcript,formatted_text,predicted_sentiment
0,249.wav,archive2/TEST/249.wav,no one likes to talk with you,classify sentiment: no one likes to talk with you,sentiment: no one likes to talk with you.
1,315.wav,archive2/TEST/315.wav,you are talking very rudely,classify sentiment: you are talking very rudely,
2,300.wav,archive2/TEST/300.wav,you are a totally careless person,classify sentiment: you are a totally careless...,
3,260.wav,archive2/TEST/260.wav,Error: Audio file could not be read as PCM WAV...,classify sentiment: Error: Audio file could no...,": Audio file could not be read as PCM WAV, AIF..."
4,15.wav,archive2/TEST/15.wav,I like your attitude,classify sentiment: I like your attitude,I like your attitude


In [9]:
from sklearn.metrics import accuracy_score

accuracy = accuracy_score(df_test['actual_sentiment'], df_test['predicted_sentiment'])
print(f'Accuracy: {accuracy:.4f}')


KeyError: 'actual_sentiment'