# EmotiVoice: An Intelligent Voice Assistant with Emotion Detection

**Author:** Jai Kumar

**Course:** EXC Project  
**Date:** April 13, 2025

## Index
1. Problem Statement
2. Introduction
3. Literature Review
4. Bill of Materials/Software Used
5. Implementation
6. Proof of Concept
7. References

## 1. Problem Statement
Design and implement a software-only intelligent voice assistant that can detect user emotions from both voice tone and spoken text, providing emotionally-aware responses. This solution should require no external hardware and be executable on a personal computer.

## 2. Introduction
Voice assistants have become widely adopted, yet most lack emotional intelligence. EmotiVoice bridges this gap using AI-based emotion detection through speech tone and sentiment analysis. This enhances interactivity and allows the assistant to respond in a more empathetic manner.

## 3. Literature Review
- Pretrained models like `Wav2Vec2` and libraries like `TextBlob` make emotion recognition more accessible.
- Tools like HuggingFace and pyttsx3 help build robust, interactive assistants.
- Past research on RAVDESS dataset and sentiment analysis via NLP has proven emotional indicators from voice and text are highly correlated with intent.

## 4. Bill of Materials / Software Used
- Python 3.10+
- `speech_recognition`
- `pyttsx3`
- `textblob`
- `transformers`
- `torch`
- `torchaudio`
- `Tkinter`
- `librosa`
- Jupyter Notebook

## 5. Implementation
The implementation consists of several modules:
- Speech Recognition using Google API
- Text Sentiment Detection using TextBlob
- Voice Tone Emotion Detection using `HuggingFace`
- Text-to-Speech using pyttsx3
- GUI with Tkinter

In [None]:
### Code Snippets
```
python
import speech_recognition as sr
from textblob import TextBlob
import pyttsx3
import random

# ------------------------
# 1. Speech Recognition
# ------------------------
recognizer = sr.Recognizer()
mic = sr.Microphone()

print("\n[1] Listening for your voice input...")
with mic as source:
    recognizer.adjust_for_ambient_noise(source)
    audio = recognizer.listen(source)

try:
    text = recognizer.recognize_google(audio)
    print(f"\nRecognized Text: {text}")

except sr.UnknownValueError:
    print("Sorry, could not understand your speech.")
    exit()
except sr.RequestError as e:
    print(f"Could not request results; {e}")
    exit()

# ------------------------
# 2. Sentiment Analysis
# ------------------------
print("\n[2] Running Sentiment Analysis...")
blob = TextBlob(text)
polarity = blob.sentiment.polarity
subjectivity = blob.sentiment.subjectivity
print(f"Polarity: {polarity}")
print(f"Subjectivity: {subjectivity}")

# ------------------------
# 3. Text to Speech Output
# ------------------------
print("\n[3] Responding with TTS...")
tts = pyttsx3.init()
response = "It's 11:04 PM"
tts.say(response)
tts.runAndWait()
print(f"Spoken Output: {response}")

# ------------------------
# 4. Mock Emotion Detection (based on sentiment)
# ------------------------
print("\n[4] Detecting Emotion...")
if polarity > 0.6:
    emotion = "Happy"
elif polarity < -0.4:
    emotion = "Angry"
elif 0.1 < polarity <= 0.6:
    emotion = "Content"
else:
    emotion = "Neutral"

print(f"Final Emotion Detected: {emotion}")
```

![Mock GUI](mainoutput.png)

## 6. Proof of Concept
The assistant was tested with multiple sentences and tone samples. Below are screenshots from the working GUI:

![Mock GUI](mainoutput2.png)
![Mock GUI](mainoutput3.png)

*Figure 1: GUI Displaying Emotion and Response*

## 7. References
- https://textblob.readthedocs.io/en/dev/
- https://realpython.com/python-speech-recognition/
- https://pypi.org/project/pyttsx3/
