# Discovery Phase: AI-Powered Mood Tracker and Music Therapy App

## Objective
The objective of this phase is to create a base model that can detect users' emotional states (e.g., stress, sadness, happiness) based on multi-modal inputs such as facial recognition, voice analysis, and text sentiment. The base model will also recommend suitable music therapy based on the detected mood.

---

## Key Tasks

### 1. Data Collection
- **Facial Expression Data**:
  - Collect or access datasets that contain labeled facial expressions (e.g., happy, sad, stressed) using resources like FER2013, AffectNet, or custom datasets.
- **Voice Emotion Data**:
  - Utilize datasets like RAVDESS, CREMA-D, or custom-recorded voice samples to recognize emotional patterns in speech (e.g., pitch, tone, speed).
- **Text Sentiment Data**:
  - Leverage sentiment analysis datasets such as IMDB reviews, Twitter sentiment data, or custom journal entries for detecting emotions through text.

### 2. Model Selection & Prototyping
- **Facial Recognition Model**:
  - Choose pre-trained models (e.g., CNN-based models for facial expression analysis) such as VGG16, ResNet, or specialized emotion recognition networks.
  - Experiment with OpenCV or Mediapipe for real-time facial emotion detection.
- **Voice Emotion Recognition**:
  - Use LSTM or RNN models for emotion recognition in voice data.
  - Pre-process voice data using tools like librosa to extract features like MFCC (Mel-frequency cepstral coefficients).
- **Text Sentiment Analysis**:
  - Fine-tune transformer-based models like BERT, RoBERTa, or DistilBERT for text sentiment detection.
  - Explore traditional sentiment analysis techniques using bag-of-words or TF-IDF along with classifiers like SVM or Naive Bayes.

### 3. Model Training
- **Multi-modal Fusion**:
  - Develop separate models for facial, voice, and text analysis and test their performance individually.
  - Combine the outputs of these models using late fusion or decision-level fusion techniques to improve mood detection accuracy.
- **Evaluation Metrics**:
  - Use accuracy, precision, recall, and F1 score to evaluate emotion detection models.
  - Utilize confusion matrices to better understand the performance across emotional classes.

### 4. Music Recommendation System (Baseline)
- **Music Tagging**:
  - Categorize music tracks based on mood (e.g., relaxing, uplifting, energizing).
  - Use an existing API (e.g., Spotify API) to access playlists or build a dataset with mood-based tags.
- **Initial Recommendation Logic**:
  - Develop simple recommendation logic: if stress is detected, play relaxing music; if sadness is detected, play uplifting music, etc.
  - Personalize recommendations over time based on user feedback and preferences.

---

## Timeline
- **Week 1-2**: Data collection and dataset exploration for facial, voice, and text sentiment analysis.
- **Week 3**: Model selection, experimentation, and prototyping for individual modalities.
- **Week 4**: Model training, tuning, and evaluation.
- **Week 5**: Basic implementation of music recommendation logic and testing.
- **Week 6**: Multi-modal model fusion and initial testing.

---

## Next Steps
- After the discovery phase, further refine the models by integrating additional data (e.g., wearable devices).
- Improve model accuracy with continuous feedback loops and machine learning adaptation.
- Begin development of real-time notification and seamless music playback systems.

---

## Tools & Frameworks
- **Facial Recognition**: OpenCV, Mediapipe, TensorFlow/Keras, PyTorch.
- **Voice Analysis**: librosa, pyAudioAnalysis, RNN/LSTM architectures.
- **Text Sentiment**: Hugging Face Transformers, spaCy, traditional NLP techniques.
- **Music Recommendation**: Spotify API, YouTube Music API, personalized recommendation algorithms.
