This repository is for a deep learning neural network model. This model uses features generated by MFCC function as it emphasizes features of the audio signal that are important for human speech perception while discarding less relevant information. I have used the Toronto Emotional Speech Set (TESS) for training the model. The results show that our model performs with a maximum accuracy of 48%. Additionally, this model helps understand how humans perceive emotions in speech. To understand this one can read the topic of "Feature Extraction" in my write up given below: https://shorturl.at/rtFOT
1) Python
2) NumPy: For numerical computations and array manipulation.
3) librosa: For audio processing and feature extraction (MFCC).
4) TensorFlow or Keras: For deep learning model creation and training.
5) Streamlit: For building the user interface and deployment.
6) Anaconda: For managing environments and dependencies (optional but recommended).
1) Setup Environment
a) Create a new anaconda environment for your project:
conda create --name emotion_recognition python=3.8
conda activate emotion_recognition
b) Install the following required libraries:
pip install numpy librosa tensorflow streamlit
2) Deployment with Streamlit:
a) Create a Streamlit app file (e.g., app.py) to serve as the user interface.
b) Make an 'app.py' file in the same folder.
c) Import necessary libraries and download the trained model int the same folder.
d) Run the Streamlit app locally using the command streamlit run app.py.