This is a Machine Learning-Audio Signal Processing Project where a real-time audio signal is classified into speech or music using Deep Neural Network and Convolutional Network. The long term goal is to create an AI personal assistant which listens to audio streams and summarize its content to the end user.
The project use the dataset DataGTZAN music/speech collection.
All the wav audio files should be extracted to the Data/Files folder.
Python Version
Python 3.9.12
Installing Virtual Environment
python -m pip install --user virtualenv
Creating New Virtual Environment
python -m venv envname
Activating Virtual Environment
source envname/bin/activate
Upgrade PIP
python -m pip install --upgrade pip
Installing Packages
python -m pip install -r requirements.txt
pip install PyAudio
#Data preprocessing
python main.py -s p
#Model Training
python main.py -s t
#Real-time Demonstration
python main.py -s r
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 32) 8224
dense_1 (Dense) (None, 64) 2112
dense_2 (Dense) (None, 128) 8320
dense_3 (Dense) (None, 256) 33024
dense_4 (Dense) (None, 512) 131584
dense_5 (Dense) (None, 256) 131328
dense_6 (Dense) (None, 128) 32896
dropout (Dropout) (None, 128) 0
dense_7 (Dense) (None, 64) 8256
dense_8 (Dense) (None, 2) 130
=================================================================
Total params: 355,874
Trainable params: 355,874
Non-trainable params: 0
_________________________________________________________________
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 101, 1290, 32) 320
max_pooling2d (MaxPooling2D) (None, 50, 645, 32) 0
conv2d_1 (Conv2D) (None, 48, 643, 64) 18496
max_pooling2d_1 (MaxPooling2D) (None, 24, 321, 64) 0
conv2d_2 (Conv2D) (None, 22, 319, 64) 36928
flatten (Flatten) (None, 449152) 0
dense (Dense) (None, 64) 28745792
dense_1 (Dense) (None, 2) 130
=================================================================
Total params: 28,801,666
Trainable params: 28,801,666
Non-trainable params: 0
_________________________________________________________________
python -m pytest --verbose
Model | Accuracy | Precision | Recall | F1-score |
---|---|---|---|---|
DNN Model | 0.9812 | 0.9980 | 0.9647 | 0.9810 |