Speech Processing in Real-Time with CNNs

Overview

This repo contains the source code and pdf my MSc Data Science dissertation project, Speech Processing in Real-Time with Convolutional Neural Networks. The project was a module of study for The University of Birmingham 2024-2025. The project successfully solved speech processing tasks by creating lightweight CNNs that can make inference in real-time. These CNNs are in contrast to current SOTA solutions (at time of writing) which are much larger and can't make accurate inference outside of their development environments.

Set-Up

Download and install Python 3.9.13 from the official website: https://www.python.org/downloads/release/python-3913/
Open the terminal.
Navigate to the project root folder (cxb1114/):

cd path/to/cxb1114

Create a virtual environment called .venv using Python 3.9.13:

python3.9 -m venv .venv

(If python3.9 is not in your PATH on Windows, use the full path to the Python executable, e.g. C:\Python39\python.exe -m venv .venv)

Activate the virtual environment.

Windows:

.venv\Scripts\activate.bat

macOS/Linux:

source .venv/bin/activate

Installed required python packages:

pip install -r requirements.txt

Repository Structure

Practical work, experiments and model training can be found in notebooks/.
The model was trained on over 10GB of data which was in data/ but is too large to host on GitLab.
Source code, including model files, helper functions and classes can be found in src/.
thesis/ is the LaTeX source for the final project report.
The apps which run the models are found in app/. More about this in "Setting up the apps" below.

|-- app/
|   |-- macOS/
|   |   `-- run_VAD.sh
|   |
|   |-- Win/
|   |   `-- run_VAD.bat
|   |
|
|-- data/
|   |-- data.wav
|   `-- ...
|
|-- notebooks/
|   |-- DataWrangling.ipynb
|   |-- HPTuning.ipynb
|   |-- ModelTraining1_5Class.ipynb
|   |-- ModelTraining2_4Class.ipynb
|   |-- ModelTraining3_VAD.ipynb
|   `-- Visuals.ipynb
|
|-- src/
|   |-- app_Count.py
|   |-- app_VAD.py
|   |-- data.py
|   |-- model.py
|   |
|   |-- utils/
|   |   |-- helpers.py
|   |   |-- SpectrogramExtractor.py
|   |   `-- predict.py
|   |
|   `-- model/
|       |-- ConvCount_5_OptunaResults_F1_234.pt
|       |-- ConvCount_full.pt
|       |-- ConvCount_postVAD.pt
|       `-- ConvCount_VAD.pt
|
|-- thesis/
|   `-- Dissertation.pdf
|
|-- demo.pptx
`-- requirements.txt

Setting up the apps

There are two VAD scripts, for both Windows and macOS. These can be accessed in app/. When ran (double click) the scripts will open an app in your browser (localhost).

Note for macOS Users

The files must be made executable first:

chmod +x app/macOS/*.sh.

The macOS scripts couldn't be tested by the author; please report any issues and I'll assist in a solution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Processing in Real-Time with CNNs

Overview

Set-Up

Repository Structure

Setting up the apps

Note for macOS Users

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
app		app
data		data
notebooks		notebooks
src		src
thesis		thesis
.gitignore		.gitignore
FinalReport.pdf		FinalReport.pdf
README.md		README.md
demo.pptx		demo.pptx
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Speech Processing in Real-Time with CNNs

Overview

Set-Up

Repository Structure

Setting up the apps

Note for macOS Users

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages