Deep Learning based Audio Language Model (ALM)

Overview

This project implements an Audio Language Model (ALM) capable of simultaneously recognizing and jointly understanding speech and non-speech audio elements with reasoning capabilities.

Developed by: Akshay Rathod

Features

Speech Recognition for Asian languages (Mandarin, Urdu, Hindi, Telugu, Tamil, Bangla) and English
Non-Speech Audio Understanding (music, alarms, environmental noises)
Speaker Diarization (differentiating between speakers)
Paralinguistic Analysis (emotion, tone, hesitation)
Audio Event Detection (car honking, dog barking, aircraft sounds, etc.)
Joint understanding of speech and non-speech elements for complex reasoning
Web interface for easy access and deployment

Project Structure

├── data/
│   ├── raw/
│   ├── processed/
│   └── datasets.py
├── models/
│   ├── alm_model.py
│   ├── speech_encoder.py
│   ├── audio_encoder.py
│   └── fusion_module.py
├── training/
│   ├── train.py
│   └── trainer.py
├── utils/
│   ├── preprocessing.py
│   └── evaluation.py
├── config/
│   └── config.yaml
├── templates/
│   └── index.html
├── main.py
├── web_app.py
├── deploy.py
└── requirements.txt

Requirements

Python 3.8+
PyTorch 1.9+
Transformers
Librosa
SoundFile
NumPy
Pandas
Flask
Gunicorn

Installation

Clone the repository:

git clone https://github.com/Akshay-Notfound/Deep-Learning-based-ALM.git
cd Deep-Learning-based-ALM

Install the required dependencies:
```
pip install -r requirements.txt
```
(Optional) Initialize the project with sample data:
```
python init_project.py
```

Web Interface

The project includes a web interface built with Flask for easy access to the ALM functionality.

To run the web interface locally:

Start the web application:
```
python web_app.py
```
Open your browser and navigate to http://localhost:5000

Deployment

Development Mode

To run the application in development mode:

python deploy.py --mode dev

Production Mode

To create production deployment scripts:

python deploy.py --mode prod

This will generate deploy.sh (for Linux/Mac) and deploy.bat (for Windows) files that can be used to deploy the application in a production environment.

Manual Production Deployment

For manual deployment, you can use Gunicorn:

gunicorn --bind 0.0.0.0:8000 --workers 4 web_app:app

Usage

Command Line Interface

Run the ALM system from the command line:

python main.py --config config/config.yaml --checkpoint path/to/checkpoint.pt --audio path/to/audio.wav

Web Interface

Upload an audio file using the web interface
Optionally, ask a question about the audio content
View the analysis results including speech recognition, audio events, speaker diarization, and paralinguistic analysis

API Endpoints

GET / - Main web interface
POST /analyze - Analyze an uploaded audio file
GET /api/status - Check API status

Demo

Run the demo to see a simulation of the ALM capabilities without requiring heavy dependencies:

python alm_demo.py

Streamlit Web Interface

The project also includes a Streamlit web interface for easy access to the ALM functionality.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Developer

Akshay Rathod - Final Year Project

Copyright and Disclaimer

This Audio Language Model (ALM) system is provided "as is" without warranty of any kind, either express or implied. The developer makes no representations or warranties regarding the accuracy, reliability, or suitability of the system for any purpose. Use of this system is at your own risk.

In no event shall the developer be liable for any direct, indirect, incidental, special, exemplary, or consequential damages arising out of the use or inability to use this system.

To run the Streamlit interface:

Install the required dependencies:
```
pip install streamlit librosa
```
Run the Streamlit app:
```
streamlit run streamlit_app.py
```

If the streamlit command is not found, try:

python -m streamlit run streamlit_app.py

On Windows, you might need to specify Python 3.10 explicitly:
```
py -3.10 -m streamlit run streamlit_app.py
```
The app will open in your default browser at http://localhost:8501

The Streamlit interface provides:

File upload for audio analysis
Question answering about audio content
Visual results display
Demo mode to see sample outputs

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
__pycache__		__pycache__
config		config
data		data
models		models
templates		templates
training		training
utils		utils
.gitignore		.gitignore
DOCUMENTATION.md		DOCUMENTATION.md
LICENSE		LICENSE
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
README.md		README.md
SOLUTION_SUMMARY.md		SOLUTION_SUMMARY.md
alm_demo.py		alm_demo.py
deploy.py		deploy.py
import_test.py		import_test.py
init_project.bat		init_project.bat
init_project.py		init_project.py
installation_summary.py		installation_summary.py
launch_streamlit.py		launch_streamlit.py
main.py		main.py
minimal_test.py		minimal_test.py
requirements.txt		requirements.txt
run_streamlit.bat		run_streamlit.bat
setup.py		setup.py
streamlit_app.py		streamlit_app.py
system_status.py		system_status.py
test_components.py		test_components.py
test_streamlit.py		test_streamlit.py
web_app.py		web_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Deep Learning based Audio Language Model (ALM)

Overview

Features

Project Structure

Requirements

Installation

Web Interface

Deployment

Development Mode

Production Mode

Manual Production Deployment

Usage

Command Line Interface

Web Interface

API Endpoints

Demo

Streamlit Web Interface

License

Developer

Copyright and Disclaimer

About

Uh oh!

Releases

Packages

Languages

License

Akshay-Notfound/Deep-Learning-based-ALM

Folders and files

Latest commit

History

Repository files navigation

Deep Learning based Audio Language Model (ALM)

Overview

Features

Project Structure

Requirements

Installation

Web Interface

Deployment

Development Mode

Production Mode

Manual Production Deployment

Usage

Command Line Interface

Web Interface

API Endpoints

Demo

Streamlit Web Interface

License

Developer

Copyright and Disclaimer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages