Swar-Saransha ~ An Automatic Nepali Speech Recognition and Summarizer System

About This Project

A system which can convert Nepali speech to text and generate summary of text developed as major project for final year of computer engineering. This project introduces an innovative Nepali speech recognition and summarizer system that has the potential to revolutionize the way we process and analyze spoken information. Our system leverages the power of deep learning algorithms, including CNN, Resnet, BiLSTM, and wav2vec2, to transcribe speech into text with remarkable accuracy and speed. Additionally, the summarizer component of our system employs advanced natural language processing techniques, such as the TextRank Algorithm and Transformer mt5, to condense lengthy speeches into concise and informative summaries, making it easier to digest large amounts of information. This system has significant applications for various fields, including journalism, legal transcription, and business meetings. By streamlining the process of speech transcription and information extraction, our system can save time, increase productivity, and improve the overall quality of speech-to-text analysis.

Motivation Behind This Project

The Nepali language is spoken by more than 17 million people worldwide, making it one of the most widely spoken languages in South Asia. However, there is a significant need for automated speech recognition and summarization tools for Nepali, as there are currently limited options available for accurately transcribing and summarizing Nepali speech. By developing Swar-Saransha, we aim to address this need and provide a valuable tool for individuals, businesses, and organizations that rely on Nepali language communication. The project has the potential to greatly enhance access to information, improve language learning, and facilitate more efficient and effective communication in Nepal.

Features

Nepali Speech Recognition that generates transcript from input speech
Nepali Speech Summarization that generates summary from the generated transcript
Options to download the generated transcript and summary
Simple UI to perform automatic speech recognition and summarization

Tools and Technologies Used:

For Deep Learning ~ Python, Keras, Huggingface Transformers, Pytorch, Librosa, pandas, numpy, matplotlib
For Designs and Diagrams ~ Figma, StarUML
For Frontend ~ React JS, Bootstrap, HTML, CSS, JS
For Backend ~ Python, FastAPI
For Coding ~ Microsoft VS Code, Google Colaboratory, Kaggle
For Version Contol ~ GitHub
For Documentation ~ OverLeaf, Notion, Google Docs, Canva
For Project Management ~ ClickUp

Steps to setup and run the Swar Saransha web app in your local machine

Clone this repo

git clone https://github.com/JuJu2181/Automatic-Nepali-Speech-Recognition-and-Summarizer.git

Create virtual environment using Python version 3.9.10 and use it for the project

python -m venv myenv

P.S: "If you use some other Python versions you may face dependency errors when installing the requirements so it is advised to use Python 3.9.10"

Install all the requirements from requirements.txt

pip install -r requirements.txt

For setting up Backend navigate to backend folder

cd Web_App/backend

Download files for wav2vec2 model from here and place the files inside "nepalimodel/best_wav2vec_model" folder.
Your directory structure should look like this ⬇️
Download files for Abstractive summarizer from here and place the files inside "abstractive" folder.
Your directory structure should look like this ⬇️
Run this command to start backend server

uvicorn api:app --reload

For frontend, open a new terminal and navigate to the frontend folder

cd Web_App/frontend

Make sure you have react installed in your computer, then run following command to install all necessary react packages.

npm install --force

To open the webapp run frontend server while still keeping the backend server running

npm start

In the frontend once you get Model Load Success notification as in image below, you are all set to go and start using the webapp to perform speech recognition and summarization.
Check out the demo video below to learn more about how to use the web app.

Demo Video for Web App

Automatic.Nepali.Speech.Recognition.and.Summary.and.1.more.page.-.Personal.-.Microsoft.Edge.2023-03-17.22-25-50.mp4

Screen Shots For Web App

1. Home Page

2. Speech Recognition From File

3. Summarizer

4. Live Speech

5. Speech Recognition Output

5. Summarizer Output

5. Evaluation of Transcript

Directory Structure

Training folder has all files needed for model training
Web_App folder has all files needed for webapp(frontend and backend included) for real time testing

Project Members

Anish Shilpakar
Anjaan Khadka
Sudip Shrestha
Sachin Manandhar

Feel Free to contact us if you want to know more about how we trained and integrated these model.

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
Training		Training
Web_App		Web_App
screenshots		screenshots
.gitignore		.gitignore
model_summary.txt		model_summary.txt
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training

Training

Web_App

Web_App

screenshots

screenshots

.gitignore

.gitignore

model_summary.txt

model_summary.txt

readme.md

readme.md

requirements.txt

requirements.txt

Repository files navigation

Swar-Saransha ~ An Automatic Nepali Speech Recognition and Summarizer System

About This Project

Motivation Behind This Project

Features

Tools and Technologies Used:

Steps to setup and run the Swar Saransha web app in your local machine

Demo Video for Web App

Screen Shots For Web App

1. Home Page

2. Speech Recognition From File

3. Summarizer

4. Live Speech

5. Speech Recognition Output

5. Summarizer Output

5. Evaluation of Transcript

Directory Structure

Project Members

About

Releases

Packages

Contributors 3

Languages

JuJu2181/Automatic-Nepali-Speech-Recognition-and-Summarizer

Folders and files

Latest commit

History

Repository files navigation

Swar-Saransha ~ An Automatic Nepali Speech Recognition and Summarizer System

About This Project

Motivation Behind This Project

Features

Tools and Technologies Used:

Steps to setup and run the Swar Saransha web app in your local machine

Demo Video for Web App

Screen Shots For Web App

1. Home Page

2. Speech Recognition From File

3. Summarizer

4. Live Speech

5. Speech Recognition Output

5. Summarizer Output

5. Evaluation of Transcript

Directory Structure

Project Members

About

Topics

Resources

Stars

Watchers

Forks

Languages