IDL-team-7: Listen, Attend, and Spell: Transcript Generation in Anime

Repository of the final project of Team 7 for 11785 Introduction to Deep Learning S21

In the project, the team attempted to build an end-to-end speaker labeled transcript generation model. The training data for the project is obtained from the Anime Movie Kimi no Na wa.

Data:

Log spectrogram of KNNW original audio soundtrack
Labeled original transcript

Model:

Modified LAS model for speech recognition (with transfer learning)
CNN-LSTM model for speaker identification

Performance:

Achieved an average Lev distance of 15.27 for speech recognition
Achieved an average classification accuracy of 57% for speaker identification

How to run:

Download the zipped code
Use the KNNW_end2end.ipynb for training and generating result
Modify the source code loading section as need.
Initialize training sessions with different parameters.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.gitignore		.gitignore
KNNW_end2end.ipynb		KNNW_end2end.ipynb
README.md		README.md
constant.py		constant.py
data_augment.py		data_augment.py
datasets.py		datasets.py
preprocess.py		preprocess.py
session.py		session.py
speaker_model.py		speaker_model.py
speech_model.py		speech_model.py
training.py		training.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IDL-team-7: Listen, Attend, and Spell: Transcript Generation in Anime

Data:

Model:

Performance:

How to run:

About

Releases

Packages

Contributors 3

Languages

Keitokuch/IDL-team-g6

Folders and files

Latest commit

History

Repository files navigation

IDL-team-7: Listen, Attend, and Spell: Transcript Generation in Anime

Data:

Model:

Performance:

How to run:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages