Repository of the final project of Team 7 for 11785 Introduction to Deep Learning S21
In the project, the team attempted to build an end-to-end speaker labeled transcript generation model. The training data for the project is obtained from the Anime Movie Kimi no Na wa.
- Log spectrogram of KNNW original audio soundtrack
- Labeled original transcript
- Modified LAS model for speech recognition (with transfer learning)
- CNN-LSTM model for speaker identification
- Achieved an average Lev distance of 15.27 for speech recognition
- Achieved an average classification accuracy of 57% for speaker identification
- Download the zipped code
- Use the KNNW_end2end.ipynb for training and generating result
- Modify the source code loading section as need.
- Initialize training sessions with different parameters.