This is an end-to-end system that trains a neural network to generate masks used to remove noise during device playback.
- Zhang, Hao, and D. Wang. "Deep learning for acoustic echo cancellation in noisy and double-talk scenarios." Training 161.2 (2018): 322.
- Wang, Yuxuan et al. “On Training Targets for Supervised Speech Separation.” IEEE/ACM transactions on audio, speech, and language processing vol. 22,12 (2014): 1849-1858. doi:10.1109/TASLP.2014.2352935
- aishoot/LSTM_PIT_Speech_Separation
- Using Optimal Ratio Mask as Training Target for Supervised Speech Separation