This project was part of my Natural Language Processing Nanodegree, which I completed in late 2020. This particular Nanodegree – in fact, this particular project – had been my goal throughout my studies of machine learning. I was just so excited to work on it back then, and I'm still excited to share the work with you now. Machine translation has a long and fascinating history that involved many different approaches before the widespread commercial adoption of Neural Machine Translation (NMT) around 2016 or so. The following NMT pipeline, that I created with TensorFlow via Keras, reflects some of the most state-of-the-art practices from that time period, but it was already somewhat outdated when I built it in 2020, thanks largely to Google Brain's Transformer model with attention.
The development notebook examines and preprocesses the data, tests a couple RNN architecture features, and finally assembles a bidirectional RNN model with embedding for training and prediction of English to French text, which reaches a pretty reasonable validation accuracy of 95% after just 10 training epochs.
- Translate English text into French text
- Create a pipeline that could be modified to translate between any languages
- Test the performance difference between word IDs and embeddings
- Test the performance difference and training needs of simple and bidirectional RNNs
- This project was part of Udacity's Natural Language Processing Nanodegree.
- Language data was provided by Udacity as a select subset of WMT's machine translation dataset.
Copyright © 2020-2022 Sean von Bayern
Licensed under the MIT License