NLP Project - ENSAE 2021: Classifying names by origin
Text classification is an important task in Natural Language Processing (NLP). Its main applications are sentiment analysis (which can be used, for example, for website moderation or marketing), intent detection or document clustering (such as mail/spam classification). For this reason, text classification is a good way to get started in NLP. The Pytorch library proposes a tutorial with this in mind, yet has a limited scope and some weaknesses regarding the data acquisition, the model used or the lack of hindsight on the results. In this work, we present a revised version of the Pytorch tutorial which covers all building blocks of a standard NLP project.
More information can be found in the project report.
All code can be run in colab, using the provided notebook.
Click this direct link to directly access the notebook on Colab. Simply save a copy to your workspace to run/edit the code.
Contributions are welcome, either open a new issue / PR or email me!