name-classification

NLP Project - ENSAE 2021: Classifying names by origin

Text classification is an important task in Natural Language Processing (NLP). Its main applications are sentiment analysis (which can be used, for example, for website moderation or marketing), intent detection or document clustering (such as mail/spam classification). For this reason, text classification is a good way to get started in NLP. The Pytorch library proposes a tutorial with this in mind, yet has a limited scope and some weaknesses regarding the data acquisition, the model used or the lack of hindsight on the results. In this work, we present a revised version of the Pytorch tutorial which covers all building blocks of a standard NLP project.

More information can be found in the project report.

All code can be run in colab, using the provided notebook.
Click this direct link to directly access the notebook on Colab. Simply save a copy to your workspace to run/edit the code.

Contributions

Contributions are welcome, either open a new issue / PR or email me!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
ML_for_NLP_ENSAE_LE_CORNEC.pdf		ML_for_NLP_ENSAE_LE_CORNEC.pdf
README.md		README.md
cleaned_names.jsonl		cleaned_names.jsonl
data.py		data.py
evaluation.py		evaluation.py
models.py		models.py
name_classification.ipynb		name_classification.ipynb
scrapper.py		scrapper.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

name-classification

Contributions

About

Uh oh!

Releases

Packages

Languages

DaphLC/name-classification

Folders and files

Latest commit

History

Repository files navigation

name-classification

Contributions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages