Skip to content

This document is intended for students either writing their (nlp-related) bachelor or master thesis or working on their (nlp-related) consulting project under our supervision.

Notifications You must be signed in to change notification settings

assenmacher-mat/howto_nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 

Repository files navigation

Where to start in the NLP jungle?

Disclaimer I: This document is intended for students either writing their (nlp-related) bachelor or master thesis or working on their (nlp-related) student consulting project under our supervision.

Disclaimer II: Please note that this document is subject to continuous change. Every time we find a new, nice source we will add it. Some stuff might also be deleted over time if not considered useful anymore.

Authors: Matthias Aßenmacher // Christian Heumann

Note: Most important resources are marked by a ⚠️

Last change: 03-03-2022

I. Set up all necessary communication channels

  • Join Mattermost (ask Matthias for the invite link)
  • Ask Matthias to add you to the “NLP”-channel on Mattermost
  • Ask Matthias to add you to our internal “NLP” mailing list
  • You can reach Matthias via Mattermost or E-Mail, Christian prefers to be contacted via E-Mail
    (In case of e-mails related to your thesis/project, make sure to cc the respective other in order to create no information asymmetries)
  • We have made very good experience with (approx.) bi-weekly meetings for short status updates and prefer to work together with you in this fashion
    (Nevertheless this is not mandadory; we just think it helps you to (i) get started and (ii) stay on track)
  • We will have a so-called “NLP Colloquium” every now and then (intended 4 times a year) where all of our BA-/MA-/Consulting- students present their work to the others. This meeting is of rather informal character (mostly intended to connect you to each other), so no need for high-gloss slides, or anything like that. Everything like jupyter notebooks/interesting figures/slides is fine.
    We will announce this via the mailing list and via Mattermost.
    Dates for 2022:
    • 01.04. at 13h s.t.
    • 01.07. at 13h s.t.
    • 21.10. at 14h s.t.
    • 16.12. at 13h s.t.
  • The mailing list will be mostly used for announcements, while in the Mattermost channel we will occasionally also post (nlp-related) stuff we consider interesting.
  • TALK TO US rather sooner than later if any problems occur which you are not able to solve on your own. Open (and timely) communication is (in our opinion) key to a successful supervision/cooperation during theses or consulting projects.

II. Useful materials for starting with NLP

1. Get familiar with the basics:

  • Pre-Processing (e.g. in Python with NLTK or spaCy)
  • One-hot-encoding of words, the bag-of-words (bow) approach, its applications in ML, drawbacks & limitations (just google this stuff, you will find enough material).
  • Extensions of the bow approach, like n-grams or tf-idf (also just google this).

2. Get familiar with the Python environment:

  • In general (it's a little different from the just "plug-and-play" style in which you can install R and R-Studio)
  • Find a comfortable setup:
    • One alternative could be using Miniconda for Python, package management and virtualenvs together with e.g. VS Code as IDE
    • Another alternative (nice for beginners) is Anaconda, a all-in-one solution that comes with various IDEs (like e.g. Spyder, which very closely resembles R-Studio)
  • Jupyter Notebooks / Lab
  • Google Colaboratory

3. Start looking into neural networks and deep learning for NLP

4. Milestone papers:

5. Make use of the overwhelming offer of blogs, tutorials (or the internet in general): Here are some nice online resources

6. Software

About

This document is intended for students either writing their (nlp-related) bachelor or master thesis or working on their (nlp-related) consulting project under our supervision.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published