Disclaimer I: This document is intended for students either writing their (nlp-related) bachelor or master thesis or working on their (nlp-related) student consulting project under our supervision.
Disclaimer II: Please note that this document is subject to continuous change. Every time we find a new, nice source we will add it. Some stuff might also be deleted over time if not considered useful anymore.
Authors: Matthias Aßenmacher // Christian Heumann
Note: Most important resources are marked by a
Last change: 03-03-2022
- Join Mattermost (ask Matthias for the invite link)
- Ask Matthias to add you to the “NLP”-channel on Mattermost
- Ask Matthias to add you to our internal “NLP” mailing list
- You can reach Matthias via Mattermost or E-Mail, Christian prefers to be contacted via E-Mail
(In case of e-mails related to your thesis/project, make sure to cc the respective other in order to create no information asymmetries) - We have made very good experience with (approx.) bi-weekly meetings for short status updates and prefer to work together with you in this fashion
(Nevertheless this is not mandadory; we just think it helps you to (i) get started and (ii) stay on track) - We will have a so-called “NLP Colloquium” every now and then (intended 4 times a year) where all of our BA-/MA-/Consulting- students present their work to the others. This meeting is of rather informal character (mostly intended to connect you to each other), so no need for high-gloss slides, or anything like that. Everything like jupyter notebooks/interesting figures/slides is fine.
We will announce this via the mailing list and via Mattermost.
Dates for 2022:- 01.04. at 13h s.t.
- 01.07. at 13h s.t.
- 21.10. at 14h s.t.
- 16.12. at 13h s.t.
- The mailing list will be mostly used for announcements, while in the Mattermost channel we will occasionally also post (nlp-related) stuff we consider interesting.
- TALK TO US rather sooner than later if any problems occur which you are not able to solve on your own. Open (and timely) communication is (in our opinion) key to a successful supervision/cooperation during theses or consulting projects.
- Pre-Processing (e.g. in Python with NLTK or spaCy)
- One-hot-encoding of words, the bag-of-words (bow) approach, its applications in ML, drawbacks & limitations (just google this stuff, you will find enough material).
- Extensions of the bow approach, like n-grams or tf-idf (also just google this).
- In general (it's a little different from the just "plug-and-play" style in which you can install R and R-Studio)
- Find a comfortable setup:
- Jupyter Notebooks / Lab
- Google Colaboratory
- Pretty nice book for an a broad overview on everything until self-attention, useful for covering the basics: Goldberg (2017)
- Good overview on Embeddings: Pilehvar & Camacho-Collados (2021)
- Overview on DL in general: Deep Learning book
- Or more basic: Intro to Statistical learning
-
⚠️ Hugging Face Transformer course⚠️ - Internal teaching resources (LMU):
- Our lecture from WS 20/21: https://moodle.lmu.de/course/view.php?id=10268
- Same lecture, slightly updated (WS 21/22): https://moodle.lmu.de/course/view.php?id=17645
- Booklet from our NLP seminar (summer semester 2020): https://slds-lmu.github.io/seminar_nlp_ss20/
- (Exemples of) Supervised theses: https://www.misoda.statistik.uni-muenchen.de/studium_lehre/theses_old/index.html
- Course about ABSA from a student consulting project: https://lisa-wm.github.io/nlp-twitter-r-bert/
- Important conceptual foundation
- Bengio, Y., et al. (2003) "A neural probabilistic language model." Journal of machine learning research: 1137-1155.
- Modification of the idea of Bengio et al.; use the internal representations of the neural net as the primary objective; Their architecture is called word2vec and is able to learn static word embeddings
- Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781(2013).
- Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013.
- Alternative framework to word2vec
- Pennington, Jeffrey, Richard Socher, and Christopher Manning. "Glove: Global vectors for word representation." Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014.
- Extending the embedding idea from word2vec to sentence/paragraph/document level
- Le, Quoc, and Tomas Mikolov. "Distributed representations of sentences and documents." International conference on machine learning. 2014.
- Sequence-to-sequence models
- Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in neural information processing systems. 2014.
- Extending the embedding idea to subword tokens
- Bojanowski, Piotr, et al. "Enriching word vectors with subword information." Transactions of the Association for Computational Linguistics 5 (2017): 135-146.
- Joulin, Armand, et al. "Bag of tricks for efficient text classification." arXiv preprint arXiv:1607.01759 (2016).
- Important foundations for the so-called Attention & Self-Attention mechanism
- Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. "Neural machine translation by jointly learning to align and translate." arXiv preprint arXiv:1409.0473 (2014).
- Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning. "Effective approaches to attention-based neural machine translation.") arXiv preprint arXiv:1508.04025 (2015).
- Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems. 2017.
- Some of the most famous models that learn (a) contextualized embeddings and (b) can be used for transfer learning
- Radford, Alec, et al. "Improving language understanding by generative pre-training." pdf (2018).
- Peters, Matthew E., et al. "Deep contextualized word representations." arXiv preprint arXiv:1802.05365 (2018).
- Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).
- Radford, Alec, et al. "Language models are unsupervised multitask learners." OpenAI Blog 1.8 (2019).
- Yang, Zhilin, et al. "XLNet: Generalized Autoregressive Pretraining for Language Understanding." Advances in neural information processing systems 32 (2019).
- Liu, Yinhan, et al. "RoBERTa: A Robustly Optimized BERT Pretraining Approach." arXiv preprint arXiv:1907.11692 (2019).
- Lan, Zhenzhong, et al. "Albert: A lite bert for self-supervised learning of language representations." arXiv preprint arXiv:1909.11942 (2019).
- Raffel, Colin, et al. "Exploring the limits of transfer learning with a unified text-to-text transformer." arXiv preprint arXiv:1910.10683 (2019).
- Sanh, Victor, et al. "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter." arXiv preprint arXiv:1910.01108 (2019).
- Clark, Kevin, et al. "Electra: Pre-training text encoders as discriminators rather than generators." arXiv preprint arXiv:2003.10555 (2020).
- Brown, Tom B., et al. "Language models are few-shot learners." arXiv preprint arXiv:2005.14165 (2020).
- Heavily used benchmark data sets
- Wang, Alex, et al. "GLUE: A multi-task benchmark and analysis platform for natural language understanding." arXiv preprint arXiv:1804.07461 (2018).
- Wang, Alex, et al. "Superglue: A stickier benchmark for general-purpose language understanding systems." arXiv preprint arXiv:1905.00537 (2019).
- Pranav Rajpurkar, et al. "SQuAD: 100,000+ questions for machine comprehension of text" Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas. Association for Computational Linguistics.
- Pranav Rajpurkar, et al. "Know what you don’t know: Unanswerable questions for SQuAD" In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 784–789, Melbourne, Australia.
- Zero-/Few-Shot Learning TO DO
- Brown, Tom B., et al. "Language models are few-shot learners." arXiv preprint arXiv:2005.14165 (2020).
- Schick, Timo, and Hinrich Schütze. "It's not just size that matters: Small language models are also few-shot learners." arXiv preprint arXiv:2009.07118 (2020).
- Prompting/Prompt-Engineering
- Liu, Pengfei, et al. "Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing." arXiv preprint arXiv:2107.13586 (2021).
- Wei, Jason, et al. "Finetuned language models are zero-shot learners." arXiv preprint arXiv:2109.01652 (2021).
- Lester, Brian, et al. "The Power of Scale for Parameter-Efficient Prompt Tuning" In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
5. Make use of the overwhelming offer of blogs, tutorials (or the internet in general): Here are some nice online resources
- https://github.com/ivan-bilan/The-NLP-Pandect (Exhaustive overview on basically everything)
- https://mccormickml.com/ (+ https://www.youtube.com/channel/UCoRX98PLOsaN8PtekB9kWrw)
- https://openai.com/blog/
- https://www.deepmind.com/blog
- https://syncedreview.com/
- https://thegradient.pub/
- https://jalammar.github.io/ (+ https://www.youtube.com/channel/UCmOwsoHty5PrmE-3QhUBfPQ)
- https://ruder.io/nlp-news/ (+ his thesis: https://ruder.io/thesis/neural_transfer_learning_for_nlp.pdf)
- https://dair.ai/ (pretty nice blog)
- https://github.com/tomohideshibata/BERT-related-papers (Exhaustive list of BERT related papers)
- Summaries by Yannik Kilcher: https://www.youtube.com/channel/UCZHmQk67mSJgfCCTn7xBfew