This is a course offered in Trinity Term 2021 at Department of Politics and International Relations, University of Oxford
Course Convener: Ashrakat Elshehawy
We will be using Python as a programming language during this course. The course starts with a short Python refresher. The course will then introduce text processing mechanisms. We are going to learn about mechanisms like tokenization, stemming, lemmatization, part-of-speech tagging, and named-entity recognition. The course will also provide insight in methods of managing and manipulating text data in Python. We will then cover aspects of numerical representation of text, for example like word-embedding, and also discuss metrics of text similarity. After that, we will focus on methods of unsupervised machine learning like clustering and topic modelling, as well as, supervised machine learning methods, with a focus on classification techniques and sentiment analysis.
All Python sheets used in class, excercises, excercise solutions, and slides will be updated here on a weekly basis.
Quickly Access our Python Sheets:
- Python Refresher 1 Link
- Python Refresher 2 Link
- Week 1 - Introduction in NLP, Peparing Corpora, and Text-Preprocessing Link
- Week 2 - Building an NLP Pipeline, Lemmatization, Stemming, POS-Tagging, NER. An introduction in dictionary methods and the use of counting for computational text-analysis Link
- Week 3 - Vector Space Representation and Unsupervised Techniques Link
- Week 4 - Supervised Techniques - Classification and Sentiment Analysis Link