In this project, we will learn how to work with natural language data. We will learn:
• What makes natural language different from other types of data
• How to prepare text data for automatic processing
• How to annotate data for supervised classification
• How to train and run a classifier for a basic NLP task
In this project we will work with Python, and jupyter notebook.
We use the TweetEval repository, a collection of 7 datasets for different classification tasks based on social media post. The repository can be found here: https://github.com/cardiffnlp/tweeteval.git
Each dataset is presented in the same format and with fixed training, validation and test splits.
• Binary Classification Task: Classifying Speech as either "Irony" or "Not Irony"
• Multiclass Classification Task: Predicting Emojis