Skip to content

Natural language processing project investigating binary and multi-class classification tasks.

Notifications You must be signed in to change notification settings

dmdequin/fyp2021p4g03

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

First Year Project - ITU 2021

Project 4 - Natural Language Processing - Group 3

Project Goals

In this project, we will learn how to work with natural language data. We will learn:

• What makes natural language different from other types of data
• How to prepare text data for automatic processing
• How to annotate data for supervised classification
• How to train and run a classifier for a basic NLP task

Project Code

In this project we will work with Python, and jupyter notebook.

Data

We use the TweetEval repository, a collection of 7 datasets for different classification tasks based on social media post. The repository can be found here: https://github.com/cardiffnlp/tweeteval.git

Each dataset is presented in the same format and with fixed training, validation and test splits.

This project will focus on two tasks:

Binary Classification Task: Classifying Speech as either "Irony" or "Not Irony"

Multiclass Classification Task: Predicting Emojis