Skip to content

UMwarrior/ProjectTeam-Task1

 
 

Repository files navigation

Project-1

Project Description

CCET ACM Student Chapter brings you the opportunity to learn basics of text processing and analysis through this project.

The goal of project is to Predict which Tweets are about real disasters and which ones are not. You have been provided with the data sets (test, train and sample submission). You have to use test and train data sets accordingly.
Twitter has become an important communication channel in times of emergency. The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time. Because of this, more agencies are interested in programatically monitoring Twitter (i.e. disaster relief organizations and news agencies).

But, it’s not always clear whether a person’s words are actually announcing a disaster.
Take this example- "on Plus Side LOOK AT THE SKY LAST NIGHT IT WAS ABLAZE"
The author explicitly uses the word “ABLAZE” but means it metaphorically. This is clear to a human right away, especially with the visual aid. But it’s less clear to a machine.

In this task, you’re challenged to build a machine learning model that predicts which Tweets are about real disasters and which one’s aren’t. You’ll have access to a dataset of 10,000 tweets that were hand classified. Disclaimer: The dataset for this competition contains text that may be considered profane, vulgar, or offensive.

If this is your first time working on an NLP problem, you can use the below resources to get you up and running.

freecodecamp
Ultimate guide to deal with Text Data (using Python) – for Data Scientists and Engineers
8 Free Resources For Beginners To Learn Natural Language Processing

Few techniques which you can use are-

  1. XGBoost
  2. Naive Bayes
  3. LSTM
  4. GloVe
  5. BERT
    and many more...

Prerequisites:

  1. Python Language.
  2. Having a little bit of knowledge of jupyter notebook and kaggle is an advantage.
  3. Data Preprocessing, visualization and basic text analysis.
  4. Remember, the approach you follow has more weightage as compared to the output.

Submission Format:

All have to fork this repository and create a folder with their jupyter file and a submission file and make a Pull Request with this repository.

Submission Deadline: 14 November 2021

In case of queries, feel free to contact:

  • Muskaan Chopra (+91-8146194184)
  • Kriti Aggarwal (+91-9915164472)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • Jupyter Notebook 100.0%