Skip to content

NLP Sentiment Analysis on Twitter posts to detect natural disasters occurring in real time.

Notifications You must be signed in to change notification settings

rkkwan/disaster-rapid-alert

Repository files navigation

Utilizing Twitter for Disaster Detection

The goal of this project was to use social media, specifically Twitter, to identify natural disasters as they occur. In this repo you will find the following:

Notebooks

  1. Gathering Data
  2. Feature Engineering
  3. Benchmark Model
  4. Model Tuning with Doc2Vec
  5. Making Predictions
  6. Time Series Analysis

Deck

Utilizing Twitter for Disaster Detection

Paper

Utilizing Twitter for Disaster Detection

This project was developed by:

Tofer Kim,
Ritchie Kwan, and
Will Stecher,
with special thanks to Li Zhong.

Problem Statement

Traditional methods for alerting on disaster-related events like earthquakes and tsunamis rely on information derived from official sources (e.g. USGS). Twitter can be a valuable resource for sharing information regarding disaster-related events. We will attempt to identify "relevant" tweets in order to investigate Twitter trends that can be used to detect natural disasters as they occur. These methods, along with geolocation and population data, can be implemented to serve as an alert to first-responders (e.g. FEMA) and potentially help save lives.

Executive Summary

Using a provided dataset of over 10,000 disaster-related tweets marked as either "relevant" or "not relevant", we trained a model using NLP to classify tweets as such. Narrowing the scope of our project to detect a specific natural disaster--wildfires--we used the keywords "wildfire" and "forest fire" to collect tweets from two distinct date-ranges: the beginning of the 2018 California wildfires and a more recent period after these fires had subsided.

Conclusion & Recommendations

After running our logistic regression model to classify tweets from the specified time-ranges above, we analyzed the frequency of "relevant" tweets. Our findings show that at beginning of the 2018 California wildfire disaster, there were consistently over 180 "relevant" tweets within a window of 300 seconds (5 minutes). More recent tweets did not exceed this threshold of 180 "relevant" tweets. This distinction can be used to detect future wildfire disasters in as short of a delay as 5 minutes.

Next Steps

We can implement our methods with new keywords and date-ranges in order to observe and detect different types of natural disasters, as well as further train our Doc2Vec model to classify these events. With access to Twitter geolocation and population data, we can further improve our functionality and potentially estimate local areas and number of people affected by future disasters.

References

About

NLP Sentiment Analysis on Twitter posts to detect natural disasters occurring in real time.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published