Project Title: Climate Risk & Disaster Management

Project Statement: During disasters, people often turn to social media to share what’s happening. But not every post is actually about the disaster, and going through all of them by hand takes a lot of time. That’s why we need an automated system that can quickly pick out disaster-related tweets, helping with faster warnings and response.

Description: During disasters, people actively share updates on social media, but not all posts are relevant or useful for emergency response. Manually filtering such a huge amount of data is time-consuming and inefficient. This project develops an AI/ML-based classifier that can automatically detect whether a tweet is disaster-related or not. Natural Language Processing (NLP) techniques are applied to clean and analyse the text, and a Naive Bayes model is used for classification. The system aims to provide faster identification of relevant information, which can support early warnings, quick decision-making, and effective disaster management.


---



In [1]:
# Essential Libraries
import pandas as pd                # for handling CSV files
import numpy as np                 # for numerical operations

# Machine Learning (Sklearn)
from sklearn.feature_extraction.text import CountVectorizer  # Convert text → numbers
from sklearn.model_selection import train_test_split         # Train/test split
from sklearn.naive_bayes import MultinomialNB                # Naive Bayes classifier
from sklearn.metrics import accuracy_score, classification_report  # Model evaluation

# Visualization
import matplotlib.pyplot as plt     # For plotting graphs
import seaborn as sns               # For better visualization


In [2]:
from google.colab import files
uploaded = files.upload()   # Upload train.csv file here


Saving train.csv to train.csv


In [3]:
# Load dataset into DataFrame
df = pd.read_csv("train.csv")


In [4]:
# Dataset info
print("\n--- Dataset Info ---")
print(df.info())

# Statistical summary
print("\n--- Dataset Description ---")
print(df.describe())

# Missing values check
print("\n--- Missing Values ---")
print(df.isnull().sum())



--- Dataset Info ---
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7613 entries, 0 to 7612
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   id        7613 non-null   int64 
 1   keyword   7552 non-null   object
 2   location  5080 non-null   object
 3   text      7613 non-null   object
 4   target    7613 non-null   int64 
dtypes: int64(2), object(3)
memory usage: 297.5+ KB
None

--- Dataset Description ---
                 id      target
count   7613.000000  7613.00000
mean    5441.934848     0.42966
std     3137.116090     0.49506
min        1.000000     0.00000
25%     2734.000000     0.00000
50%     5408.000000     0.00000
75%     8146.000000     1.00000
max    10873.000000     1.00000

--- Missing Values ---
id             0
keyword       61
location    2533
text           0
target         0
dtype: int64


In [5]:
# Show first 10 rows
df.head(10)


Unnamed: 0,id,keyword,location,text,target
0,1,,,Our Deeds are the Reason of this #earthquake M...,1
1,4,,,Forest fire near La Ronge Sask. Canada,1
2,5,,,All residents asked to 'shelter in place' are ...,1
3,6,,,"13,000 people receive #wildfires evacuation or...",1
4,7,,,Just got sent this photo from Ruby #Alaska as ...,1
5,8,,,#RockyFire Update => California Hwy. 20 closed...,1
6,10,,,#flood #disaster Heavy rain causes flash flood...,1
7,13,,,I'm on top of the hill and I can see a fire in...,1
8,14,,,There's an emergency evacuation happening now ...,1
9,15,,,I'm afraid that the tornado is coming to our a...,1
