Disaster Response Pipeline

NLP classification model to categorise disaster response messages via an interactive web app.

Installation

There should be no necessary libraries to run the code here beyond the Anaconda distribution of Python. The code should run with no issues using Python version 3.

Project Motivation

The motivation behind undertaking this project was to gain familiarity with the process of developing an ETL pipeline to train a machine learning model. The steps of the process were as follows:

Creating a file to extract and transform labaled disaster response messages from CSV files and load the data into a SQLite database
Developing and tuning a classification model and saving the model into a pickle file
Creating a Flask app to display Plotly visualisations of the training data and provide an interface for predicting the labels of new messages
Writing the HTML files to display the app as a web page using Bootstrap templates

The version of the files contained in this repository can be used to render the app on a local machine.

Data Preprocessing and Modeling

The data for this project was provided in two separate CSV files: one containing the messages and the other containing their binary classifications of 36 category labels.

After merging the data sets, the steps for processing the messages into a model-friendly format were as follows:

Remove all punctuation and special characters from the text using a regular expression
Tokenise each document in the text
Lemmatise tokens, set to lower case, strip whitespace and filter out stop words
Convert documents into a matrix of vectorised token counts
Transform matrix into tf-idf representation

This process rendered the messages into a matrix of fearures suitable to train a Random Forest Classification model, with the 36 categories acting as a multilabel target variable.

The NLP preprocessing and model fitting were bundled into a single pipeline to better facilitate the saving of the model to then be loaded in the web app.

File Descriptions

app: folder containing HTML files for web page and run.py file to load the data, create the visualisations and deploy the model
data: folder containing CSV files of the original data, data_processing.py file containing ETL pipeline, and final data set loaded into SQLite database
models: folder containing file to train and save model and final model in classifier.pkl

To render the dashboard locally:

Download the repository
Run the ETL pipeline:
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
Build and train the model:
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
Navigate to the app folder and launch the app:
python run.py

Licensing, Authors, Acknowledgements

The data for this project was provided by Figure Eight (now Appen).

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
app		app
data		data
models		models
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disaster Response Pipeline

Table of Contents

Installation

Project Motivation

Data Preprocessing and Modeling

File Descriptions

Licensing, Authors, Acknowledgements

About

Releases

Packages

Languages

harryroper96/disaster_response_pipeline

Folders and files

Latest commit

History

Repository files navigation

Disaster Response Pipeline

Table of Contents

Installation

Project Motivation

Data Preprocessing and Modeling

File Descriptions

Licensing, Authors, Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages