The Disaster Response Pipeline project creates a web app to leverage an ML model that analyzes messages sent during a natural disaster, and classifies them according to the type of need/aid being requested. The Webapp provides statistics on the types of messages used to train the model, and allows users to evaluate new messages and how they would be classified by the model.
The project should run on deployments of Python versions 3.*. An environment.yml is provided in case of deploying on a new environment,
For instance, the environment.yml file can be used to create a conda environment with required packages via:
conda create -n myenv --file environment.yml
-
To run ETL pipeline that cleans data and stores in database:
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
-
To train and export model:
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
-
To execute the web application just run the following command in the app's directory:
python run.py
-
Go to the provided address for the website. Ex: http://0.0.0.0:3001/
This project was created as part of the assignments of Udacity's Data Science Nanodegree. The Disaster Response Pipeline project involves analyzing data from messages sent during a natural disaster via social media, news and other means.
The goal of this project is to build a Machine Learning pipeline capable of ingesting and cleaning the original data and building a NLP model capable of categorizing future messages in terms of what type of need they might relate to, providing support to the different teams involved in disaster relief by providing a way to quickly classify messages being sent in real time.
The home page of the web application will show overall statistics of the messages used to train the model, like the one presented below:
A message dialog is also provided in order to classify new messages. Once submitted, the resulting categories matched for the message will show in a screen like the one below:
The following structure of files is present on this repo:
├── app
│ ├── run.py # Web app startup script
│ └── templates
│ ├── go.html
│ └── master.html
├── data
│ ├── disaster_categories.csv # Original categories used for model training
│ ├── disaster_messages.csv # Original messages used for model training
│ ├── DisasterResponse.db # Database of messages
│ └── process_data.py # Code to load, clean data and export it to database
├── environment.yml
├── LICENSE
├── models
│ ├── classifier.pkl # Pretrained model
│ └── train_classifier.py # Code to train and save model
└── README.md # This README
The data used for the analysis is provided by Figure Eight. This work is part of an assignment for the Data Science Nanodegree offered by Udacity.
The code for this project is available at Github.
This project is distributed under MIT License.