DISASTER RESPONSE - FIGURE8

Project Summary

In this project, texts and tweets from various natural disasters in 2010 and 2012 have been provided by Figure Eight in their original language and with translation in English. All messages are labeled with their relation to 36 categories.

After cleaning and consolidating the data into an SQLite database we are extracting features and testing different Machine Learning (ML) algorithms with the intent to better predict future message categories and potentially organize a proper response more efficiently.

Local installation

Clone github repository: git clone https://github.com/6one2/DisasterResponsePipeline_Figure8.git
Create virtual environment: pipenv shell and install required packages pipenv install

For deployment purpose, a custom package (herokutils) has been created for the classes used in both the training and the prediction process phase.

Choosing the model (running the scripts locally)

After running the ETL script data/process_data.py you will be able to run the ML pipeline model/train_classifier.py

To run the ETL and save the database:
python data/process_data.py data/messages.csv data/categories.csv data/DisasterResponse.db

To run the ML pipeline that trains the classifier and saves the model:
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl

The ML pipeline runs as follow:

creates features: check model/train_classifier.build_model()

TF-IDF on cleaned and lemmatized words
create custom features (count of nouns, cardinals, characters, punctuations, stopwords and word mean length) and scales the counts between 0 and 1 to match TF-IDF scores.

A grid search tests first a RandomForest Classifier and then a Support Vector classifier with different parameters.

Other classifiers have been tested (LogisticResgression, Adaboost MultinomialNB...) but the RandomForest and SVC provided the best scores. We decided to run train_classifiers.py with only params2 or params3 in GridsearchCV() for speed but change to params1 for full grid search.

The weighted average scores of the best estimator per caregory is printed in the console and saved in a file as model/model_results.md

the category child_alone has been removed from the dataset because it was composed of only 1 class.

Running the App

The app is hosted on heroku here.

Because of the constraints of the free heroku dynos, this app was trained only over 5000 random samples from DisasterResponse.db in order to limit the size to the model.

To run the app locally, after having generated the classifier run: python app.py

verify that the name of the classifier file is correct line 26

Deployment warning

If you intend to deploy the app, the choice of the classifier might impact the deployment on heroku (for the free service). The size of the classifier (large size with RandomForest classifier) might require LFS tracking for Github and special build-packs integration on heroku. For more info read this.

Project Structure

.
├── Pipfile
├── Pipfile.lock
├── Procfile
├── README.md
├── app.py
├── data
│   ├── DisasterResponse.db
│   ├── categories.csv
│   ├── messages.csv
│   ├── plot_data.py
│   └── process_data.py
├── model
│   ├── classifier.pkl
│   ├── model_results.md
│   └── train_classifier.py
├── nltk.txt
└── templates
    ├── go.html
    └── master.html

References

datasets from Figure Eight: https://appen.com/datasets/combined-disaster-response-data/
udacity DataScience Nano degree: link

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

model

model

templates

templates

.gitattributes

.gitattributes

.gitignore

.gitignore

Pipfile

Pipfile

Pipfile.lock

Pipfile.lock

Procfile

Procfile

README.md

README.md

app.py

app.py

nltk.txt

nltk.txt

Repository files navigation

DISASTER RESPONSE - FIGURE8

Project Summary

Local installation

Choosing the model (running the scripts locally)

Running the App

Deployment warning

Project Structure

References

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
data		data
model		model
templates		templates
.gitattributes		.gitattributes
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
Procfile		Procfile
README.md		README.md
app.py		app.py
nltk.txt		nltk.txt

6one2/DisasterResponsePipeline_Figure8

Folders and files

Latest commit

History

Repository files navigation

DISASTER RESPONSE - FIGURE8

Project Summary

Local installation

Choosing the model (running the scripts locally)

Running the App

Deployment warning

Project Structure

References

About

Topics

Resources

Stars

Watchers

Forks

Languages