This Project is part of Data Science Nanodegree Program by Udacity in collaboration with Figure Eight. The dataset contains pre-labelled tweet and messages from real-life disaster events. The project aim is to build a Natural Language Processing (NLP) model to categorize messages on a real time basis.
This project is divided in the following key sections:
- Processing data, building an ETL pipeline to extract data from source, clean the data and save them in a SQLite DB
- Build a machine learning pipeline to train the which can classify text message in various categories
- Run a web app which can show model results in real time
Clone this GIT repository:
git clone https://github.com/GuillaumeVerb/Disaster-Response-Pipelines.git
Run the following commands in the project's root directory to set up your database and model.
To run ETL pipeline that cleans data and stores in database python data/process_data.py data/messages.csv data/categories.csv data/DisasterResponse.db
To run ML pipeline that trains classifier and saves python models/train_classifer.py data/DisasterResponse.db models/classifier.pkl
Run the following command in the app's directory to run your web app. python run.py
Go to http://0.0.0.0:3001/
-
app
| - template
| |- master.html # main page of web app
| |- go.html # classification result page of web app
|- run.py # Flask file that runs app -
data
|- categories.csv # data to process
|- messages.csv # data to process
|- process_data.py
|- DisasterResponse.db # database to save clean data to -
models
|- train_classifer.py
|- classifier.pkl # saved model -
README.md
Udacity for providing an amazing Data Science Nanodegree Program
Figure Eight for providing the relevant dataset to train the model