This project was completed as part of the Udacity Data Science Nanodegree program.
To correctly classify messages into disaster and emergency-related categories.
The following sections describe how to install the application structure locally.
- Python 3.6+
- Web framework: Flask, gunicorn
- Data Visualization: plotly
- Database: SQLAlchemy
- Data Preparation & Modelling: pandas, NLTK (average_perceptron_tagger, punkt, WordNet), scikit-learn
For a full list of requirements, please see the requirements.txt file.
To clone the project, use the following: git clone https://github.com/dagrewal/nlp-disaster-app.git
To install the dependencies using pip, make sure that you are in the root directory of the project. You can check this by using the ls
command in the terminal and ensuring that you can see requirements.txt. If you are using Windows, use dir
in the command prompt. Then use the following: pip install -r requirements.txt
to install the required dependencies.
Once everything has been installed correctly, again using terminal or command prompt, type gunicorn nlp-disaster-app:app
and navigate to localhost:8000 in your web browser.
If you experience any issues during the installation process, please log your issue using the Issues tab in the GitHub project.
There are two Python files that are used to (a) read in and clean the raw data and store in a SQL database and (b) prepare the data, engineer new features and train a multi-class supervised learning model using a training dataset.
Using the terminal (or command prompt):
- Navigate to
nlp-disaster-app/data
- Run
python process_data.py disaster_messages.csv disaster_categories.csv [insert_database_name].db
The Python script will proceed to read in the two .csv files, clean the data as specified in the clean_data function and store the cleaned data into a database that you specified in the arguments above. The database will be saved to the same folder as the Python script.
Using the terminal (or command prompt):
- Navigate to
nlp-disaster-app/models
- Run
python train_classifier.py ../data[insert_database_name].db [insert_saved_model_name]
The Python script will proceed to prepare the data for training, engineer new features and train a supervised learning model using the prepared training data. The script will save the model into a .pkl file in the same folder as the Python script. For specifics on the engineering of new features and the model development, please inspect the functions within the script.
Once everything has finished running (note that the train_classifier.py script will take a while to run), navigate to nlp-disaster-app
and run gunicorn nlp-disaster-app:app
and navigate to localhost:8000.
It could be a task for the reader to improve the accuracy of the model by engineering new features and applying different supervised learning models to improve the model performance. The reader could also create more visualizations to be included on the home page of the application.
https://opensource.org/licenses/MIT
This project was completed as part of the Udacity Data Science Nanodegree program. The data used for this project was provided by Figure8.