Skip to content

trthatcher/Mahalangur

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

39 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Mahalangur: Summiting the Himalayas

Mahalangur is a small Python data science project that demonstrates how to design Scikit Learn models with a replicable project structure and how to create a simple web API and visualization in Flask. The project structure is largely based on the Cookiecutter Data Science template and is outlined in the Project Organization section below.

Visualization Demo

A demo of the web interface for Mahalangur is hosted on PythonAnywhere:

Acknowledgement

This project is named after the Mahalangur Himal, a section of the Himalayas that contains four of the six tallest mountains - including Mount Everest. The expedition data is sourced from the Himalayan Database.

License

This project is ISC licensed. However, the climb data is sourced from the Himalayan Database - please reach out to them if you wish to use their data for anything other than personal use.

Project Organization

The repository is a Python project using the following folder structure:

Mahalangur
β”œβ”€β”€ LICENSE
β”œβ”€β”€ README.md          <- The top-level README for developers using this project
β”œβ”€β”€ Makefile           <- Makefile with commands like `make install_requirements`
β”‚
β”œβ”€β”€ mahalangur
β”‚   β”œβ”€β”€ assets         <- Serialized models that are to be distributed with the
β”‚   β”‚                     package
β”‚   β”‚
β”‚   β”œβ”€β”€ data           <- Code for downloading or generating raw data
β”‚   β”‚   β”œβ”€β”€ metadata   <- Static metadata such as code tables
β”‚   β”‚   └── sql        <- Database definitions for SQLite datastore
β”‚   β”‚
β”‚   β”œβ”€β”€ feat           <- Code to turn raw data into features for modelling
β”‚   β”œβ”€β”€ web            <- Flask API and web visualization code
β”‚   └── rfmodel.py     <- Code for training the model
β”‚
β”œβ”€β”€ notebooks          <- Jupyter notebooks used for exploring/analysing the data and
β”‚                         for prototyping models
β”‚
└── references         <- Data dictionaries, manuals and other explanatory materials

Usage

Installation

Clone Mahalangur to a folder of your choice:

git clone https://github.com/trthatcher/Mahalangur.git
cd Mahalangur

Next, create the mahalangur conda environment and install the requirements:

make environment
conda activate mahalangur
make install_requirements

This will install the package and its dependencies.

Getting the Latest Data

By default, this package will download training data to a .mahalangur directory in your home directory. You can override this by setting a MAHALANGUR_HOME environment variable to the directory of your choosing. The Mahalangur data directory is laid out in the following way:

.mahalangur
β”‚
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ raw            <- Raw data is downloaded to this directory
β”‚   β”œβ”€β”€ processed      <- Processed data is stored in this directory
β”‚   └── mahalangur.db  <- This database is created to store the processed data
β”‚
β”œβ”€β”€ metadata           <- Processed metadata is output here
β”‚
└── models             <- Serialized models are output here

To download the latest version of the Himalayan Database, run the following command in the terminal:

make dataset

This will populate the .mahalangur/data directory with updated extracts and transfer them into the mahalangur.db SQLite database. An updated model can be created by running:

make model_rf

The model will be stored in the .mahalangur/models directory. Note that if you would like to update the model used by the package, you will need to transfer it to the assets directory in the package.

Starting the API

Once the package has been cloned and installed, you can run the web visualization locally with the following command:

make api