Skip to content

catwhiskers/m5-prediction-accuracy

Repository files navigation

m5-prediction-accuracy

Objective

Demo how to use AWS SageMaker to perform machine learning tasks. Dataset used is sales data of Walmart. We employ machine learning algorithms to predict forthcoming 28 days sales unit of each item in each store.

Structure

Preparation

Open a SageMaker notebook. FYI - I use ml.c5.2xlarge with 5GB EBS to run the code in this repository.

Data analysis and preprocessing

The notebook This notebook shows how to do data analysis and preprocessing on SageMaker notebook. Packages are pre-installed therefore we can execute the notebook directly without machine provisioning.

Performing training and prediction by a SageMaker built-in algorithm - xgboost

On SageMaker, there are many built-in algorithms can be used directly. There is the list to referece. All algorithms are managed in form of docker images, and are hosted on ECR (Elastic Container Registry). In this notebook, We use xgboost 1.0-1 to perform training and inferencing

Performing training by your own algorithm

On SageMaker, you can define your own algorithms to use. This notebook demonstrate how to perform bring your own container.

SageMaker Experiment and Debugger

It is important to facilitate efficient communication between data scientists. SageMaker Experiments enables the team exchange the experiment information transparently. Moreover, results of the experiments can be easily reproduced; since the input/output artifacts in the experiments are kept in AWS S3, the hyperparameters, types of machines and algorithm used are recorded as well.

It is also important to have a machanism to monitor the experiments and detect the troubles encountered early. To do troubleshooting further, record the criticle metrics and/or tensors are necessary. SageMaker Debugger provides the machanism for team to do training job monitoring and this notebook demonstrate how to use SageMaker Experiments and Debugger

Releases

No releases published

Packages

No packages published

Languages