Demo how to use AWS SageMaker to perform machine learning tasks. Dataset used is sales data of Walmart. We employ machine learning algorithms to predict forthcoming 28 days sales unit of each item in each store.
Open a SageMaker notebook. FYI - I use ml.c5.2xlarge with 5GB EBS to run the code in this repository.
The notebook This notebook shows how to do data analysis and preprocessing on SageMaker notebook. Packages are pre-installed therefore we can execute the notebook directly without machine provisioning.
On SageMaker, there are many built-in algorithms can be used directly. There is the list to referece. All algorithms are managed in form of docker images, and are hosted on ECR (Elastic Container Registry). In this notebook, We use xgboost 1.0-1 to perform training and inferencing
On SageMaker, you can define your own algorithms to use. This notebook demonstrate how to perform bring your own container.
It is important to facilitate efficient communication between data scientists. SageMaker Experiments enables the team exchange the experiment information transparently. Moreover, results of the experiments can be easily reproduced; since the input/output artifacts in the experiments are kept in AWS S3, the hyperparameters, types of machines and algorithm used are recorded as well.
It is also important to have a machanism to monitor the experiments and detect the troubles encountered early. To do troubleshooting further, record the criticle metrics and/or tensors are necessary. SageMaker Debugger provides the machanism for team to do training job monitoring and this notebook demonstrate how to use SageMaker Experiments and Debugger