# Building your own Scikit container

reference:  
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/scikit_bring_your_own/scikit_bring_your_own.ipynb



Reasons:  
- a specific version is not supported
- config specific dependencies
- use diff training/hosting than the one provided




How Amazon SM runs your training image:  
https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo-dockerfile.html

How Amazon SM provides training information:  
https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo-running-container.html

Scikit Decision Trees 
https://scikit-learn.org/stable/modules/tree.html


## Training

/opt/ml  
├── input  
│   ├── config  
│   │   ├── hyperparameters.json  
│   │   └── resourceConfig.json  
│   └── data  
│       └── <channel_name>  
│           └── <_input data>  
├── model  
│   └── <model files>  
└── output  
    └── failure  


## Hosting
/opt/ml  
└── model  
    └── <_model files>  
    

## The Container

├── Dockerfile    
├── build_and_push.sh  
└── decision_trees  
    ├── nginx.conf   
    ├── predictor.py  
    ├── serve  
    ├── train   
    └── wsgi.py  


* __nginx.conf__ is the configuration file for the nginx front-end. Generally, you should be able to take this file as-is.
* _predictor.py_ is the program that actually implements the Flask web server and the decision tree predictions for this app. You'll want to customize the actual prediction parts to your application. Since this algorithm is simple, we do all the processing here in this file, but you may choose to have separate files for implementing your custom logic.
* serve is the program started when the container is started for hosting. It simply launches the gunicorn server which runs multiple instances of the Flask app defined in predictor.py. You should be able to take this file as-is.
* train is the program that is invoked when the container is run for training. You will modify this program to implement your training algorithm.
* wsgi.py is a small wrapper used to invoke the Flask app. You should be able to take this file as-is.

In [1]:
import pandas as pd