<a href="https://colab.research.google.com/github/XinyaoWa/hydroai-colab/blob/main/SageMaker_Integration.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Prepare Environment**

**Prepare Account**

    IAM-role: arn:aws-cn:iam::055665198835:role/service-role/AmazonSageMaker-ExecutionRole-20220209T142703
    ECR-URL: 055665198835.dkr.ecr.cn-north-1.amazonaws.com.cn
    region: cn-north-1

**Local Encironment**
- 设置user: https://docs.amazonaws.cn/en_us/AmazonECR/latest/userguide/get-set-up-for-amazon-ecr.html

- 安装CLI: https://docs.amazonaws.cn/en_us/cli/latest/userguide/getting-started-install.html

- 配置CLI: https://docs.amazonaws.cn/cli/latest/userguide/cli-configure-quickstart.html

- CLI访问ECR: https://docs.amazonaws.cn/AmazonECR/latest/userguide/getting-started-cli.html


**SageMaker Notebook Instance**

- https://docs.aws.amazon.com/sagemaker/latest/dg/adapt-training-container.html

**reference**

- https://docs.aws.amazon.com/sagemaker/latest/dg/adapt-training-container.html

- https://github.com/aws/sagemaker-training-toolkit

- Notebook code

## **Prepare Docker**

**Prepare Code**

- Start original image

      docker run -it {image-v1}

- Prepare train code

      import tensorflow as tf
      mnist = tf.keras.datasets.mnist
      (x_train, y_train), (x_test, y_test) = mnist.load_data()
      x_train, x_test = x_train / 255.0, x_test / 255.0
      model = tf.keras.models.Sequential([
      tf.keras.layers.Flatten(input_shape=(28, 28)),
      tf.keras.layers.Dense(128, activation='relu'),
      tf.keras.layers.Dropout(0.2),
      tf.keras.layers.Dense(10, activation='softmax')])
      model.compile(optimizer='adam',
      loss='sparse_categorical_crossentropy',
      metrics=['accuracy'])
      model.fit(x_train, y_train, epochs=1)
      model.evaluate(x_test, y_test)

- Modify Parameter

  - Hyperparameter

        parser.add_argument("--learning-rate", type=int, default=1)
    
  - System parameter

      Data_path: 
      
        SM_CHANNEL_TRAINING='/opt/ml/input/data/training'
        SM_CHANNEL_TESTING='/opt/ml/input/data/testing'
        parser.add_argument("--training", type=str, default=os.environ["SM_CHANNEL_TRAINING"])
        parser.add_argument("--testing", type=str, default=os.environ["SM_CHANNEL_TESTING"])

       Model_Dir:

        SM_MODEL_DIR=/opt/ml/model
        parser.add_argument('model_dir', type=str, default=os.environ['SM_MODEL_DIR'])

- Verify Code in Docker

- Put the code at: */opt/ml/code*

**Install Toolkit**

        pip3 install sagemaker-training

**Save Docker Image**

        docker commit {container-ID} {image-v2}
        


## **Generate the Docker**

**Dockerfile**

    FROM {image-v2}
    ENV SAGEMAKER_PROGRAM {runfile}
    ENV PATH {PATH}
    # ENV PATH /opt/intel/oneapi/intelpython/latest/bin/libfabric:/opt/intel/oneapi/intelpython/latest/bin:/opt/intel/oneapi/intelpython/latest/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/spark-3.2.0-bin-hadoop3.2/bin:/usr/lib/jvm/java-8-openjdk-amd64/jre/bin

**Generate the Docker**

    docker build -t {image-v3} .

**Tag the image**

        docker tag {image-v3} {ECR-URL}/{repo_name}:{tag}

**Push the Docker**

- Generate the ECR repo

        algorithm_name={repo_name}
        aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1
        if [ $? -ne 0 ]
        then
        aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
        fi

- Login in AWS Docker 

        aws ecr get-login-password --region {region} | docker login --username AWS --password-stdin {ECR-URL}
    Such as:
        aws ecr get-login-password --region cn-north-1 | docker login --username AWS --password-stdin 055665198835.dkr.ecr.cn-north-1.amazonaws.com.cn

- Push to ECR

        docker push {ECR-URL}/{repo_name}:{tag}

## **Test**

**Local Test**

    from sagemaker.estimator import Estimator
    estimator = Estimator(image_uri={image-v3},
                      role={IAM-role},
                      instance_count=1,
                      instance_type='local')
    estimator.fit()

**Online Test**

- Push the data to S3 bucket

        import sagemaker

        sagemaker_session = sagemaker.session.Session()
        bucket = sagemaker_session.default_bucket()
        s3_folder = "hydroai-integration/TwitterRecSys2021Dataset_sample"
        print(sagemaker_session.upload_data("TwitterRecSys2021Dataset_sample", bucket, s3_folder))

- Test

        import sagemaker
        import json

        def json_encode_hyperparameters(hyperparameters):
            return {str(k): json.dumps(v) for (k, v) in hyperparameters.items()}
        hyperparameters = json_encode_hyperparameters({
          "sigopt_api_token":"SHJAKPKQIQOESRBQDHLAGYVEZVULHJVJTNAVVQUHFRHLXZVZ",
          "observation_budget": 1, 
          "enable_sigopt": True})

        est = sagemaker.estimator.Estimator(
            image_uri = {ECR-URL}/{repo_name}:{tag},
            role={IAM-tole},
            instance_count=1,
            instance_type='ml.m5.4xlarge',
            base_job_name="hydroai-integration-recsys-test",
            hyperparameters=hyperparameters,
        )
            
        est.fit({"training": {S3-URL}})