# Setup for Fine-tuning Korean ReRanker using Amazon SageMaker
- Container: conda_python3
- We recommend python 3.10 or later.
- version check: !python -V

## 1. Install python SDK
- **패키지 설치 후 notebook이 재시작 합니다**

In [15]:
install_needed = True

In [16]:
import sys
import IPython

if install_needed:
    print("installing deps and restarting kernel")
    !sudo curl -L "https://github.com/docker/compose/releases/download/v2.7.0/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
    !sudo chmod +x /usr/local/bin/docker-compose
    !{sys.executable} -m pip install -U pip
    !{sys.executable} -m pip install -U awscli
    !{sys.executable} -m pip install -U botocore
    !{sys.executable} -m pip install -U boto3
    !{sys.executable} -m pip install -U sagemaker 
    !{sys.executable} -m pip install -U termcolor
    !{sys.executable} -m pip install -U transformers
    !{sys.executable} -m pip install -U datasets
    !{sys.executable} -m pip install -U sentencepiece
    !{sys.executable} -m pip install -U FlagEmbedding

    IPython.Application.instance().kernel.do_shutdown(True)

installing deps and restarting kernel
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 24.5M  100 24.5M    0     0   125M      0 --:--:-- --:--:-- --:--:--  125M
Collecting pip
  Downloading pip-23.3.2-py3-none-any.whl.metadata (3.5 kB)
Downloading pip-23.3.2-py3-none-any.whl (2.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m58.7 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.3.1
    Uninstalling pip-23.3.1:
      Successfully uninstalled pip-23.3.1
Successfully installed pip-23.3.2
Collecting awscli
  Downloading awscli-1.32.6-py3-none-any.whl.metadata (11 kB)
Collecting botocore==1.34.6 (from awscli)
  Downloading botocore-1.34.6-py3-none-any.whl.metadata 

## 2. Building serving image
- Fine-tuned reranker 모델 서빙은 AWS의 `HuggingFace Inference Containers` 를 사용합니다. 
    - Native Deep Learning Conatiner (DLC)의 정보는 [link](https://github.com/aws/deep-learning-containers/blob/master/available_images.md)를 통해 확인하세요.
- 원할한 서빙을 위해서는 `transformer >= 4.36.2` 가 필요합니다. (transformer ver.: 4.28.1 in native container)
- 때문에 해당 예제에서는 custom container image를 이용하여 serving 하도록 합니다. 
- **[중요] ECR 사용을 위해서는 `AmazonEC2ContainerRegistryFullAccess` 권한이 필요합니다**

In [9]:
import boto3
from utils.ecr import ecr_handler

In [10]:
%%writefile src/serving/Dockerfile-serving

FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:2.0.0-transformers4.28.1-gpu-py310-cu118-ubuntu20.04
RUN pip install -U pip
RUN pip install -U botocore
RUN pip install -U awscli
RUN pip install -U boto3
RUN pip install -U sagemaker
RUN pip install -U transformers
ENV PYTHONUNBUFFERED=TRUE

Overwriting src/serving/Dockerfile-serving


In [11]:
build_image = True

### **[주의]** 아래 코드의 region 및 accound id 변경하지 않음
`ecr.build_docker(docker_dir, dockerfile, repository_name, strRegionName="us-east-1", strAccountId="763104351884")`

In [12]:
if build_image:

    ecr = ecr_handler()
    region = boto3.Session().region_name
    account_id = boto3.client("sts").get_caller_identity().get("Account")

    repository_name = "ko-reranker-serve"  ## <-- 원하는 docker repostory 이름을 추가
    repository_name = repository_name.lower()
    dockerfile = "Dockerfile-serving"
    docker_dir = "./src/serving/"
    tag = "latest"

    ecr.build_docker(docker_dir, dockerfile, repository_name, strRegionName="us-east-1", strAccountId="763104351884")
    ecr_repository_uri = ecr.register_image_to_ecr(region, account_id, repository_name, tag)
    
else:
    ecr_repository_uri = "<your ecr repo uri>" #"419974056037.dkr.ecr.us-east-1.amazonaws.com/ko-reranker-serve"

/home/ec2-user/SageMaker/fine-tune-reranker-kr
/home/ec2-user/SageMaker/fine-tune-reranker-kr/src/serving
strDockerFile Dockerfile-serving
aws ecr get-login --region 'us-east-1' --registry-ids '763104351884' --no-include-email


https://docs.docker.com/engine/reference/commandline/login/#credentials-store



Login Succeeded

Sending build context to Docker daemon  3.584kB

Step 1/8 : FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:2.0.0-transformers4.28.1-gpu-py310-cu118-ubuntu20.04
 ---> 561ff95be4ef
Step 2/8 : RUN pip install -U pip
 ---> Using cache
 ---> 3b6307422d72
Step 3/8 : RUN pip install -U botocore
 ---> Running in 22c930f916be
Collecting botocore
  Downloading botocore-1.34.6-py3-none-any.whl.metadata (5.6 kB)
Downloading botocore-1.34.6-py3-none-any.whl (11.9 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.9/11.9 MB 82.5 MB/s eta 0:00:00
Installing collected packages: botocore
  Attempting uninstall: botocore
    Found existing installation: botocore 1.31.9
    Uninstalling botocore-1.31.9:
      Successfully uninstalled botocore-1.31.9
[91mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
awscli 1.29.9 requires botoco

https://docs.docker.com/engine/reference/commandline/login/#credentials-store



Login Succeeded

aws ecr create-repository --repository-name 'ko-reranker-serve'
docker tag 'ko-reranker-serve:latest' '419974056037.dkr.ecr.us-east-1.amazonaws.com/ko-reranker-serve:latest'
docker push '419974056037.dkr.ecr.us-east-1.amazonaws.com/ko-reranker-serve:latest'
== REGISTER AN IMAGE TO ECR ==
