---

<p><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/50/Oracle_logo.svg/2560px-Oracle_logo.svg.png" width="200" align = "left"></p>



## **<h1 align ="right"><b> Oracle CloudWorld - Las Vegas</b></h1>**

# **<h1 align ="middle"><b>NVIDIA Triton Inference Server, deploying GPT2</b></h1>**

Step by step instructions on deploying GPT2 using OCI Data Science, Model Deployment, on NVIDIA Triton Inference server

---

In [None]:
## note. The below runs in Frankfurt. chance 'fra' in 'fra.ocir.io' to 'xx.ocir.io'

In [1]:
# Conda installed and used: Tensforflow28_p38_gpu_v1

# additional:
#!pip install transformers tf2onnx
#!pip install oracle-ads --upgrade
#!pip install oci --upgrade
#!pip install tensorflow --upgrade

import os
from transformers import GPT2Tokenizer, TFGPT2LMHeadModel
from transformers import GPT2Tokenizer
import oci
import ads

---

# **1. GPT2 Model**

## **1.1 Run Model Downloader and Vocab Downloader**

In [None]:
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = TFGPT2LMHeadModel.from_pretrained(
    "gpt2", from_pt=True, pad_token_id=tokenizer.eos_token_id
)
model.save_pretrained("./gpt2model", saved_model=True)

In [None]:
!mkdir ./vocab

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
tokenizer.save_vocabulary("./vocab")

## **1.2 Transform tensorflow model to ONNX format**

In [None]:
!mkdir ./converted_output

In [None]:
os.system("python -m tf2onnx.convert --saved-model ./gpt2model/saved_model/1 --opset 11  --output ./converted_output/model.onnx")

## **1.3 Test, load and see the input and output layers of model.onnx**

In [6]:
# import onnx
# from onnx_tf.backend import prepare
 
# onnx_model = onnx.load("/home/datascience/4_ocw_las_vegas_triton/model.onnx")
# output = onnx_model.graph.output
# input_all = onnx_model.graph.input

# print(input_all)
# print("######################################################################################")
# print(output)


-----

## **1.3 Clone Git repo and model artifacts**

In [None]:
!git clone https://github.com/oracle-samples/oci-data-science-ai-samples.git
!mv /home/datascience/4_ocw_las_vegas_triton/oci-data-science-ai-samples/model-deployment/containers/Triton/gpt2_ensemble/gpt-pipeline /home/datascience/4_ocw_las_vegas_triton/

## **1.5 Copy merges.txt and config.json to folder**

In [None]:
!cp ./vocab/merges.txt ./gpt-pipeline/encoder/1/
!cp ./vocab/vocab.json ./gpt-pipeline/encoder/1/
!cp ./vocab/merges.txt ./gpt-pipeline/decoder/1/
!cp ./vocab/vocab.json ./gpt-pipeline/decoder/1/

## **1.6 Make dir and copy model.onnx**

In [None]:
#!mkdir ./gpt-pipeline/gpt2/1
!cp ./converted_output/model.onnx ./model_repository/gpt2/1/

---

# **2. Triton Inference Server**

## **2.1 Build Triton Server**

In [None]:
# Create a linux compute, download the private key and upload in directory here.

In [None]:
## Steps in terminal

#change security of private key
chmod 400 /home/datascience/4_ocw_las_vegas_triton/private_key.key

#ssh into compute shape
ssh -i /home/datascience/4_ocw_las_vegas_triton/private_key.key opc@89.168.91.125

#install docker on compute
sudo yum install docker

In [None]:
# exit the ssh connection
exit

# copy docker file to compute
scp -i /home/datascience/4_ocw_las_vegas_triton/private_key.key -pr /home/datascience/4_ocw_las_vegas_triton/Dockerfile opc@89.168.91.125:/home/opc
scp -i /home/datascience/4_ocw_las_vegas_triton/private_key.key -pr /home/datascience/4_ocw_las_vegas_triton/entrypoint.sh opc@89.168.91.125:/home/opc

## **2. 2 Build Docker**

In [None]:
######################## changed 1.0.0 to 1.1.0

In [None]:
#copy and run in terminal
ssh -i /home/datascience/4_ocw_las_vegas_triton/private_key.key opc@89.168.91.125 "docker build -t triton-server:1.1.0 . -f Dockerfile"

##### Output example:
###### Successfully tagged localhost/triton-server:1.0.0
###### ecdf0956040bf0a4b192d3ae072100adadc1e925260cf1be221d94dcb5740df5

## **2.3 Log in docker**

In [None]:
## 1. Generate an Auth Token. Go to user settings in OCI, create Auth Token = password.
## 2. User name to log in ocir is: <tenancy-namespace>/<username>. Example: oraseemeaanalytics/oracleidentitycloudservice/bob.peulen@oracle.com

In [None]:
# ssh into compute
ssh -i /home/datascience/4_ocw_las_vegas_triton/private_key.key opc@89.168.91.125

## log in and enter credentials. Auth token and user name
#docker login fra.ocir.io
docker login -u 'frqap2zhtzbe/oracleidentitycloudservice/bob.peulen@oracle.com' --password 'ZI]jhAd]cNpllg9vQZCu' fra.ocir.io

## **2.4 Create Container Registry**

In OCI, go to Container Registry. Click "Create Repository". In the below, we used "triton_inference_server"

## **2.5 Tag Docker**

In [None]:
docker tag triton-server:1.1.0 fra.ocir.io/frqap2zhtzbe/nvidia_triton_server

## **2.6 Push Docker to OCIR**

In [None]:
docker push fra.ocir.io/frqap2zhtzbe/nvidia_triton_server

---

# **3. Store model in Model Catalog**

In [None]:
#rename artifact folder to 'model_repository'

In [None]:
!zip -r artifacts_gpt2_v5.zip ./model_repository/ 

# **4. Model Deployment**

In [None]:
## see steps here to deploy from UI: https://blogs.oracle.com/ai-and-datascience/post/oci-nvidia-triton-inference-server

In [None]:
from oci.data_science.models import UpdateModelConfigurationDetails, ModelConfigurationDetails, OcirModelDeploymentEnvironmentConfigurationDetails, CreateModelDeploymentDetails, InstanceConfiguration
from oci import data_science
# image_disgest you can find in the Container Registry

## **4.1 Define Model Deployment**

In [None]:
config = oci.config.from_file()

# Initialize service client with default config file
data_science_client = oci.data_science.DataScienceClient(config)

instance = InstanceConfiguration(instance_shape_name = "VM.GPU.A10.1")

# Create a model configuration details object
model_config_details = ModelConfigurationDetails(
    model_id= "ocid1.datasciencemodel.oc1.eu-frankfurt-1.amaaaaaangencdyagh4jvam2sc7omvrwhtle2t47yhspvobfeecadivetrcq",
    instance_configuration = instance
)

# Create the container environment configuration
environment_config_details = OcirModelDeploymentEnvironmentConfigurationDetails(
    environment_configuration_type="OCIR_CONTAINER",
    environment_variables={'CONTAINER_TYPE': 'TRITON'},
    image="fra.ocir.io/frqap2zhtzbe/triton_inference_server:latest",
    image_digest="sha256:ac88175fdc3e77db43cc382b65c1f93b242fa6d9947d074308714c0f2ddf9984",
    cmd=[
        "/entrypoint.sh",
        "/opt/ds/model/deployed_model"
        "None",
        "5000"
    ],
    server_port=5000,
    health_check_port=5000
)

# create a model type deployment
single_model_deployment_config_details = data_science.models.SingleModelDeploymentConfigurationDetails(
    deployment_type="SINGLE_MODEL",
    model_configuration_details=model_config_details,
    environment_configuration_details=environment_config_details
)

#logging


# set up parameters required to create a new model deployment.
create_model_deployment_details = CreateModelDeploymentDetails(    
    display_name= "gpt2_triton",  
    model_deployment_configuration_details = single_model_deployment_config_details,
    compartment_id = "ocid1.compartment.oc1..aaaaaaaae3n6r6hrjipbap2hojicrsvkzatrtlwvsyrpyjd7wjnw4za3m75q",
    project_id = "ocid1.datascienceproject.oc1.eu-frankfurt-1.amaaaaaangencdyaik5ssdqk4as2bhldxprh7vnqpk7yycsm7vymd344cgua"
)


## **4.2 Create model deployment**

In [None]:
## create model deployment
create_model_deployment_response = data_science_client.create_model_deployment(
    create_model_deployment_details=create_model_deployment_details)
print(create_model_deployment_response.data)

----

# **5. Model Inference**

In [2]:
data = "Machine learning is a field of computer science"

In [3]:
import requests
import oci
from oci.signer import Signer

url = f"https://modeldeployment.eu-frankfurt-1.oci.customer-oci.com/ocid1.datasciencemodeldeployment.oc1.eu-frankfurt-1.amaaaaaangencdyaevohj36x3qs4id5fcwhbehudd6kvpcipybxugq7gmzxa/predict"


config = oci.config.from_file("~/.oci/config")
auth = Signer(
   tenancy=config['tenancy'],
   user=config['user'],
   fingerprint=config['fingerprint'],
   private_key_file_location=config['key_file'],
   pass_phrase=config['pass_phrase'])


count = 0
max_gen_len = 10
gen_sentence = data

while count < max_gen_len:
    payload = {
            "inputs": [
                {
                    "name": "TEXT",
                    "datatype": "BYTES",
                    "shape": [1],
                    "data": [gen_sentence],
                }
            ]
        }

    headers = {"model_name": "ensemble_model", "model_version": "1"}

    ret = requests.post(
            url,
            json=payload,
            auth=auth,
            headers=headers
        )

    print(ret.status_code)
    res = ret.json()
    next_seq = str(res["outputs"][0]['data'][0])
    gen_sentence += " " + next_seq

    count += 1

print("Input Seq::", data)
print("Out Seq::", gen_sentence)


200
200
200
200
200
200
200
200
200
200
Input Seq:: Machine learning is a field of computer science
Out Seq:: Machine learning is a field of computer science that has been around for a long time . It


In [None]:
# documentation
# https://blogs.oracle.com/ai-and-datascience/post/llama2-oci-data-science-cloud-platform