# Deploying Granite Code models in Amazon SageMaker

---

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.

![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-2/inference|generativeai|huggingfacetgi|ibm-granite|granite-code-instruct.ipynb)

---

The IBM Granite Code models are a family of high-performance, foundational language models pre-trained on over 3 trillion tokens of code and natural language data across 116 programming languages. These models range from 3 billion to 34 billion parameters and come in base and instruction-following variants.

What sets the Granite Code models apart is their strong performance on a wide range of code intelligence tasks like code generation, translation, analysis, and refactoring - often outperforming larger open-source models.

IBM has released the Granite Code models to open source under the permissive Apache 2.0 license, enabling their use for both research and commercial purposes with no restrictions. The models are available on [Hugging Face](https://huggingface.co/ibm-granite).

[Hugging Face](https://huggingface.co/) is a popular open source hub for machine learning (ML) models. AWS and Hugging Face have a partnership that allows a seamless integration through SageMaker with a set of AWS Deep Learning Containers (DLCs) for training and inference in PyTorch or TensorFlow, and Hugging Face estimators and predictors for the SageMaker Python SDK. SageMaker features and capabilities help developers and data scientists get started with natural language processing (NLP) on AWS with ease.

In this notebook, we will deploy the Granite models on Amazon SageMaker for accelerating legacy code conversion and modernisation use cases. 

## Deploying Granite Code models in Amazon SageMaker

### Prepare the environment

In [None]:
!pip install -U sagemaker -q

In [None]:
import json
import sagemaker
from sagemaker.huggingface import HuggingFaceModel
from sagemaker import get_execution_role
from sagemaker.huggingface import get_huggingface_llm_image_uri

sagemaker_session = sagemaker.Session()
account_id = sagemaker_session.account_id()
role = sagemaker.get_execution_role()
region = sagemaker_session.boto_region_name

# Use latest container image (2.0.3) for the Granite models
image_uri = "763104351884.dkr.ecr.{}.amazonaws.com/huggingface-pytorch-tgi-inference:2.3.0-tgi2.0.3-gpu-py310-cu121-ubuntu22.04-v2.0".format(
    region
)

# print ecr image uri
print(f"llm image uri: {image_uri}")

### Create SageMaker Estimator

Link to the Granite Code models in HuggingFace: https://huggingface.co/ibm-granite

Next we configure the model object by specifying a unique name, the image_uri for the managed TGI container, and the execution role for the endpoint. Additionally, we specify a number of environment variables including the HF_MODEL_ID which corresponds to the model from the HuggingFace Hub that will be deployed, and the HF_TASK which configures the inference task to be performed by the model.

You should also define SM_NUM_GPUS, which specifies the tensor parallelism degree of the model. Tensor parallelism can be used to split the model across multiple GPUs, which is necessary when working with LLMs that are too big for a single GPU. Here, you should set SM_NUM_GPUS to the number of available GPUs on your selected instance type. For example, in this tutorial, we set SM_NUM_GPUS to 4 because our selected instance type ml.g5.12xlarge has 4 available GPUs.

The HuggingFaceModel handles downloading the Granite model and dependencies, packaging them into a Docker container, and deploying to a SageMaker inference endpoint.

In [None]:
# sagemaker config
instance_type = "ml.g5.12xlarge"
number_of_gpu = 4
health_check_timeout = 600

# Hub model configuration
hub = {
    "HF_MODEL_ID": "ibm-granite/granite-20b-code-instruct",  # since g5.12xlarge has 4 GPU, we are sharding model weights accross 4 GPU's. If you are testing it on g5.2xlarge set this to 1 as it has only 1 GPU
    "SM_NUM_GPUS": json.dumps(
        number_of_gpu
    ),  # no effect this varible is only for SM provided TGI container
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    image_uri=image_uri,
    env=hub,
    role=role,
)

After we have created the HuggingFaceModel we deploy it to Amazon SageMaker using the deploy method. We deploy the model with the ml.g5.12xlarge instance type, which has 4 NVIDIA A10G GPUs and 96GB of GPU memory.

In [None]:
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type=instance_type,
    container_startup_health_check_timeout=health_check_timeout,
)

SageMaker will now create our endpoint and deploy the model to it. This can takes a 10-15 minutes.

### Run inference with the model

Now that we have the Granite Code model loaded and deployed to a SageMaker endpoint, we can start generating or converting code. We use the predict method from the predictor to run inference on our endpoint. We can inference with different parameters to impact the generation. Parameters can be defined as in the parameters attribute of the payload.

#### Example 1: Code Generation

In this example, we want to write a function in the Python programming language that reverses a string.

In [None]:
prompt_1 = """Using the directions below, generate Python code for the specified task.

Question:
# Write a Python function that prints 'Hello World!' string 'n' times.

Answer:
def print_n_times(n):
    for i in range(n):
        print("Hello World!")

<end of code>

Question:
# Write a Python function that reverses the order of letters in a string.
# The function named 'reversed' takes the argument 'my_string', which is a string. It returns the string in reverse order.

Answer:"""

In [None]:
# hyperparameters for llm
payload = {
    "inputs": prompt_1,
    "parameters": {
        "do_sample": True,
        "top_p": 0.6,
        "temperature": 0.1,
        "top_k": 50,
        "max_new_tokens": 300,
        "repetition_penalty": 1.03,
        "stop": ["<end of code>"],
    },
}

# send request to endpoint
response = predictor.predict(payload)

print(response[0]["generated_text"][len(prompt_1) :])

The output contains Python code similar to the following snippet:
```
def reverse_string(my_string):
    return my_string[::-1]
```
Be sure to test the generated code to verify that it works as you expect.

For example, if you run `reversed("good morning")`, the result is `gninrom doog`.

#### Example 2: Code Conversion

In this example, we want to convert code from one programming language to another. The prompt below converts a code snippet from C++ to Python.

In [None]:
prompt_2 = """
Question:
Translate the following code from C++ to Python.
C++:
#include "bits/stdc++.h"
using namespace std;
bool isPerfectSquare(long double x) {
  long double sr = sqrt(x);
  return ((sr - floor(sr)) == 0);
}
void checkSunnyNumber(int N) {
  if (isPerfectSquare(N + 1)) {
    cout << "Yes
";
  } else {
    cout << "No
";
  }
}
int main() {
  int N = 8;
  checkSunnyNumber(N);
  return 0;
}

Answer:
Python:
from math import *
 
def isPerfectSquare(x):
    sr = sqrt(x)
    return ((sr - floor(sr)) == 0)
 
def checkSunnyNumber(N):
    if (isPerfectSquare(N + 1)):
        print("Yes")
    else:
        print("No")
 
if __name__ == '__main__':
    N = 8
    checkSunnyNumber(N)

<end of code>

Question:
Translate the following code from C++ to Python.
C++:
#include <bits/stdc++.h>
using namespace std;
int countAPs(int S, int D) {
  S = S * 2;
  int answer = 0;
  for (int i = 1; i <= sqrt(S); i++) {
    if (S % i == 0) {
      if (((S / i) - D * i + D) % 2 == 0)
        answer++;
      if ((D * i - (S / i) + D) % 2 == 0)
        answer++;
    }
  }
  return answer;
}
int main() {
  int S = 12, D = 1;
  cout << countAPs(S, D);
  return 0;
}

Answer:
"""

You can send the prompt to the Granite Code model loaded and deployed to the SageMaker endpoint, and adjust the following hyperparameters.

In [None]:
# hyperparameters for llm
payload_2 = {
    "inputs": prompt_2,
    "parameters": {
        "do_sample": True,
        "top_p": 0.6,
        "temperature": 0.1,
        "top_k": 50,
        "max_new_tokens": 1000,
        "repetition_penalty": 1.03,
        "stop": ["<end of code>"],
    },
}

# send request to endpoint
response_2 = predictor.predict(payload_2)

print(response_2[0]["generated_text"][len(prompt_2) :])

The output contains Python code similar to the following snippet:
```
Python:
from math import *
 
def countAPs(S, D):
    S = S * 2
    answer = 0
    for i in range(1, int(sqrt(S)) + 1):
        if S % i == 0:
            if ((S // i) - D * i + D) % 2 == 0:
                answer += 1
            if (D * i - (S // i) + D) % 2 == 0:
                answer += 1
    return answer
 
if __name__ == '__main__':
    S = 12
    D = 1
    print(countAPs(S, D))

```
Be sure to test the generated code to verify that it works as you expect.

#### Example 3: Code Conversion (C to Java)

In this example, you want to convert code from one programming language to another. The prompt below converts a code snippet from C to Java.

Specifically, we cover common programming constructs like linked lists and file I/O operations. The C code is converted to Java while preserving the functionality and logic. In the Java code, we utilize classes, objects, and Java-specific APIs like `FileWriter` and `BufferedReader` to achieve similar results as the C code.

* #1: The C code implements a singly linked list data structure. It defines a Node struct containing an integer data value and a pointer to the next node. The code provides functions to create a new node, add a node to the end of the list, and print the list.
* #2: The C code demonstrates how to write data to a file and then read the data back from the file. It opens a file named "example.txt" in write mode, writes a string to the file, and closes the file. Then, it opens the same file in read mode, reads the contents into a buffer, and prints the buffer to the console.

In [None]:
prompt_3 = """
Question:
Translate the following code from C to Java.

C Code:

```c
#include <stdio.h>
#include <stdlib.h>

typedef struct Node {
    int data;
    struct Node* next;
} Node;

Node* createNode(int data) {
    Node* newNode = (Node*)malloc(sizeof(Node));
    newNode->data = data;
    newNode->next = NULL;
    return newNode;
}

void addNode(Node** head, int data) {
    Node* newNode = createNode(data);
    if (*head == NULL) {
        *head = newNode;
        return;
    }
    Node* temp = *head;
    while (temp->next != NULL) {
        temp = temp->next;
    }
    temp->next = newNode;
}

void printList(Node* head) {
    Node* temp = head;
    while (temp != NULL) {
        printf("%d ", temp->data);
        temp = temp->next;
    }
    printf("\n");
}

int main() {
    Node* head = NULL;
    addNode(&head, 1);
    addNode(&head, 2);
    addNode(&head, 3);
    printList(head);
    return 0;
}
```

Java Code:

```java
class Node {
    int data;
    Node next;

    Node(int data) {
        this.data = data;
        next = null;
    }
}

class LinkedList {
    Node head;

    void addNode(int data) {
        Node newNode = new Node(data);
        if (head == null) {
            head = newNode;
            return;
        }
        Node temp = head;
        while (temp.next != null) {
            temp = temp.next;
        }
        temp.next = newNode;
    }

    void printList() {
        Node temp = head;
        while (temp != null) {
            System.out.print(temp.data + " ");
            temp = temp.next;
        }
        System.out.println();
    }

    public static void main(String[] args) {
        LinkedList list = new LinkedList();
        list.addNode(1);
        list.addNode(2);
        list.addNode(3);
        list.printList();
    }
}
```

<end of code>

Question:
Translate the following code from C to Java.

C Code:

```c
#include <stdio.h>
#include <stdlib.h>

int main() {
    FILE* file = fopen("example.txt", "w");
    if (file == NULL) {
        printf("Error opening file!\n");
        return 1;
    }

    fprintf(file, "This is an example of writing to a file.\n");
    fclose(file);

    file = fopen("example.txt", "r");
    if (file == NULL) {
        printf("Error opening file!\n");
        return 1;
    }

    char buffer[100];
    while (fgets(buffer, sizeof(buffer), file) != NULL) {
        printf("%s", buffer);
    }

    fclose(file);
    return 0;
}
```
Answer:
"""

In [None]:
# hyperparameters for llm
payload_3 = {
    "inputs": prompt_3,
    "parameters": {
        "do_sample": True,
        "top_p": 0.6,
        "temperature": 0.1,
        "top_k": 50,
        "max_new_tokens": 1000,
        "repetition_penalty": 1.03,
        "stop": ["<end of code>"],
    },
}

# send request to endpoint
response_3 = predictor.predict(payload_3)

print(response_3[0]["generated_text"][len(prompt_3) :])

The output contains Java code similar to the following snippet:
```
import java.io.*;

public class FileExample {
    public static void main(String[] args) {
        try (FileWriter writer = new FileWriter("example.txt");
             BufferedReader reader = new BufferedReader(new FileReader("example.txt"))) {

            writer.write("This is an example of writing to a file.");

            String line;
            while ((line = reader.readLine())!= null) {
                System.out.println(line);
            }
        } catch (IOException e) {
            System.out.println("An error occurred: " + e.getMessage());
        }
    }
}
```

### Clean up

In [None]:
predictor.delete_endpoint()

## Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.


![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-1/inference|generativeai|huggingfacetgi|ibm-granite|granite-code-instruct.ipynb)

![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-2/inference|generativeai|huggingfacetgi|ibm-granite|granite-code-instruct.ipynb)

![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-1/inference|generativeai|huggingfacetgi|ibm-granite|granite-code-instruct.ipynb)

![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ca-central-1/inference|generativeai|huggingfacetgi|ibm-granite|granite-code-instruct.ipynb)

![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/sa-east-1/inference|generativeai|huggingfacetgi|ibm-granite|granite-code-instruct.ipynb)

![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-1/inference|generativeai|huggingfacetgi|ibm-granite|granite-code-instruct.ipynb)

![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-2/inference|generativeai|huggingfacetgi|ibm-granite|granite-code-instruct.ipynb)

![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-3/inference|generativeai|huggingfacetgi|ibm-granite|granite-code-instruct.ipynb)

![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-central-1/inference|generativeai|huggingfacetgi|ibm-granite|granite-code-instruct.ipynb)

![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-north-1/inference|generativeai|huggingfacetgi|ibm-granite|granite-code-instruct.ipynb)

![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-1/inference|generativeai|huggingfacetgi|ibm-granite|granite-code-instruct.ipynb)

![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-2/inference|generativeai|huggingfacetgi|ibm-granite|granite-code-instruct.ipynb)

![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-1/inference|generativeai|huggingfacetgi|ibm-granite|granite-code-instruct.ipynb)

![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-2/inference|generativeai|huggingfacetgi|ibm-granite|granite-code-instruct.ipynb)

![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-south-1/inference|generativeai|huggingfacetgi|ibm-granite|granite-code-instruct.ipynb)
