# Run Llama 2 Models in SageMaker JumpStart

---
In this demo notebook, we demonstrate how to use the SageMaker Python SDK to deploy a JumpStart model for Text Generation using the Llama 2 fine-tuned model optimized for dialogue use cases.

---

## Setup

***

In [5]:
%pip install --upgrade --quiet sagemaker

[0mNote: you may need to restart the kernel to use updated packages.


***
You can continue with the default model or choose a different model: this notebook will run with the following model IDs :
- `meta-textgeneration-llama-2-7b-f`
- `meta-textgeneration-llama-2-13b-f`
- `meta-textgeneration-llama-2-70b-f`
***

In [93]:
model_id = "meta-textgeneration-llama-2-7b-f"

In [94]:
model_version = "3.*"

## Deploy model

***
You can now deploy the model using SageMaker JumpStart. For successful deployment, you must manually change the `accept_eula` argument in the model's deploy method to `True`.
***

In [95]:
from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(model_id=model_id, model_version=model_version, instance_type="ml.g5.2xlarge" )

In [96]:
predictor = model.deploy(accept_eula=True)

----------!

## Invoke the endpoint

***
### Supported Parameters

***
This model supports many parameters while performing inference. They include:

* **max_length:** Model generates text until the output length (which includes the input context length) reaches `max_length`. If specified, it must be a positive integer.
* **max_new_tokens:** Model generates text until the output length (excluding the input context length) reaches `max_new_tokens`. If specified, it must be a positive integer.
* **num_beams:** Number of beams used in the greedy search. If specified, it must be integer greater than or equal to `num_return_sequences`.
* **no_repeat_ngram_size:** Model ensures that a sequence of words of `no_repeat_ngram_size` is not repeated in the output sequence. If specified, it must be a positive integer greater than 1.
* **temperature:** Controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words. If `temperature` -> 0, it results in greedy decoding. If specified, it must be a positive float.
* **early_stopping:** If True, text generation is finished when all beam hypotheses reach the end of sentence token. If specified, it must be boolean.
* **do_sample:** If True, sample the next word as per the likelihood. If specified, it must be boolean.
* **top_k:** In each step of text generation, sample from only the `top_k` most likely words. If specified, it must be a positive integer.
* **top_p:** In each step of text generation, sample from the smallest possible set of words with cumulative probability `top_p`. If specified, it must be a float between 0 and 1.
* **return_full_text:** If True, input text will be part of the output generated text. If specified, it must be boolean. The default value for it is False.
* **stop**: If specified, it must be a list of strings. Text generation stops if any one of the specified strings is generated.

We may specify any subset of the parameters mentioned above while invoking an endpoint. Next, we show an example of how to invoke endpoint with these arguments.

**NOTE**: If `max_new_tokens` is not defined, the model may generate up to the maximum total tokens allowed, which is 4K for these models. This may result in endpoint query timeout errors, so it is recommended to set `max_new_tokens` when possible. For 7B, 13B, and 70B models, we recommend to set `max_new_tokens` no greater than 1500, 1000, and 500 respectively, while keeping the total number of tokens less than 4K.

**NOTE**: This model only supports 'system', 'user' and 'assistant' roles, starting with 'system', then 'user' and alternating (u/a/u/a/u...).

***

### Example prompts
***
The examples in this section demonstrate how to perform text generation with conversational dialog as prompt inputs. Example payloads are retrieved programmatically from the `JumpStartModel` object.

Input messages for Llama-2 chat models should exhibit the following format. The model only supports 'system', 'user' and 'assistant' roles, starting with 'system', then 'user' and alternating (u/a/u/a/u...). The last message must be from 'user'. A simple user prompt may look like the following:
```
<s>[INST] {user_prompt} [/INST]
```
You may also add a system prompt with the following syntax:
```
<s>[INST] <<SYS>>
{system_prompt}
<</SYS>>

{user_prompt} [/INST]
```
Finally, you can have a conversational interaction with the model by including all previous user prompts and assistant responses in the input:
```
<s>[INST] <<SYS>>
{system_prompt}
<</SYS>>

{user_prompt_1} [/INST] {assistant_response_1} </s><s>[INST] {user_prompt_1} [/INST]
```
***

In [28]:
# example_payloads = model.retrieve_all_examples()

# for payload in example_payloads[:1]:
#     response = predictor.predict(payload.body)
#     print("\nInput\n", payload.body, "\n\nOutput\n", response[0]["generated_text"], "\n\n===============")

# lasson专用

## 相似度计算方法

In [40]:
# lasson download bleu
# !pip install nltk sacrebleu

## bleu

In [47]:
!sacrebleu --version

sacrebleu 2.4.2


In [66]:
import sacrebleu

# 参考文本（字符串）
reference_text = "The patient is referred due to the rapid progression of the tumor despite initial treatment modalities. Further evaluation and management from a specialized neuro-oncology team are essential for this recurrent and aggressive tumor."

# 生成的文本（字符串）
candidate_text = "The patient is referred due to the rapid progression of the tumor despite initial treatment modalities. Further evaluation and management from a specialized neuro-oncology team are essential for this recurrent and aggressive tumor."

# 计算BLEU分数，调整n-gram权重
# 默认权重是(0.25, 0.25, 0.25, 0.25)，分别对应1-gram到4-gram
# 我们可以调整权重，例如只考虑1-gram和2-gram，权重分别为(0.5, 0.5)
# 设置n-gram权重
weights = (0.5, 0.5, 0, 0)

# 创建BLEU对象，并传入自定义的权重
weights = (0.5, 0.5, 0, 0)

bleu = sacrebleu.corpus_bleu([candidate_text], [[reference_text]])
print(f"BLEU score (): {bleu.score}")
print(bleu)

BLEU score (): 100.00000000000004
BLEU = 100.00 100.0/100.0/100.0/100.0 (BP = 1.000 ratio = 1.000 hyp_len = 35 ref_len = 35)


## ROUGE

In [65]:
# !pip install rouge

In [67]:
from rouge import Rouge

# 计算ROUGE分数
rouge = Rouge()
scores = rouge.get_scores(candidate_text, reference_text, avg=True) # 由于只有一个 reference，所以 avg没有影响
print(f"ROUGE scores: {scores}")

ROUGE scores: {'rouge-1': {'r': 1.0, 'p': 1.0, 'f': 0.999999995}, 'rouge-2': {'r': 1.0, 'p': 1.0, 'f': 0.999999995}, 'rouge-l': {'r': 1.0, 'p': 1.0, 'f': 0.999999995}}


# deploy llama2

In [104]:
# lasson do
with open("../letter.txt", "r", encoding="utf-8") as f:
    letter = f.read()
# print(letter)

response = predictor.predict({'inputs': letter,
                             'parameters': {'max_new_tokens': 128}})

print("Output:\n", response[0]["generated_text"].strip(), end="\n\n\n")

Output:
 {"referral_content": "During a routine eye examination, we noted significant disc swelling in both eyes. The referral is due to the presentation of disc swelling in both eyes. This finding was corroborated by the presence of blurred disc margins and swollen discs on fundoscopy. Given these concerning features, further diagnostic workup and specialized management are warranted to rule out underlying pathologies such as intracranial hypertension or other optic neuropathies."}




## test all test.json with llama2

In [106]:
import json


In [141]:
import json

def evaluate_jsonl_with_llama2(path):
    test_data_json = []
    with open(path, 'r', encoding='utf-8') as f:
        for line in f:
            test_data_json.append(json.loads(line.strip()))
    rouge_score_list = []
    bleu_score_list = []

    rouge = Rouge()

    evaluate_list = []

    for single_test in test_data_json:
        instruction = single_test["instruction"]
        whole_letter = single_test["whole_letter"]
        referral_content = single_test["referral_content"]
        prompt = f"{instruction}\n\n###\n\n{whole_letter}\n\n###"
        response = predictor.predict({'inputs': prompt,
                                 'parameters': {'max_new_tokens': 128}})
        # print(prompt)
        reference_text = referral_content
        try:
            tmp = json.loads(response[0]["generated_text"].strip())
            candidate_text = tmp["referral_content"]
        except Exception as err:
            candidate_text = "extract failure"
        finally:

            evaluate_list.append(candidate_text)
            # print("predict: " + candidate_text)
            # print("real: " + reference_text)


            bleu = sacrebleu.corpus_bleu([candidate_text], [[reference_text]])
            bleu_score_list.append(bleu.score)
            print(bleu.score)
            # print()
            # 计算ROUGE分数
            scores = rouge.get_scores(candidate_text, reference_text) # 由于只有一个 reference，所以 avg没有影响
            rouge_score_list.append(scores)
    return evaluate_list, bleu_score_list, rouge_score_list

In [142]:
train_evaluate_list, train_bleu_score_list,train_rouge_score_list = evaluate_jsonl_with_llama2("../train_test_data/train.jsonl")

100.00000000000004
100.00000000000004
0.0
100.00000000000004
100.00000000000004
100.00000000000004
21.47111723416974
100.00000000000004
0.0
62.16502425002141
2.1817477645132297
79.24997800443228
0.0
100.00000000000004
37.854187754140106
100.00000000000004
60.453809371402485
100.00000000000004
100.00000000000004
8.208499862389884
55.803514577004734
100.00000000000004
94.24306874344771
32.91929878079057
47.676062866897006
100.00000000000004
100.00000000000004
100.00000000000004
3.5823421191287177
100.00000000000004
100.00000000000004
0.0
100.00000000000004
92.20093321379696
100.00000000000004
100.00000000000004
42.90620009431088
1.488123358577587
94.21727090955797
1.8315638888734187
1.4820386596780344
0.0
100.00000000000004
4.9787068367863965
100.00000000000004
38.88955639892231
77.23309772005763
100.00000000000004
100.00000000000004
0.0
100.00000000000004
100.00000000000004
100.00000000000004
38.88955639892231
1.4491339833989647
1.1008462059749267
0.0
100.00000000000004
100.000000000000

In [117]:
test_rouge_score_list = rouge_score_list
test_bleu_score_list = bleu_score_list
test_evaluate_list = evaluate_list

In [144]:
print(test_bleu_score_list)

[40.656965974059936, 100.00000000000004, 100.00000000000004, 100.00000000000004, 91.574062728735, 1.1657779351238686, 4.31593092614526, 17.776857282813793, 100.00000000000004, 100.00000000000004, 100.00000000000004, 100.00000000000004, 100.00000000000004, 49.246428767541, 36.78794411714425, 3.176605498590708, 100.00000000000004, 38.289288597511224, 14.264897010923287, 16.70480665692845, 100.00000000000004]


In [148]:
print(test_evaluate_list)

['The referral reason is the presence of XFM on the anterior surface of the IOL in the right eye.', 'The reason for referral is due to the presence of a neurovascular conflict seen on MRI, which has led to dislocation and atrophy of the trigeminal root, causing significant paroxysmal pain. Given her family history of TN, advanced evaluation and potential surgical intervention might be required.', 'The patient needs a referral for a comprehensive cardiac evaluation due to the findings suggestive of Ebstein Disease, which likely contributed to his sudden cardiac death. Timely diagnosis and appropriate management are crucial for this condition.', 'The patient presented with fever, sudden onset nausea, abdominal pain, and hematemesis for the past three days. Given his complex medical background and the acute nature of his current symptoms, further specialist evaluation and management are warranted.', 'Reason for referral: The patient has persistent skin sores bilaterally in her groin and s

In [147]:
print(train_bleu_score_list)

[100.00000000000004, 100.00000000000004, 0.0, 100.00000000000004, 100.00000000000004, 100.00000000000004, 21.47111723416974, 100.00000000000004, 0.0, 62.16502425002141, 2.1817477645132297, 79.24997800443228, 0.0, 100.00000000000004, 37.854187754140106, 100.00000000000004, 60.453809371402485, 100.00000000000004, 100.00000000000004, 8.208499862389884, 55.803514577004734, 100.00000000000004, 94.24306874344771, 32.91929878079057, 47.676062866897006, 100.00000000000004, 100.00000000000004, 100.00000000000004, 3.5823421191287177, 100.00000000000004, 100.00000000000004, 0.0, 100.00000000000004, 92.20093321379696, 100.00000000000004, 100.00000000000004, 42.90620009431088, 1.488123358577587, 94.21727090955797, 1.8315638888734187, 1.4820386596780344, 0.0, 100.00000000000004, 4.9787068367863965, 100.00000000000004, 38.88955639892231, 77.23309772005763, 100.00000000000004, 100.00000000000004, 0.0, 100.00000000000004, 100.00000000000004, 100.00000000000004, 38.88955639892231, 1.4491339833989647, 1.

In [149]:
print(train_evaluate_list)

["Given the complexity of Jimmy's condition, he requires specialized care from a pediatric urologist for ongoing management and potential further intervention. Your expertise in this area would be greatly beneficial for optimal patient outcomes.", "During a routine eye examination, I observed swollen discs and indistinct margins in both of Ms. Doe's optic nerves. Her symptoms include headaches and transient visual obscurations. Given the severity of these findings, I believe that comprehensive neuro-ophthalmological assessment is essential to rule out any potential underlying conditions, such as increased intracranial pressure or other neurological issues.", 'extract failure', 'The patient is being referred to you for further evaluation and management of recently diagnosed Type 2 Diabetes Mellitus, which has shown signs of poor glycemic control despite current treatment regimens. His recent HbA1c levels have been persistently elevated despite adherence to medication and lifestyle chang

In [115]:
print(evaluate_list[6])
print(test_data_json[6]["referral_content"])

The reason for this referral is the confirmed diagnosis of Anisakis simplex infection.
The reason for this referral is the confirmed diagnosis of Anisakis simplex infection. The parasite was carefully removed using standard biopsy forceps, and microbiological examination confirmed the species. Although her symptoms improved rapidly post-endoscopic removal, further follow-up and potential treatment are necessary to ensure complete eradication and to monitor for any residual effects.


***
While not used in the previously provided example payloads, you can format your own messages to the Llama-2 model with the following utility function.
***

In [None]:
from typing import Dict, List


def format_messages(messages: List[Dict[str, str]]) -> List[str]:
    """Format messages for Llama-2 chat models.
    
    The model only supports 'system', 'user' and 'assistant' roles, starting with 'system', then 'user' and 
    alternating (u/a/u/a/u...). The last message must be from 'user'.
    """
    prompt: List[str] = []

    if messages[0]["role"] == "system":
        content = "".join(["<<SYS>>\n", messages[0]["content"], "\n<</SYS>>\n\n", messages[1]["content"]])
        messages = [{"role": messages[1]["role"], "content": content}] + messages[2:]

    for user, answer in zip(messages[::2], messages[1::2]):
        prompt.extend(["<s>", "[INST] ", (user["content"]).strip(), " [/INST] ", (answer["content"]).strip(), "</s>"])

    prompt.extend(["<s>", "[INST] ", (messages[-1]["content"]).strip(), " [/INST] "])

    return "".join(prompt)


dialog = [
    {"role": "system", "content": "Always answer with Haiku"},
    {"role": "user", "content": "I am going to Paris, what should I see?"},
]

prompt = format_messages(dialog)

## Clean up the endpoint

In [143]:
# Delete the SageMaker endpoint
predictor.delete_model()
predictor.delete_endpoint()