# Run Llama 2 Models in SageMaker JumpStart

---
In this demo notebook, we demonstrate how to use the SageMaker Python SDK to deploy a JumpStart model for Text Generation using the Llama 2 fine-tuned model optimized for dialogue use cases.

---

## Setup

***

In [1]:
%pip install --upgrade --quiet sagemaker

Note: you may need to restart the kernel to use updated packages.


***
You can continue with the default model or choose a different model: this notebook will run with the following model IDs :
- `meta-textgeneration-llama-2-7b-f`
- `meta-textgeneration-llama-2-13b-f`
- `meta-textgeneration-llama-2-70b-f`
***

In [34]:
model_id = "meta-textgeneration-llama-2-7b-f"

In [35]:
model_version = "3.2.0"

## Deploy model

***
You can now deploy the model using SageMaker JumpStart. For successful deployment, you must manually change the `accept_eula` argument in the model's deploy method to `True`.
***

In [36]:
from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(model_id=model_id, model_version=model_version, instance_type="ml.g5.2xlarge")

In [37]:
predictor = model.deploy(accept_eula=True)

INFO:sagemaker:Creating model with name: meta-textgeneration-llama-2-7b-f-2024-07-30-10-29-46-017
INFO:sagemaker:Creating endpoint-config with name meta-textgeneration-llama-2-7b-f-2024-07-30-10-29-51-544
INFO:sagemaker:Creating endpoint with name meta-textgeneration-llama-2-7b-f-2024-07-30-10-29-51-544


----------!

## Invoke the endpoint

***
### Supported Parameters

***
This model supports many parameters while performing inference. They include:

* **max_length:** Model generates text until the output length (which includes the input context length) reaches `max_length`. If specified, it must be a positive integer.
* **max_new_tokens:** Model generates text until the output length (excluding the input context length) reaches `max_new_tokens`. If specified, it must be a positive integer.
* **num_beams:** Number of beams used in the greedy search. If specified, it must be integer greater than or equal to `num_return_sequences`.
* **no_repeat_ngram_size:** Model ensures that a sequence of words of `no_repeat_ngram_size` is not repeated in the output sequence. If specified, it must be a positive integer greater than 1.
* **temperature:** Controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words. If `temperature` -> 0, it results in greedy decoding. If specified, it must be a positive float.
* **early_stopping:** If True, text generation is finished when all beam hypotheses reach the end of sentence token. If specified, it must be boolean.
* **do_sample:** If True, sample the next word as per the likelihood. If specified, it must be boolean.
* **top_k:** In each step of text generation, sample from only the `top_k` most likely words. If specified, it must be a positive integer.
* **top_p:** In each step of text generation, sample from the smallest possible set of words with cumulative probability `top_p`. If specified, it must be a float between 0 and 1.
* **return_full_text:** If True, input text will be part of the output generated text. If specified, it must be boolean. The default value for it is False.
* **stop**: If specified, it must be a list of strings. Text generation stops if any one of the specified strings is generated.

We may specify any subset of the parameters mentioned above while invoking an endpoint. Next, we show an example of how to invoke endpoint with these arguments.

**NOTE**: If `max_new_tokens` is not defined, the model may generate up to the maximum total tokens allowed, which is 4K for these models. This may result in endpoint query timeout errors, so it is recommended to set `max_new_tokens` when possible. For 7B, 13B, and 70B models, we recommend to set `max_new_tokens` no greater than 1500, 1000, and 500 respectively, while keeping the total number of tokens less than 4K.

**NOTE**: This model only supports 'system', 'user' and 'assistant' roles, starting with 'system', then 'user' and alternating (u/a/u/a/u...).

***

### Example prompts
***
The examples in this section demonstrate how to perform text generation with conversational dialog as prompt inputs. Example payloads are retrieved programmatically from the `JumpStartModel` object.

Input messages for Llama-2 chat models should exhibit the following format. The model only supports 'system', 'user' and 'assistant' roles, starting with 'system', then 'user' and alternating (u/a/u/a/u...). The last message must be from 'user'. A simple user prompt may look like the following:
```
<s>[INST] {user_prompt} [/INST]
```
You may also add a system prompt with the following syntax:
```
<s>[INST] <<SYS>>
{system_prompt}
<</SYS>>

{user_prompt} [/INST]
```
Finally, you can have a conversational interaction with the model by including all previous user prompts and assistant responses in the input:
```
<s>[INST] <<SYS>>
{system_prompt}
<</SYS>>

{user_prompt_1} [/INST] {assistant_response_1} </s><s>[INST] {user_prompt_1} [/INST]
```
***

In [None]:
# example_payloads = model.retrieve_all_examples()

# for payload in example_payloads[:1]:
#     response = predictor.predict(payload.body)
#     print("\nInput\n", payload.body, "\n\nOutput\n", response[0]["generated_text"], "\n\n===============")

## Bleu

In [11]:
%pip install nltk sacrebleu

Collecting sacrebleu
  Downloading sacrebleu-2.4.2-py3-none-any.whl.metadata (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.0/58.0 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
Collecting portalocker (from sacrebleu)
  Downloading portalocker-2.10.1-py3-none-any.whl.metadata (8.5 kB)
Downloading sacrebleu-2.4.2-py3-none-any.whl (106 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m106.7/106.7 kB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading portalocker-2.10.1-py3-none-any.whl (18 kB)
Installing collected packages: portalocker, sacrebleu
Successfully installed portalocker-2.10.1 sacrebleu-2.4.2
Note: you may need to restart the kernel to use updated packages.


## bleu

In [None]:
!sacrebleu --version

sacrebleu 2.4.2


In [18]:
import sacrebleu

# 参考文本（字符串）
reference_text = "The patient is referred due to the rapid progression of the tumor despite initial treatment modalities. Further evaluation and management from a specialized neuro-oncology team are essential for this recurrent and aggressive tumor."

# 生成的文本（字符串）
candidate_text = "The patient is referred due to the rapid progression of the tumor despite initial treatment modalities. Further evaluation and management from a specialized neuro-oncology team are essential for this recurrent and aggressive tumor."

# 计算BLEU分数，调整n-gram权重
# 默认权重是(0.25, 0.25, 0.25, 0.25)，分别对应1-gram到4-gram
# 我们可以调整权重，例如只考虑1-gram和2-gram，权重分别为(0.5, 0.5)
# 设置n-gram权重
weights = (0.5, 0.5, 0, 0)

# 创建BLEU对象，并传入自定义的权重
weights = (0.5, 0.5, 0, 0)

bleu = sacrebleu.corpus_bleu([candidate_text], [[reference_text]])
print(f"BLEU score (): {bleu.score}")
print(bleu)

BLEU score (): 100.00000000000004
BLEU = 100.00 100.0/100.0/100.0/100.0 (BP = 1.000 ratio = 1.000 hyp_len = 35 ref_len = 35)


import re
import json
import pandas as pd
import sacrebleu
from sagemaker.jumpstart.model import JumpStartModel

In [38]:
import re
import json
import pandas as pd
import sacrebleu
import logging

ROUGE scores: {'rouge-1': {'r': 1.0, 'p': 1.0, 'f': 0.999999995}, 'rouge-2': {'r': 1.0, 'p': 1.0, 'f': 0.999999995}, 'rouge-l': {'r': 1.0, 'p': 1.0, 'f': 0.999999995}}


# deploy llama2

In [68]:
with open("letter1.txt", "r", encoding="utf-8") as f:
    letter = f.read()
# print(letter)

response = predictor.predict({'inputs': letter,
                             'parameters': {'max_new_tokens': 128}})

print("Output:\n", response[0]["generated_text"].strip(), end="\n\n\n")

Output:
 Output:

is_Papilledema: False
referral_content: "Further diagnostic evaluation and potential therapeutic intervention from a hepatologist are recommended to determine the nature of the liver mass and address the elevated hepatic transaminases."




In [6]:
# Lewis test all test.json with llama2 -

In [59]:
from concurrent.futures import ThreadPoolExecutor, as_completed
from requests.exceptions import Timeout

logging.basicConfig(level=logging.INFO)

def replace_output_prefix(input_str):
    result = re.sub(r'(?i)^Output:\s*', '', input_str)
    return result

def parse_model_response(response_text):
    # Split the response into lines and look for the referral_content line
    lines = response_text.split('\n')
    for line in lines:
        if "referral_content:" in line:
            return line.split("referral_content:", 1)[1].strip().strip('"')
    return "extract failure"

def evaluate_jsonl_with_llama2(predictor, path, output_jsonl_path, csv_file_path):
    test_data_json = []
    with open(path, 'r', encoding='utf-8') as f:
        for line in f:
            test_data_json.append(json.loads(line.strip()))
    
    bleu_score_list = []
    evaluate_list = []

    for single_test in test_data_json:
        instruction = single_test.get("instruction", "")
        whole_letter = single_test.get("whole_letter", "")
        referral_content = single_test.get("referral_content", "")
        prompt = f"{instruction}\n\n###\n\n{whole_letter}\n\n###"
        response = predictor.predict({'inputs': prompt, 'parameters': {'max_new_tokens': 512}})
        
        reference_text = referral_content
        candidate_text = "extract failure"
        try:
            tmp_str = replace_output_prefix(response[0]["generated_text"].strip())
            candidate_text = parse_model_response(tmp_str)
        except Exception as err:
            logging.error(f"Error processing ID: {single_test.get('id', 'unknown')}")
            logging.error(f"Response: {response[0]['generated_text'].strip() if response else 'No response'}")
            logging.error(err)
        finally:
            evaluate_list.append(candidate_text)
            bleu = sacrebleu.corpus_bleu([candidate_text], [[reference_text]])
            bleu_score_list.append(bleu.score)
            single_test["bleu"] = bleu.score
            single_test["predict_referral_content"] = candidate_text
    
    with open(output_jsonl_path, mode='w', encoding='utf-8') as f:
        for single_test in test_data_json:
            f.write(json.dumps(single_test, ensure_ascii=False) + '\n')
    logging.info(f"Predicted data has been saved to {output_jsonl_path}.")
    
    # Create and save CSV file
    csv_data = []

    for single_test in test_data_json:
        csv_data.append({
            "id": single_test.get("id", ""),
            "name": single_test.get("name", ""),
            "instruction": single_test.get("instruction", ""),
            "whole_letter": single_test.get("whole_letter", ""),
            "referral_content": single_test.get("referral_content", ""),
            "predict_referral_content": single_test.get("predict_referral_content", ""),
            "bleu": single_test.get("bleu", 0),
        })

    df = pd.DataFrame(csv_data)
    df.to_csv(csv_file_path, index=False, encoding='utf-8')
    logging.info(f"CSV file has been saved to {csv_file_path}")
    
    return evaluate_list, bleu_score_list


In [60]:
test_evaluate_list, test_bleu_score_list = evaluate_jsonl_with_llama2(predictor, 
                                        "test_data_735/test.jsonl", "300724-4lewis-predict_llama2_7b_f_test.jsonl",
                                                                "300724-4lewis-predict_llama2_7b_f_test.csv")

INFO:root:Predicted data has been saved to 300724-4lewis-predict_llama2_7b_f_test.jsonl.
INFO:root:CSV file has been saved to 300724-4lewis-predict_llama2_7b_f_test.csv


In [61]:
from concurrent.futures import ThreadPoolExecutor, as_completed
from requests.exceptions import Timeout

logging.basicConfig(level=logging.INFO)

def replace_output_prefix(input_str):
    result = re.sub(r'(?i)^Output:\s*', '', input_str)
    return result

def parse_model_response(response_text):
    # Split the response into lines and look for the referral_content line
    lines = response_text.split('\n')
    for line in lines:
        if "referral_content:" in line:
            return line.split("referral_content:", 1)[1].strip().strip('"')
    return "extract failure"

def evaluate_jsonl_with_llama2(predictor, path, output_jsonl_path, csv_file_path):
    test_data_json = []
    with open(path, 'r', encoding='utf-8') as f:
        for line in f:
            test_data_json.append(json.loads(line.strip()))
    
    bleu_score_list = []
    evaluate_list = []

    for single_test in test_data_json:
        instruction = single_test.get("instruction", "")
        whole_letter = single_test.get("whole_letter", "")
        referral_content = single_test.get("referral_content", "")
        prompt = f"{instruction}\n\n###\n\n{whole_letter}\n\n###"
        response = predictor.predict({'inputs': prompt, 'parameters': {'max_new_tokens': 512}})
        
        reference_text = referral_content
        candidate_text = "extract failure"
        try:
            tmp_str = replace_output_prefix(response[0]["generated_text"].strip())
            candidate_text = parse_model_response(tmp_str)
            # Fallback mechanism to handle cases where primary extraction fails
            if candidate_text == "extract failure":
                if "referral_content:" in tmp_str:
                    candidate_text = tmp_str.split("referral_content:", 1)[1].strip().strip('"')
                elif "The reason for this referral is" in tmp_str:
                    candidate_text = tmp_str.split("The reason for this referral is", 1)[1].split(".")[0].strip()
                elif "The referral is being made because" in tmp_str:
                    candidate_text = tmp_str.split("The referral is being made because", 1)[1].split(".")[0].strip()
        except Exception as err:
            logging.error(f"Error processing ID: {single_test.get('id', 'unknown')}")
            logging.error(f"Response: {response[0]['generated_text'].strip() if response else 'No response'}")
            logging.error(err)
        finally:
            evaluate_list.append(candidate_text)
            bleu = sacrebleu.corpus_bleu([candidate_text], [[reference_text]])
            bleu_score_list.append(bleu.score)
            single_test["bleu"] = bleu.score
            single_test["predict_referral_content"] = candidate_text
    
    with open(output_jsonl_path, mode='w', encoding='utf-8') as f:
        for single_test in test_data_json:
            f.write(json.dumps(single_test, ensure_ascii=False) + '\n')
    logging.info(f"Predicted data has been saved to {output_jsonl_path}.")
    
    # Create and save CSV file
    csv_data = []

    for single_test in test_data_json:
        csv_data.append({
            "id": single_test.get("id", ""),
            "name": single_test.get("name", ""),
            "instruction": single_test.get("instruction", ""),
            "whole_letter": single_test.get("whole_letter", ""),
            "referral_content": single_test.get("referral_content", ""),
            "predict_referral_content": single_test.get("predict_referral_content", ""),
            "bleu": single_test.get("bleu", 0),
        })

    df = pd.DataFrame(csv_data)
    df.to_csv(csv_file_path, index=False, encoding='utf-8')
    logging.info(f"CSV file has been saved to {csv_file_path}")
    
    return evaluate_list, bleu_score_list

In [62]:
test_evaluate_list, test_bleu_score_list = evaluate_jsonl_with_llama2(predictor, 
                                        "test_data_735/test.jsonl", "300724-5lewis-predict_llama2_7b_f_test.jsonl",
                                                                "300724-5lewis-predict_llama2_7b_f_test.csv")

INFO:root:Predicted data has been saved to 300724-5lewis-predict_llama2_7b_f_test.jsonl.
INFO:root:CSV file has been saved to 300724-5lewis-predict_llama2_7b_f_test.csv


In [66]:
from concurrent.futures import ThreadPoolExecutor, as_completed
from requests.exceptions import Timeout

logging.basicConfig(level=logging.INFO)

def replace_output_prefix(input_str):
    result = re.sub(r'(?i)^Output:\s*', '', input_str)
    return result

def parse_model_response(response_text):
    # Split the response into lines and look for the referral_content line
    lines = response_text.split('\n')
    for line in lines:
        if "referral_content:" in line:
            return line.split("referral_content:", 1)[1].strip().strip('"')
    return "extract failure"

def extract_referral_content(tmp_str):
    # Primary extraction using common patterns
    patterns = [
        r'referral_content\s*:\s*"(.*?)"',
        r'The reason for this referral is\s*(.*?)\.',
        r'The referral is being made because\s*(.*?)\.',
        r'Referral Reason:\s*(.*?)\.',
        r'The reason for referral is\s*(.*?)\.'
    ]
    
    for pattern in patterns:
        match = re.search(pattern, tmp_str)
        if match:
            return match.group(1).strip()

    # Fallback extraction logic
    sentences = tmp_str.split('. ')
    for sentence in sentences:
        if 'referral' in sentence.lower():
            return sentence.strip()
    
    return "extract failure"

def evaluate_jsonl_with_llama2(predictor, path, output_jsonl_path, csv_file_path):
    test_data_json = []
    with open(path, 'r', encoding='utf-8') as f:
        for line in f:
            test_data_json.append(json.loads(line.strip()))
    
    bleu_score_list = []
    evaluate_list = []

    for single_test in test_data_json:
        instruction = single_test.get("instruction", "")
        whole_letter = single_test.get("whole_letter", "")
        referral_content = single_test.get("referral_content", "")
        prompt = f"{instruction}\n\n###\n\n{whole_letter}\n\n###"
        response = predictor.predict({'inputs': prompt, 'parameters': {'max_new_tokens': 512}})
        
        reference_text = referral_content
        candidate_text = "extract failure"
        try:
            tmp_str = replace_output_prefix(response[0]["generated_text"].strip())
            candidate_text = extract_referral_content(tmp_str)
        except Exception as err:
            logging.error(f"Error processing ID: {single_test.get('id', 'unknown')}")
            logging.error(f"Response: {response[0]['generated_text'].strip() if response else 'No response'}")
            logging.error(err)
        finally:
            evaluate_list.append(candidate_text)
            bleu = sacrebleu.corpus_bleu([candidate_text], [[reference_text]])
            bleu_score_list.append(bleu.score)
            single_test["bleu"] = bleu.score
            single_test["predict_referral_content"] = candidate_text
    
    with open(output_jsonl_path, mode='w', encoding='utf-8') as f:
        for single_test in test_data_json:
            f.write(json.dumps(single_test, ensure_ascii=False) + '\n')
    logging.info(f"Predicted data has been saved to {output_jsonl_path}.")
    
    # Create and save CSV file
    csv_data = []

    for single_test in test_data_json:
        csv_data.append({
            "id": single_test.get("id", ""),
            "name": single_test.get("name", ""),
            "instruction": single_test.get("instruction", ""),
            "whole_letter": single_test.get("whole_letter", ""),
            "referral_content": single_test.get("referral_content", ""),
            "predict_referral_content": single_test.get("predict_referral_content", ""),
            "bleu": single_test.get("bleu", 0),
        })

    df = pd.DataFrame(csv_data)
    df.to_csv(csv_file_path, index=False, encoding='utf-8')
    logging.info(f"CSV file has been saved to {csv_file_path}")
    
    return evaluate_list, bleu_score_list


In [67]:
test_evaluate_list, test_bleu_score_list = evaluate_jsonl_with_llama2(predictor, 
                                        "test_data_735/test.jsonl", "300724-6lewis-predict_llama2_7b_f_test.jsonl",
                                                                "300724-6lewis-predict_llama2_7b_f_test.csv")

INFO:root:Predicted data has been saved to 300724-6lewis-predict_llama2_7b_f_test.jsonl.
INFO:root:CSV file has been saved to 300724-6lewis-predict_llama2_7b_f_test.csv


In [74]:
from concurrent.futures import ThreadPoolExecutor, as_completed
from requests.exceptions import Timeout

logging.basicConfig(level=logging.INFO)

def replace_output_prefix(input_str):
    result = re.sub(r'(?i)^Output:\s*', '', input_str)
    return result

def parse_model_response(response_text):
    # Split the response into lines and look for the referral_content line
    lines = response_text.split('\n')
    for line in lines:
        if "referral_content:" in line:
            return line.split("referral_content:", 1)[1].strip().strip('"')
    return "extract failure"

def extract_referral_content(tmp_str):
    # Primary extraction using common patterns
    patterns = [
        r'referral_content\s*:\s*"(.*?)"',
        r'The reason for this referral is\s*(.*?)\.',
        r'The referral is being made because\s*(.*?)\.',
        r'Referral Reason:\s*(.*?)\.',
        r'The reason for referral is\s*(.*?)\.'
    ]
    
    for pattern in patterns:
        match = re.search(pattern, tmp_str)
        if match:
            return match.group(1).strip()

    # Fallback extraction logic
    sentences = tmp_str.split('. ')
    for sentence in sentences:
        if 'referral' in sentence.lower():
            return sentence.strip()
    
    return "extract failure"

def clean_extracted_text(text):
    # Further clean and format the extracted text
    text = text.replace('\n', ' ').strip()
    text = re.sub(r'\s+', ' ', text)
    return text

def evaluate_jsonl_with_llama2(predictor, path, output_jsonl_path, csv_file_path):
    test_data_json = []
    with open(path, 'r', encoding='utf-8') as f:
        for line in f:
            test_data_json.append(json.loads(line.strip()))
    
    bleu_score_list = []
    evaluate_list = []

    for single_test in test_data_json:
        instruction = single_test.get("instruction", "")
        whole_letter = single_test.get("whole_letter", "")
        referral_content = single_test.get("referral_content", "")
        prompt = f"{instruction}\n\n###\n\n{whole_letter}\n\n###"
        response = predictor.predict({'inputs': prompt, 'parameters': {'max_new_tokens': 512}})
        
        reference_text = referral_content if referral_content is not None else ""
        candidate_text = "extract failure"
        try:
            tmp_str = replace_output_prefix(response[0]["generated_text"].strip())
            candidate_text = extract_referral_content(tmp_str)
            candidate_text = clean_extracted_text(candidate_text)
        except Exception as err:
            logging.error(f"Error processing ID: {single_test.get('id', 'unknown')}")
            logging.error(f"Response: {response[0]['generated_text'].strip() if response else 'No response'}")
            logging.error(err)
        finally:
            evaluate_list.append(candidate_text)
            bleu = sacrebleu.corpus_bleu([candidate_text], [[reference_text]])
            bleu_score_list.append(bleu.score)
            single_test["bleu"] = bleu.score
            single_test["predict_referral_content"] = candidate_text
    
    with open(output_jsonl_path, mode='w', encoding='utf-8') as f:
        for single_test in test_data_json:
            f.write(json.dumps(single_test, ensure_ascii=False) + '\n')
    logging.info(f"Predicted data has been saved to {output_jsonl_path}.")
    
    # Create and save CSV file
    csv_data = []

    for single_test in test_data_json:
        csv_data.append({
            "id": single_test.get("id", ""),
            "name": single_test.get("name", ""),
            "instruction": single_test.get("instruction", ""),
            "whole_letter": single_test.get("whole_letter", ""),
            "referral_content": single_test.get("referral_content", ""),
            "predict_referral_content": single_test.get("predict_referral_content", ""),
            "bleu": single_test.get("bleu", 0),
        })

    df = pd.DataFrame(csv_data)
    df.to_csv(csv_file_path, index=False, encoding='utf-8')
    logging.info(f"CSV file has been saved to {csv_file_path}")
    
    return evaluate_list, bleu_score_list

In [75]:
test_evaluate_list, test_bleu_score_list = evaluate_jsonl_with_llama2(predictor, 
                                        "test_data_735/test.jsonl", "300724-7lewis-predict_llama2_7b_f_test.jsonl",
                                                                "300724-7lewis-predict_llama2_7b_f_test.csv")

INFO:root:Predicted data has been saved to 300724-7lewis-predict_llama2_7b_f_test.jsonl.
INFO:root:CSV file has been saved to 300724-7lewis-predict_llama2_7b_f_test.csv


In [76]:
analyze_predict_data(test_bleu_score_list)

分数大于100的个数：46, 占所有数据的百分比为： 0.32857142857142857
分数大于70的个数：56, 占所有数据的百分比为： 0.4
bleu平均分数: 50.374449038232484


In [77]:
logging.basicConfig(level=logging.INFO)

def replace_output_prefix(input_str):
    result = re.sub(r'(?i)^Output:\s*', '', input_str)
    return result

def extract_referral_content(tmp_str):
    # Refined primary extraction using common patterns
    patterns = [
        r'referral_content\s*:\s*"(.*?)"',
        r'The reason for this referral is\s*(.*?)(?<!\s)\.',
        r'The referral is being made because\s*(.*?)(?<!\s)\.',
        r'Referral Reason:\s*(.*?)(?<!\s)\.',
        r'The reason for referral is\s*(.*?)(?<!\s)\.',
        r'for referral is\s*(.*?)(?<!\s)\.',
        r'need referral because\s*(.*?)(?<!\s)\.',
        r'This referral is made to\s*(.*?)(?<!\s)\.',
        r'We are referring this patient because\s*(.*?)(?<!\s)\.',
        r'The purpose of this referral is\s*(.*?)(?<!\s)\.',
        r'Our reason for referral is\s*(.*?)(?<!\s)\.',
        r'Due to\s*(.*?)(?<!\s)\.',
    ]
    
    for pattern in patterns:
        match = re.search(pattern, tmp_str)
        if match:
            return match.group(1).strip()

    # Fallback extraction logic
    sentences = tmp_str.split('. ')
    for sentence in sentences:
        if 'referral' in sentence.lower():
            return sentence.strip()
    
    return "extract failure"

def clean_extracted_text(text):
    # Further clean and format the extracted text
    text = text.replace('\n', ' ').strip()
    text = re.sub(r'\s+', ' ', text)
    return text

def evaluate_jsonl_with_llama2(predictor, path, output_jsonl_path, csv_file_path):
    test_data_json = []
    with open(path, 'r', encoding='utf-8') as f:
        for line in f:
            test_data_json.append(json.loads(line.strip()))
    
    bleu_score_list = []
    evaluate_list = []

    for single_test in test_data_json:
        instruction = single_test.get("instruction", "")
        whole_letter = single_test.get("whole_letter", "")
        referral_content = single_test.get("referral_content", "")
        prompt = f"{instruction}\n\n###\n\n{whole_letter}\n\n###"
        response = predictor.predict({'inputs': prompt, 'parameters': {'max_new_tokens': 512}})
        
        reference_text = referral_content if referral_content is not None else ""
        candidate_text = "extract failure"
        try:
            tmp_str = replace_output_prefix(response[0]["generated_text"].strip())
            candidate_text = extract_referral_content(tmp_str)
            candidate_text = clean_extracted_text(candidate_text)
        except Exception as err:
            logging.error(f"Error processing ID: {single_test.get('id', 'unknown')}")
            logging.error(f"Response: {response[0]['generated_text'].strip() if response else 'No response'}")
            logging.error(err)
        finally:
            evaluate_list.append(candidate_text)
            bleu = sacrebleu.corpus_bleu([candidate_text], [[reference_text]])
            bleu_score_list.append(bleu.score)
            single_test["bleu"] = bleu.score
            single_test["predict_referral_content"] = candidate_text
    
    with open(output_jsonl_path, mode='w', encoding='utf-8') as f:
        for single_test in test_data_json:
            f.write(json.dumps(single_test, ensure_ascii=False) + '\n')
    logging.info(f"Predicted data has been saved to {output_jsonl_path}.")
    
    # Create and save CSV file
    csv_data = []

    for single_test in test_data_json:
        csv_data.append({
            "id": single_test.get("id", ""),
            "name": single_test.get("name", ""),
            "instruction": single_test.get("instruction", ""),
            "whole_letter": single_test.get("whole_letter", ""),
            "referral_content": single_test.get("referral_content", ""),
            "predict_referral_content": single_test.get("predict_referral_content", ""),
            "bleu": single_test.get("bleu", 0),
        })

    df = pd.DataFrame(csv_data)
    df.to_csv(csv_file_path, index=False, encoding='utf-8')
    logging.info(f"CSV file has been saved to {csv_file_path}")
    
    return evaluate_list, bleu_score_list

In [78]:
test_evaluate_list, test_bleu_score_list = evaluate_jsonl_with_llama2(predictor, 
                                        "test_data_735/test.jsonl", "300724-8lewis-predict_llama2_7b_f_test.jsonl",
                                                                "300724-8lewis-predict_llama2_7b_f_test.csv")

INFO:root:Predicted data has been saved to 300724-8lewis-predict_llama2_7b_f_test.jsonl.
INFO:root:CSV file has been saved to 300724-8lewis-predict_llama2_7b_f_test.csv


In [79]:
logging.basicConfig(level=logging.INFO)
def replace_output_prefix(input_str):
    result = re.sub(r'(?i)^Output:\s*', '', input_str)
    return result

def extract_referral_content(tmp_str):
    # Refined primary extraction using common patterns
    patterns = [
        r'referral_content\s*:\s*"(.*?)"',
        r'The reason for this referral is\s*(.*?)(?<!\s)\.',
        r'The referral is being made because\s*(.*?)(?<!\s)\.',
        r'Referral Reason:\s*(.*?)(?<!\s)\.',
        r'The reason for referral is\s*(.*?)(?<!\s)\.',
        r'for referral is\s*(.*?)(?<!\s)\.',
        r'need referral because\s*(.*?)(?<!\s)\.',
        r'This referral is made to\s*(.*?)(?<!\s)\.',
        r'We are referring this patient because\s*(.*?)(?<!\s)\.',
        r'The purpose of this referral is\s*(.*?)(?<!\s)\.',
        r'Our reason for referral is\s*(.*?)(?<!\s)\.',
        r'Due to\s*(.*?)(?<!\s)\.',
        r'because of\s*(.*?)(?<!\s)\.',
        r'As a result of\s*(.*?)(?<!\s)\.',
        r'The intent of this referral is\s*(.*?)(?<!\s)\.',
    ]
    
    for pattern in patterns:
        match = re.search(pattern, tmp_str)
        if match:
            return match.group(1).strip()

    # Fallback extraction logic
    sentences = tmp_str.split('. ')
    for sentence in sentences:
        if 'referral' in sentence.lower():
            return sentence.strip()
    
    return "extract failure"

def clean_extracted_text(text):
    # Further clean and format the extracted text
    text = text.replace('\n', ' ').strip()
    text = re.sub(r'\s+', ' ', text)
    return text

def evaluate_jsonl_with_llama2(predictor, path, output_jsonl_path, csv_file_path):
    test_data_json = []
    with open(path, 'r', encoding='utf-8') as f:
        for line in f:
            test_data_json.append(json.loads(line.strip()))
    
    bleu_score_list = []
    evaluate_list = []

    for single_test in test_data_json:
        instruction = single_test.get("instruction", "")
        whole_letter = single_test.get("whole_letter", "")
        referral_content = single_test.get("referral_content", "")
        prompt = f"{instruction}\n\n###\n\n{whole_letter}\n\n###"
        response = predictor.predict({'inputs': prompt, 'parameters': {'max_new_tokens': 512}})
        
        reference_text = referral_content if referral_content is not None else ""
        candidate_text = "extract failure"
        try:
            tmp_str = replace_output_prefix(response[0]["generated_text"].strip())
            candidate_text = extract_referral_content(tmp_str)
            candidate_text = clean_extracted_text(candidate_text)
        except Exception as err:
            logging.error(f"Error processing ID: {single_test.get('id', 'unknown')}")
            logging.error(f"Response: {response[0]['generated_text'].strip() if response else 'No response'}")
            logging.error(err)
        finally:
            evaluate_list.append(candidate_text)
            bleu = sacrebleu.corpus_bleu([candidate_text], [[reference_text]])
            bleu_score_list.append(bleu.score)
            single_test["bleu"] = bleu.score
            single_test["predict_referral_content"] = candidate_text
    
    with open(output_jsonl_path, mode='w', encoding='utf-8') as f:
        for single_test in test_data_json:
            f.write(json.dumps(single_test, ensure_ascii=False) + '\n')
    logging.info(f"Predicted data has been saved to {output_jsonl_path}.")
    
    # Create and save CSV file
    csv_data = []

    for single_test in test_data_json:
        csv_data.append({
            "id": single_test.get("id", ""),
            "name": single_test.get("name", ""),
            "instruction": single_test.get("instruction", ""),
            "whole_letter": single_test.get("whole_letter", ""),
            "referral_content": single_test.get("referral_content", ""),
            "predict_referral_content": single_test.get("predict_referral_content", ""),
            "bleu": single_test.get("bleu", 0),
        })

    df = pd.DataFrame(csv_data)
    df.to_csv(csv_file_path, index=False, encoding='utf-8')
    logging.info(f"CSV file has been saved to {csv_file_path}")
    
    return evaluate_list, bleu_score_list

In [80]:
test_evaluate_list, test_bleu_score_list = evaluate_jsonl_with_llama2(predictor, 
                                        "test_data_735/test.jsonl", "300724-9lewis-predict_llama2_7b_f_test.jsonl",
                                                                "300724-9lewis-predict_llama2_7b_f_test.csv")

INFO:root:Predicted data has been saved to 300724-9lewis-predict_llama2_7b_f_test.jsonl.
INFO:root:CSV file has been saved to 300724-9lewis-predict_llama2_7b_f_test.csv


In [81]:
logging.basicConfig(level=logging.INFO)

def replace_output_prefix(input_str):
    result = re.sub(r'(?i)^Output:\s*', '', input_str)
    return result

def extract_referral_content(tmp_str):
    # Refined primary extraction using common patterns
    patterns = [
        r'referral_content\s*:\s*"(.*?)"',
        r'The reason for this referral is\s*(.*?)(?<!\s)\.',
        r'The referral is being made because\s*(.*?)(?<!\s)\.',
        r'Referral Reason:\s*(.*?)(?<!\s)\.',
        r'The reason for referral is\s*(.*?)(?<!\s)\.',
        r'for referral is\s*(.*?)(?<!\s)\.',
        r'need referral because\s*(.*?)(?<!\s)\.',
        r'This referral is made to\s*(.*?)(?<!\s)\.',
        r'We are referring this patient because\s*(.*?)(?<!\s)\.',
        r'The purpose of this referral is\s*(.*?)(?<!\s)\.',
        r'Our reason for referral is\s*(.*?)(?<!\s)\.',
        r'Due to\s*(.*?)(?<!\s)\.',
        r'because of\s*(.*?)(?<!\s)\.',
        r'As a result of\s*(.*?)(?<!\s)\.',
        r'The intent of this referral is\s*(.*?)(?<!\s)\.',
        # Specific patterns for provided examples
        r'Given the imaging and laboratory findings,\s*(.*?)(?<!\s)\.',
        r'Given the persistent and progressive nature of the swelling,\s*(.*?)(?<!\s)\.',
        r'Mr. Doe requires a specialist evaluation for\s*(.*?)(?<!\s)\.',
        r'Given the complexity and progression of her condition,\s*(.*?)(?<!\s)\.',
        r'The imaging studies confirm the diagnosis of\s*(.*?)(?<!\s)\.',
    ]
    
    for pattern in patterns:
        match = re.search(pattern, tmp_str)
        if match:
            return match.group(1).strip()

    # Fallback extraction logic
    sentences = tmp_str.split('. ')
    for sentence in sentences:
        if 'referral' in sentence.lower():
            return sentence.strip()
    
    return "extract failure"

def clean_extracted_text(text):
    # Further clean and format the extracted text
    text = text.replace('\n', ' ').strip()
    text = re.sub(r'\s+', ' ', text)
    return text

def evaluate_jsonl_with_llama2(predictor, path, output_jsonl_path, csv_file_path):
    test_data_json = []
    with open(path, 'r', encoding='utf-8') as f:
        for line in f:
            test_data_json.append(json.loads(line.strip()))
    
    bleu_score_list = []
    evaluate_list = []

    for single_test in test_data_json:
        instruction = single_test.get("instruction", "")
        whole_letter = single_test.get("whole_letter", "")
        referral_content = single_test.get("referral_content", "")
        prompt = f"{instruction}\n\n###\n\n{whole_letter}\n\n###"
        response = predictor.predict({'inputs': prompt, 'parameters': {'max_new_tokens': 512}})
        
        reference_text = referral_content if referral_content is not None else ""
        candidate_text = "extract failure"
        try:
            tmp_str = replace_output_prefix(response[0]["generated_text"].strip())
            candidate_text = extract_referral_content(tmp_str)
            candidate_text = clean_extracted_text(candidate_text)
        except Exception as err:
            logging.error(f"Error processing ID: {single_test.get('id', 'unknown')}")
            logging.error(f"Response: {response[0]['generated_text'].strip() if response else 'No response'}")
            logging.error(err)
        finally:
            evaluate_list.append(candidate_text)
            bleu = sacrebleu.corpus_bleu([candidate_text], [[reference_text]])
            bleu_score_list.append(bleu.score)
            single_test["bleu"] = bleu.score
            single_test["predict_referral_content"] = candidate_text
    
    with open(output_jsonl_path, mode='w', encoding='utf-8') as f:
        for single_test in test_data_json:
            f.write(json.dumps(single_test, ensure_ascii=False) + '\n')
    logging.info(f"Predicted data has been saved to {output_jsonl_path}.")
    
    # Create and save CSV file
    csv_data = []

    for single_test in test_data_json:
        csv_data.append({
            "id": single_test.get("id", ""),
            "name": single_test.get("name", ""),
            "instruction": single_test.get("instruction", ""),
            "whole_letter": single_test.get("whole_letter", ""),
            "referral_content": single_test.get("referral_content", ""),
            "predict_referral_content": single_test.get("predict_referral_content", ""),
            "bleu": single_test.get("bleu", 0),
        })

    df = pd.DataFrame(csv_data)
    df.to_csv(csv_file_path, index=False, encoding='utf-8')
    logging.info(f"CSV file has been saved to {csv_file_path}")
    
    return evaluate_list, bleu_score_list

In [82]:
test_evaluate_list, test_bleu_score_list = evaluate_jsonl_with_llama2(predictor, 
                                        "test_data_735/test.jsonl", "300724-10lewis-predict_llama2_7b_f_test.jsonl",
                                                                "300724-10lewis-predict_llama2_7b_f_test.csv")

INFO:root:Predicted data has been saved to 300724-10lewis-predict_llama2_7b_f_test.jsonl.
INFO:root:CSV file has been saved to 300724-10lewis-predict_llama2_7b_f_test.csv


In [83]:
analyze_predict_data(test_bleu_score_list)

分数大于100的个数：46, 占所有数据的百分比为： 0.32857142857142857
分数大于70的个数：56, 占所有数据的百分比为： 0.4
bleu平均分数: 50.374449038232484


In [None]:
#FINAL TEST SCRIPT

In [87]:
logging.basicConfig(level=logging.INFO)

def replace_output_prefix(input_str):
    result = re.sub(r'(?i)^Output:\s*', '', input_str)
    return result

def extract_referral_content(tmp_str):
    # Refined primary extraction using common patterns
    patterns = [
        r'\s*referral_content\s*:\s*"(.*?)"',
        r'\s*The reason for this referral is\s*(.*?)(?<!\s)\.',
        r'\s*The referral is being made because\s*(.*?)(?<!\s)\.',
        r'\s*Referral Reason:\s*(.*?)(?<!\s)\.',
        r'\s*The reason for referral is\s*(.*?)(?<!\s)\.',
        r'\s*for referral is\s*(.*?)(?<!\s)\.',
        r'\s*need referral because\s*(.*?)(?<!\s)\.',
        r'\s*This referral is made to\s*(.*?)(?<!\s)\.',
        r'\s*We are referring this patient because\s*(.*?)(?<!\s)\.',
        r'\s*The purpose of this referral is\s*(.*?)(?<!\s)\.',
        r'\s*Our reason for referral is\s*(.*?)(?<!\s)\.',
        r'\s*Due to\s*(.*?)(?<!\s)\.',
        r'\s*because of\s*(.*?)(?<!\s)\.',
        r'\s*As a result of\s*(.*?)(?<!\s)\.',
        r'\s*The intent of this referral is\s*(.*?)(?<!\s)\.',
        # Specific patterns for provided examples
        r'\s*Given the imaging and laboratory findings,\s*(.*?)(?<!\s)\.',
        r'\s*Given the persistent and progressive nature of the swelling,\s*(.*?)(?<!\s)\.',
        r'\s*Mr. Doe requires a specialist evaluation for\s*(.*?)(?<!\s)\.',
        r'\s*Given the complexity and progression of her condition,\s*(.*?)(?<!\s)\.',
        r'\s*The imaging studies confirm the diagnosis of\s*(.*?)(?<!\s)\.',
        r'\s*He would benefit from\s*(.*?)(?<!\s)\.',
        r'\s*Mr. Doe requires\s*(.*?)(?<!\s)\.',
        r'\s*specialist evaluation for\s*(.*?)(?<!\s)\.',
        r'\s*comprehensive evaluation and appropriate management by\s*(.*?)(?<!\s)\.',
        r'\s*to determine the nature of\s*(.*?)(?<!\s)\.',
        r'\s*Given its persistence and unresponsiveness to initial therapies,\s*(.*?)(?<!\s)\.',
    ]
    
    for pattern in patterns:
        match = re.search(pattern, tmp_str)
        if match:
            return match.group(1).strip()

    # Fallback extraction logic
    sentences = tmp_str.split('. ')
    for sentence in sentences:
        if 'referral' in sentence.lower():
            return sentence.strip()
    
    return "extract failure"

def clean_extracted_text(text):
    # Further clean and format the extracted text
    text = text.replace('\n', ' ').strip()
    text = re.sub(r'\s+', ' ', text)
    return text

def evaluate_jsonl_with_llama2(predictor, path, output_jsonl_path, csv_file_path):
    test_data_json = []
    with open(path, 'r', encoding='utf-8') as f:
        for line in f:
            test_data_json.append(json.loads(line.strip()))
    
    bleu_score_list = []
    evaluate_list = []

    for single_test in test_data_json:
        instruction = single_test.get("instruction", "")
        whole_letter = single_test.get("whole_letter", "")
        referral_content = single_test.get("referral_content", "")
        prompt = f"{instruction}\n\n###\n\n{whole_letter}\n\n###"
        response = predictor.predict({'inputs': prompt, 'parameters': {'max_new_tokens': 512}})
        
        reference_text = referral_content if referral_content is not None else ""
        candidate_text = "extract failure"
        try:
            tmp_str = replace_output_prefix(response[0]["generated_text"].strip())
            candidate_text = extract_referral_content(tmp_str)
            candidate_text = clean_extracted_text(candidate_text)
        except Exception as err:
            logging.error(f"Error processing ID: {single_test.get('id', 'unknown')}")
            logging.error(f"Response: {response[0]['generated_text'].strip() if response else 'No response'}")
            logging.error(err)
        finally:
            evaluate_list.append(candidate_text)
            bleu = sacrebleu.corpus_bleu([candidate_text], [[reference_text]])
            bleu_score_list.append(bleu.score)
            single_test["bleu"] = bleu.score
            single_test["predict_referral_content"] = candidate_text
    
    with open(output_jsonl_path, mode='w', encoding='utf-8') as f:
        for single_test in test_data_json:
            f.write(json.dumps(single_test, ensure_ascii=False) + '\n')
    logging.info(f"Predicted data has been saved to {output_jsonl_path}.")
    
    # Create and save CSV file
    csv_data = []

    for single_test in test_data_json:
        csv_data.append({
            "id": single_test.get("id", ""),
            "name": single_test.get("name", ""),
            "instruction": single_test.get("instruction", ""),
            "whole_letter": single_test.get("whole_letter", ""),
            "referral_content": single_test.get("referral_content", ""),
            "predict_referral_content": single_test.get("predict_referral_content", ""),
            "bleu": single_test.get("bleu", 0),
        })

    df = pd.DataFrame(csv_data)
    df.to_csv(csv_file_path, index=False, encoding='utf-8')
    logging.info(f"CSV file has been saved to {csv_file_path}")
    
    return evaluate_list, bleu_score_list

In [88]:
test_evaluate_list, test_bleu_score_list = evaluate_jsonl_with_llama2(predictor, 
                                        "test_data_735/test.jsonl", "300724-finallewis-predict_llama2_7b_f_test.jsonl",
                                                                "300724-finallewis-predict_llama2_7b_f_test.csv")

INFO:root:Predicted data has been saved to 300724-finallewis-predict_llama2_7b_f_test.jsonl.
INFO:root:CSV file has been saved to 300724-finallewis-predict_llama2_7b_f_test.csv


In [89]:
analyze_predict_data(test_bleu_score_list)

分数大于100的个数：46, 占所有数据的百分比为： 0.32857142857142857
分数大于70的个数：56, 占所有数据的百分比为： 0.4
bleu平均分数: 50.374449038232484


## test all test.json with llama2

In [14]:
import sacrebleu
import json
import pandas as pd
import re

## Adjust function to handle and extract the referral_content from the simulated output 

def replace_output_prefix(input_str):
    # use regex to replace the prefix, ignoring case
    result = re.sub(r'(?i)^Output:\s*', '', input_str)
    return result

def evaluate_jsonl_with_llama2(predictor, path, output_jsonl_path, csv_file_path):
    test_data_json = []
    with open(path, 'r', encoding='utf-8') as f:
        for line in f:
            test_data_json.append(json.loads(line.strip()))
    
    bleu_score_list = []
    evaluate_list = []

    for single_test in test_data_json:
        instruction = single_test["instruction"]
        whole_letter = single_test["whole_letter"]
        referral_content = single_test["referral_content"]
        prompt = f"{instruction}\n\n###\n\n{whole_letter}\n\n###"
        response = predictor.predict({'inputs': prompt,
                                 'parameters': {'max_new_tokens': 256}})
        # print(prompt)
        reference_text = referral_content
        try:
            tmp_str = replace_output_prefix(response[0]["generated_text"].strip())
            # Extract referral_content using regular expressions
            match = re.search(r'referral_content:\s*"([^"]+)"', tmp_str)
            if match:
                candidate_text = match.group(1)
            else:
                candidate_text = "extract failure"
        except Exception as err:
            print(single_test["id"])
            print(response[0]["generated_text"].strip())
            print()
            candidate_text = "extract failure"
        finally:

            evaluate_list.append(candidate_text)
            # print("predict: " + candidate_text)
            # print("real: " + reference_text)
            bleu = sacrebleu.corpus_bleu([candidate_text], [[reference_text]])
            bleu_score_list.append(bleu.score)
            # print(bleu.score)
            single_test["bleu"] = bleu.score
            single_test["predict_referral_content"] = candidate_text
            
    with open(output_jsonl_path, mode='w', encoding='utf-8') as f:
         for single_test in test_data_json:
             f.write(json.dumps(single_test, ensure_ascii=False) + '\n')
    print(f"predicted data has been saved to {output_path}.")
    
    # Create and save CSV file
    csv_data = []

    for single_test in test_data_json:
        csv_data.append({
            "id": single_test["id"],
            "name": single_test["name"],
            "instruction": single_test["instruction"],
            "whole_letter": single_test["whole_letter"],
            "referral_content": single_test["referral_content"],
            "predict_referral_content": single_test["predict_referral_content"],
            "bleu": single_test["bleu"],
        })

    # create DataFrame
    df = pd.DataFrame(csv_data)

    # save to csv file
    df.to_csv(csv_file_path, index=False, encoding='utf-8')

    print(f"CSV file has been saved to {csv_file_path}")
    
    return evaluate_list, bleu_score_list



In [17]:
test_evaluate_list, test_bleu_score_list = evaluate_jsonl_with_llama2(predictor, 
                                        "test_data_735/test.jsonl", "290724lewis-predict_llama2_7b_f_test.jsonl",
                                                                "290724lewis-predict_llama2_7b_f_test.csv")

KeyboardInterrupt: 

In [37]:
train_evaluate_list, train_bleu_score_list,train_rouge_score_list = evaluate_jsonl_with_llama2(predictor,
                                                                "../train_test_data/train.jsonl", "../train_test_data/predict_llama2_7b_f_train.jsonl",
                                                                "../train_test_data/predict_llama2_7b_f_train.csv")

CSV file has been saved to ../train_test_data/predict_llama2_7b_f_train.csv


In [71]:
def analyze_predict_data(bleu_score_list):
    # 统计大于100的个数
    count_gt_100 = sum(1 for score in bleu_score_list if score >= 100)

    # 统计大于70的个数
    count_gt_70 = sum(1 for score in bleu_score_list if score > 70)

    prob_gt_100 = count_gt_100 / len(bleu_score_list)
    prob_gt_70 = count_gt_70 / len(bleu_score_list)
    average_score = sum(bleu_score_list) / float(len(bleu_score_list))

    print(f"分数大于100的个数：{count_gt_100}, 占所有数据的百分比为： {prob_gt_100}")
    print(f"分数大于70的个数：{count_gt_70}, 占所有数据的百分比为： {prob_gt_70}")
    print(f"bleu平均分数: {average_score}")

In [73]:
analyze_predict_data(test_bleu_score_list)

分数大于100的个数：46, 占所有数据的百分比为： 0.32857142857142857
分数大于70的个数：56, 占所有数据的百分比为： 0.4
bleu平均分数: 50.374449038232484


In [53]:
analyze_predict_data(test_bleu_score_list)

分数大于100的个数：10, 占所有数据的百分比为： 0.47619047619047616
分数大于70的个数：11, 占所有数据的百分比为： 0.5238095238095238
bleu平均分数: 62.56950311883414


***
While not used in the previously provided example payloads, you can format your own messages to the Llama-2 model with the following utility function.
***

In [None]:
from typing import Dict, List


def format_messages(messages: List[Dict[str, str]]) -> List[str]:
    """Format messages for Llama-2 chat models.
    
    The model only supports 'system', 'user' and 'assistant' roles, starting with 'system', then 'user' and 
    alternating (u/a/u/a/u...). The last message must be from 'user'.
    """
    prompt: List[str] = []

    if messages[0]["role"] == "system":
        content = "".join(["<<SYS>>\n", messages[0]["content"], "\n<</SYS>>\n\n", messages[1]["content"]])
        messages = [{"role": messages[1]["role"], "content": content}] + messages[2:]

    for user, answer in zip(messages[::2], messages[1::2]):
        prompt.extend(["<s>", "[INST] ", (user["content"]).strip(), " [/INST] ", (answer["content"]).strip(), "</s>"])

    prompt.extend(["<s>", "[INST] ", (messages[-1]["content"]).strip(), " [/INST] "])

    return "".join(prompt)


dialog = [
    {"role": "system", "content": "Always answer with Haiku"},
    {"role": "user", "content": "I am going to Paris, what should I see?"},
]

prompt = format_messages(dialog)

## Clean up the endpoint

In [45]:
# Delete the SageMaker endpoint
predictor.delete_model()
predictor.delete_endpoint()