# Run Llama 2 Models in SageMaker JumpStart

---
In this notebook, I use the SageMaker Python SDK to deploy a JumpStart model for Text Generation using the Llama 2 fine-tuned model optimized for dialogue use cases. I asked a single prompt, and then a sequence of prompts regarding mRNA drug development. 

---

## Setup

***

In [2]:
%pip install --upgrade --quiet sagemaker

[0mNote: you may need to restart the kernel to use updated packages.


***
You can continue with the default model or choose a different model: this notebook will run with the following model IDs :
- `meta-textgeneration-llama-2-7b-f`
- `meta-textgeneration-llama-2-13b-f`
- `meta-textgeneration-llama-2-70b-f`
***

In [3]:
model_id = "meta-textgeneration-llama-2-7b-f"

In [4]:
model_version = "3.*"

## Deploy model

***
You can now deploy the model using SageMaker JumpStart. For successful deployment, you must manually change the `accept_eula` argument in the model's deploy method to `True`.
***

In [5]:
from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(model_id=model_id, model_version=model_version)

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml


Using vulnerable JumpStart model 'meta-textgeneration-llama-2-7b-f' and version '3.2.0'.
Using model 'meta-textgeneration-llama-2-7b-f' with wildcard version identifier '3.*'. You can pin to version '3.2.0' for more stable results. Note that models may have different input/output signatures after a major version upgrade.


In [11]:
predictor = model.deploy(accept_eula=True)

-----------!

## Invoke the endpoint

***
### Supported Parameters

***
This model supports many parameters while performing inference. They include:

* **max_length:** Model generates text until the output length (which includes the input context length) reaches `max_length`. If specified, it must be a positive integer.
* **max_new_tokens:** Model generates text until the output length (excluding the input context length) reaches `max_new_tokens`. If specified, it must be a positive integer.
* **num_beams:** Number of beams used in the greedy search. If specified, it must be integer greater than or equal to `num_return_sequences`.
* **no_repeat_ngram_size:** Model ensures that a sequence of words of `no_repeat_ngram_size` is not repeated in the output sequence. If specified, it must be a positive integer greater than 1.
* **temperature:** Controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words. If `temperature` -> 0, it results in greedy decoding. If specified, it must be a positive float.
* **early_stopping:** If True, text generation is finished when all beam hypotheses reach the end of sentence token. If specified, it must be boolean.
* **do_sample:** If True, sample the next word as per the likelihood. If specified, it must be boolean.
* **top_k:** In each step of text generation, sample from only the `top_k` most likely words. If specified, it must be a positive integer.
* **top_p:** In each step of text generation, sample from the smallest possible set of words with cumulative probability `top_p`. If specified, it must be a float between 0 and 1.
* **return_full_text:** If True, input text will be part of the output generated text. If specified, it must be boolean. The default value for it is False.
* **stop**: If specified, it must be a list of strings. Text generation stops if any one of the specified strings is generated.

We may specify any subset of the parameters mentioned above while invoking an endpoint. Next, we show an example of how to invoke endpoint with these arguments.

**NOTE**: If `max_new_tokens` is not defined, the model may generate up to the maximum total tokens allowed, which is 4K for these models. This may result in endpoint query timeout errors, so it is recommended to set `max_new_tokens` when possible. For 7B, 13B, and 70B models, we recommend to set `max_new_tokens` no greater than 1500, 1000, and 500 respectively, while keeping the total number of tokens less than 4K.

**NOTE**: This model only supports 'system', 'user' and 'assistant' roles, starting with 'system', then 'user' and alternating (u/a/u/a/u...).

***

### Example prompts
***
The examples in this section demonstrate how to perform text generation with conversational dialog as prompt inputs. Example payloads are retrieved programmatically from the `JumpStartModel` object.

Input messages for Llama-2 chat models should exhibit the following format. The model only supports 'system', 'user' and 'assistant' roles, starting with 'system', then 'user' and alternating (u/a/u/a/u...). The last message must be from 'user'. A simple user prompt may look like the following:
```
<s>[INST] {user_prompt} [/INST]
```
You may also add a system prompt with the following syntax:
```
<s>[INST] <<SYS>>
{system_prompt}
<</SYS>>

{user_prompt} [/INST]
```
Finally, you can have a conversational interaction with the model by including all previous user prompts and assistant responses in the input:
```
<s>[INST] <<SYS>>
{system_prompt}
<</SYS>>

{user_prompt_1} [/INST] {assistant_response_1} </s><s>[INST] {user_prompt_1} [/INST]
```
***

In [27]:
prompt = {
    'inputs': '<s>[INST] "What are the main principles of quantum mechanics? Please describe it in less than 100 tokens." [/INST]', 
    'parameters': {
        'max_new_tokens': 100, 
        'top_k': 50, 
        'top_p': 0.9, 
        'temperature': 0.6, 
        'decoder_input_details': True, 
        'details': True
    }
}

# Call the predictor with the JSON string
response = predictor.predict(prompt)

# Output the response
print("\n\nInput\n", prompt['inputs'], "\n\nOutput\n", response[0]["generated_text"], "\n\n===============")



Input
 <s>[INST] "What are the main principles of quantum mechanics? Please describe it in less than 100 tokens." [/INST] 

Output
   Quantum mechanics is a fundamental theory in physics that describes the behavior of matter and energy at the smallest scales, such as atoms and subatomic particles. Here are some of the main principles of quantum mechanics:

1. Wave-particle duality: Quantum objects, such as electrons, can exhibit both wave-like and particle-like behavior depending on how they are observed.
2. Uncertainty principle: It is impossible to know certain properties of a 



### Sequence of prompts

In [30]:
prompts_sequence = [
    {
    'inputs': '<s>[INST] "What companies work on mRNA drugs? Please describe it in under 256 tokens." [/INST]', 
    'parameters': {
        'max_new_tokens': 256, 
        'top_k': 50, 
        'top_p': 0.9, 
        'temperature': 0.6, 
        'decoder_input_details': True, 
        'details': False}
    },
  
    {
    'inputs': '<s>[INST] "What stage is each drug at? Please describe it in under 500 tokens." [/INST]', 
    'parameters': {
        'max_new_tokens': 500, 
        'top_k': 50, 
        'top_p': 0.9, 
        'temperature': 0.6, 
        'decoder_input_details': True, 
        'details': True}
    }
]  
        

for prompt in prompts_sequence:
    response = predictor.predict(prompt)
    print("\nInput\n", prompt['inputs'], "\n\nOutput\n", response[0]["generated_text"], "\n\n===============")


Input
 <s>[INST] "What companies work on mRNA drugs? Please describe it in under 256 tokens." [/INST] 

Output
   There are several companies working on mRNA drugs, here are a few notable ones:

1. Moderna Therapeutics: A biotech company that specializes in mRNA-based drugs for a variety of diseases, including cancer, infectious diseases, and rare genetic disorders.
2. BioNTech: A German biotech company that develops mRNA-based vaccines and therapies, including the first FDA-approved mRNA vaccine for COVID-19.
3. Pfizer: A pharmaceutical company that has entered the mRNA drug space through its acquisition of Biohaven, which is developing mRNA-based therapies for diseases such as migraine and epilepsy.
4. Gilead Sciences: A biopharmaceutical company that is developing mRNA-based therapies for a range of diseases, including HIV, hepatitis, and inflammatory disorders.
5. RNA Biology: A biotech company that focuses on the discovery and development of mRNA-based therapies for 


Input
 <s>

## Clean up the endpoint

In [32]:
# Delete the SageMaker endpoint
predictor.delete_model()
predictor.delete_endpoint()

### Link SageMaker notebook to GitHub

In [7]:
update-notebook-instance meta-textgeneration-llama-2-7b-f-SDK-mRNA.ipynb App-AWS-SakeMaker-mRNA-1

SyntaxError: invalid syntax (1571705536.py, line 1)