#### LLM Model Evaluation

In this notebook, we'll deploy the Meta Llama 2 7B model and evaluate it's text generation capabilities and domain knowledge. we'll use the SageMaker Python SDK for Foundation Models and deploy the model for inference. 

The Llama 2 7B Foundation model performs the task of text generation. It takes a text string as input and predicts next words in the sequence. 

#### Set Up
There are some initial steps required for setup.

In [1]:
!pip install ipywidgets==7.0.0 --quiet
!pip install --upgrade sagemaker datasets --quiet

To deploy the model on Amazon Sagemaker, we need to setup and authenticate the use of AWS services. we'll use the execution role associated with the current notebook instance as the AWS account role with SageMaker access.

# Authenticating with AWS services

In [2]:
import sagemaker, boto3, json
from sagemaker.session import Session

sagemaker_session = Session()
aws_role = sagemaker_session.get_caller_identity_arn()
aws_region = boto3.Session().region_name
sess = sagemaker.Session()
print(aws_role)
print(aws_region)
print(sess)

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml
arn:aws:iam::674482793186:role/service-role/SageMaker-udacitySagemakerRole
us-west-2
<sagemaker.session.Session object at 0x7f86a485fa90>


# Selecting The Text Generation Model Meta Llama 2 7B

In [3]:
(model_id, model_version,) = ("meta-textgeneration-llama-2-7b","2.*",)

Running the next cell deploys the model
This Python code is used to deploy a machine learning model using Amazon SageMaker's JumpStart library. 

1. Import the `JumpStartModel` class from the `sagemaker.jumpstart.model` module.

2. Create an instance of the `JumpStartModel` class using the `model_id` and `model_version` variables created in the previous cell. This object represents the machine learning model you want to deploy.

3. Call the `deploy` method on the `JumpStartModel` instance. This method deploys the model on Amazon SageMaker and returns a `Predictor` object.

The `Predictor` object (`predictor`) can be used to make predictions with the deployed model. The `deploy` method will automatically choose an endpoint name, instance type, and other deployment parameters. If you want to specify these parameters, you can pass them as arguments to the `deploy` method.

**The next cell will take some time to run.**  It is deploying a large language model, and that takes time.  You'll see dashes (--) while it is being deployed.  Please be patient! You'll see an exclamation point at the end of the dashes (---!) when the model is deployed and then you can continue running the next cells.  

You might see a warning "For forward compatibility, pin to model_version..." You can ignore this warning, just wait for the model to deploy. 


# Deploying the Llama2 Model on AWS Sagemaker

In [4]:
from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(model_id=model_id, model_version=model_version, instance_type="ml.g5.2xlarge")
predictor = model.deploy()


For forward compatibility, pin to model_version='2.*' in your JumpStartModel or JumpStartEstimator definitions. Note that major version upgrades may have different EULA acceptance terms and input/output signatures.
Using vulnerable JumpStart model 'meta-textgeneration-llama-2-7b' and version '2.1.8'.
Using model 'meta-textgeneration-llama-2-7b' with wildcard version identifier '2.*'. You can pin to version '2.1.8' for more stable results. Note that models may have different input/output signatures after a major version upgrade.


----------------!

#### Invoke the endpoint, query and parse response
The next step is to invoke the model endpoint, send a query to the endpoint, and recieve a response from the model. 

Running the next cell defines a function that will be used to parse and print the response from the model. 

In [5]:
def print_response(payload, response):
    print(payload["inputs"])
    print(f"> {response[0]['generation']}")
    print("\n==================================\n")

The model takes a text string as input and predicts next words in the sequence, the input we send it is the prompt. 

The prompt we send the model should relate to the domain we'd like to fine-tune the model on.  This way we'll identify the model's domain knowledge before it's fine-tuned, and then we can run the same prompts on the fine-tuned model.   

**Replace "inputs"** in the next cell with the input to send the model based on the domain you've chosen. 

**For financial domain:**

  "inputs": "Replace with sentence below"  
- "The  investment  tests  performed  indicate"
- "the  relative  volume  for  the  long  out  of  the  money  options, indicates"
- "The  results  for  the  short  in  the  money  options"
- "The  results  are  encouraging  for  aggressive  investors"

# Evaluating the Pre-trained Llama2 Text Generation Large Language Model for Domain Knowledge

In [10]:
def print_response(payload, response):
    print(payload["inputs"])
    print(f"> {response[0]['generation']}")
    print("\n==================================\n")
payload = {
    "inputs": "The  results  are  encouraging  for  aggressive  investors",
    "parameters": {
        "max_new_tokens": 64,
        "top_p": 0.9,
        "temperature": 0.6,
        "return_full_text": False,
    },
}
try:
    response = predictor.predict(payload, custom_attributes="accept_eula=true")
    print_response(payload, response)
except Exception as e:
    print(e)

The  results  are  encouraging  for  aggressive  investors
> 
in  the  near-term  and  for  long-term  investors  as  well.
The  results  are  encouraging  for  aggressive  investors  in
the  near-term  and  for  long-term  investors  as 




### Example - 1:

  Input Prompt : "The results for the short in the money options"
  
  Output text generated : The stock is down 24% from its 52-week high. However, it’s still up 22% from its 52-week low.
The company is facing some headwinds in the near term, but it has a long-term growth plan that should

### Example - 2:

  Input Prompt : "the relative volume for the long out of the money options, indicates"
  
  output text generated :  that the options are trading at a 16.9% premium to the at the money options.
The implied volatility for the long out of the money options is 29.2%.
The implied volatility for the short out of the money options is 31
  
### Example - 3:

  Input Prompt : "The  investment  tests  performed  indicate"
  
  output text generated : that  the  Company  has  adequate  liquidity  to  fund  its  operations
through  the  foreseeable  future.
In  addition,  the  Company  has  a  $250,000,000 revolving credit

### Example - 4:
 
  Input prompt : "The  results  are  encouraging  for  aggressive  investors"
  
  output text generated : in  the  near-term  and  for  long-term  investors  as  well.
The  results  are  encouraging  for  aggressive  investors  in
the  near-term  and  for  long-term  investors  as 


The prompt is related to the domain you want to fine-tune your model on. You will see the outputs from the model without fine-tuning are limited in providing insightful or relevant content.

In [11]:
# Delete the SageMaker endpoint and the attached resources
predictor.delete_model()
predictor.delete_endpoint()