#### LLM Model Evaluation

In this notebook, I deploy the Meta Llama 2 7B model and evaluate it's text generation capabilities and financial domain knowledge. This notebook is intended to be run on AWS SageMaker.

The Llama 2 7B Foundation model performs the task of text generation. It takes a text string as input and predicts next words in the sequence. 

#### Set Up
There are some initial steps required for setup on Sagemaker. After running the cell below, the Kernel should be restarted (before running any subsequent cells). 

In [1]:
!pip install ipywidgets==7.0.0 --quiet
!pip install --upgrade sagemaker datasets --quiet

To deploy the model on AWS Sagemaker, the AWS services session needs to be setup and authenticated. You need to create an execution role with Sagemaker access to do this. The cell below validates that your role is an IAM role with Sagemaker access.

In [1]:
import sagemaker, boto3, json
from sagemaker.session import Session

sagemaker_session = Session()
aws_role = sagemaker_session.get_caller_identity_arn()
aws_region = boto3.Session().region_name
sess = sagemaker.Session()
print(aws_role)
print(aws_region)
print(sess)

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml
arn:aws:iam::082876160679:role/service-role/SageMaker-ProjectSagemakerRole
us-west-2
<sagemaker.session.Session object at 0x7fbaf4f04550>


## 2. Select Text Generation Model Meta Llama 2 7B
Run the next cell to set variables that contain the values of the name of the model we want to load and the version of the model.

In [2]:
(model_id, model_version,) = ("meta-textgeneration-llama-2-7b","2.*",)

Running the next cell deploys the model
This Python code is used to deploy a machine learning model using Amazon SageMaker's JumpStart library. Make sure you have requested a 'ml.g5.2xlarge' GPU instance for training through the AWS Service Quota dashboard first.

**The next cell will take some time to run.** It is deploying a large language model, and that takes time. The cell output will show dashes (--) while it is being deployed. Once the model has been deployed, you will see an exclamation point at the end of the dashes (---!), at which point you can continue running the next cells. 

You might see a warning "For forward compatibility, pin to model_version..." You can ignore this warning, just wait for the model to deploy.


In [3]:
from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(model_id=model_id, model_version=model_version, instance_type="ml.g5.2xlarge")
predictor = model.deploy()


For forward compatibility, pin to model_version='2.*' in your JumpStartModel or JumpStartEstimator definitions. Note that major version upgrades may have different EULA acceptance terms and input/output signatures.
Using vulnerable JumpStart model 'meta-textgeneration-llama-2-7b' and version '2.1.8'.
Using model 'meta-textgeneration-llama-2-7b' with wildcard version identifier '2.*'. You can pin to version '2.1.8' for more stable results. Note that models may have different input/output signatures after a major version upgrade.


-------------!

#### Invoke the endpoint, query and parse response
The following cells invoke the model endpoint, send a prompt to it, and receive a response from the model. The cell below creates the function that will be used to parse and print the response from the model.

In [4]:
def print_response(payload, response):
    print(payload["inputs"])
    print(f"> {response[0]['generation']}")
    print("\n==================================\n")

The model takes a text string as input and predicts next words in the sequence, the input we send it is the prompt. The following is a list of potential inputs to give the model. They are all related  to the financial domain as we want to identify the model's domain knowledge before fine-tuning it in this step.

Replace **"inputs"** in the next cell with the input to send the model. You can choose from the inputs below or write your own finance-related prompt.

**Financial Domain Prompts:**

  "inputs": "Replace with sentence below"  
- "The  investment  tests  performed  indicate"
- "the  relative  volume  for  the  long  out  of  the  money  options, indicates"
- "The  results  for  the  short  in  the  money  options"
- "The  results  are  encouraging  for  aggressive  investors"

In [10]:
payload = {
    "inputs": "The results are encouraging for aggressive investors",
    "parameters": {
        "max_new_tokens": 64,
        "top_p": 0.9,
        "temperature": 0.6,
        "return_full_text": False,
    },
}
try:
    response = predictor.predict(payload, custom_attributes="accept_eula=true")
    print_response(payload, response)
except Exception as e:
    print(e)

The results are encouraging for aggressive investors
> .
The following table shows the return of a $10,000 investment in TSLA over the last 10 years.
TSLA 10-Year Return
TSLA Price Chart
TSLA Revenue and Net Income History
TSLA Earnings Estimates




Finally, we delete the Sagemaker endpoint to end our model deployment.

**IF YOU FAIL TO RUN THE CELL BELOW THE MODEL WILL CONTINUE RUNNING AND SPEND MONEY**

In [None]:
# Delete the SageMaker endpoint and the attached resources
predictor.delete_model()
predictor.delete_endpoint()

Verify your model endpoint was deleted by checking the Sagemaker dashboard and choosing `endpoints` under `Inference` in the left navigation menu. If the endpoint is still there, choose the endpoint, and then under "Actions" select **Delete**