# Comparing Pre-trained model with Fine-tuning Model

Once a model is deployed as a Sagemaker Endpoint, you can test model endpoint inference using `sagemaker.Predictor` class which test as input and allowing `Predictor` Class to do the heavy lifting.

In this notebook we are going to use both pre-trained model (deployed in lab-01) and the fine-tuned model (deployed in lab-04).

In [120]:
import json
import pandas as pd
import sagemaker
from datasets import load_dataset
from random import randrange
from sagemaker import serializers, deserializers
from IPython.display import display, HTML

In [101]:
sess = sagemaker.Session()

## Sample Dataset

We need sample dataset to test our model inference

In [102]:
def format_dolly(sample, incl_answer=True):
    instruction = f"### Instruction\n{sample['instruction']}"
    context = f"### Context\n{sample['context']}" if len(sample["context"]) > 0 else None
    response = f"### Answer\n{sample['response']}" if incl_answer else None
    # join all the parts together
    prompt = "\n\n".join([i for i in [instruction, context, response] if i is not None])

    if not incl_answer:
        return prompt, sample['response']
    else:
        return prompt

In [122]:
inference_dataset = load_dataset("databricks/databricks-dolly-15k", split="train[15%:30%]")

# To train for question answering/information extraction, you can replace the assertion in next line to example["category"] == "closed_qa"/"information_extraction".
summarization_dataset = inference_dataset.filter(lambda example: example["category"] == "summarization")
#summarization_dataset = summarization_dataset.remove_columns("category")

# We split the dataset into two where test data is used to evaluate at the end.
train_and_test_dataset = summarization_dataset.train_test_split(test_size=0.1)

Using the latest cached version of the dataset since databricks/databricks-dolly-15k couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'default' at /Users/marcasbr/.cache/huggingface/datasets/databricks___databricks-dolly-15k/default/0.0.0/bdd27f4d94b9c1f951818a7da7fd7aeea5dbff1a (last modified on Thu Feb 22 17:18:56 2024).
Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 136.04ba/s]
Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 501.77ba/s]


46373

In [105]:
sample_query, gt_answer = format_dolly(inference_dataset[10], False) 
sample_query = sample_query + "\n\n### Answer"

In [106]:
print(sample_query)

### Instruction
what are the rules of cricket ?

### Context
Cricket is a bat-and-ball game played between two teams of eleven players on a field at the centre of which is a 22-yard (20-metre) pitch with a wicket at each end, each comprising two bails balanced on three stumps. The batting side scores runs by striking the ball bowled at one of the wickets with the bat and then running between the wickets, while the bowling and fielding side tries to prevent this (by preventing the ball from leaving the field, and getting the ball to either wicket) and dismiss each batter (so they are "out"). Means of dismissal include being bowled, when the ball hits the stumps and dislodges the bails, and by the fielding side either catching the ball after it is hit by the bat, but before it hits the ground, or hitting a wicket with the ball before a batter can cross the crease in front of the wicket. When ten batters have been dismissed, the innings ends and the teams swap roles. The game is adjudicat

## Instantiate Endpoints

To run inference, we need to instantiate a new `sagemaker.Predictor` class.

In [121]:
finetuned_predictor = sagemaker.Predictor(
    endpoint_name="ft-meta-llama2-7b-chat-tg-ep",
    sagemaker_session=sess,
    serializer=serializers.JSONSerializer(),
    deserializer=deserializers.JSONDeserializer(),
)

pretrained_predictor = sagemaker.Predictor(
    endpoint_name="meta-llama2-7b-chat-tg-ep",
    sagemaker_session=sess,
    serializer=serializers.JSONSerializer(),
    deserializer=deserializers.JSONDeserializer(),
)

In [123]:
# Getting test dataset for endpoint evaluations
test_dataset = train_and_test_dataset["test"]

inputs, ground_truth_responses, responses_before_finetuning, responses_after_finetuning = (
    [],
    [],
    [],
    [],
)

In [124]:
def predict_and_print(datapoint):
    sample_query, gt_answer = format_dolly(datapoint, False) 
    sample_query = sample_query + "\n\n### Answer"
    
    # For instruction fine-tuning, we insert a special key between input and output
    input_output_demarkation_key = "\n\n### Response:\n"

    payload = {
        "inputs": sample_query,
        "parameters": {"temperature": 0.6, "max_new_tokens": 256}
    }
    
    inputs.append(payload["inputs"])
    ground_truth_responses.append(datapoint["response"])

    # Please change the following line to "accept_eula=True"
    pretrained_response = pretrained_predictor.predict(
        payload, custom_attributes="accept_eula=True"
    )
    responses_before_finetuning.append(pretrained_response[0]["generated_text"])
    #print(f'responses_before_finetuning: {pretrained_response[0]["generated_text"]}')
    
    # Please change the following line to "accept_eula=True"
    finetuned_response = finetuned_predictor.predict(
        payload, custom_attributes="accept_eula=True")
    responses_after_finetuning.append(finetuned_response["generated_text"])
    #print(f'responses_after_finetuning: {finetuned_response["generated_text"]}')


In [125]:
try:
    for i, datapoint in enumerate(test_dataset.select(range(2))):
        predict_and_print(datapoint)

    df = pd.DataFrame(
        {
            "Inputs": inputs,
            "Ground Truth": ground_truth_responses,
            "Response from non-finetuned model": responses_before_finetuning,
            "Response from fine-tuned model": responses_after_finetuning,
        }
    )
    #print(df.head())
    display(HTML(df.to_html()))
except Exception as e:
    print(e)

Unnamed: 0,Inputs,Ground Truth,Response from non-finetuned model,Response from fine-tuned model
0,"### Instruction\nBased on this passage, give a bulleted overview of the components of the lean startup methodology.\n\n### Context\nLean startup is a methodology for developing businesses and products that aims to shorten product development cycles and rapidly discover if a proposed business model is viable; this is achieved by adopting a combination of business-hypothesis-driven experimentation, iterative product releases, and validated learning. Lean startup emphasizes customer feedback over intuition and flexibility over planning. This methodology enables recovery from failures more often than traditional ways of product development. \n\n### Answer",Lean startup is a business and product development methodology\nThe aim is to shorten product development cycles\nThe approach is to test business hypotheses\nThe experiments involve iterative product releases and learning from customer feedback,\n\n• Business-hypothesis-driven experimentation\n• Iterative product releases\n• Validated learning\n• Customer feedback over intuition\n• Flexibility over planning,"\nHere is a bulleted overview of the components of the lean startup methodology:\n\n1. Customer development: identifying the target market, validating customer needs, and gaining a deep understanding of the problem the product is intended to solve.\n\n2. Validated learning: building a minimum viable product (MVP) and testing it with a small group of customers to validate assumptions and gather feedback.\n\n3. Business model experimentation: testing different business models to determine which is most viable, and iterating based on customer feedback.\n\n4. Constant iteration: continuously improving the product based on customer feedback and data-driven decision making.\n\n5. Rapid prototyping: creating a prototype of the product to validate the business model and gather feedback quickly.\n\n6. Lean team: a cross-functional team that includes designers, developers, and business experts, who work together to quickly validate assumptions and build a successful product.\n\n7. Lean canvas: a tool used for visualizing the key components of a business's value proposition, customers, channels, and costs.\n\n8. Customer segmentation: identifying and targeting specific customer segments to focus"
1,"### Instruction\nGive me a bulleted list of the main responses to ""Bad Blood"" listed in this text, including whether they were positive or negative.\n\n### Context\n""Bad Blood"" is a song by American singer-songwriter Taylor Swift, taken from her fifth studio album 1989 (2014). Swift wrote the song with its producers Max Martin and Shellback. The album track is a pop song with stomping drums. A hip hop remix of ""Bad Blood"", featuring American rapper Kendrick Lamar and additional production by Ilya, was released as the fourth single from 1989 on May 17, 2015, by Big Machine and Republic Records.\n\nThe lyrics are about feelings of betrayal by a close friend. Upon the album's release, critics expressed mixed opinions about ""Bad Blood"", with some complimenting Swift's defiant attitude and dubbing it an album highlight, while others criticized its production and lyrics. The remixed single was praised for the reworked instrumentation and Lamar's verses, which others considered to be out of place on a pop song. Media outlets speculated that American singer Katy Perry is the subject of the song. The remix received a Grammy nomination for Best Pop Duo/Group Performance.\n\n""Bad Blood"" was supported by a high-budget music video directed by Joseph Kahn and produced by Swift. It featured an ensemble cast consisting of many singers, actresses and fashion models, which received wide media coverage. Critics praised the video for its cinematic and futuristic visuals inspired by neo-noir styles. It won a Grammy Award for Best Music Video, and two MTV Video Music Awards for the Video of the Year and Best Collaboration. Commercially, ""Bad Blood"" reached number one in Australia, Canada, New Zealand, and Scotland, as well as the United States, where it topped the Billboard Hot 100, Adult Top 40 and Mainstream Top 40 charts. It has been certified triple platinum in Australia and Canada, and 6× platinum in the US.\n\n### Answer",* positive about Taylor Swift's defiant attitude\n* negative about the song's production and lyrics\n* positive about the instrumentation and verses\n* positive about the accompanying music video's visuals\n* positive in terms of topping several charts,"\n\nHere are the main responses to ""Bad Blood"":\n\n* Positive:\n\t+ Critics praised Swift's defiant attitude and dubbed the song an album highlight.\n\t+ The remixed single was praised for the reworked instrumentation and Lamar's verses.\n\t+ The music video was praised for its cinematic and futuristic visuals.\n\t+ The video won a Grammy Award for Best Music Video and two MTV Video Music Awards.\n\t+ The song reached number one in several countries, including the US, and was certified multi-platinum in several countries.\n* Negative:\n\t+ Some critics expressed mixed opinions about the song, criticizing its production and lyrics.\n\t+ Media outlets speculated that the song was about Katy Perry, which Swift denied.\n\t+ Some critics found Lamar's verses to be out of place on a pop song.","\n\nPositive responses:\n- The song's production and sound were praised for their catchiness and energy.\n- The lyrics were seen as empowering and relatable, with many fans connecting to the song's themes of female empowerment and standing up for oneself.\n- The music video was also praised for its creativity and visuals, and was seen as a showcase of Swift's creative and artistic vision.\n\nNegative responses:\n- Some critics felt that the song's lyrics were too simplistic and lacked depth, and that its message was overly generic.\n- The song's production was also criticized for being overly simplistic and lacking variety.\n- Some fans felt that the song's lyrics were too personal and intrusive, and that Swift was crossing the line by writing about a specific person.\n- The music video was also criticized for being overly extravagant and excessive, with some feeling that it was a waste of resources.\n\nHowever, overall, the response to ""Bad Blood"" was mixed, with both positive and negative comments being made.### Source: https://en."
