<a href="https://colab.research.google.com/github/scorecard-ai/scorecard-cookbook/blob/main/Scorecard_Multi_Message_Prompt_(ChatML)_Example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Demo: Scorecard Multi-Message Prompt (ChatML)
## 🧙‍♂️ Instructions

1. Create an account and [login to Scorecard](https://app.getscorecard.ai/). Copy your [API key](https://app.getscorecard.ai/api-key).
1. Add your Scorecard and OpenAI API Keys below.
1. Go to `Runtime` -> `Run all`. Enjoy!

In [None]:
#@title 👉 API Keys

OPENAI_API_KEY = "" #@param { type: "string" }
SCORECARD_API_KEY = "" #@param { type: "string" }

# Setup

In [None]:
#@title Install dependencies
#@markdown In order to keep the notebook working for all future users, we pin the dependency versions.

!pip install scorecard-ai==0.1.10
!pip install openai==1.11.1

Collecting scorecard-ai==0.1.10
  Downloading scorecard_ai-0.1.10-py3-none-any.whl (41 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.8/41.8 kB[0m [31m433.6 kB/s[0m eta [36m0:00:00[0m
[?25hCollecting httpx>=0.21.2 (from scorecard-ai==0.1.10)
  Downloading httpx-0.26.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.9/75.9 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pydantic<2.5.0,>=1.9.2 (from scorecard-ai==0.1.10)
  Downloading pydantic-2.4.2-py3-none-any.whl (395 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m395.8/395.8 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx>=0.21.2->scorecard-ai==0.1.10)
  Downloading httpcore-1.0.3-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.0/77.0 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx>=0.21.2->scorec

In [None]:
#@title Imports

from openai import OpenAI
from scorecard.client import Scorecard

# Build your LLM system

Now, let's define your system (aka system-under-test)! For this demo, we'll set up an LLM call to generate the opening line of a story, where the user determines what the topic of the story will be.

In [None]:
#@title Define our multi-message prompt template

PROMPT_TEMPLATE_1 = "You are a helpful assistant." #@param { type:"string" }

PROMPT_TEMPLATE_2 = "Assist the user in crafting a story about {user_topic}." #@param { type:"string" }

PROMPT_TEMPLATE_3 = "I need a good opening line for my story. Please generate only the opening line." #@param { type:"string" }

In [None]:
#@title Call OpenAI to generate a story
#@markdown Here we'll define an example of a multi-message prompt sent to OpenAI.

def generate_story(user_topic: str) -> str:
  client = OpenAI(api_key=OPENAI_API_KEY)
  response = client.chat.completions.create(
    model="gpt-3.5-turbo",  # or "gpt-4" depending on your access and requirements
    messages=[
        {"role": "system", "content": PROMPT_TEMPLATE_1},
        {"role": "system", "content": PROMPT_TEMPLATE_2.format(user_topic=user_topic)},
        {"role": "user", "content": PROMPT_TEMPLATE_3}
    ]
  )

  return response.choices[0].message.content

# Evaluate your system

### Pre-req: Create Metrics

First, using the Scorecard application, create your metrics and scoring config. For this example,
we can use something simple like a Helpfulness metric that determines whether
the generation adheres to the user's request.

Once you have created your scoring config, copy the ID and enter it below:

In [None]:
#@title Configure Metrics
SCORING_CONFIG_ID = 1  #@param { type: "number" }

In [None]:
#@title 1. Create a basic Testset
#@markdown Here we'll create a basic Testset that gets stored in Scorecard.

client = Scorecard(
    api_key=SCORECARD_API_KEY
)

# Create a Testset
testset = client.testset.create(
    name="Story Opening Lines",
    description="Demo of a testset created via Scorecard Python SDK",
    using_retrieval=False
)

# Add three testcases
client.testcase.create(
    testset_id=testset.id,
    user_query="magical powers to control ice and snow"
)
client.testcase.create(
    testset_id=testset.id,
    user_query="a journey with a rugged iceman, his loyal reindeer, and a naive snowman"
)
client.testcase.create(
    testset_id=testset.id,
    user_query="the story of two royal sisters"
)

print("Visit the Scorecard app to view your Testset:")
print(f"https://app.getscorecard.ai/view-dataset/{testset.id}")

Visit the Scorecard app to view your Testset:
https://app.getscorecard.ai/view-dataset/786


In [None]:
#@title 2. Run Tests
#@markdown Now we'll create a new Run to execute our LLM system above.

from scorecard.types import RunStatus

run = client.run.create(testset_id=testset.id)
client.run.update_status(run_id=run.id, status=RunStatus.RUNNING_EXECUTION)

for testcase in client.testset.get_testcases(testset_id=testset.id).results:
  model_response = generate_story(user_topic=testcase.user_query)
  client.testrecord.create(run_id=run.id,
                           testset_id=testset.id,
                           testcase_id=testcase.id,
                           user_query=testcase.user_query,
                           response=model_response)

client.run.update_status(run_id=run.id, status=RunStatus.AWAITING_SCORING)

print("Visit the Scorecard app to view your Run:")
print(f"https://app.getscorecard.ai/view-records/{run.id}")

Visit the Scorecard app to view your Run:
https://app.getscorecard.ai/view-records/2610


In [None]:
#@title 3. Kick off Scoring
#@markdown Once your run above is finished executing, hit the "Run Scoring" button to run scoring. Once that's done, visit the Results page:

print("Visit the Scorecard app to view your Results:")
print(f"https://app.getscorecard.ai/view-grades/{run.id}")

Visit the Scorecard app to view your Results:
https://app.getscorecard.ai/view-grades/2610
