# Initial Prompt Engineering and Evals
Let's go through the first steps of AI app building.
What we'll be doing is a building an LLM that can classify text into two categories; American Football Team or Australian Football Team

## Step 1: Vibes
This one is easy and my wager is most of you have already done this.
You get get a model, some data, try some prompts, and see what happens.

Though it feels informal this currently just a really great way to get staretd.

### Task: Initial Prompt and Output
Get a model to output whether a team name is of an Australian Football Team or an American football team.
Since you're the app designer you need to design the format.

However once we feel things out we want to quickly move to the next step.

## Step 2: Repeatable evals
This is where most people drop off. 
Vibes are good but its hard to repeatably scale random vibe checking.

### Task: Eval Harness
Write a repeatable eval script that can test all American and Australian team names.
We included a couple of tasks
 1. Full city and team name provided
 2. Only team name provided
 3. An article about the team

**Note**
The goal here is NOT to build a perfect classifier. Rather we want you to think hard about how to build a repeatable eval, eval metrics, and pitfalls of evals. Don't stress about not getting a 100%, but do make tweaks to see how high of a score you can get!

## Functionality Test
A quick code block to check all the wiring is correct with python and Ollama.

In [51]:
from ollama import chat
from ollama import ChatResponse

model = 'gemma2:2b'

def single_turn(prompt):
    response: ChatResponse = chat(model=model, messages=[
      {
        'role': 'user',
        'content': prompt,
      },
    ])
    return response.message.content

prompt = "Say hello to the class"
single_turn(prompt)

'Hello everyone! 😄 \n'

## Single Team

In [56]:
afl_team = "Carlton Blues"
american_team = "Tennessee Titans"

In [54]:
prompt = "Output if this is an australian or american team, only print australian or american no other output: " + f"{afl_team}"
single_turn(prompt)

'Australian \n'

In [57]:
prompt = "Output if this is an australian or american team, only print australian or american no other output: " + f"{american_team}"
single_turn(prompt)

'American \n'

## Extending evals
Here are the list of AFL and NFL team names.
Check if the model gets things right.
This means you'll need to create a scoring and aggregation function.


In [47]:
afl_clubs = [
    "Adelaide Crows",
    "Brisbane Lions",
    "Carlton Blues",
    "Collingwood Magpies",
    "Essendon Bombers",
    "Fremantle Dockers",
    "Geelong Cats",
    "Gold Coast Suns",
    "Greater Western Sydney (GWS) Giants",
    "Hawthorn Hawks",
    "Melbourne Demons",
    "North Melbourne Kangaroos",
    "Port Adelaide Power",
    "Richmond Tigers",
    "St Kilda Saints",
    "Sydney Swans",
    "West Coast Eagles",
    "Western Bulldogs"
]

nfl_teams = [
    "Arizona Cardinals",
    "Atlanta Falcons",
    "Baltimore Ravens",
    "Buffalo Bills",
    "Carolina Panthers",
    "Chicago Bears",
    "Cincinnati Bengals",
    "Cleveland Browns",
    "Dallas Cowboys",
    "Denver Broncos",
    "Detroit Lions",
    "Green Bay Packers",
    "Houston Texans",
    "Indianapolis Colts",
    "Jacksonville Jaguars",
    "Kansas City Chiefs",
    "Las Vegas Raiders",
    "Los Angeles Chargers",
    "Los Angeles Rams",
    "Miami Dolphins",
    "Minnesota Vikings",
    "New England Patriots",
    "New Orleans Saints",
    "New York Giants",
    "New York Jets",
    "Philadelphia Eagles",
    "Pittsburgh Steelers",
    "San Francisco 49ers",
    "Seattle Seahawks",
    "Tampa Bay Buccaneers",
    "Tennessee Titans",
    "Washington Commanders"
]

In [48]:
import numpy as np
eval_map = {"australian": afl_name, "american": nfl_names}

# Remove key let students code themselves
score = []
for nationality, teams in eval_map.items():
    for team in teams:
        prompt = "Output if this is an australian or american team, only print australian or american no other output: " + f"{team}"
        #print(prompt)
        response = single_turn(prompt).strip()
        score.append(response.lower() == nationality)
        print(f"{team}: {response}")

print(np.array(score).mean())

Adelaide Crows: Australian
Brisbane Lions: Australian
Carlton Blues: Australian
Collingwood Magpies: Australian
Essendon Bombers: Australian
Fremantle Dockers: Australian
Geelong Cats: Australian
Gold Coast Suns: Australian
Greater Western Sydney (GWS) Giants: Australian
Hawthorn Hawks: Australian
Melbourne Demons: Australian
North Melbourne Kangaroos: Australian
Port Adelaide Power: Australian
Richmond Tigers: Australian
St Kilda Saints: Australian
Sydney Swans: Australian
West Coast Eagles: Australian
Western Bulldogs: Australian
Arizona Cardinals: American
Atlanta Falcons: American
Baltimore Ravens: American
Buffalo Bills: American
Carolina Panthers: American
Chicago Bears: American
Cincinnati Bengals: American
Cleveland Browns: American
Dallas Cowboys: American
Denver Broncos: American
Detroit Lions: American
Green Bay Packers: American
Houston Texans: American
Indianapolis Colts: American
Jacksonville Jaguars: American
Kansas City Chiefs: American
Las Vegas Raiders: American
Los A

## Maing things harder. Taking just the team name
What happens if we don't provide the full context. What happens to our score then?

In [63]:
afl_names = ['Crows',
 'Lions',
 'Blues',
 'Magpies',
 'Bombers',
 'Dockers',
 'Cats',
 'Suns',
 'Giants',
 'Hawks',
 'Demons',
 'Kangaroos',
 'Power',
 'Tigers',
 'Saints',
 'Swans',
 'Eagles',
 'Bulldogs']

In [64]:
nfl_teams = ['Cardinals',
 'Falcons',
 'Ravens',
 'Bills',
 'Panthers',
 'Bears',
 'Bengals',
 'Browns',
 'Cowboys',
 'Broncos',
 'Lions',
 'Packers',
 'Texans',
 'Colts',
 'Jaguars',
 'Chiefs',
 'Raiders',
 'Chargers',
 'Rams',
 'Dolphins',
 'Vikings',
 'Patriots',
 'Saints',
 'Giants',
 'Jets',
 'Eagles',
 'Steelers',
 '49ers',
 'Seahawks',
 'Buccaneers',
 'Titans',
 'Commanders']

In [65]:
eval_map = {"american": nfl_teams, "australian": afl_names}

core = []

for nationality, teams in eval_map.items():
    for team in teams:
        team = team.rsplit(maxsplit =1)[-1]
        prompt = "Output if this is an australian or american team, only print australian or american no other output: " + f"{team}"
        response = single_turn(prompt).strip()
        score.append(response.lower() == nationality)
        print(f"{team}: {response}")
print(np.array(score).mean())

Cardinals: American
Falcons: Australian
Ravens: American
Bills: American
Panthers: Australian
Bears: Australian
Bengals: American
Browns: American
Cowboys: Australian
Broncos: Australian
Lions: Australian
Packers: American
Texans: American
Colts: Australian
Jaguars: Australian
Chiefs: American
Raiders: Australian
Chargers: American
Rams: American
Dolphins: Australian
Vikings: Australian
Patriots: American
Saints: Australian
Giants: Australian
Jets: American
Eagles: Australian
Steelers: American
49ers: American
Seahawks: American
Buccaneers: American
Titans: Australian
Commanders: American
Crows: Australian
Lions: Australian
Blues: Australian
Magpies: Australian
Bombers: Australian
Dockers: Australian
Cats: Australian
Suns: Australian
Giants: American
Hawks: American
Demons: Australian
Kangaroos: Australian
Power: Australian
Tigers: Australian
Saints: Australian
Swans: Australian
Eagles: Australian
Bulldogs: Australian
0.67


## News Articles
Now try this for some long form articles. This time we won't give you an answer key, we'll let you figure things out.


TODO: Ravin will fill this in. But its basically the same as above. Instead of a team name it'll take in what is a synthetically generated news article and classify it