### Steps to Connect to an already running vLLM server:  
We have to Local port-forward from this machine to the machine where vLLM is running:
1. On a terminal login to alvis
2. ssh to this machine, ie. where this notebook is running. say, `ssh alvis10-11`
3. Perform local port-forwarding to vLLM serving machine (say `alvis5-09`): do the following:  
   `ssh -L <vllm_api_port>:localhost:<vllm_api_port> alvis5-09`
4. Now run the following cells

For teacher: Run an vLLM server like so:
```
apptainer exec /apps/containers/vLLM/vllm-0.11.0.sif vllm serve /mimer/NOBACKUP/Datasets/LLM/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775 --port=$(find_ports) --host 0.0.0.0 > vllm.out 2> vllm.err &
```

In [None]:
# Setup: imports and model wrapper
import os, json, re, textwrap
from openai import OpenAI

def call_model(prompt, temperature=0.0, max_tokens=300):
    """Gets response from a vLLM hosted model, given that it is already running on a port."""

    openai_api_key = "EMPTY"
    VLLM_PORT = 8000
    openai_api_base = f"http://localhost:{VLLM_PORT}/v1"

    client = OpenAI(
        api_key=openai_api_key,
        base_url=openai_api_base,
    )

    models = client.models.list()
    model = models.data[0].id
    STREAM = False
    
    resp = client.chat.completions.create(
    model=model,
    messages=[{"role": "user", "content": prompt}],
    stream=STREAM,
    temperature=temperature,
    max_tokens=max_tokens,
    )
    return resp.choices[0].message.content.strip()


def score_response(response, expected_keywords):
    """Simple scoring: counts how many expected keywords appear in the returned text."""
    
    text = response.lower()
    hits = sum(1 for k in expected_keywords if k.lower() in text)
    return hits, len(expected_keywords), hits/len(expected_keywords) if expected_keywords else 1.0


In [None]:
def run_example(field, technique_name, naive_prompt, engineered_prompt, expected_keywords, temperature=0.0):
    print('\n' + '='*80)
    print(f"Field: {field} â€” Technique: {technique_name}\n")
    print('Naive prompt:\n', naive_prompt, '\n')
    r1 = call_model(naive_prompt, temperature=temperature)
    print('Naive response:\n', r1, '\n')
    s1 = score_response(r1, expected_keywords)
    print(f"Naive score: {s1[0]}/{s1[1]} ({s1[2]*100:.0f}%)\n")
    print('Engineered prompt:\n', engineered_prompt, '\n')
    r2 = call_model(engineered_prompt, temperature=temperature)
    print('Engineered response:\n', r2, '\n')
    s2 = score_response(r2, expected_keywords)
    print(f"Engineered score: {s2[0]}/{s2[1]} ({s2[2]*100:.0f}%)")
    improvement = (s2[2] - s1[2]) if s1[1]>0 else s2[2]
    print('\nImprovement (fractional):', round(improvement, 3))
    return {'naive':(r1,s1), 'engineered':(r2,s2)}


### Few shot prompting

In [None]:
# Philology example (few-shot)
naive = "Given this excerpt, what clues tell you about its date? Excerpt: '...incipit missae... long s...'."
engineered = textwrap.dedent('''
    Below are two examples of how to extract dating clues from short manuscript excerpts.
    Example 1:
    Excerpt: "...in nomine..." -> Clues: script = Caroline minuscule; material = parchment; probable date = 9th century.
    Example 2:
    Excerpt: "...long s..." -> Clues: long s indicates later medieval scribal practice; probable date = 14th-16th c.
    Now follow the same pattern for this excerpt: '...incipit missae... long s...'
    Output: list 'Clues' and 'Probable date range'.
    ''')
expected = ['script', 'parchment', 'probable date', 'long s']
run_example('Philology', 'Few-shot prompting', naive, engineered, expected)


### Assigning roles

<details>
    <summary>Hint</summary>
    
```python

        engineered = textwrap.dedent('''
        You are a physics professor teaching an introductory quantum mechanics course to school students who have basic knowledge of classical physics but no prior quantum mechanics background.
    
        Explain quantum entanglement in a way that:
        1. Uses an accessible analogy
        2. Addresses common misconceptions
        3. Explains why it's significant for quantum computing
        4. Keeps the explanation under 200 words
        ''')
    
```
    
</details>    

In [None]:
# Physics example (assign role)
naive = textwrap.dedent('''
    Explain quantum entanglement.
    ''')
engineered = textwrap.dedent('''
    ___
    ''') #FILL_HERE

expected = ['imagine', 'candy', 'friend', 'computers']
run_example('Physics', 'Assigning roles', naive, engineered, expected)


### Separating data and instructions

In [None]:
# Peace & Conflict example (separating data and instructions)
naive = "Read this case and list drivers of conflict: 'In Village X, new dam reduced water, unemployment rose, local militia formed.'"
engineered = textwrap.dedent('''
    Instruction: Identify and rank up to 5 primary drivers of conflict in the provided case. For each driver, give a one-sentence rationale and suggest one short mitigation action.
    ---DATA---
    Case description: "In Village X, construction of a new upstream dam reduced local water access; seasonal crops failed; unemployment rose from 12% to 28% in two years; a local militia formed and there were two clashes with police last year."
    ---END DATA---
    Output format (markdown numbered list):
    1. <Driver>: <one-sentence rationale>. Mitigation: <action>.
    ''')
expected = ['water', 'unemployment', 'militia', 'mitigation', 'rationale']
run_example('Peace & Conflict', 'Separating data and instructions', naive, engineered, expected)


### Formatting output

In [None]:
# Sociology example (formatted JSON output)
naive = "Give me variables and why they're important for studying neighborhood cohesion."
engineered = textwrap.dedent('''
    Return a JSON array where each element is an object with keys:
    - "variable": short variable name
    - "why": one-sentence justification
    - "measurement": suggested question and response type
    Provide exactly 5 variables. Do not include additional text outside the JSON.
    Field context: sociology research on neighborhood cohesion.
    ''')
expected = ['json', 'variable', 'why', 'measurement']
run_example('Sociology', 'Formatting output', naive, engineered, expected)
