# Test setup for local inference using phi model

In [1]:
model_id = "microsoft/Phi-3-mini-4k-instruct"
model_name = "microsoft/Phi-3-mini-4k-instruct"

## Download and instantiate model

In [2]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

torch.random.manual_seed(0)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained(model_id)

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

## Example 1: Outfit builder

In [3]:
SYSTEM_PROMPT = """You are a fashion designer helping a client choose an outfit. You will be provided a prompt by the user and you will respond with some options for outfit combinations.

Below is an example exchange between you and a user:<|end|>
<|user|>
I'm a short, curvy man with a muscular build. I opt for relaxed, comfortable clothing like stretch jeans and loose shirts that accommodate my frame.<|end|>
<|assistant|>
Outfit Combination 1:
- Top: Fitted white linen shirt
- Bottom: Slim-fit beige chinos
- Shoe: Brown leather loafers
- Accessories: Brown woven belt, aviator sunglasses

Outfit Combination 2:
- Top: Light blue oxford button-down shirt
- Bottom: Navy blue tailored trousers
- Shoe: Tan leather brogues
- Accessories: Navy blue patterned pocket square, silver wristwatch

Outfit Combination 3:
- Top: Light gray tailored blazer
- Bottom: Dark wash denim jeans
- Shoe: White canvas sneakers
- Accessories: Black leather belt, silver pendant necklace

Outfit Combination 4:
- Top: Navy blue polo shirt
- Bottom: Khaki shorts
- Shoe: Brown leather sandals
- Accessories: Navy blue baseball cap, woven brown bracelet

Outfit Combination 5:
- Top: Light pink dress shirt (with rolled-up sleeves)
- Bottom: Charcoal gray dress pants
- Shoe: Black monk strap shoes
- Accessories: Black leather belt, silver tie clip<|end|>

Please provide a response to the user prompt below:"""

messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": "I'm a petite, slender man with a youthful appearance. I gravitate towards smart-casual wear, preferring fitted jeans and stylish bomber jackets."},
    ]

generation_args = {
    "max_new_tokens": 500,
    "return_full_text": True,
    "temperature": 1,
    "do_sample": True,
}

output = pipe(messages, **generation_args)

The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.
You are not running the flash-attention implementation, expect numerical differences.


In [4]:
output

[{'generated_text': [{'role': 'system',
    'content': "You are a fashion designer helping a client choose an outfit. You will be provided a prompt by the user and you will respond with some options for outfit combinations.\n\nBelow is an example exchange between you and a user:<|end|>\n<|user|>\nI'm a short, curvy man with a muscular build. I opt for relaxed, comfortable clothing like stretch jeans and loose shirts that accommodate my frame.<|end|>\n<|assistant|>\nOutfit Combination 1:\n- Top: Fitted white linen shirt\n- Bottom: Slim-fit beige chinos\n- Shoe: Brown leather loafers\n- Accessories: Brown woven belt, aviator sunglasses\n\nOutfit Combination 2:\n- Top: Light blue oxford button-down shirt\n- Bottom: Navy blue tailored trousers\n- Shoe: Tan leather brogues\n- Accessories: Navy blue patterned pocket square, silver wristwatch\n\nOutfit Combination 3:\n- Top: Light gray tailored blazer\n- Bottom: Dark wash denim jeans\n- Shoe: White canvas sneakers\n- Accessories: Black leathe

## Example 2: Simple tool call

### Inference

In [5]:
SYSTEM_PROMPT = """You are an agent in a resource optimization game. Your goal is to maximize the total rewards generated over your lifetime.

At each time step, you will be provided with a state description and a set of actions to choose from.

The state description consists of the following information:
- cash: the number of tokens you currently have
- investment: the number of tokens you have invested
- rewards: the total rewards you have accumulated
- previous actions: the actions you have taken so far

You will be asked to select an action to take at each time step. The action space consists of the following options:
- <work>: Work to generate new tokens
- <invest>: Invest tokens to generate tokens passively
- <collect>: Collect tokens by cashing out investment
- <spend>: Spend token to generate rewards

Below is an example:

State:
cash=100
investment=100
rewards=20
previous actions=<work>, <invest>, <spend>, <invest>, <work>

What action would you like to take?
<|Action|>: <work>
<|Reasoning|>: ...


Let's get started with a new game:

State:
cash=0
investment=0
rewards=0
previous actions=

What action would you like to take?"""

messages = [{"role": "system", "content": SYSTEM_PROMPT},]

generation_args = {
    "max_new_tokens": 50,
    "return_full_text": True,
    # "temperature": 1,
    "do_sample": False,
}

output = pipe(messages, **generation_args)

In [6]:
for x in output[0]["generated_text"]:
    print(x["content"])

You are an agent in a resource optimization game. Your goal is to maximize the total rewards generated over your lifetime.

At each time step, you will be provided with a state description and a set of actions to choose from.

The state description consists of the following information:
- cash: the number of tokens you currently have
- investment: the number of tokens you have invested
- rewards: the total rewards you have accumulated
- previous actions: the actions you have taken so far

You will be asked to select an action to take at each time step. The action space consists of the following options:
- <work>: Work to generate new tokens
- <invest>: Invest tokens to generate tokens passively
- <collect>: Collect tokens by cashing out investment
- <spend>: Spend token to generate rewards

Below is an example:

State:
cash=100
investment=100
rewards=20
previous actions=<work>, <invest>, <spend>, <invest>, <work>

What action would you like to take?
<|Action|>: <work>
<|Reasoning|>: ..