The `client.chat.completions.create` function can accept several key parameters when making a request:

1. **model**: Specifies which OpenAI model to use, like `"gpt-4"`.
2. **messages**: A list of messages that form the conversation history. Each message is an object with a `"role"` (e.g., `"user"`, `"assistant"`, `"system"`) and `"content"` which contains the text.
3. **temperature**: Controls the randomness of the model's output. A lower value makes the output more deterministic, while a higher value increases diversity.
4. **max_tokens**: Limits the number of tokens in the generated response.
5. **top_p**: Implements nucleus sampling; it considers only the tokens that make up the top `p` probability mass.
6. **n**: Specifies how many completion choices to generate.
7. **stream**: If set to `true`, it streams back partial progress as the output is generated.
8. **stop**: Specifies a stopping sequence, where the model stops generating further tokens.
9. **presence_penalty** and **frequency_penalty**: These parameters penalize new versus repeated tokens respectively to control the diversity and repetitiveness of the output.

These parameters offer flexibility in shaping the responses generated by the model, allowing you to fine-tune the output to your needs.

For more detailed information, you can refer to the official [OpenAI documentation](https://platform.openai.com/docs/models/chat/completions/create)【6†source】【7†source】【8†source】.

In [1]:
import os
import openai
import yaml
from dotenv import load_dotenv

# Load environment variables from the .env file
load_dotenv()

# Load API key from a YAML configuration file
with open("config.yaml") as f:
    config_yaml = yaml.load(f, Loader=yaml.FullLoader)

# Initialize the OpenAI client with the API key
client = openai.OpenAI(api_key=config_yaml['token'])

# Set the model name
MODEL = "gpt-4o-mini"

def llm(prompt, stop=["\n"]):
    # Prepare the dialog for the API request
    dialogs = [{"role": "user", "content": prompt}]
    
    # Call OpenAI API to generate the completion
    completion = client.chat.completions.create(
        model=MODEL,         
        messages=dialogs,                   
        temperature=0,                   # Controls randomness, 0 means more deterministic output
        max_tokens=100,                  # Limits the generated text length to 100 tokens
        top_p=1,                         # Controls diversity via nucleus sampling, 1 means using the full distribution
        frequency_penalty=0.0,           # Reduces repeated words in the output, 0 means no penalty
        presence_penalty=0.0,            # Reduces likelihood of word repetition, 0 means no penalty
        stop=stop                        # Stop sequence for generation, here it's a newline "\n"
    )
    
    # Return the generated content
    return completion.choices[0].message.content


In [97]:
import sys

# # 定义 Logger 类
# class Logger(object):
#     def __init__(self, filename="alfworldresult.txt"):
#         self.terminal = sys.stdout
#         self.log = open(filename, "w")
# 
#     def write(self, message):
#         self.terminal.write(message)  # 将消息写入到控制台
#         self.log.write(message)  # 将消息写入到文件
# 
#     def flush(self):
#         # 这里无需实现，但需要保留 flush 方法，以避免报错
#         pass
# 
# # 将 sys.stdout 替换为 Logger 实例
# sys.stdout = Logger()



In [6]:
# # Set the environment variable
%env ALFWORLD_DATA=/Users/huangziheng/PycharmProjects/ALFWORLD_DATA
# # Verify that it's set
# !echo $MY_VARIABLE

env: ALFWORLD_DATA=/Users/huangziheng/PycharmProjects/ALFWORLD_DATA


In [7]:
import yaml
import alfworld
import alfworld.agents.environment

with open('base_config.yaml') as reader:
    config = yaml.safe_load(reader)

# Replace environment variables in paths
def expand_env_vars(config):
    if isinstance(config, dict):
        return {k: expand_env_vars(v) for k, v in config.items()}
    elif isinstance(config, list):
        return [expand_env_vars(v) for v in config]
    elif isinstance(config, str):
        return os.path.expandvars(config)
    else:
        return config

config = expand_env_vars(config)
    
split = "eval_out_of_distribution"

config

{'environment': {'ALFWORLD_DATA': '/Users/huangziheng/PycharmProjects/ALFWORLD_DATA'},
 'dataset': {'data_path': '/Users/huangziheng/PycharmProjects/ALFWORLD_DATA/json_2.1.1/train',
  'eval_id_data_path': '/Users/huangziheng/PycharmProjects/ALFWORLD_DATA/json_2.1.1/valid_seen',
  'eval_ood_data_path': '/Users/huangziheng/PycharmProjects/ALFWORLD_DATA/json_2.1.1/valid_unseen',
  'num_train_games': -1,
  'num_eval_games': -1},
 'logic': {'domain': '/Users/huangziheng/PycharmProjects/ALFWORLD_DATA/logic/alfred.pddl',
  'grammar': '/Users/huangziheng/PycharmProjects/ALFWORLD_DATA/logic/alfred.twl2'},
 'env': {'type': 'AlfredTWEnv',
  'regen_game_files': False,
  'domain_randomization': False,
  'task_types': [1, 2, 3, 4, 5, 6],
  'expert_timeout_steps': 150,
  'expert_type': 'handcoded',
  'goal_desc_human_anns_prob': 0.0,
  'hybrid': {'start_eps': 100000, 'thor_prob': 0.5, 'eval_mode': 'tw'},
  'thor': {'screen_width': 300,
   'screen_height': 300,
   'smooth_nav': False,
   'save_frames_

In [8]:
env = getattr(alfworld.agents.environment, config["env"]["type"])(config, train_eval=split)
#<alfworld.agents.environment.alfred_tw_env.AlfredTWEnv at 0x7fb9a0211e20>

env = env.init_env(batch_size=1)
#<textworld.gym.envs.textworld_batch.TextworldBatchGymEnv at 0x7fb9b2f95f40>

def process_ob(ob):
    if ob.startswith('You arrive at loc '):
        ob = ob[ob.find('. ')+2:]    
    return ob
 
#  ```python
# ob = "You arrive at loc 5. You see a beautiful park."
# ```
# 
# - 由于 `ob` 以 `'You arrive at loc '` 开头，`ob.find('. ')` 返回 `22`（`'. '` 的位置）。
# - `ob[22+2:]` 从索引 `24` 开始，结果是 `'You see a beautiful park.'`。
# - 函数将返回 `'You see a beautiful park.'`。


Initializing AlfredTWEnv...


100%|██████████| 341/341 [00:00<00:00, 739.64it/s] 

Overall we have 134 games in split=eval_out_of_distribution
Evaluating with 134 games





In [9]:
import json
folder = './prompts/'
prompt_file = 'alfworld_3prompts.json'
with open(folder + prompt_file, 'r') as f:
    d = json.load(f)


In [10]:
import sys  

def alfworld_run(prompt, to_print=True, ob=''):
    init_prompt = prompt + ob + '\n>'  # 初始化提示符，将 prompt 和 ob 拼接起来，并添加一个提示符
    prompt = ''  # 清空 prompt 变量
    if to_print:
        print(ob)  # 如果 to_print 为 True，打印 ob
        sys.stdout.flush()  # 刷新标准输出，确保 ob 立刻显示
        # sys.stdout.flush() 是一种控制输出显示时机的手段，确保所有缓冲的数据被立即写入到标准输出设备上。这在需要实时输出或调试程序时尤其有用。

    for i in range(1, 50):
        print("===============")
        print("i", i)
        action = llm(init_prompt + prompt, stop=['\n']).strip()
        print("action", action)
        observation, reward, done, info = env.step([action])
        print("observation, reward, done, info", observation, reward, done, info)

        observation, reward, done = process_ob(observation[0]), info['won'][0], done[0]
        print("observation, reward, done", observation, reward, done)

        if action.startswith('think:'):
            observation = 'OK.'

        if to_print:
            print(f'Act {i}: {action}\nObs {i}: {observation}')
            sys.stdout.flush()

        prompt += f' {action}\n{observation}\n>'
        print("prompt", prompt)


        if done:
            return reward

    return 0


# def alfworld_run(prompt, to_print=True, ob=''):
#     init_prompt = prompt + ob + '\n>'  # 初始化提示符，将 prompt 和 ob 拼接起来，并添加一个提示符
#     prompt = ''  # 清空 prompt 变量
#     if to_print:
#         print(ob)  # 如果 to_print 为 True，打印 ob
#         sys.stdout.flush()  # 刷新标准输出，确保 ob 立刻显示
#         
#     for i in range(1, 50):
#         print("===============")
#         print("i", i)
#         action = llm(init_prompt + prompt, stop=['\n']).strip()
#         print("action", action)
#         observation, reward, done, info = env.step([action])
#         print("observation, reward, done, info", observation, reward, done, info)
#      
#         observation, reward, done = process_ob(observation[0]), info['won'][0], done[0]
#         print("observation, reward, done", observation, reward, done)
#     
#         if action.startswith('think:'):
#             observation = 'OK.'
#     
#         if to_print:
#             print(f'Act {i}: {action}\nObs {i}: {observation}')
#             sys.stdout.flush()
#     
#         prompt += f' {action}\n{observation}\n>'
#         print("prompt", prompt)
#     
#         if done:
#             return reward
# 
#     return 0

# 在使用 Logger 之后，所有的 print 输出将会同时显示在控制台和 alfworldresult.txt 文件中。

In [11]:
prefixes = {
    'pick_and_place': 'put',
    'pick_clean_then_place': 'clean',
    'pick_heat_then_place': 'heat',
    'pick_cool_then_place': 'cool',
    'look_at_obj': 'examine',
    'pick_two_obj': 'puttwo'
}
cnts = [0] * 6 #counts
rs = [0] * 6 #Results

In [12]:
ob, info = env.reset()

ob

('-= Welcome to TextWorld, ALFRED! =-\n\nYou are in the middle of a room. Looking quickly around you, you see a cabinet 6, a cabinet 5, a cabinet 4, a cabinet 3, a cabinet 2, a cabinet 1, a coffeemachine 1, a countertop 3, a countertop 2, a countertop 1, a drawer 3, a drawer 2, a drawer 1, a fridge 1, a garbagecan 1, a microwave 1, a shelf 3, a shelf 2, a shelf 1, a sinkbasin 1, a stoveburner 4, a stoveburner 3, a stoveburner 2, a stoveburner 1, and a toaster 1.\n\nYour task is to: put a cool tomato in microwave.',)

In [13]:
info

{'admissible_commands': [['go to cabinet 1',
   'go to cabinet 2',
   'go to cabinet 3',
   'go to cabinet 4',
   'go to cabinet 5',
   'go to cabinet 6',
   'go to coffeemachine 1',
   'go to countertop 1',
   'go to countertop 2',
   'go to countertop 3',
   'go to drawer 1',
   'go to drawer 2',
   'go to drawer 3',
   'go to fridge 1',
   'go to garbagecan 1',
   'go to microwave 1',
   'go to shelf 1',
   'go to shelf 2',
   'go to shelf 3',
   'go to sinkbasin 1',
   'go to stoveburner 1',
   'go to stoveburner 2',
   'go to stoveburner 3',
   'go to stoveburner 4',
   'go to toaster 1',
   'inventory',
   'look']],
 'won': [False],
 'extra.gamefile': ['/Users/huangziheng/PycharmProjects/ALFWORLD_DATA/json_2.1.1/valid_unseen/pick_cool_then_place_in_recep-Tomato-None-Microwave-10/trial_T20190909_102644_926781/game.tw-pddl']}

In [14]:
ob = '\n'.join(ob[0].split('\n\n')[1:])
ob

'You are in the middle of a room. Looking quickly around you, you see a cabinet 6, a cabinet 5, a cabinet 4, a cabinet 3, a cabinet 2, a cabinet 1, a coffeemachine 1, a countertop 3, a countertop 2, a countertop 1, a drawer 3, a drawer 2, a drawer 1, a fridge 1, a garbagecan 1, a microwave 1, a shelf 3, a shelf 2, a shelf 1, a sinkbasin 1, a stoveburner 4, a stoveburner 3, a stoveburner 2, a stoveburner 1, and a toaster 1.\nYour task is to: put a cool tomato in microwave.'

In [15]:
name = '/'.join(info['extra.gamefile'][0].split('/')[-3:-1])
name

'pick_cool_then_place_in_recep-Tomato-None-Microwave-10/trial_T20190909_102644_926781'

In [16]:
# ob, info = env.reset()
# ob = '\n'.join(ob[0].split('\n\n')[1:])
# name = '/'.join(info['extra.gamefile'][0].split('/')[-3:-1])
# print(name)

In [17]:
# Open the file in write mode
# with open('alfworldresult.txt', 'w') as f:
#     for i, (k, v) in enumerate(prefixes.items()):
#         if name.startswith(k):
#             prompt = 'Interact with a household to solve a task. Here are two examples.\n' + d[f'react_{v}_1'] + d[f'react_{v}_0'] + '\nHere is the task.\n'
#             f.write(f'{k} {v}\n')
#             f.write(f'ob: {ob}\n')
#             f.write(f'prompt: {prompt}\n')
#             r = alfworld_run(prompt, ob=ob)
#             rs[i] += r
#             cnts[i] += 1
#             break
#     f.write(f'{_+1} r: {r} rs: {rs} cnts: {cnts} sum(rs)/sum(cnts): {sum(rs) / sum(cnts)}\n')
#     f.write('------------\n')

for i, (k, v) in enumerate(prefixes.items()):
    if name.startswith(k):
        prompt = 'Interact with a household to solve a task. Here are two examples.\n' + d[f'react_{v}_1'] + d[f'react_{v}_0'] + '\nHere is the task.\n'
        print(k, v)
        r = alfworld_run(prompt, ob=ob)
        rs[i] += r
        cnts[i] += 1
        break
print(_+1, 'r', r, 'rs', rs, 'cnts', cnts, 'sum(rs)/sum(cnts)', sum(rs) / sum(cnts))
print('------------\n')

# # 结束后恢复 sys.stdout
# sys.stdout.log.close()
# sys.stdout = sys.stdout.terminal



pick_cool_then_place cool
You are in the middle of a room. Looking quickly around you, you see a cabinet 6, a cabinet 5, a cabinet 4, a cabinet 3, a cabinet 2, a cabinet 1, a coffeemachine 1, a countertop 3, a countertop 2, a countertop 1, a drawer 3, a drawer 2, a drawer 1, a fridge 1, a garbagecan 1, a microwave 1, a shelf 3, a shelf 2, a shelf 1, a sinkbasin 1, a stoveburner 4, a stoveburner 3, a stoveburner 2, a stoveburner 1, and a toaster 1.
Your task is to: put a cool tomato in microwave.
i 1
action To solve the task, I need to find a tomato, cool it with the fridge, and then put it in the microwave.
observation, reward, done, info ('Nothing happens.',) (0,) (False,) {'admissible_commands': [['go to cabinet 1', 'go to cabinet 2', 'go to cabinet 3', 'go to cabinet 4', 'go to cabinet 5', 'go to cabinet 6', 'go to coffeemachine 1', 'go to countertop 1', 'go to countertop 2', 'go to countertop 3', 'go to drawer 1', 'go to drawer 2', 'go to drawer 3', 'go to fridge 1', 'go to garbage

TypeError: can only concatenate str (not "int") to str

In [25]:

for _ in range(134):
    ob, info = env.reset()
    ob = '\n'.join(ob[0].split('\n\n')[1:])
    name = '/'.join(info['extra.gamefile'][0].split('/')[-3:-1])
    print(name)
    for i, (k, v) in enumerate(prefixes.items()):
        if name.startswith(k):
            prompt = 'Interact with a household to solve a task. Here are two examples.\n' + d[f'react_{v}_1'] + d[f'react_{v}_0'] + '\nHere is the task.\n'
            print(k, v)
            r = alfworld_run(prompt, ob=ob)
            rs[i] += r
            cnts[i] += 1
            break
    print(_+1, 'r', r, 'rs', rs, 'cnts', cnts, 'sum(rs)/sum(cnts)', sum(rs) / sum(cnts))
    print('------------\n')


KeyboardInterrupt: 