In [1]:
# This is a basic prompt template containing all the necessary onboarding information to solve AppWorld tasks. It explains the role of the agent and the supervisor, how to explore the API documentation, how to operate the interactive coding environment and call APIs via a simple task, and provides key instructions and disclaimers.

# You can adapt it as needed by your agent. You can also choose to bypass API docs app and build your own API retrieval, e.g., for FullCodeRefl, IPFunCall, etc, we asked an LLM to predict relevant APIs separately and put its documentation directly in the prompt.
PROMPT_TEMPLATE = """
USER:
I am your supervisor and you are a super intelligent AI Assistant whose job is to achieve my day-to-day tasks completely autonomously.

To do this, you will need to interact with app/s (e.g., spotify, venmo, etc) using their associated APIs on my behalf. For this you will undertake a *multi-step conversation* using a python REPL environment. That is, you will write the python code and the environment will execute it and show you the result, based on which, you will write python code for the next step and so on, until you've achieved the goal. This environment will let you interact with app/s using their associated APIs on my behalf.

Here are three key APIs that you need to know to get more information

# To get a list of apps that are available to you.
print(apis.api_docs.show_app_descriptions())

# To get the list of apis under any app listed above, e.g. supervisor
print(apis.api_docs.show_api_descriptions(app_name='supervisor'))

# To get the specification of a particular api, e.g. supervisor app's show_account_passwords
print(apis.api_docs.show_api_doc(app_name='supervisor', api_name='show_account_passwords'))

Each code execution will produce an output that you can use in subsequent calls. Using these APIs, you can now generate code, that the environment will execute, to solve the task.

For example, consider the task:

My name is: {{ supervisor.first_name }} {{ supervisor.last_name }}. My personal email is {{ supervisor.email }} and phone number is {{ supervisor.phone_number }}.

Task:

What is the password for my Spotify account?

ASSISTANT:
# Okay. Lets first find which apps are available to get the password by looking at the app descriptions.
print(apis.api_docs.show_app_descriptions())

USER:
[
  {
    "name": "api_docs",
    "description": "An app to search and explore API documentation."
  },
  {
    "name": "supervisor",
    "description": "An app to access supervisor's personal information, account credentials, addresses, payment cards, and manage the assigned task."
  },
  ...
  {
    "name": "spotify",
    "description": "A music streaming app to stream songs and manage song, album and playlist libraries."
  },
  {
    "name": "venmo",
    "description": "A social payment app to send, receive and request money to and from others."
  },
  ...
]


ASSISTANT:
# Looks like the supervisor app could help me with that. Lets see what apis are available under this app.
print(apis.api_docs.show_api_descriptions(app_name='supervisor'))


USER:
[
  ...
  "show_account_passwords : Show your supervisor's account passwords."
  ...
]


ASSISTANT:
# I can use `show_account_passwords` to get the passwords. Let me see its detailed specification to understand its arguments and output structure.
print(apis.api_docs.show_api_doc(app_name='supervisor', api_name='show_account_passwords'))

USER:
{
  'app_name': 'supervisor',
  'api_name': 'show_account_passwords',
  'path': '/account_passwords',
  'method': 'GET',
  'description': "Show your supervisor's app account passwords.",
  'parameters': [],
  'response_schemas': {
    'success': [{'account_name': 'string', 'password': 'string'}],
    'failure': {'message': 'string'}
  }
}

ASSISTANT:
# Okay, it requires no arguments. So I can just call it directly.
print(apis.supervisor.show_account_passwords())

USER:
[
  {
    "account_name": "spotify",
    "password": "dummy_spotify_pass"
  },
  {
    "account_name": "file_system",
    "password": "dummy_fs_pass"
  },
  ...
]


ASSISTANT:
# So the Spotify password is an entry in the `passwords` list with the account_name=spotify.
spotify_password = [account_password["account_name"] == "spotify" for account_password in passwords][0]["password"]
print(spotify_password)


USER:
dummy_spotify_pass

ASSISTANT:
# When the task is completed, I need to call apis.supervisor.complete_task(). If there is an answer, I need to pass it as an argument `answer`. I will pass the spotify_password as an answer.
apis.supervisor.complete_task(answer=spotify_password)


USER:
Marked the active task complete.


----------------------------------------------

USER:
**Key instructions and disclaimers**:

1. The email addresses, access tokens and variables (e.g. spotify_password) in the example above were only for demonstration. Obtain the correct information by calling relevant APIs yourself.
2. Only generate valid code blocks, i.e., do not put them in ```...``` or add any extra formatting. Any thoughts should be put as code comments.
3. You can use the variables from the previous code blocks in the subsequent code blocks.
4. Write small chunks of code and only one chunk of code in every step. Make sure everything is working correctly before making any irreversible change.
5. The provided Python environment has access to its standard library. But modules and functions that have a risk of affecting the underlying OS, file system or process are disabled. You will get an error if do call them.
6. Any reference to a file system in the task instructions means the file system *app*, operable via given APIs, and not the actual file system the code is running on. So do not write code making calls to os-level modules and functions.
7. To interact with apps, only use the provided APIs, and not the corresponding Python packages. E.g., do NOT use `spotipy` for Spotify. Remember, the environment only has the standard library.
8. The provided API documentation has both the input arguments and the output JSON schemas. All calls to APIs and parsing its outputs must be as per this documentation.
9. For APIs that return results in "pages", make sure to consider all pages.
10. To obtain current data or time, use Python functions like `datetime.now()` or obtain it from the phone app. Do not rely on your existing knowledge of what the current date or time is.
11. For all temporal requests, use proper time boundaries, e.g., if I ask for something that happened yesterday, make sure to consider the time between 00:00:00 and 23:59:59. All requests are concerning a single, default (no) time zone.
12. Any reference to my friends, family or any other person or relation refers to the people in my phone's contacts list.
13. All my personal information, and information about my app account credentials, physical addresses and owned payment cards are stored in the "supervisor" app. You can access them via the APIs provided by the supervisor app.
14. Once you have completed the task, call `apis.supervisor.complete_task()`. If the task asks for some information, return it as the answer argument, i.e. call `apis.supervisor.complete_task(answer=<answer>)`. For tasks that do not require an answer, just skip the answer argument or pass it as None.
15. The answers, when given, should be just entity or number, not full sentences, e.g., `answer=10` for "How many songs are in the Spotify queue?". When an answer is a number, it should be in numbers, not in words, e.g., "10" and not "ten".
16. You can also pass `status="fail"` in the complete_task API if you are sure you cannot solve it and want to exit.
17. You must make all decisions completely autonomously and not ask for any clarifications or confirmations from me or anyone else.

USER:
Using these APIs, now generate code to solve the actual task:

My name is: {{ supervisor.first_name }} {{ supervisor.last_name }}. My personal email is {{ supervisor.email }} and phone number is {{ supervisor.phone_number }}.

Task:

{{ instruction }}
"""

In [2]:
# import json
# import requests

# def call_llm(messages: list[dict]):
#     model_name = 'mistral-7b-inst-2252b'
#     url = "https://aiplatform.ccg24-hrzana-edk8s.ccg24.lvs.paypalinc.com/seldon/seldon/" + model_name + "/v2/models/" + model_name + "/infer"
#     payload = json.dumps({
#       "messages": messages,
#       # "max_tokens": 2000,
#       # "temperature": 0.0,
#       # "frequency_penalty": 0,
#       # "presence_penalty": 0,
#       # "top_p": 0.0,
#       # "stop": None
#     })
#     headers = {
#         'Content-Type': 'application/json'
#     }

#     response = requests.request("POST", url, headers=headers, data=payload, verify=False)
#     # print(type(response))
#     try:
#         resp_data = response.json()
#         print(f"resp_data -> {resp_data}")
#         return(resp_data["detail"][0]["input"]["messages"][1]["content"])
#     except:
#         resp_data = response.text
#         if resp_data.lstrip().startswith("data:"):
#             resp_data = resp_data.lstrip()[5:].strip()
#         response_json = json.loads(resp_data)
#         # print("\n\n\n\n\n")
#         # print(f"response_json - {response_json}")
#         return (response_json["detail"][0]["input"]["messages"][1]["content"])
 

In [7]:
def create_input(user_query, thought_list, last_obs):
    print(f"user query is {user_query}")
    prompt_template = """User Instruction : 
```{}```
The following list contains the ordered set of steps already taken, the last element of the list is the current state of the task : 
```{}```
Current Observation : 
```{}```

Based on User instruction, history of steps taken and current observation, decide your next step.
Respond with the action to take and the thought behind it.

Respond in the following json format:
{{"thought" : "...",
"action" : "..."}}
""".format(user_query, thought_list, last_obs)
    
    return prompt_template

In [9]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

base_model = AutoModelForCausalLM.from_pretrained('/projects/llm-repo/models/Qwen/Qwen2.5-14B-Instruct', device_map='auto', torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained('/projects/llm-repo/models/Qwen/Qwen2.5-14B-Instruct')

query = "Send $1 privately to 2134567890?"


text = create_input(query, thought_list = '', last_obs = '')
inputs = tokenizer(text, return_tensors="pt").to('cuda')
outputs = base_model.generate(input_ids=inputs["input_ids"].to("cuda"), attention_mask=inputs["attention_mask"], max_new_tokens=1000, pad_token_id=tokenizer.eos_token_id)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

loading configuration file /projects/llm-repo/models/Qwen/Qwen2.5-14B-Instruct/config.json
Model config Qwen2Config {
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 13824,
  "max_position_embeddings": 32768,
  "max_window_layers": 70,
  "model_type": "qwen2",
  "num_attention_heads": 40,
  "num_hidden_layers": 48,
  "num_key_value_heads": 8,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.51.3",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 152064
}

loading weights file /projects/llm-repo/models/Qwen/Qwen2.5-14B-Instruct/model.safetensors.index.json
Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16.
Generate config Gener

user query is Send $1 privately to 2134567890?
User Instruction : 
```Send $1 privately to 2134567890?```
The following list contains the ordered set of steps already taken, the last element of the list is the current state of the task : 
``````
Current Observation : 
``````

Based on User instruction, history of steps taken and current observation, decide your next step.
Respond with the action to take and the thought behind it.

Respond in the following json format:
{"thought" : "...",
"action" : "..."}
```json
{"thought": "The user wants to send a private message with a dollar amount to a specific phone number. However, sending money via SMS or text messages isn't typically supported by most messaging platforms. I need more context about what service or app the user is referring to.",
"action": "ask_for_more_details"}
```


In [3]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

base_model = AutoModelForCausalLM.from_pretrained('last_results/checkpoint-4000', device_map='auto', torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained('last_results/checkpoint-4000')
    
def call_llm(messages: list[dict]) -> str:
    """
    Call an LLM with a history of messages and return the response.
    """
    
    formatted_prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    
    inputs = tokenizer([formatted_prompt], return_tensors="pt").to(model.device)
    outputs = base_modelgenerate(input_ids=inputs["input_ids"].to("cuda"), attention_mask=inputs["attention_mask"], max_new_tokens=1000, pad_token_id=tokenizer.eos_token_id) #.generate(**inputs, max_new_tokens=1000, do_sample=False)
    
    response_ids = outputs[:, inputs.input_ids.shape[1]:]
    
    response_text = tokenizer.decode(response_ids[0], skip_special_tokens=True)
    
    return response_text
    
    
    # text = create_input(messages, thought_list = '', last_obs = '')
#     inputs = tokenizer(text, return_tensors="pt").to('cuda')
#     outputs = base_model.generate(input_ids=inputs["input_ids"].to("cuda"), attention_mask=inputs["attention_mask"], max_new_tokens=1000, pad_token_id=tokenizer.eos_token_id)

#     print(tokenizer.decode(outputs[0], skip_special_tokens=True))
#     return tokenizer.decode(outputs[0], skip_special_tokens=True)

  from .autonotebook import tqdm as notebook_tqdm
2025-06-19 17:10:49.095707: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-06-19 17:10:49.107054: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1750353049.121047    1439 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1750353049.125100    1439 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-06-19 17:10:49.139943: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorF

[2025-06-19 17:11:08,387] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)


/opt/conda/envs/py311/compiler_compat/ld: cannot find -laio: No such file or directory
collect2: error: ld returned 1 exit status
/opt/conda/envs/py311/compiler_compat/ld: /usr/local/cuda/lib64/libcufile.so: undefined reference to `std::runtime_error::~runtime_error()@GLIBCXX_3.4'
/opt/conda/envs/py311/compiler_compat/ld: /usr/local/cuda/lib64/libcufile.so: undefined reference to `__gxx_personality_v0@CXXABI_1.3'
/opt/conda/envs/py311/compiler_compat/ld: /usr/local/cuda/lib64/libcufile.so: undefined reference to `std::ostream::tellp()@GLIBCXX_3.4'
/opt/conda/envs/py311/compiler_compat/ld: /usr/local/cuda/lib64/libcufile.so: undefined reference to `std::chrono::_V2::steady_clock::now()@GLIBCXX_3.4.19'
/opt/conda/envs/py311/compiler_compat/ld: /usr/local/cuda/lib64/libcufile.so: undefined reference to `std::string::_M_replace_aux(unsigned long, unsigned long, unsigned long, char)@GLIBCXX_3.4'
/opt/conda/envs/py311/compiler_compat/ld: /opt/conda/envs/py311/bin/../x86_64-conda-linux-gnu/sy

[2025-06-19 17:11:09,465] [INFO] [logging.py:107:log_dist] [Rank -1] [TorchCheckpointEngine] Initialized with serialization = False


Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.


In [4]:
import re

from jinja2 import Template

from appworld.task import Task


class MinimalReactAgent:
    """A minimal ReAct Agent for AppWorld tasks."""

    def __init__(self, task: Task):
        self.task = task
        self.history: list[dict] = self.prompt_messages()

    def prompt_messages(self) -> list[dict]:
        """Builds prompt messages for the agent to solve self.task.instruction"""
        # Populate the fields of the prompt template with the task details
        dictionary = {"supervisor": self.task.supervisor, "instruction": self.task.instruction}
        prompt = Template(PROMPT_TEMPLATE.lstrip()).render(dictionary)
        # Extract and return the OpenAI JSON formatted messages from the prompt
        messages: list[dict] = []
        last_start = 0
        for match in re.finditer("(USER|ASSISTANT|SYSTEM):\n", prompt):
            last_end = match.span()[0]
            if len(messages) == 0:
                if last_end != 0:
                    raise ValueError(
                        f"Start of the prompt has no assigned role: {prompt[:last_end]}"
                    )
            else:
                messages[-1]["content"] = prompt[last_start:last_end]
            mesg_type = match.group(1).lower()
            messages.append({"role": mesg_type, "content": None})
            last_start = match.span()[1]
        messages[-1]["content"] = prompt[last_start:]
        return messages

    def next_code_block(self, last_execution_output: str | None = None) -> str:
        """
        Asks Agent to generate next code block given last_execution_output and history.
        """
        # Add the last execution output as the user response to the history
        if last_execution_output is not None:
            self.history.append({"role": "user", "content": last_execution_output})
        # Get the next code block based on the history.
        code = call_llm(self.history)
        # Add this code block to history as the assistant response
        self.history.append({"role": "assistant", "content": code})
        return code

In [5]:
import os

from appworld import AppWorld, load_task_ids

os.environ['APPWORLD_ROOT'] = '/tmp/appworld'


# Split to evaluate on.
dataset_name = "test_normal"  # Or dev, test_normal, test_challenge

# Experiment name. Experiment outputs are store in
# experiments/outputs/{experiment_name} relative to root ("." by default)
experiment_name = "minimal_react_agent"

# Max number of environment interactions per task
max_interactions = 10

# For each task in the dataset split
task_ids = load_task_ids(dataset_name)
print(len(task_ids))

168


In [6]:
for index, task_id in enumerate(task_ids[:1]):
    # Load the appworld environment for the task
    with AppWorld(
        task_id=task_id,
        experiment_name=experiment_name,
    ) as world:
        # Load the agent with the task to solve
        print("\n\n" + "*" * 20 + f" Task {index+1}/{len(task_ids)} ({task_id})  " + "*" * 20)
        print(world.task.instruction)
        agent = MinimalReactAgent(world.task)
        output: str | None = None
        # Until the task is completed or max_interactions is reached
        for _ in range(max_interactions):
            # ask the agent to generate the code block based on the history.
            code = agent.next_code_block(output)
            print("\n\n" + "%" * 20 + " CODE " + "%" * 20 + "\n" + code)
            # execute the code in the world environment
            output = world.execute(code)
            print("\n\n" + "=" * 20 + " OUTPUT " + "=" * 20 + "\n" + output)
            # stop if agent has committed the task to be complete.
            if world.task_completed():
                break

RuntimeError: Failed to import transformers.integrations.vptq because of the following error (look up to see its traceback):
No module named 'vptq'

In [12]:
!ls -lrt /tmp/appworld/

total 136
drwxrwxrwx 1 root root  4096 Jun 18 04:16 src
drwxrwxrwx 2 root root  4096 Jun 18 04:16 scripts
-rwxrwxrwx 1 root root   133 Jun 18 04:16 pytest.ini
-rwxrwxrwx 1 root root  7268 Jun 18 04:16 pyproject.toml
drwxrwxrwx 2 root root  4096 Jun 18 04:16 notebooks
drwxrwxrwx 2 root root  4096 Jun 18 04:16 images
-rwxrwxrwx 1 root root  1421 Jun 18 04:16 dockerfile
-rwxrwxrwx 1 root root   553 Jun 18 04:16 README.pypi.md
-rwxrwxrwx 1 root root 65338 Jun 18 04:16 README.md
-rwxrwxrwx 1 root root 11357 Jun 18 04:16 LICENSE
drwxrwxrwx 4 root root  4096 Jun 18 04:16 tests
drwxrwxrwx 6 root root  4096 Jun 18 04:16 generate
drwxrwxrwx 1 root root  4096 Jun 18 04:16 data
drwxrwxrwx 1 root root  4096 Jun 19 16:27 experiments


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [10]:
!appworld run ci_react --override '{"config": {"agent": {"model_config": {"completion_method": "openai", "name": "Qwen/Qwen2.5-7B-Instruct", "base_url": "http://localhost:8000/v1"}, "logger_config": {"verbose": true}}}}'



[31m╭─[0m[31m────────────────────[0m[31m [0m[1;31mTraceback [0m[1;2;31m(most recent call last)[0m[31m [0m[31m─────────────────────[0m[31m─╮[0m
[31m│[0m [2;33m/tmp/appworld/src/appworld/common/[0m[1;33mutils.py[0m:[94m1924[0m in [92mensure_package_installed[0m  [31m│[0m
[31m│[0m                                                                              [31m│[0m
[31m│[0m   [2m1921 [0m                                                                      [31m│[0m
[31m│[0m   [2m1922 [0m[94mdef[0m[90m [0m[92mensure_package_installed[0m(module_name: [96mstr[0m) -> [94mNone[0m:               [31m│[0m
[31m│[0m   [2m1923 [0m[2m│   [0m[94mtry[0m:                                                              [31m│[0m
[31m│[0m [31m❱ [0m1924 [2m│   │   [0mimportlib.import_module(module_name)                          [31m│[0m
[31m│[0m   [2m1925 [0m[2m│   [0m[94mexcept[0m [96mModuleNotFoundError[0m [94mas[0m exception

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Step 1: Load the base model with standard HuggingFace class
base_model = AutoModelForCausalLM.from_pretrained(
    "/projects/llm-repo/models/Qwen/Qwen2.5-14B-Instruct",
    torch_dtype="auto",
    device_map="auto"  # Optional: only if needed
)

# Step 2: Load and merge LoRA adapter
model = PeftModel.from_pretrained(base_model, "/projects/neural-alchemists-ftf-hackathon/14b_results/checkpoint-3500")
model = model.merge_and_unload()

# Step 3: Save merged model
model.save_pretrained("merged_qwen25_14B")
tokenizer = AutoTokenizer.from_pretrained("/projects/llm-repo/models/Qwen/Qwen2.5-14B-Instruct")
tokenizer.save_pretrained("merged_qwen25_14B")

  from .autonotebook import tqdm as notebook_tqdm
2025-06-20 09:14:00.751462: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-06-20 09:14:00.762949: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1750410840.777252    3219 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1750410840.781358    3219 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-06-20 09:14:00.796669: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorF

[2025-06-20 09:14:20,259] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)


/opt/conda/envs/py311/compiler_compat/ld: cannot find -laio: No such file or directory
collect2: error: ld returned 1 exit status
/opt/conda/envs/py311/compiler_compat/ld: /usr/local/cuda/lib64/libcufile.so: undefined reference to `std::runtime_error::~runtime_error()@GLIBCXX_3.4'
/opt/conda/envs/py311/compiler_compat/ld: /usr/local/cuda/lib64/libcufile.so: undefined reference to `__gxx_personality_v0@CXXABI_1.3'
/opt/conda/envs/py311/compiler_compat/ld: /usr/local/cuda/lib64/libcufile.so: undefined reference to `std::ostream::tellp()@GLIBCXX_3.4'
/opt/conda/envs/py311/compiler_compat/ld: /usr/local/cuda/lib64/libcufile.so: undefined reference to `std::chrono::_V2::steady_clock::now()@GLIBCXX_3.4.19'
/opt/conda/envs/py311/compiler_compat/ld: /usr/local/cuda/lib64/libcufile.so: undefined reference to `std::string::_M_replace_aux(unsigned long, unsigned long, unsigned long, char)@GLIBCXX_3.4'
/opt/conda/envs/py311/compiler_compat/ld: /opt/conda/envs/py311/bin/../x86_64-conda-linux-gnu/sy

[2025-06-20 09:14:21,479] [INFO] [logging.py:107:log_dist] [Rank -1] [TorchCheckpointEngine] Initialized with serialization = False


Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
Loading checkpoint shards: 100%|██████████| 8/8 [00:05<00:00,  1.37it/s]


('merged_qwen25_14B/tokenizer_config.json',
 'merged_qwen25_14B/special_tokens_map.json',
 'merged_qwen25_14B/vocab.json',
 'merged_qwen25_14B/merges.txt',
 'merged_qwen25_14B/added_tokens.json',
 'merged_qwen25_14B/tokenizer.json')