Fix count tokens #254

gabrielhuang · 2025-06-23T22:26:25Z

Changes:

default to gpt-4 tokenizer if any exception is found while loading the tokenizer

How to reproduce the error

from agentlab.llm.llm_utils import count_tokens
count_tokens("this is a test", model="anthropic/claude-3.5-sonnet:beta")

Error that this PR addresses

  File "/Users/gabriel.huang/code/BrowserGym/browsergym/experiments/src/browsergym/experiments/loop.py", line 417, in run
    action = step_info.from_action(agent)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gabriel.huang/code/BrowserGym/browsergym/experiments/src/browsergym/experiments/loop.py", line 205, in from_action
    self.action, self.agent_info = agent.get_action(self.obs.copy())
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gabriel.huang/code/AgentLab/src/agentlab/llm/tracking.py", line 61, in wrapper
    action, agent_info = get_action(self, obs)
                         ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gabriel.huang/code/AgentLab/src/agentlab/agents/generic_agent/generic_agent.py", line 115, in get_action
    human_prompt = dp.fit_tokens(
                   ^^^^^^^^^^^^^^
  File "/Users/gabriel.huang/code/AgentLab/src/agentlab/agents/dynamic_prompting.py", line 254, in fit_tokens
    max_prompt_tokens -= count_tokens(prompt, model=model_name) + 1  # +1 because why not ?
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gabriel.huang/code/AgentLab/src/agentlab/llm/llm_utils.py", line 197, in count_tokens
    enc = get_tokenizer(model)
          ^^^^^^^^^^^^^^^^^^^^
  File "/Users/gabriel.huang/code/AgentLab/src/agentlab/llm/llm_utils.py", line 190, in get_tokenizer
    return AutoTokenizer.from_pretrained(model_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gabriel.huang/mamba/envs/raa6/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 950, in from_pretrained
    tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gabriel.huang/mamba/envs/raa6/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 782, in get_tokenizer_config
    resolved_config_file = cached_file(
                           ^^^^^^^^^^^^
  File "/Users/gabriel.huang/mamba/envs/raa6/lib/python3.12/site-packages/transformers/utils/hub.py", line 312, in cached_file
    file = cached_files(path_or_repo_id=path_or_repo_id, filenames=[filename], **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gabriel.huang/mamba/envs/raa6/lib/python3.12/site-packages/transformers/utils/hub.py", line 523, in cached_files
    _get_cache_file_to_return(path_or_repo_id, filename, cache_dir, revision) for filename in full_filenames
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gabriel.huang/mamba/envs/raa6/lib/python3.12/site-packages/transformers/utils/hub.py", line 140, in _get_cache_file_to_return
    resolved_file = try_to_load_from_cache(path_or_repo_id, full_filename, cache_dir=cache_dir, revision=revision)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gabriel.huang/mamba/envs/raa6/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 106, in _inner_fn
    validate_repo_id(arg_value)
  File "/Users/gabriel.huang/mamba/envs/raa6/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 160, in validate_repo_id
    raise HFValidationError(
huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'anthropic/claude-3.5-sonnet:beta'.

Description by Korbit AI

What change is being made?

Update the exception handling in get_tokenizer to catch all exceptions and log the specific error message, then default to "gpt-4".

Why are these changes being made?

The change aims to improve the robustness and clarity of the error handling by logging the specific exception encountered when a tokenizer cannot be found, ensuring that unexpected issues are detected and reported more precisely. This approach guarantees fallback to a known default model while maintaining detailed logging information for potential troubleshooting.

Is this description stale? Ask me to generate a new description by commenting /korbit-generate-pr-description

korbit-ai

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

Category	Issue	Status
	Over-broad Exception Handling ▹ view

Files scanned

File Path	Reviewed
src/agentlab/llm/llm_utils.py	✅

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X

korbit-ai · 2025-06-23T22:27:48Z

src/agentlab/llm/llm_utils.py

    try:
        return AutoTokenizer.from_pretrained(model_name)
-    except OSError:
-        logging.info(f"Could not find a tokenizer for model {model_name}. Defaulting to gpt-4.")
+    except Exception as e:
+        logging.info(f"Could not find a tokenizer for model {model_name}: {e} Defaulting to gpt-4.")


Over-broad Exception Handling

Tell me more

What is the issue?

Using a bare Exception catch is too broad and could mask critical errors that should be handled differently.

Why this matters

This could catch and ignore serious issues like memory errors or import errors that require different handling, potentially making debugging more difficult.

Suggested change ∙ Feature Preview

Catch specific exceptions that are expected in tokenizer loading:

try: return AutoTokenizer.from_pretrained(model_name) except (OSError, ValueError) as e: logging.info(f"Could not find a tokenizer for model {model_name}: {e} Defaulting to gpt-4.")

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

amanjaiswal73892

Thanks!

fix count tokens

0c76a10

gabrielhuang self-assigned this Jun 23, 2025

gabrielhuang requested a review from amanjaiswal73892 June 23, 2025 22:26

korbit-ai bot reviewed Jun 23, 2025

View reviewed changes

amanjaiswal73892 approved these changes Jun 27, 2025

View reviewed changes

amanjaiswal73892 merged commit 69e7216 into main Jun 27, 2025
6 of 7 checks passed

amanjaiswal73892 deleted the fix_count_tokens branch June 27, 2025 21:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix count tokens #254

Fix count tokens #254

Uh oh!

gabrielhuang commented Jun 23, 2025 •

edited by korbit-ai bot

Loading

Uh oh!

korbit-ai bot left a comment •

edited

Loading

Uh oh!

korbit-ai bot Jun 23, 2025

Uh oh!

amanjaiswal73892 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix count tokens #254

Fix count tokens #254

Uh oh!

Conversation

gabrielhuang commented Jun 23, 2025 • edited by korbit-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description by Korbit AI

What change is being made?

Why are these changes being made?

Uh oh!

korbit-ai bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

Uh oh!

korbit-ai bot Jun 23, 2025

Choose a reason for hiding this comment

Over-broad Exception Handling

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

amanjaiswal73892 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gabrielhuang commented Jun 23, 2025 •

edited by korbit-ai bot

Loading

korbit-ai bot left a comment •

edited

Loading