Skip to content

Updates in type-checking specifications have broken transformers' types #37928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
4 tasks done
thfrkielikone opened this issue May 2, 2025 · 1 comment
Open
4 tasks done
Labels

Comments

@thfrkielikone
Copy link

System Info

  • transformers version: 4.52.0.dev0 (commit fa3c3f9; head of main branch as of writing)
  • Platform: macOS-15.4.1-arm64-arm-64bit-Mach-O
  • Python version: 3.13.2
  • Huggingface_hub version: 0.30.2
  • Safetensors version: 0.5.3
  • Accelerate version: not installed
  • Accelerate config: not found
  • DeepSpeed version: not installed
  • PyTorch version (GPU?): 2.7.0 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: <not relevant; type checking issue>

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

The following code runs correctly in the above environment (empty venv), ie. quits without producing output:

from transformers import PreTrainedTokenizer, GenerationMixin, StoppingCriteriaList, StoppingCriteria, pipeline, Pipeline

Upon type checking with the newest pyright (1.1.400 installed from npm), it produces the following error:

<snip>/repro.py
  <snip>/repro.py:1:58 - error: "PreTrainedTokenizer" is not exported from module "transformers"
    Import from "transformers.tokenization_utils" instead (reportPrivateImportUsage)
  <snip>/repro.py:1:79 - error: "GenerationMixin" is not exported from module "transformers"
    Import from "transformers.generation.utils" instead (reportPrivateImportUsage)
  <snip>/repro.py:1:96 - error: "StoppingCriteriaList" is not exported from module "transformers"
    Import from "transformers.generation.stopping_criteria" instead (reportPrivateImportUsage)
  <snip>/repro.py:1:118 - error: "StoppingCriteria" is not exported from module "transformers"
    Import from "transformers.generation.stopping_criteria" instead (reportPrivateImportUsage)
  <snip>/repro.py:1:136 - error: "pipeline" is not exported from module "transformers"
    Import from "transformers.pipelines" instead (reportPrivateImportUsage)
  <snip>/repro.py:1:146 - error: "Pipeline" is not exported from module "transformers"
    Import from "transformers.pipelines.base" instead (reportPrivateImportUsage)
6 errors, 0 warnings, 0 informations

This is due to pyright updating its definition of what is a publicly re-exported symbol; see page https://docs.basedpyright.com/dev/benefits-over-pyright/new-diagnostic-rules/ section reportPrivateLocalImportUsage. It also links to the relevant PEP (https://peps.python.org/pep-0484/#stub-files) that specifies this. To quote the PEP:

Modules and variables imported into the stub are not considered exported from the stub unless the import uses the import ... as ... form or the equivalent from ... import ... as ... form. (UPDATE: To clarify, the intention here is that only names imported using the form X as X will be exported, i.e. the name before and after as must be the same.)

If looking at the documentation for eg. the pipeline objects (https://huggingface.co/docs/transformers/v4.51.3/en/main_classes/pipelines#transformers.pipeline), we find this example code, which does the same from transformers import pipeline; I conclude that this is the expected usage:

import datasets
from transformers import pipeline
from transformers.pipelines.pt_utils import KeyDataset
from tqdm.auto import tqdm

pipe = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h", device=0)
dataset = datasets.load_dataset("superb", name="asr", split="test")

# KeyDataset (only *pt*) will simply return the item in the dict returned by the dataset item
# as we're not interested in the *target* part of the dataset. For sentence pair use KeyPairDataset
for out in tqdm(pipe(KeyDataset(dataset, "file"))):
    print(out)
    # {"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"}
    # {"text": ....}
    # ....

In my environment I get the same kind of error (+ I don't have datasets installed)

<snip>/pipeline.py
  <snip>/pipeline.py:1:8 - error: Import "datasets" could not be resolved (reportMissingImports)
  <snip>/pipeline.py:2:26 - error: "pipeline" is not exported from module "transformers"
    Import from "transformers.pipelines" instead (reportPrivateImportUsage)
2 errors, 0 warnings, 0 informations

Fixing this issue needs an intervention upon the imports in src/transformers/__init__.py. It looks like something tedious that could be mostly automated. For the pipeline error to get resolved, the minimal change was sth like this:

diff --git a/src/transformers/__init__.py b/src/transformers/__init__.py
index 691f8aad00..2cea8c0078 100644
--- a/src/transformers/__init__.py
+++ b/src/transformers/__init__.py
@@ -668,8 +668,8 @@ if TYPE_CHECKING:
         ZeroShotClassificationPipeline,
         ZeroShotImageClassificationPipeline,
         ZeroShotObjectDetectionPipeline,
-        pipeline,
     )
+    from .pipelines import pipeline as pipeline
     from .processing_utils import ProcessorMixin
 
     # Tokenization

Expected behavior

The above code which runs correctly should also type check cleanly.

@Rocketknight1
Copy link
Member

Hi @thfrkielikone, we probably won't make any breaking changes just to fix type checking - our type checking philosophy is that it's "nice to have", but other factors are more important.

If you or anyone else wants to try making a PR to fix type checking, you can! However, we'll probably only accept it if it doesn't have any side-effects, and doesn't interact badly with any of our tooling / consistency checks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants