-
Notifications
You must be signed in to change notification settings - Fork 28.9k
Insights: huggingface/transformers
Overview
Could not load contribution data
Please try again later
62 Pull requests merged by 38 people
-
[FIX] Save speed metrics to logs
#38136 merged
May 15, 2025 -
Omit creation of positional IDs within ESM if applicable
#38089 merged
May 15, 2025 -
disable deepspeed when setting up fake trainer
#38101 merged
May 15, 2025 -
enable trainer test cases on xpu
#38138 merged
May 15, 2025 -
Hotfix: Flash Attention 2 support in Pixtral
#38146 merged
May 15, 2025 -
[generate] Run custom generation code from the Hub
#36405 merged
May 15, 2025 -
🔴 Remove head mask in generative models
#35786 merged
May 15, 2025 -
enable csm integration cases on xpu, all passed
#38140 merged
May 15, 2025 -
[Qwen3] Qwen3 MoE add tp plan for expert mlps
#38135 merged
May 15, 2025 -
Fix incorrect attention mask truncate in WhisperFlashAttention2
#36477 merged
May 14, 2025 -
enable d_fine finetuning properly
#37962 merged
May 14, 2025 -
Add my username to
run_slow
whitelist#38126 merged
May 14, 2025 -
[docs] add uv installation instructions for source builds
#37968 merged
May 14, 2025 -
Update trainer.md
#38113 merged
May 14, 2025 -
Add config validation and style tweaks
#37589 merged
May 14, 2025 -
Fix auto batch size finder test
#38125 merged
May 14, 2025 -
[video processor] fix tests
#38104 merged
May 14, 2025 -
enable finegrained_fp8 and granite_speech cases on XPU
#38036 merged
May 14, 2025 -
Fix description and formatting errors in code docs
#38074 merged
May 13, 2025 -
Add style bot
#38102 merged
May 13, 2025 -
[CSM] update test for t4 runners
#38110 merged
May 13, 2025 -
Add Fast Image Processor for vilt
#37304 merged
May 13, 2025 -
Fix InternVL interpolate_pos_encoding and add to video_processing_auto
#38092 merged
May 13, 2025 -
fix
check_bad commit.py
gives wrong results#38107 merged
May 13, 2025 -
[bug] fix llava processor to calculate unpadding size correctly
#37988 merged
May 13, 2025 -
Fix
past_key_values
type hint in model output types#37953 merged
May 13, 2025 -
Fix bug in prefill_chunk_size that ignores disable_compile flag
#38067 merged
May 13, 2025 -
[smolvlm] skip the test
#38099 merged
May 13, 2025 -
Disable report callbacks for certain training tests
#38088 merged
May 13, 2025 -
fix: Propagate
lr_scheduler_kwargs
options to create LR Scheduler when LayerWiseDummyOptimizer is used#34559 merged
May 13, 2025 -
add timeout for downloading the
librispeech_asr
dataset#38073 merged
May 13, 2025 -
update
require_read_token
#38093 merged
May 13, 2025 -
Refactor image processor phi4
#36976 merged
May 12, 2025 -
uninstall
kernels
from docker images#38083 merged
May 12, 2025 -
update seed_worker to set seed based on worker_id and rank
#37980 merged
May 12, 2025 -
Fix tot update in trainer
#37923 merged
May 12, 2025 -
fix the inconsist docstring in apply_chat_template
#38069 merged
May 12, 2025 -
chore(qwen2): display warning log only when sliding window attention …
#36316 merged
May 12, 2025 -
Fix mt5 test on AMD devices
#38081 merged
May 12, 2025 -
Add cuda graphs
#38059 merged
May 12, 2025 -
docs: fix md style
#38057 merged
May 12, 2025 -
Add AMD expectation to test_gpt2_sample
#38079 merged
May 12, 2025 -
Fix OneFormer integration test
#38016 merged
May 12, 2025 -
[
chat
] generate parameterization powered byGenerationConfig
and UX-related changes#38047 merged
May 12, 2025 -
[VLM] fix loading issues
#38051 merged
May 12, 2025 -
🔴 Video processors as a separate class
#35206 merged
May 12, 2025 -
fix(conversion): Fix size mismatch error during TF->PT model loading
#38014 merged
May 10, 2025 -
enable generation fsdp/utils cases on XPU
#38009 merged
May 9, 2025 -
Fix linalg.norm for CovnNextV2
#38015 merged
May 9, 2025 -
Fix cache update!
#38046 merged
May 9, 2025 -
Fix reduce-labels in BEIT Fast Image Processor
#38042 merged
May 9, 2025 -
Re-Enable
Trigger CircleCI via GitHub Actions when "ready for review" (#37885)
#38041 merged
May 9, 2025 -
Support for version spec in requires & arbitrary mismatching depths across folders
#37854 merged
May 9, 2025 -
Do not erase a cache_position passed explicitly to generate(), if there is one
#37986 merged
May 9, 2025 -
Disable
Trigger CircleCI via GitHub Actions when
ready for review`#38038 merged
May 9, 2025 -
Trigger CircleCI via GitHub Actions when
ready for review
#37885 merged
May 9, 2025 -
[Temporary] Log some information in some pytest/pluggy internal places
#37996 merged
May 9, 2025 -
enable utils test cases on XPU
#38005 merged
May 9, 2025 -
make mistral3 pass on xpu
#37882 merged
May 9, 2025 -
fix document masking for chunked attention
#37429 merged
May 9, 2025 -
[
AutoDocstring
] Based on inspect parsing of the signature#33771 merged
May 8, 2025
49 Pull requests opened by 38 people
-
Update Loss Functions to Accept Tensor num_items_in_batch
#38029 opened
May 8, 2025 -
Add `TemplateConstraint` and `OrdredConstraint` features (#27706)
#38030 opened
May 8, 2025 -
check github actions 3
#38044 opened
May 9, 2025 -
[fix] sliding window attention mask
#38045 opened
May 9, 2025 -
Better pipeline type hints ✨
#38049 opened
May 9, 2025 -
Handling Overlapping Annotations in Mask2Former by A Small Trick
#38054 opened
May 9, 2025 -
SQuat cache implementation
#38055 opened
May 9, 2025 -
[SAM-HQ] Update names in the docs
#38058 opened
May 10, 2025 -
Improved cache docs
#38060 opened
May 10, 2025 -
Fix broken example generation script for Llama3
#38062 opened
May 10, 2025 -
Added scores in the streamer classes based on generation flag
#38064 opened
May 10, 2025 -
Updated the Model docs - for the ALIGN model
#38072 opened
May 11, 2025 -
Cache System Refactor: Layered Architecture
#38077 opened
May 12, 2025 -
[gemma3] fix bidirectional attention mask
#38080 opened
May 12, 2025 -
fix multi-image case for llava-onevision
#38084 opened
May 12, 2025 -
Add CB
#38085 opened
May 12, 2025 -
Refactor `MambaCache` to `modeling_mamba.py` (parity with Zamba)
#38086 opened
May 12, 2025 -
Add optional RMSNorm support to BitNet quantization (config + layers)
#38087 opened
May 12, 2025 -
Refactor `get_XXX_dataloader` from Trainer
#38090 opened
May 12, 2025 -
In Llama4 fix wrongly inverted causal attention mask when using SDPA implementation
#38094 opened
May 12, 2025 -
Fix amp deprecation issue
#38100 opened
May 13, 2025 -
Fix incorrect batching audio index calculation for Phi-4-Multimodal
#38103 opened
May 13, 2025 -
[video processors] support frame sampling within processors
#38105 opened
May 13, 2025 -
Better typing in src/transformers/training_args.py
#38106 opened
May 13, 2025 -
[`Attention`] Refactor Attention Interface for Bart-based Models and Enable Flex Attention
#38108 opened
May 13, 2025 -
Support TP for save_pretrained()
#38111 opened
May 13, 2025 -
Force real tensors and clone state_dict in src/transformers/modeling_utils.py
#38114 opened
May 13, 2025 -
Minor llama4 fixes
#38123 opened
May 14, 2025 -
[phi-4] add processor tests
#38124 opened
May 14, 2025 -
[`compile`] re-enable for Qwen-VL models
#38127 opened
May 14, 2025 -
🚨🚨🚨 [pipelines] update defaults in pipelines that can `generate`
#38129 opened
May 14, 2025 -
Make HF implementation match original OLMo 2 models for lower precisions
#38131 opened
May 14, 2025 -
Add AMD MI300 CI caller leveraging self-hosted runner scale set workflow in hf-workflows
#38132 opened
May 14, 2025 -
Skip non-selected experts for qwen3_moe
#38133 opened
May 15, 2025 -
Fix error in calculating `cache_position` with past_length for Chatglm and Mamba model
#38134 opened
May 15, 2025 -
Fix FSDP + llava-next/llava-onevision
#38141 opened
May 15, 2025 -
[omni modality] support composite processor config
#38142 opened
May 15, 2025 -
add dots1
#38143 opened
May 15, 2025 -
[VLMs] add helpers for get/set embedding
#38144 opened
May 15, 2025 -
remove unhandled parameter
#38145 opened
May 15, 2025 -
[omni modality] support composite preprocessor config
#38149 opened
May 15, 2025 -
Allow qwen/emu3 to process low res images
#38150 opened
May 15, 2025 -
Fix Qwen2.5 Omni `SinusoidsPositionEmbedding` precision
#38151 opened
May 15, 2025 -
Support for transformers explicit filename
#38152 opened
May 15, 2025 -
[core] support tensor-valued _extra_state values in `from_pretrained`
#38155 opened
May 15, 2025 -
Avoid incorrect generations for KV caches containing more than sliding_window tokens
#38156 opened
May 15, 2025 -
Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors
#38157 opened
May 15, 2025 -
[Examples] Add Comprehensive GPT2 vs DistilGPT2 Comparison with Perplexity and Benchmarks
#38158 opened
May 15, 2025 -
Fix handling of slow/fast image processors in image_processing_auto.py
#38161 opened
May 15, 2025
48 Issues closed by 17 people
-
Multiple processor classes have input side-effects
#36865 closed
May 15, 2025 -
transformers has no attribute TFFlorence2ForConditionalGeneration
#37235 closed
May 15, 2025 -
Llama4TextExperts module implementation
#37325 closed
May 15, 2025 -
BatchEncoding.to(device, dtype) could be worked!!
#38096 closed
May 15, 2025 -
Tensor Parallelism with Quantized Models
#38122 closed
May 14, 2025 -
Inconsistency in installation instructions for `venv` and `uv`
#37956 closed
May 14, 2025 -
Potential bug in Qwen 2/2.5 VL Image Preprocessor
#38003 closed
May 14, 2025 -
`output_hidden_states` only return part of hidden_state when setting `device_map="auto"`
#36636 closed
May 14, 2025 -
Unable to load google/siglip2-so400m-patch14-384/
#36845 closed
May 14, 2025 -
OSError: meta-llama/Llama-4-Scout-17B-16E-Instruct does not appear to have a file named X
#37314 closed
May 14, 2025 -
Versions greater than 4.49 are not compatible with Ascend NPU
#37992 closed
May 12, 2025 -
Different DataLoader worker share the same seed and lost randomness
#37932 closed
May 12, 2025 -
[Trainer] tot update steps is incorrect
#37777 closed
May 12, 2025 -
transformers require torch >= 2.1.0 to run fp8 model, but im using 2.7.0
#38034 closed
May 12, 2025 -
Add GPT-2-climate
#20747 closed
May 12, 2025 -
Is there any plan to add kosmos-2 to the transformers.
#24671 closed
May 12, 2025 -
Add MobileViT v2
#22570 closed
May 12, 2025 -
[New model] RT-DETR
#26742 closed
May 12, 2025 -
Typo in modeling_utils.py causing checkpoint loading error with Qwen2.5-VL
#38070 closed
May 12, 2025 -
Qwen/Qwen2.5-VL-7B-Instruct not work [2025-05-10]
#38056 closed
May 12, 2025 -
Video Processor as a separate class
#33504 closed
May 12, 2025 -
Jitter Noise added to input being passed to experts in Switch Transformers
#33969 closed
May 12, 2025 -
opencv imshow stuck forever when importing transformer
#37239 closed
May 12, 2025 -
ed_video = input_tokens.index(video_token_id, st) ValueError: 151656 is not in list
#37240 closed
May 12, 2025 -
TypeError: 'NoneType' object cannot be interpreted as an integer
#37242 closed
May 12, 2025 -
Inconsistent results between torch and jax versions of DINOv2
#37246 closed
May 12, 2025 -
Incorrect word timestamps and word repetitions with Whisper-Large-v3-turbo model
#37248 closed
May 12, 2025 -
RuntimeError when loading InternVL3-14B model: Embedding size mismatch
#38033 closed
May 12, 2025 -
XLA FSDP V2 + TPU + T5 Family Models doesn't work
#35142 closed
May 11, 2025 -
LayerDrop broken in various Flax models (Whisper/BART/more...)
#35468 closed
May 11, 2025 -
llama code break with torch compile
#36484 closed
May 11, 2025 -
a logic error in _preprocess function of Qwen2VLImageProcessor Class
#37064 closed
May 11, 2025 -
Whether transformers Trainer support pipeline parallelism?
#37129 closed
May 11, 2025 -
Quen FSDP model training hangs when some batches do not contain images
#37186 closed
May 11, 2025 -
Bug when using StaticCache in Qwen2.5 Inference with custom inputs_embeds and attention_masks
#37189 closed
May 11, 2025 -
Gemma3 Gradient Accumulation loss
#37197 closed
May 11, 2025 -
torch.compile graph break when tuning llama with FA2
#37199 closed
May 11, 2025 -
RWKV6-Finch-7B-HF crashes during inference
#37221 closed
May 11, 2025 -
Why does `transformers` load FA2 when it's not asked to do so?
#37227 closed
May 11, 2025 -
Request to add D-FINE
#35283 closed
May 11, 2025 -
Loading a Pytorch model from a Tensorflow saved model doesn't work
#37786 closed
May 10, 2025 -
Removing GenerateMixin inheritance from PreTrainedModel class results in Phi4 load fail
#38050 closed
May 10, 2025 -
Performance degradation on certain vision models from v4.51.*
#37748 closed
May 9, 2025 -
Swinv2Model reports an error when using the parameter use_obsolute_embeddings
#37161 closed
May 9, 2025 -
qwen3-moe attention module is defined repeatedly.
#37813 closed
May 9, 2025
28 Issues opened by 25 people
-
`tie_word_embeddings` not saved on customized model
#38160 opened
May 15, 2025 -
Support `extra_state` attributes in from_pretrained
#38154 opened
May 15, 2025 -
Have to import cv2 and pop up window frist, or else it stuck forever
#38139 opened
May 15, 2025 -
Speed metrics are not logged
#38137 opened
May 15, 2025 -
Emu3 precision regression
#38121 opened
May 14, 2025 -
phi-4-mm HF format
#38120 opened
May 14, 2025 -
Unable to quantize a pretrained SegFormer-B0 to int8 using Quanto
#38119 opened
May 14, 2025 -
Llama4 inference encounter unsupported op in dynamo ?
#38118 opened
May 14, 2025 -
OLMo and OLMo 2 models do not match original models for low precisions
#38117 opened
May 14, 2025 -
support static kv cache with torch.compile for qwen2vl
#38115 opened
May 13, 2025 -
[Bug] Phi-4-multimodal audio processor failed to process multiple audios with close length
#38098 opened
May 13, 2025 -
ImportError: cannot import name 'amp' from 'apex'
#38095 opened
May 13, 2025 -
transformers showing decoder model architecture detected so padding should be left
#38071 opened
May 11, 2025 -
Adding native support to load GGUF models using transformers
#38063 opened
May 10, 2025 -
Weights not initialized correctly when instantiating model with a pretrained backbone
#38061 opened
May 10, 2025 -
Attention mask for multi-image input in gemma3
#38053 opened
May 9, 2025 -
Modernbert 3D attention mask
#38040 opened
May 9, 2025 -
Trainer API doesnt stop after the training has been completed
#38039 opened
May 9, 2025 -
Removing the modification of loss value due to rounding off to 4 digits
#38032 opened
May 9, 2025 -
bug in new prefill_chunk_size implementation
#38028 opened
May 8, 2025
112 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Add Magma Agentic Model from Microsoft
#37267 commented on
May 13, 2025 • 64 new comments -
New cache tests and modular Hybrid Cache
#37972 commented on
May 14, 2025 • 25 new comments -
[core] Completely rewrite the masking logic for all attentions
#37866 commented on
May 15, 2025 • 18 new comments -
Add ColQwen2 to 🤗 transformers
#35778 commented on
May 13, 2025 • 14 new comments -
Bart: new cache format
#35314 commented on
May 15, 2025 • 12 new comments -
Superpoint fast image processor
#37804 commented on
May 15, 2025 • 8 new comments -
Add Fast Image Processor for mobileViT
#37143 commented on
May 12, 2025 • 8 new comments -
36978 | Fast image processor for DPT model
#37481 commented on
May 15, 2025 • 7 new comments -
[Validation] First implementation of `@strict` from `huggingface_hub`
#36534 commented on
May 15, 2025 • 6 new comments -
add profiler to trainer
#37889 commented on
May 15, 2025 • 6 new comments -
Enhance Model Loading By Providing Parallelism, Uses Optional Env Flag
#36835 commented on
May 15, 2025 • 4 new comments -
Support Kosmos-2.5
#31711 commented on
May 15, 2025 • 4 new comments -
Add args support for fast image processors
#37018 commented on
May 13, 2025 • 3 new comments -
Feat: save_pretrained for tensor parallel (and other parallelisms) models
#37919 commented on
May 15, 2025 • 3 new comments -
Fix Float64 RuntimeError on Integrated Graphics when using DirectML
#37735 commented on
May 12, 2025 • 2 new comments -
parallelism goes brrr
#37877 commented on
May 15, 2025 • 2 new comments -
Feat: add warnings for unused keys and rules in tensor parallel
#37893 commented on
May 15, 2025 • 2 new comments -
Adds use_repr to model_addition_debugger_context
#37984 commented on
May 12, 2025 • 2 new comments -
feat: support indivisible shards for TP model loading and TPlizing.
#37220 commented on
May 12, 2025 • 1 new comment -
fix: support grad clipping for TP through replicating non-sharded modules
#36132 commented on
May 15, 2025 • 1 new comment -
add fast image processor nougat
#37661 commented on
May 13, 2025 • 1 new comment -
Translating model_doc/bert.md to Chinese
#37806 commented on
May 14, 2025 • 1 new comment -
make Llama4TextMoe forward more readable
#37529 commented on
May 12, 2025 • 0 new comments -
internalize build_inputs_with_special_tokens and prepare_for_model
#37522 commented on
May 15, 2025 • 0 new comments -
Docs: fix docstrings for Gemma3 modeling
#37534 commented on
May 9, 2025 • 0 new comments -
Add callback to monitor progress in whisper transcription
#37483 commented on
May 14, 2025 • 0 new comments -
Mllama fast image processor
#37539 commented on
May 15, 2025 • 0 new comments -
Inherited CausalLM Tests
#37590 commented on
May 15, 2025 • 0 new comments -
Fix interpolation of convnext image processor
#37460 commented on
May 14, 2025 • 0 new comments -
[Cache] Support compilable cache reuse with smaller batch sizes
#37394 commented on
May 15, 2025 • 0 new comments -
Add `segmentation_maps` support to MobileNetV2ImageProcessor
#37312 commented on
May 9, 2025 • 0 new comments -
Add Fast Image Processor for Chameleon
#37140 commented on
May 13, 2025 • 0 new comments -
Improve typing in TrainingArgument
#36944 commented on
May 13, 2025 • 0 new comments -
fix unexpected kws of input_ids when setup no speech detection of whisper
#36809 commented on
May 13, 2025 • 0 new comments -
RuntimeError when converting and saving Flax ViT model to PyTorch
#37999 commented on
May 12, 2025 • 0 new comments -
Pass `eps` to `Mistral3RMSNorm`
#38026 commented on
May 13, 2025 • 0 new comments -
[ESM] Add flash-attention-2 backend for ESM-2
#38023 commented on
May 15, 2025 • 0 new comments -
Qwen2.5-Omni: Update modeling_qwen2_5_omni.py to fix error when loading quantized weights with AutoAWQ.
#38013 commented on
May 12, 2025 • 0 new comments -
proof of concept for using dataset of test cases for tokenizer tests
#37994 commented on
May 13, 2025 • 0 new comments -
update loss computation in modeling code
#37993 commented on
May 15, 2025 • 0 new comments -
CI result inspector util
#37976 commented on
May 14, 2025 • 0 new comments -
Include output embedding as well with `include_embedding` flag
#37935 commented on
May 13, 2025 • 0 new comments -
Fix wrong example in grounding dino
#37921 commented on
May 10, 2025 • 0 new comments -
support MiniCPM-o2.6
#37917 commented on
May 15, 2025 • 0 new comments -
Feat: Add class_proba option to semantic segmentation post-processing
#37904 commented on
May 13, 2025 • 0 new comments -
Get our efficiency back
#37884 commented on
May 9, 2025 • 0 new comments -
[WIP] Perception lm
#37878 commented on
May 14, 2025 • 0 new comments -
New bart model card
#37858 commented on
May 14, 2025 • 0 new comments -
Add z-loss to Bamba for v2
#37842 commented on
May 9, 2025 • 0 new comments -
Added False case implementation for config.do_stable_layer_norm in FlaxWav2vec2Models
#37822 commented on
May 15, 2025 • 0 new comments -
general spm converter
#37763 commented on
May 15, 2025 • 0 new comments -
Stop autoconverting custom code checkpoints
#37751 commented on
May 9, 2025 • 0 new comments -
[VLMs] add helpers to get multimodal encodings
#37743 commented on
May 9, 2025 • 0 new comments -
refactor can_save_slow_tokenizer
#37722 commented on
May 9, 2025 • 0 new comments -
:rotating_light: :rotating_light: Fix custom code saving
#37716 commented on
May 15, 2025 • 0 new comments -
Add support for manually setting `head_dim` in Qwen2 MoE
#37643 commented on
May 9, 2025 • 0 new comments -
Add time-based evaluation strategy to Trainer
#37642 commented on
May 9, 2025 • 0 new comments -
Add PLM Model
#37634 commented on
May 9, 2025 • 0 new comments -
[WiP] Add EoMT Model
#37610 commented on
May 12, 2025 • 0 new comments -
Trainer Stuck at 0% Progress during Training on Multi-GPU Setup
#38008 commented on
May 12, 2025 • 0 new comments -
Trainer.training_step incorrectly normalizes mean token loss when n_gpu > 1
#37474 commented on
May 12, 2025 • 0 new comments -
Community contribution: Adding GGUF support for more architectures
#33260 commented on
May 12, 2025 • 0 new comments -
How to solve the error of converting Qwen onnx_model to tensorRT_model?
#37408 commented on
May 12, 2025 • 0 new comments -
Loading HQQ quantized models is broken since #35926
#37263 commented on
May 12, 2025 • 0 new comments -
Support multimodal models in vLLM with transformers backend
#37780 commented on
May 12, 2025 • 0 new comments -
Model implementation with Transformers and Hugging face hub.
#27532 commented on
May 12, 2025 • 0 new comments -
how to fine tune TrOCR on specifique langage guide.
#33106 commented on
May 12, 2025 • 0 new comments -
Patches for different modalities
#34585 commented on
May 12, 2025 • 0 new comments -
Refactor bert-based models to use global attention function
#37495 commented on
May 12, 2025 • 0 new comments -
FileNotFoundError when using SentenceTransformerTrainingArguments(load_best_model_at_end=True) and Peft
#34747 commented on
May 12, 2025 • 0 new comments -
Inconsistent Documentation for `dataset_index` Requirement Across ViTPose Models
#36773 commented on
May 12, 2025 • 0 new comments -
Since 4.50.0, saving and loading a Whisper model causes an error
#37172 commented on
May 12, 2025 • 0 new comments -
Issue: Unexpected Shape of logits: When Using generate() with num_return_sequences > 1
#37378 commented on
May 11, 2025 • 0 new comments -
ImportError: cannot import name '_flash_supports_window_size' from 'transformers.modeling_flash_attention_utils'
#37428 commented on
May 11, 2025 • 0 new comments -
facebook/opt-30b Cuda Allocation Error with version >= 4.50.0 code
#37436 commented on
May 11, 2025 • 0 new comments -
Processor multiprocessing error when load custom processor
#37637 commented on
May 10, 2025 • 0 new comments -
Make `argmax` in `post_process_semantic_segmentation` optional
#37715 commented on
May 10, 2025 • 0 new comments -
FP8 tensors not saved correctly
#37250 commented on
May 10, 2025 • 0 new comments -
TimeSformer assumes a fixed number of frames in its layers even though it interpolates temporal embeddings based on the input
#38027 commented on
May 10, 2025 • 0 new comments -
clarify the label shifting behavior of llama models when `labels` is given.
#32944 commented on
May 10, 2025 • 0 new comments -
A shallow copy in groundingdino
#37333 commented on
May 9, 2025 • 0 new comments -
Image Processor fails to process void segmentation maps
#30064 commented on
May 9, 2025 • 0 new comments -
Are there any plans to provide some performance analysis tools for transformers?
#36360 commented on
May 9, 2025 • 0 new comments -
Can't load Llama4 Processor
#37375 commented on
May 9, 2025 • 0 new comments -
Does Qwen_2_5_VL support variable length attention computation?
#38007 commented on
May 9, 2025 • 0 new comments -
[WIP] Add DINO DETR Model to HuggingFace Transformers
#36711 commented on
May 11, 2025 • 0 new comments -
Add Aimv2 model
#36625 commented on
May 13, 2025 • 0 new comments -
Add evolla rebase main
#36232 commented on
May 12, 2025 • 0 new comments -
[WIP] Add a dedicated tokenizer for byte level transformers
#36216 commented on
May 12, 2025 • 0 new comments -
[ModernBERT] Add CausalLM functionality to ModernBERT
#35946 commented on
May 14, 2025 • 0 new comments -
Add padding-free to bamba
#35861 commented on
May 12, 2025 • 0 new comments -
[Whisper] Pipeline: handle long form generation
#35750 commented on
May 9, 2025 • 0 new comments -
Integrate xlstm cleanly.
#35377 commented on
May 15, 2025 • 0 new comments -
[`ESM`] Add support for sdpa.
#34954 commented on
May 13, 2025 • 0 new comments -
Add Molmo (7B-D, 7B-O, 70B)
#33962 commented on
May 12, 2025 • 0 new comments -
Custom beam search scorer argument in generate function
#32097 commented on
May 14, 2025 • 0 new comments -
[Contributions Welcome] Add Fast Image Processors
#36978 commented on
May 15, 2025 • 0 new comments -
Cannot run backward with tensor parallel
#36657 commented on
May 15, 2025 • 0 new comments -
Why can't InternVL3-8B start vLLM after being converted to the Hugging Face format? It shows the error: `ValueError: 'limit_mm_per_prompt' is only supported for multimodal models.'
#38000 commented on
May 15, 2025 • 0 new comments -
Incorrect installation instructions
#37476 commented on
May 15, 2025 • 0 new comments -
4.51.3 is much faster than prevous version - do you see the same?
#37504 commented on
May 15, 2025 • 0 new comments -
Trainer num_tokens() function seem to be outdated and not correct
#37510 commented on
May 15, 2025 • 0 new comments -
Wrong KV cache update for sliding-window attention (SWA) layers when total sequence length reaches window size
#37574 commented on
May 15, 2025 • 0 new comments -
[Community contributions] Model cards
#36979 commented on
May 15, 2025 • 0 new comments -
Qwen2vl support for GGUF
#35282 commented on
May 14, 2025 • 0 new comments -
The "force_words_ids" does not seem to be available on llama4
#37478 commented on
May 14, 2025 • 0 new comments -
Convnext image preprocessor raises an AssertionError when comparing logits
#37461 commented on
May 13, 2025 • 0 new comments -
`last_cache_position` definition issue in hybrid SWA models
#37706 commented on
May 13, 2025 • 0 new comments -
FSDP Torch XLA vs. FSDPv2 (SMPD) Torch XLA checkpoint saving bug
#36004 commented on
May 13, 2025 • 0 new comments -
Whisper word-level timestamp extraction fails with beam search
#36093 commented on
May 13, 2025 • 0 new comments -
pytorch_utils.py > isin_mps_friendly > RuntimeError: Expected elements.dtype() == test_elements.dtype() to be true, but got false.
#37423 commented on
May 13, 2025 • 0 new comments -
Broken phi4 model
#37464 commented on
May 13, 2025 • 0 new comments