Skip to content
This repository has been archived by the owner on Jul 19, 2024. It is now read-only.

Update dependency transformers to ~=4.42.4 #11

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

renovate[bot]
Copy link
Contributor

@renovate renovate bot commented Feb 24, 2024

Mend Renovate

This PR contains the following updates:

Package Change Age Adoption Passing Confidence
transformers ~=4.35.0 -> ~=4.42.4 age adoption passing confidence

Release Notes

huggingface/transformers (transformers)

v4.42.4: Patch release v4.42.4

Compare Source

Mostly gemma2 support FA2 softcapping!

but also fix the sliding window for long context and other typos.

Was off last week could not get this out, thanks all for your patience 🥳

v4.42.3: Patch release v4.42.3

Compare Source

Make sure we have attention softcapping for "eager" GEMMA2 model

After experimenting, we noticed that for the 27b model mostly, softcapping is a must. So adding it back (it should have been there, but an error on my side made it disappear) sorry all! 😭

  • Gemma capping is a must for big models (#​31698)

v4.42.2: Patch release v4.42.2

Compare Source

Patch release

Thanks to our 2 contributors for their prompt fixing mostly applies for training and FA2!

v4.42.1: : Patch release

Compare Source

Patch release for commit:

  • [HybridCache] Fix get_seq_length method (#​31661)

v4.42.0: : Gemma 2, RTDETR, InstructBLIP, LLAVa Next, New Model Adder

Compare Source

New model additions

Gemma-2

The Gemma2 model was proposed in Gemma2: Open Models Based on Gemini Technology and Research by Gemma2 Team, Google.
Gemma2 models are trained on 6T tokens, and released with 2 versions, 2b and 7b.

The abstract from the paper is the following:

This work introduces Gemma2, a new family of open language models demonstrating strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Gemma2 outperforms similarly sized open models on 11 out of 18 text-based tasks, and we present comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of our model development. We believe the responsible release of LLMs is critical for improving the safety of frontier models, and for enabling the next wave of LLM innovations

image

RTDETR

The RT-DETR model was proposed in DETRs Beat YOLOs on Real-time Object Detection by Wenyu Lv, Yian Zhao, Shangliang Xu, Jinman Wei, Guanzhong Wang, Cheng Cui, Yuning Du, Qingqing Dang, Yi Liu.

RT-DETR is an object detection model that stands for “Real-Time DEtection Transformer.” This model is designed to perform object detection tasks with a focus on achieving real-time performance while maintaining high accuracy. Leveraging the transformer architecture, which has gained significant popularity in various fields of deep learning, RT-DETR processes images to identify and locate multiple objects within them.

image

InstructBlip

The InstructBLIP model was proposed in InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning by Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi. InstructBLIP leverages the BLIP-2 architecture for visual instruction tuning.

InstructBLIP uses the same architecture as BLIP-2 with a tiny but important difference: it also feeds the text prompt (instruction) to the Q-Former.

image

LlaVa NeXT Video

The LLaVa-NeXT-Video model was proposed in LLaVA-NeXT: A Strong Zero-shot Video Understanding Model by Yuanhan Zhang, Bo Li, Haotian Liu, Yong Jae Lee, Liangke Gui, Di Fu, Jiashi Feng, Ziwei Liu, Chunyuan Li. LLaVa-NeXT-Video improves upon LLaVa-NeXT by fine-tuning on a mix if video and image dataset thus increasing the model’s performance on videos.

LLaVA-NeXT surprisingly has strong performance in understanding video content in zero-shot fashion with the AnyRes technique that it uses. The AnyRes technique naturally represents a high-resolution image into multiple images. This technique is naturally generalizable to represent videos because videos can be considered as a set of frames (similar to a set of images in LLaVa-NeXT). The current version of LLaVA-NeXT makes use of AnyRes and trains with supervised fine-tuning (SFT) on top of LLaVA-Next on video data to achieves better video understanding capabilities.The model is a current SOTA among open-source models on VideoMME bench.

New model adder

A very significant change makes its way within the transformers codebase, introducing a new way to add models to transformers. We recommend reading the description of the PR below, but here is the gist of it:

The diff_converter tool is here to replace our old Copied from statements, while keeping our core transformers philosophy:

  • single model single file
  • explicit code
  • standardization of modeling code
  • readable and educative code
  • simple code
  • least amount of modularity

This additionally unlocks the ability to very quickly see the differences between new architectures that get developed. While many architectures are similar, the "single model, single file" policy can obfuscate the changes. With this diff converter, we want to make the changes between architectures very explicit.

Tool-use and RAG model support

We've made major updates to our support for tool-use and RAG models. We can now automatically generate JSON schema descriptions for Python functions which are suitable for passing to tool models, and we've defined a standard API for tool models which should allow the same tool inputs to be used with many different models. Models will need updates to their chat templates to support the new API, and we're targeting the Nous-Hermes, Command-R and Mistral/Mixtral model families for support in the very near future. Please see the updated chat template docs for more information.

If you are the owner of a model that supports tool use, but you're not sure how to update its chat template to support the new API, feel free to reach out to us for assistance with the update, for example on the Hugging Face Discord server. Ping Matt and yell key phrases like "chat templates" and "Jinja" and your issue will probably get resolved.

GGUF support

We further the support of GGUF files to offer fine-tuning within the python/HF ecosystem, before converting them back to the GGUF/GGML/llama.cpp libraries.

Trainer improvements

A new optimizer is added in the Trainer.

Quantization improvements

Several improvements are done related to quantization: a new cache (the quantized KV cache) is added, offering the ability to convert the cache of generative models, further reducing the memory requirements.

Additionally, the documentation related to quantization is entirely redone with the aim of helping users choose which is the best quantization method.

Examples

New instance segmentation examples are added by @​qubvel

Notable improvements

As a notable improvement to the HF vision models that leverage backbones, we enable leveraging HF pretrained model weights as backbones, with the following API:

from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation

config = MaskFormerConfig(backbone="microsoft/resnet-50", use_pretrained_backbone=True)
model = MaskFormerForInstanceSegmentation(config)

Additionally, we thank @​Cyrilvallez for diving into our generate method and greatly reducing the memory requirements.

Breaking changes

Remove ConversationalPipeline and Conversation object

Both the ConversationalPipeline and the Conversation object have been deprecated for a while, and are due for removal in 4.42, which is the upcoming version.

The TextGenerationPipeline is recommended for this use-case, and now accepts inputs in the form of the OpenAI API.

Remove an accidental duplicate softmax application in FLAVA's attention

Removes duplicate softmax application in FLAVA attention. Likely to have a small change on the outputs but flagging with 🚨 as it will change a bit.

Idefics2's ignore_index attribute of the loss is updated to -100
out_indices from timm being updated

Recent updates to timm changed the type of the attribute model.feature_info.out_indices. Previously, out_indices would reflect the input type of out_indices on the create_model call i.e. either tuple or list. Now, this value is always a tuple.

As list are more useful and consistent for us -- we cannot save tuples in configs, they must be converted to lists first -- we instead choose to cast out_indices to always be a list.

This has the possibility of being a slight breaking change if users are creating models and relying on out_indices on being a tuple. As this property only happens when a new model is created, and not if it's saved and reloaded (because of the config), then I think this has a low chance of having much of an impact.

datasets referenced in the quantization config get updated to remove references to datasets with restrictive licenses.

Bugfixes and improvements


Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR has been generated by Mend Renovate. View repository job log here.

@renovate renovate bot force-pushed the renovate/transformers-4.x branch from 29feddd to 761aa05 Compare March 1, 2024 03:42
@renovate renovate bot changed the title Update dependency transformers to ~=4.38.1 Update dependency transformers to ~=4.38.2 Mar 1, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from 761aa05 to 6ecc8a9 Compare March 21, 2024 02:25
@renovate renovate bot changed the title Update dependency transformers to ~=4.38.2 Update dependency transformers to ~=4.39.0 Mar 21, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from 6ecc8a9 to bbbcb5b Compare March 22, 2024 20:28
@renovate renovate bot changed the title Update dependency transformers to ~=4.39.0 Update dependency transformers to ~=4.39.1 Mar 22, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from bbbcb5b to 46c11b0 Compare March 28, 2024 19:17
@renovate renovate bot changed the title Update dependency transformers to ~=4.39.1 Update dependency transformers to ~=4.39.2 Mar 28, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from 46c11b0 to 6fa3fd1 Compare April 2, 2024 13:27
@renovate renovate bot changed the title Update dependency transformers to ~=4.39.2 Update dependency transformers to ~=4.39.3 Apr 2, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from 6fa3fd1 to 18e9cbf Compare April 18, 2024 16:42
@renovate renovate bot changed the title Update dependency transformers to ~=4.39.3 Update dependency transformers to ~=4.40.0 Apr 18, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from 18e9cbf to 7ce5fce Compare April 24, 2024 02:05
@renovate renovate bot changed the title Update dependency transformers to ~=4.40.0 Update dependency transformers to ~=4.40.1 Apr 24, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from 7ce5fce to ffb9bda Compare May 6, 2024 17:23
@renovate renovate bot changed the title Update dependency transformers to ~=4.40.1 Update dependency transformers to ~=4.40.2 May 6, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from ffb9bda to e03dd48 Compare May 17, 2024 23:03
@renovate renovate bot changed the title Update dependency transformers to ~=4.40.2 Update dependency transformers to ~=4.41.0 May 17, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from e03dd48 to bd69e0f Compare May 22, 2024 22:23
@renovate renovate bot changed the title Update dependency transformers to ~=4.41.0 Update dependency transformers to ~=4.41.1 May 22, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from bd69e0f to 3cd6d4e Compare May 30, 2024 18:36
@renovate renovate bot changed the title Update dependency transformers to ~=4.41.1 Update dependency transformers to ~=4.41.2 May 30, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from 3cd6d4e to 47b1293 Compare June 27, 2024 19:29
@renovate renovate bot changed the title Update dependency transformers to ~=4.41.2 Update dependency transformers to ~=4.42.1 Jun 27, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from 47b1293 to 448a14c Compare June 28, 2024 10:13
@renovate renovate bot changed the title Update dependency transformers to ~=4.42.1 Update dependency transformers to ~=4.42.2 Jun 28, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from 448a14c to c74abeb Compare June 28, 2024 19:34
@renovate renovate bot changed the title Update dependency transformers to ~=4.42.2 Update dependency transformers to ~=4.42.3 Jun 28, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from c74abeb to c1b8b77 Compare July 11, 2024 19:27
@renovate renovate bot changed the title Update dependency transformers to ~=4.42.3 Update dependency transformers to ~=4.42.4 Jul 11, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

0 participants