Files

.ipynb_checkpoints
AST
BERT
BEiT
BLIP-2
CANINE
CLIPSeg
Conditional DETR
ConvNeXT
DETA
DETR
DINO
DINOv2
DPT
Deformable-DETR
Depth Anything
DiT
Donut
Flux
GIT
GLPN
GPT-J-6B
Grounding DINO
GroupViT
Idefics2
- Fine_tune_Idefics2_for_JSON_extraction_use_cases_(PyTorch_Lightning).ipynb
- Fine_tune_Idefics2_for_multi_page_PDF_question_answering_on_DUDE.ipynb
- README.md
ImageGPT
InstructBLIP
KOSMOS-2
LLaVA-NeXT-Video
LLaVa-NeXT
LLaVa
LUKE
LayoutLM
LayoutLMv2
LayoutLMv3
LayoutXLM
LiLT
MarkupLM
Mask2Former
MaskFormer
Mistral
Nougat
OWLv2
OneFormer
PaliGemma
PerSAM
Perceiver
Pix2Struct
RT-DETR
SAM
SegFormer
SegGPT
SigLIP
SuperPoint
Swin2SR
T5
TAPAS
Table Transformer
TrOCR
UDOP
UPerNet
ViLT
ViP-LLaVa
ViTMAE
ViTMatte
VideoLLaVa
VideoMAE
VisionTransformer
X-CLIP
YOLOS
ZoeDepth
.DS_Store
.gitignore
CITATION.cff
HuggingFace_vision_ecosystem_overview_(June_2022).ipynb
LICENSE
README.md

Idefics2

Name		Name	Last commit message	Last commit date
parent directory ..
Fine_tune_Idefics2_for_JSON_extraction_use_cases_(PyTorch_Lightning).ipynb		Fine_tune_Idefics2_for_JSON_extraction_use_cases_(PyTorch_Lightning).ipynb
Fine_tune_Idefics2_for_multi_page_PDF_question_answering_on_DUDE.ipynb		Fine_tune_Idefics2_for_multi_page_PDF_question_answering_on_DUDE.ipynb
README.md		README.md

This folder contains notebooks regarding Idefics2, a powerful vision-language model developed by Hugging Face.

I just uploaded a similar notebook for LLaVa: it works just as well, and I removed the addition of special tokens to make the logic simpler. Can be done for Idefics2, too.
The notebook I currently include here is aimed for extraction use cases (image->text or JSON).

If you have a chatbot use case, I'd recommend taking a look at the experimental support for VLMs in the TRL library: