Files

.ipynb_checkpoints
AST
BERT
BEiT
BLIP-2
CANINE
CLIPSeg
Conditional DETR
ConvNeXT
DETA
DETR
DINO
DINOv2
DPT
Deformable-DETR
Depth Anything
DiT
Donut
Flux
GIT
GLPN
GPT-J-6B
Grounding DINO
GroupViT
Idefics2
ImageGPT
InstructBLIP
KOSMOS-2
LLaVA-NeXT-Video
LLaVa-NeXT
LLaVa
LUKE
LayoutLM
LayoutLMv2
LayoutLMv3
LayoutXLM
LiLT
MarkupLM
Mask2Former
MaskFormer
Mistral
Nougat
OWLv2
OneFormer
PaliGemma
PerSAM
Perceiver
Pix2Struct
RT-DETR
SAM
SegFormer
SegGPT
SigLIP
SuperPoint
Swin2SR
T5
TAPAS
Table Transformer
TrOCR
UDOP
UPerNet
ViLT
- Fine_tuning_ViLT_for_VQA.ipynb
- Inference_with_ViLT_(visual_question_answering).ipynb
- Masked_language_modeling_with_ViLT.ipynb
- README.md
- Using_ViLT_for_image_text_retrieval.ipynb
- ViLT_for_natural_language_visual_reasoning.ipynb
ViP-LLaVa
ViTMAE
ViTMatte
VideoLLaVa
VideoMAE
VisionTransformer
X-CLIP
YOLOS
ZoeDepth
.DS_Store
.gitignore
CITATION.cff
HuggingFace_vision_ecosystem_overview_(June_2022).ipynb
LICENSE
README.md

ViLT

Name		Name	Last commit message	Last commit date
parent directory ..
Fine_tuning_ViLT_for_VQA.ipynb		Fine_tuning_ViLT_for_VQA.ipynb
Inference_with_ViLT_(visual_question_answering).ipynb		Inference_with_ViLT_(visual_question_answering).ipynb
Masked_language_modeling_with_ViLT.ipynb		Masked_language_modeling_with_ViLT.ipynb
README.md		README.md
Using_ViLT_for_image_text_retrieval.ipynb		Using_ViLT_for_image_text_retrieval.ipynb
ViLT_for_natural_language_visual_reasoning.ipynb		ViLT_for_natural_language_visual_reasoning.ipynb

README.md

ViLT notebooks

In this directory, you can find several notebooks that illustrate how to use NAVER AI Lab's ViLT both for fine-tuning on custom data as well as inference. It currently includes the following notebooks:

fine-tuning ViLT for visual question answering (VQA) (based on the VQAv2 dataset)
performing inference with ViLT to illustrate visual question answering (VQA)
masked language modeling (MLM) with a pre-trained ViLT model
performing inference with ViLT for image-text retrieval
performing inference with ViLT to illustrate natural language for visual reasoning (based on the NLVRv2 dataset).

All models can be found on the hub.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

ViLT

ViLT

README.md

ViLT notebooks

Files

ViLT

Directory actions

More options

Directory actions

More options

Latest commit

History

ViLT

Folders and files

parent directory

README.md

ViLT notebooks