Skip to content

This repository demonstrates the data preparation and fine-tuning the IDEFICS Vision Language Model.

License

Notifications You must be signed in to change notification settings

NSTiwari/Fine-tune-IDEFICS-Vision-Language-Model

Repository files navigation

Fine tune Idefics2-8B Vision Language Model

This repository demonstrates the data preparation and fine-tuning the Idefics2-8B Vision Language Model.

Vision Language Model

Vision Language Models are multimodal models that learn from images and text, generating text outputs from image and text inputs. They excel in zero-shot capabilities, generalization, and various tasks like image recognition, question answering, and document understanding.

Dataset

Inference

Question: What the location address of NSDA?

Answer: ['1128 SIXTEENTH ST., N. W., WASHINGTON, D. C. 20036', '1128 sixteenth st., N. W., washington, D. C. 20036']

References & Resources:

About

This repository demonstrates the data preparation and fine-tuning the IDEFICS Vision Language Model.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published