Skip to content

Welcome to the repository of our state-of-the-art image captioning model. We have combined the strengths of the BLIP-2 (Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models) model with LoRa (Low-Rank Adaptation of Large Language Models) to create an effective and precise image captioning tool.

Notifications You must be signed in to change notification settings

diegobonilla98/PixLore

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 

Repository files navigation

PixLore

Welcome to the repository of our state-of-the-art image captioning model. We have combined the strengths of the BLIP-2 (Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models) model with LoRa (Low-Rank Adaptation of Large Language Models) to create an effective and precise image captioning tool. Our dataset, rich in image descriptions, has been automatically labeled using a combination of multi-modal models.

Results

Check out our GitHub Page! https://diegobonilla98.github.io/PixLore/

Project Overview

The main objective of this project is to generate detailed and accurate captions for a wide range of images. By combining the power of BLIP-2 and LoRa, we aim to make a significant leap in the image captioning domain.

Dataset Overview

Our dataset consists of 11,000 examples that have been automatically labeled using an assortment of multi-modal models.

Data Structure

Images: High-quality images randomly sampled from the COCO dataset. Captions: Rich and detailed descriptions automatically generated for each image.

Model Architecture

We employ the BLIP-2 model architecture, known for effectively combining the strengths of image encoders and large language models. To further enhance its performance, we have fine-tuned the model using LoRa, a technique that provides low-rank adaptation for large language models.

About

Welcome to the repository of our state-of-the-art image captioning model. We have combined the strengths of the BLIP-2 (Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models) model with LoRa (Low-Rank Adaptation of Large Language Models) to create an effective and precise image captioning tool.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages