Skip to content

Sghosh1999/Computer-Vision-OCR-Florence2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Model Summary

Welcome to the Florence-2 repository! This repository contains a Hugging Face's transformers implementation of the Florence-2 model, developed by Microsoft.

Florence-2 is a cutting-edge vision foundation model designed to handle a diverse array of vision and vision-language tasks through a prompt-based approach. This model can interpret simple text prompts to perform tasks such as captioning, object detection, and segmentation. It utilizes the FLD-5B dataset, which includes 5.4 billion annotations across 126 million images, to excel in multi-task learning. Florence-2's sequence-to-sequence architecture allows it to perform exceptionally well in both zero-shot and fine-tuned settings, making it a competitive and versatile vision foundation model.

For more details, you can read the technical paper.

Features

  • Prompt-based Approach: Handles a wide range of vision tasks with simple text prompts.
  • Multi-task Learning: Leverages the extensive FLD-5B dataset to master multiple tasks.
  • Sequence-to-Sequence Architecture: Excels in zero-shot and fine-tuned settings.
  • Vision and Vision-Language Tasks: Capable of captioning, object detection, segmentation, and more.

image

About

- OCR Application using Microsoft Florence 2

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published