Skip to content

IMvision12/KerasFormers

Repository files navigation

KerasFormers 🚀

License Keras Python

📖 Introduction

KerasFormers is a collection of models with pretrained weights, built entirely with Keras 3. It supports a range of tasks, including classification, object detection (DETR, RT-DETR, RT-DETRv2, RF-DETR, D-FINE, OWL-ViT, OWLv2), segmentation (SAM, SAM2, SAM3, SegFormer, DeepLabV3, EoMT, MaskFormer, Mask2Former, MobileViT-DeepLabV3), monocular depth estimation (Depth Anything V1, Depth Anything V2), feature extraction (DINO, DINOv2, DINOv3), vision-language modeling (CLIP, SigLIP, SigLIP2, MetaCLIP 2), speech recognition (Whisper, Speech2Text), and more. It includes hybrid architectures like MaxViT alongside traditional CNNs and pure transformers. kerasformers includes custom layers and backbone support, providing flexibility and efficiency across various applications. For backbones, there are various weight variants like in1k, in21k, fb_dist_in1k, ms_in22k, fb_in22k_ft_in1k, ns_jft_in1k, aa_in1k, cvnets_in1k, augreg_in21k_ft_in1k, augreg_in21k, and many more.

⚡ Installation

From PyPI (recommended)

pip install -U kerasformers

From Source

pip install -U git+https://github.com/IMvision12/KerasFormers

📑 Documentation

Per-model guides - with architecture notes, usage examples, and available pretrained weights, live in the docs/ folder, one page per model across every supported task (classification, object detection, segmentation, depth estimation, feature extraction, vision-language, and speech recognition). Classification backbones share a single page since they all follow the same XModel / XImageClassify two-class structure; each other model has its own. Browse docs/ for the complete, always-up-to-date list.

📑 Models








📜 License

This project leverages timm and transformers for converting pretrained weights from PyTorch to Keras. For licensing details, please refer to the respective repositories.

🌟 Credits

  • The Keras team for their powerful and user-friendly deep learning framework
  • The Transformers library for its robust tools for loading and adapting pretrained models
  • The pytorch-image-models (timm) project for pioneering many computer vision model implementations
  • All contributors to the original papers and architectures implemented in this library

Citing

BibTeX

@misc{gc2025kerasformers,
  author = {Gitesh Chawda},
  title = {KerasFormers},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/IMvision12/KerasFormers}}

About

KerasFormers: Open-source Keras 3 collection of pretrained models across Vision, LLM, VLM, Depth, Speech, and more

Resources

License

Stars

Watchers

Forks

Packages