Stars
Integrate the DeepSeek API into popular softwares
Use the Moondream 2 model to detect faces and their gaze directions in videos.
[CVPR 2025] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Industry leading face manipulation platform
Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.
Official implementation of the paper "Watermark Anything with Localized Messages"
🎬 ScreenToGif allows you to record a selected area of your screen, edit and save it as a gif or video.
Python tool for converting files and office documents to Markdown.
[CVPR 2025] DEIM: DETR with Improved Matching for Fast Convergence
Segment Anything Model 2 CPP Wrapper for macOS and Ubuntu CPU/GPU
Retrieval and Retrieval-augmented LLMs
axinc-ai / segment-anything-2
Forked from facebookresearch/sam2The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
A high-performance C++ headers for real-time object detection and segmentation using YOLO models, leveraging ONNX Runtime and OpenCV for seamless integration. Supports multiple YOLO (v5, v7, v8, v9…
Implementation of the "Learn No to Say Yes Better" paper.
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python
📄 A curated list of awesome .cursorrules files
Let your Claude able to think
[TMLR 2025🔥] A survey for the autoregressive models in vision.
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
ModelScope: bring the notion of Model-as-a-Service to life.
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Effortless data labeling with AI support from Segment Anything and other awesome models.
Official code repo for the O'Reilly Book - "Hands-On Large Language Models"
CVHub520 / segment-anything-2
Forked from facebookresearch/sam2The repository provides code for running image, video, and camera inference using the SAM 2.