Lists (24)
Sort Name ascending (A-Z)
3d
build
communication
datasets
docs
framework
homeautomation
infrastructure
location
map
ml
model
monitoring
music
nlp
ocr
photo
security
speech
stt / ttstools
user
video
web
youngreader
Stars
Open-Sora: Democratizing Efficient Video Production for All
Toolkit for linearizing PDFs for LLM datasets/training
YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/
NVIDIA Ingest is an early access set of microservices for parsing hundreds of thousands of complex, messy unstructured PDFs and other enterprise documents into metadata and text to embed into retri…
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
Wyoming protocol server for faster whisper speech to text system
This project is a REST API wrapper for the whatsapp-web.js library, providing an easy-to-use interface to interact with the WhatsApp Web platform.
bridge between mattermost, IRC, gitter, xmpp, slack, discord, telegram, rocketchat, twitch, ssh-chat, zulip, whatsapp, keybase, matrix, microsoft teams, nextcloud, mumble, vk and more with REST API…
A text extraction library supporting PDFs, images, office documents and more
Open-source platform for extracting structured data from documents using AI.
Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
Use late-interaction multi-modal models such as ColPali in just a few lines of code.
An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the si…
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
The Free Software Media System - Server Backend & API
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.
Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement
LLM based autonomous agent that conducts deep local and web research on any topic and generates a long report with citations.
Convenience Docker images for Apache Tika Server