- Quebec City, Canada
- http://themlbook.com
Stars
⚡ TabPFN: Foundation Model for Tabular Data ⚡
A Simplified Pytorch Version of the Dreamer Algorithm
A completely customizable framework for building rich text editors. (Currently in beta.)
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…
A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.
🔥Open Source No Code Web Data Extraction Platform. Turn Websites To APIs & Spreadsheets With No-Code Robots In Minutes🔥
Autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way.
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
real time face swap and one-click video deepfake with only a single image
A Bulletproof Way to Generate Structured JSON from Language Models
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
An implementation of Shazam's song recognition algorithm.
A vector search SQLite extension that runs anywhere!
Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Apps Script samples for Google Workspace products.
Use Large Language Models (LLM) in Google Sheets
🔥Highlighting the top ML papers every week.
Data validation using Python type hints
AI's query engine - Platform for building AI that can learn and answer questions over large scale federated data.
📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)
[CCS'24] A dataset consists of 15,140 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 1,405 jailbreak prompts).
Efficient few-shot learning with Sentence Transformers
Curated list of datasets and tools for post-training.
A fast inference library for running LLMs locally on modern consumer-class GPUs