PDF craft can convert PDF files into various other formats. This project will focus on processing PDF files of scanned books. The project has just started.

Python 948 63 Updated Mar 28, 2025

PaddlePaddle / PaddleMIX

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high …

Python 603 193 Updated Mar 28, 2025

jina-ai / reader

Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/

TypeScript 8,414 658 Updated Mar 27, 2025

allenai / olmocr

Toolkit for linearizing PDFs for LLM datasets/training

Python 10,641 712 Updated Mar 28, 2025

wwbin2017 / bailing

百聆是一个类似GPT-4o的语音对话机器人，通过ASR+LLM+TTS实现，集成DeepSeek R1等优秀大模型，时延低至800ms，Mac等低配置也可运行，支持打断

Python 1,002 175 Updated Mar 15, 2025

gabrielchua / open-notebooklm

Forked from knowsuchagency/pdf-to-podcast

Convert any PDF into a podcast episode!

Python 2,169 242 Updated Dec 7, 2024

microsoft / markitdown

Python tool for converting files and office documents to Markdown.

Python 41,623 1,975 Updated Mar 28, 2025

getomni-ai / zerox

OCR & Document Extraction using vision models

TypeScript 10,726 702 Updated Mar 28, 2025

open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 8,868 693 Updated Mar 3, 2025

comfyanonymous / ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.

Python 72,701 7,881 Updated Mar 28, 2025

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 12,494 1,252 Updated Mar 25, 2025

6drf21e / ChatTTS_Speaker

ChatTTS 2000条音色稳定性打分🥇+区分男女年龄👧+在线试听🔈 ChatTTS 2K Speaker Stability Score & Categorized by Gender and Age & Audio Preview

Python 646 36 Updated Jul 2, 2024

lenML / Speech-AI-Forge

🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.

Python 1,142 149 Updated Mar 26, 2025

unslothai / unsloth

Finetune Llama 3.3, DeepSeek-R1, Gemma 3 & Reasoning LLMs 2x faster with 70% less memory! 🦥

Python 36,088 2,777 Updated Mar 27, 2025

coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 38,910 4,892 Updated Aug 16, 2024

OpenTalker / SadTalker

[CVPR 2023] SadTalker：Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

Python 12,523 2,334 Updated Jun 26, 2024

antgroup / echomimic_v2

[CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

Python 3,413 396 Updated Feb 27, 2025

RVC-Boss / GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 43,310 4,815 Updated Mar 26, 2025

docling-project / docling

Get your documents ready for gen AI

Python 25,586 1,528 Updated Mar 29, 2025

2noise / ChatTTS

A generative speech model for daily dialogue.

Python 35,451 3,838 Updated Mar 14, 2025

SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 10,824 1,496 Updated Mar 28, 2025

VikParuchuri / tabled

Detect and extract tables to markdown and csv

Python 734 49 Updated Jan 24, 2025

QuivrHQ / MegaParse

File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.

Python 5,915 294 Updated Feb 21, 2025

HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format

JavaScript 21,380 2,634 Updated Mar 29, 2025

littletomatodonkey / Augment-XY-CUT

an unofficial code for augment-XY-CUT in XYLayoutLM

Python 28 4 Updated Jul 12, 2022

ppaanngggg / layoutreader

A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.

Python 200 13 Updated May 23, 2024

TeamWiseFlow / wiseflow

Use LLMs to dig out what you care about from massive amounts of information and a variety of sources daily.

Python 7,199 1,295 Updated Mar 26, 2025

AlibabaResearch / AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

C++ 1,673 190 Updated Dec 27, 2024

JaidedAI / EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Python 26,133 3,287 Updated Sep 24, 2024

VikParuchuri / surya

OCR, layout analysis, reading order, table recognition in 90+ languages

Python 16,986 1,108 Updated Mar 28, 2025

Nico nico-zck

Lists (32)

Alfred

AnomalyDetection

Avatar

👨🏻‍💻CodingTools

📷CV

🕸DeepLearning

Diffusion

🌏FQ

🤖GPT

Graph

🧑‍🏫 Guide Book

🦍Large Models

📋List

LLM Agents

LLM Apps

🗃️LLM Memory

LLM Tools

💻Mac

🤖MachineLearning

🛠MiscTools

📝NLP

OCR

📖PaperCode

📱 Phone

🔬ResearchTool

📶Router

🔭ScientificTools

🗄Server

🗣️Spech

📚Study

🖥Windows

Zotero

Starred repositories

Machine learning

Deep learning

Python

Awesome Lists