Skip to content
View nico-zck's full-sized avatar

Block or report nico-zck

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

PDF craft can convert PDF files into various other formats. This project will focus on processing PDF files of scanned books. The project has just started.

Python 948 63 Updated Mar 28, 2025

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high …

Python 603 193 Updated Mar 28, 2025

Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/

TypeScript 8,414 658 Updated Mar 27, 2025

Toolkit for linearizing PDFs for LLM datasets/training

Python 10,641 712 Updated Mar 28, 2025

百聆 是一个类似GPT-4o的语音对话机器人,通过ASR+LLM+TTS实现,集成DeepSeek R1等优秀大模型,时延低至800ms,Mac等低配置也可运行,支持打断

Python 1,002 175 Updated Mar 15, 2025

Convert any PDF into a podcast episode!

Python 2,169 242 Updated Dec 7, 2024

Python tool for converting files and office documents to Markdown.

Python 41,623 1,975 Updated Mar 28, 2025

OCR & Document Extraction using vision models

TypeScript 10,726 702 Updated Mar 28, 2025

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 8,868 693 Updated Mar 3, 2025

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.

Python 72,701 7,881 Updated Mar 28, 2025

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 12,494 1,252 Updated Mar 25, 2025

ChatTTS 2000条音色稳定性打分🥇+区分男女年龄👧+在线试听🔈 ChatTTS 2K Speaker Stability Score & Categorized by Gender and Age & Audio Preview

Python 646 36 Updated Jul 2, 2024

🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.

Python 1,142 149 Updated Mar 26, 2025

Finetune Llama 3.3, DeepSeek-R1, Gemma 3 & Reasoning LLMs 2x faster with 70% less memory! 🦥

Python 36,088 2,777 Updated Mar 27, 2025

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 38,910 4,892 Updated Aug 16, 2024

[CVPR 2023] SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

Python 12,523 2,334 Updated Jun 26, 2024

[CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

Python 3,413 396 Updated Feb 27, 2025

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 43,310 4,815 Updated Mar 26, 2025

Get your documents ready for gen AI

Python 25,586 1,528 Updated Mar 29, 2025

A generative speech model for daily dialogue.

Python 35,451 3,838 Updated Mar 14, 2025

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 10,824 1,496 Updated Mar 28, 2025

Detect and extract tables to markdown and csv

Python 734 49 Updated Jan 24, 2025

File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.

Python 5,915 294 Updated Feb 21, 2025

Label Studio is a multi-type data labeling and annotation tool with standardized output format

JavaScript 21,380 2,634 Updated Mar 29, 2025

an unofficial code for augment-XY-CUT in XYLayoutLM

Python 28 4 Updated Jul 12, 2022

A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.

Python 200 13 Updated May 23, 2024

Use LLMs to dig out what you care about from massive amounts of information and a variety of sources daily.

Python 7,199 1,295 Updated Mar 26, 2025

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

C++ 1,673 190 Updated Dec 27, 2024

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Python 26,133 3,287 Updated Sep 24, 2024

OCR, layout analysis, reading order, table recognition in 90+ languages

Python 16,986 1,108 Updated Mar 28, 2025
Next
Showing results