Skip to content

Add GGUFReader#1

Merged
michalharakal merged 2 commits intomainfrom
gguf-reader
Feb 3, 2025
Merged

Add GGUFReader#1
michalharakal merged 2 commits intomainfrom
gguf-reader

Conversation

@mylesieong
Copy link
Copy Markdown
Collaborator

No description provided.

@michalharakal
Copy link
Copy Markdown
Contributor

@mylesieong thank you very much

@michalharakal michalharakal merged commit 8415003 into main Feb 3, 2025
michalharakal pushed a commit that referenced this pull request May 22, 2025
@michalharakal michalharakal deleted the gguf-reader branch October 4, 2025 18:13
michalharakal added a commit that referenced this pull request Jan 18, 2026
Outlines the path to making KLlama the #1 choice for multiplatform
LLM inference, competing with jlama while offering unique KMP capabilities.

Key phases:
- Performance: mmap loading, quantized kernels, SIMD, KV-cache
- Developer Experience: Chat API, streaming Flow, templates, tools
- Model Support: Mistral, Phi, Gemma, Qwen, MoE architectures
- Platform Acceleration: Metal, WebGPU, NNAPI
- Distribution: Maven Central, CocoaPods, npm, documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
michalharakal added a commit that referenced this pull request Jan 18, 2026
Outlines the path to making KLlama the #1 choice for multiplatform
LLM inference, competing with jlama while offering unique KMP capabilities.

Key phases:
- Performance: mmap loading, quantized kernels, SIMD, KV-cache
- Developer Experience: Chat API, streaming Flow, templates, tools
- Model Support: Mistral, Phi, Gemma, Qwen, MoE architectures
- Platform Acceleration: Metal, WebGPU, NNAPI
- Distribution: Maven Central, CocoaPods, npm, documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
michalharakal added a commit that referenced this pull request Jan 18, 2026
Outlines the path to making KLlama the #1 choice for multiplatform
LLM inference, competing with jlama while offering unique KMP capabilities.

Key phases:
- Performance: mmap loading, quantized kernels, SIMD, KV-cache
- Developer Experience: Chat API, streaming Flow, templates, tools
- Model Support: Mistral, Phi, Gemma, Qwen, MoE architectures
- Platform Acceleration: Metal, WebGPU, NNAPI
- Distribution: Maven Central, CocoaPods, npm, documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
michalharakal added a commit that referenced this pull request Jan 18, 2026
- Update competitive comparison table (quantized inference, SIMD: ✅)
- Mark key gaps #1 and #2 as resolved
- Update sections 1.3, 1.4, 1.5 with implementation details
- Add Phase 1 Summary table
- Update Q1 timeline with completed items

Phase 1 "Performance Foundation" is now complete:
- Memory-mapped GGUF loading (JVM)
- BitNet/Ternary TQ1_0/TQ2_0 support
- Q8_0/Q4_K quantized inference kernels
- JVM Vector API SIMD kernels
- Off-heap KV cache with platform abstractions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants