Altersieg

Altersieg

Working on LLM inference and GPU modeling.

Popular repositories Loading

myRAG myRAG Public

Python
AtomFlow AtomFlow Public

A lite LLM inference framework focusing on NVIDIA Blackwell.

C++
CS5491 CS5491 Public

Python
cuLA cuLA Public

Forked from inclusionAI/cuLA

CUDA kernels for linear attention variants, written in CuTe DSL and CUTLASS C++.

Python
llama.cpp llama.cpp Public

Forked from ggml-org/llama.cpp

LLM inference in C/C++

C++
flashinfer flashinfer Public

Forked from flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

Python