Skip to content

Learning LLM Systems by Building

introduction

This organization is a collection of projects created during my journey of learning Large Language Models (LLMs), retrieval systems, and AI infrastructure.

Instead of only studying theory, I try to learn by building real systems — from inference servers to RAG pipelines and evaluation frameworks.


What I'm Exploring

  • How to serve models efficiently (GPU / TensorRT / batching)
  • How to route and manage multiple LLMs
  • How retrieval works (dense / sparse / hybrid / multi-vector)
  • How to evaluate LLM outputs and reduce hallucination
  • How to design practical LLM applications

Project Overview

These projects are not perfect or production-ready — they reflect my learning process and experiments.

Inference & Serving

Routing

  • LLM Router Server
    Learning how to route requests across multiple models with load balancing.

Retrieval & RAG

  • Tiny-RAGFlow
    A lightweight RAG framework to understand hybrid retrieval and reranking.

Tools

  • LLM Tools
    A unified interface for interacting with LLMs, embeddings, and rerankers.

Data Processing

  • file2md
    Converting different file formats into Markdown for downstream LLM usage.

Evaluation

  • llm-evals
    Experimenting with LLM evaluation and LLM-as-a-judge approaches.

Research Exploration

ML + Database

  • ML2SQL
    Exploring how ML models can run directly inside databases using SQL.

Why This Exists

I believe the best way to understand LLM systems is to:

Build them piece by piece.

Each repository focuses on a different part of the stack, and together they form a rough picture of how modern LLM systems work.


Still Learning

This is an ongoing journey.
Many things are incomplete, naive, or experimental — and that’s intentional.

If you’re also learning, feel free to explore, use, or build on top of these projects.

Contact

If you have any questions, ideas, or just want to chat about LLMs, feel free to:

  • Open an issue
  • Or reach out via email

milk333445@gmail.com

Pinned Loading

  1. file2md file2md Public

    file2md is a versatile tool for converting multiple file formats to Markdown.

    Python 4

  2. TensorrtServer TensorrtServer Public

    A high-performance deep learning model inference server based on TensorRT, supporting fast inference for Embedding, Reranker, and NLI models.

    Python 4

  3. ML2SQL ML2SQL Public

    Compile tree-based machine learning models into SQL inference queries, enabling model predictions to run directly inside the database.

    Python 4

  4. LLM-Router-Server LLM-Router-Server Public

    LLM Router Server is a high-performance routing service designed for multi-model deployment scenarios, used to uniformly manage and orchestrate multiple local Large Language Model (LLM) services, E…

    Python 5 1

Repositories

Showing 9 of 9 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…