Stars
该代码与B站上的视频 https://www.bilibili.com/video/BV18L41197Uz/?spm_id_from=333.788&vd_source=eefa4b6e337f16d87d87c2c357db8ca7 相关联。
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
🎨 Light & Dark Vim color schemes inspired by Google's Material Design
This is the top-level repository for the Accel-Sim framework.
GPU programming related news and material links
Distribute and run LLMs with a single file.
This project records the process of optimizing SGEMM (single-precision floating point General Matrix Multiplication) on the riscv platform.
A lightweight memory allocator for hardware-accelerated machine learning
An Agile RISC-V SoC Design Framework with in-order cores, out-of-order cores, accelerators, and more
The CORE-V CVA6 is an Application class 6-stage RISC-V CPU capable of booting Linux
The PULP Ara is a 64-bit Vector Unit, compatible with the RISC-V Vector Extension Version 1.0, working as a coprocessor to CORE-V's CVA6 core
Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/READ…
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Multi-Threaded FP32 Matrix Multiplication on x86 CPUs
OpenAI GPT2 pre-training and sequence prediction implementation in Tensorflow 2.0
Code for the paper "Language Models are Unsupervised Multitask Learners"
CPU inference for the DeepSeek family of large language models in pure C++