Skip to content
View jacknrose's full-sized avatar

Block or report jacknrose

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

该代码与B站上的视频 https://www.bilibili.com/video/BV18L41197Uz/?spm_id_from=333.788&vd_source=eefa4b6e337f16d87d87c2c357db8ca7 相关联。

Python 67 3 Updated Oct 7, 2023

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,104 535 Updated Mar 28, 2025

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 2,891 241 Updated Mar 25, 2025

🎨 Light & Dark Vim color schemes inspired by Google's Material Design

Vim Script 2,766 241 Updated Jul 26, 2024

PTX-EMU is a simple emulator for CUDA program.

C++ 30 3 Updated Jan 10, 2024

This is the top-level repository for the Accel-Sim framework.

Python 376 135 Updated Mar 18, 2025

GPU programming related news and material links

1,434 84 Updated Jan 6, 2025

Distribute and run LLMs with a single file.

C++ 22,061 1,156 Updated Mar 24, 2025

BLISlab: A Sandbox for Optimizing GEMM

C 511 109 Updated Jun 17, 2021

This project records the process of optimizing SGEMM (single-precision floating point General Matrix Multiplication) on the riscv platform.

C 20 1 Updated Dec 11, 2024
C 6 Updated Jan 9, 2025

A lightweight memory allocator for hardware-accelerated machine learning

C++ 145 11 Updated Mar 21, 2025

An Agile RISC-V SoC Design Framework with in-order cores, out-of-order cores, accelerators, and more

Scala 1,799 690 Updated Mar 24, 2025

Implements kernels with RISC-V Vector

Assembly 22 2 Updated Mar 24, 2023

The CORE-V CVA6 is an Application class 6-stage RISC-V CPU capable of booting Linux

Assembly 2,415 743 Updated Mar 28, 2025

Example of RISC-V Vector programming

C 13 2 Updated Feb 9, 2025

The PULP Ara is a 64-bit Vector Unit, compatible with the RISC-V Vector Extension Version 1.0, working as a coprocessor to CORE-V's CVA6 core

C 414 143 Updated Mar 20, 2025

Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.

C 137 26 Updated Feb 3, 2022

ATen: A TENsor library for C++11

C++ 694 127 Updated Nov 20, 2019

GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…

C++ 1,267 544 Updated Feb 15, 2025

Awesome Mobile LLMs

156 12 Updated Mar 23, 2025

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Python 8,349 520 Updated May 3, 2024

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/READ…

C++ 10,139 1,796 Updated Mar 28, 2025

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 142,078 28,450 Updated Mar 28, 2025

Multi-Threaded FP32 Matrix Multiplication on x86 CPUs

C 343 21 Updated Feb 20, 2025

OpenAI GPT2 pre-training and sequence prediction implementation in Tensorflow 2.0

Python 262 85 Updated Mar 25, 2023

Code for the paper "Language Models are Unsupervised Multitask Learners"

Python 23,250 5,635 Updated Aug 14, 2024

CPU inference for the DeepSeek family of large language models in pure C++

C++ 281 29 Updated Feb 11, 2025
Next
Showing results