-
Intel
- Shanghai
- https://read.cv/hym
Stars
Let your Claude able to think
Advanced Quantization Algorithm for LLMs/VLMs.
Intel® NPU Acceleration Library
An innovative library for efficient LLM inference via low-bit quantization
how to optimize some algorithm in cuda.
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Writing a minimal x86-64 JIT compiler in C++
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
Intel® Extension for TensorFlow*
MLIRX is now defunct. Please see PolyBlocks - https://docs.polymagelabs.com
This is an implementation of sgemm_kernel on L1d cache.
an educational compiler intermediate representation
A list of awesome compiler projects and papers for tensor computation and deep learning.
LLVM Optimization to extract a function, embedded in its intermediate representation in the binary, and execute it using the LLVM Just-In-Time compiler.
Transform ONNX model to PyTorch representation
Optimize GEMM. With AVX512 and AVX512-BF16, 800x improvement.
Intel Data Parallel C++ (and SYCL 2020) Tutorial.
LightSeq: A High Performance Library for Sequence Processing and Generation
Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure
Python Framework for sparse neural networks