A Portable Toolkit for deploying Edge AI and HPC (opencl, vulkan, simd, task scheduling)
-
Updated
Jun 1, 2024 - C
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
A Portable Toolkit for deploying Edge AI and HPC (opencl, vulkan, simd, task scheduling)
Sandbox for graphics paper implementation
A high-throughput and memory-efficient inference and serving engine for LLMs
OptiX 8 Lightweight Wrapper Library
(GPU accelerated) Multi-arch (linux/amd64, linux/arm64/v8) JupyterLab Julia docker images. Please submit Pull Requests to the GitLab repository. Mirror of
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
(GPU accelerated) Multi-arch (linux/amd64, linux/arm64/v8) Julia docker images. Please submit Pull Requests to the GitLab repository. Mirror of
Robotics with GPU computing
CUDA based Pathtracing Offline and Realtime Renderer
FlashInfer: Kernel Library for LLM Serving
My personal attempt at creating a relatively fast iterative mergesort that runs on CUDA GPUs
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
A retargetable MLIR-based machine learning compiler and runtime toolkit.
CUDA C++ Core Libraries
Template library for floating point operations
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
✨ Zero-code distributed tracing and profiling, observability via eBPF 🚀
Created by Nvidia
Released June 23, 2007