High-efficiency floating-point neural network inference operators for mobile, server, and Web
-
Updated
May 29, 2024 - C
High-efficiency floating-point neural network inference operators for mobile, server, and Web
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
The Tensor Algebra SuperOptimizer for Deep Learning
Batch normalization fusion for PyTorch
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
A set of tool which would make your life easier with Tensorrt and Onnxruntime. This Repo is designed for YoloV3
Optimize layers structure of Keras model to reduce computation time
cross-platform modular neural network inference library, small and efficient
MIVisionX Python Inference Analyzer uses pre-trained ONNX/NNEF/Caffe models to analyze inference results and summarize individual image results
MLP-Rank: A graph theoretical approach to structured pruning of deep neural networks based on weighted Page Rank centrality as introduced by the related thesis.
PyTorch Mobile: Android examples of usage in applications
Batch estimation on Lie groups
A simple tool that applies structure-level optimizations (e.g. Quantization) to a TensorFlow model
🤖️ Optimized CUDA Kernels for Fast MobileNetV2 Inference
Batch Partitioning for Multi-PE Inference with TVM (2020)
A constrained expectation-maximization algorithm for feasible graph inference.
Modified inference engine for quantized convolution using product quantization
ncnn is a high-performance neural network inference framework optimized for the mobile platform
Improving Natural Language Processing tasks using BERT-based models
PyTorch Mobile: iOS examples
Add a description, image, and links to the inference-optimization topic page so that developers can more easily learn about it.
To associate your repository with the inference-optimization topic, visit your repo's landing page and select "manage topics."