Skip to content

FoundInSPAM/pytorch2.0_ttnn

 
 

Repository files navigation

Ask DeepWiki

PyTorch 2.0 TTNN Compiler

The PyTorch 2.0 TT-NN Compiler enables seamless execution of PyTorch models on Tenstorrent AI accelerators. By leveraging the TT-NN backend, you can achieve significant performance improvements while maintaining PyTorch's familiar API.

🚀 Quick Start

Installation

Install from the repo:

pip install git+https://bitbucket.org/tenstorrent/pytorch2.0_ttnn

or as an editable package from source:

git clone https://github.com/tenstorrent/pytorch2.0_ttnn.git
cd pytorch2.0_ttnn
pip install -e .

✨ Basic Usage

Option 1: Eager Mode: get your model running by switching to a TT device

import torch
import torch_ttnn

model = YourModel()

device = ttnn.open_device(device_id=0)
model.to(torch_ttnn.ttnn_device_as_torch_device(device))

output = model(input_data)

Option 2: Compilation Mode (Recommended): get more perf with a JIT compiler

import torch
import torch_ttnn

model = YourModel()

device = ttnn.open_mesh_device(ttnn.MeshShape(1, 2))  # 1x2 device grid
option = torch_ttnn.TorchTtnnOption(device=device, data_parallel=2)

model = torch.compile(model, backend=torch_ttnn.backend, options=option)
output = model(input_data)

📊 Model Support

We've extensively tested the compiler across a diverse range of model architectures. Here's a summary of our validation results:

Model Status Batch Compiled First Run (ms) Original Throughput (Inferences Per Second) Compiled Throughput (Inferences Per Second) Accuracy (%) Torch Ops Before (Unique Ops) Torch Ops Remain (Unique Ops) To/From Device Ops
Autoencoder (linear) 1 380.4 0.466888 526.3157894736842 100.0 22 (3) 0 (0) 0
BERT 8 45896.43 0.0107214 39.95205753096284 99.69 1465 (22) 0 (0) 0
DPR 1 18587.31 0.354789 72.30657989877079 99.38 720 (22) 0 (0) 1
HardNet 1 168441.12 0.196658 19.98001998001998 98.45 245 (10) 0 (0) 124
MLPMixer 1 18600.14 0.201063 79.1139240506329 99.99 253 (11) 0 (0) 0
Mnist 1 5871.58 30.1296 408.1632653061224 99.42 14 (8) 0 (0) 1
MobileNetV2 1 94348.17 1.12857 38.37298541826554 99.09 154 (9) 0 (0) 0
OpenPose V2 1 19618.0 0.334242 35.67606136282554 91.49 155 (7) 0 (0) 6
Perceiver IO 1 51594.21 0.0204124 19.227071716977502 99.95 1531 (20) 0 (0) 1
ResNet18 1 50784.1 0.420679 73.20644216691069 99.27 70 (9) 0 (0) 1
ResNet50 4 77262.36 0.752556 46.576618537494184 98.61 176 (9) 0 (0) 1
RoBERTa 1 35202.52 0.0768288 22.10921954455008 28.56 719 (21) 0 (0) 3
U-Net 1 74530.51 0.0159431 67.52194463200539 100.0 68 (6) 0 (0) 12
Unet-brain 1 3374.58 0.0164509 55.2791597567717 N/A 68 (6) 0 (0) 12
Unet-carvana 1 32995.1 0.0118069 30.59039461609055 99.69 67 (5) 0 (0) 12
albert/albert-base-v2 1 26818.2 0.734258 42.15851602023609 98.82 791 (21) 0 (0) 3
albert/albert-base-v2-classification 1 10319.82 0.740883 46.66355576294914 99.97 779 (21) 0 (0) 2
albert/albert-large-v2 1 20157.95 0.392332 24.236548715462916 98.95 1547 (21) 0 (0) 3
albert/albert-xlarge-v2 1 44869.53 0.105447 12.891581797086504 97.36 1547 (21) 0 (0) 3
densenet121 1 143878.54 0.260364 13.027618551328816 99.74 432 (10) 0 (0) 597
densenet161 1 193868.05 0.102238 9.669309611293752 99.49 572 (10) 0 (0) 1147
densenet169 1 252974.82 0.258978 9.337940050424876 99.58 600 (10) 0 (0) 1241
densenet201 1 87818.07 0.209147 7.605141075366947 99.39 712 (10) 0 (0) 1905
distilbert-base-uncased 1 27401.45 0.730663 85.39709649871904 72.37 361 (16) 0 (0) 1
dla34.in1k 1 75717.13 0.270115 37.579857196542655 99.48 135 (9) 0 (0) 23
ese_vovnet19b_dw.ra_in1k 1 64235.77 0.541729 51.09862033725089 99.44 111 (12) 0 (0) 19
ghostnet_100.in1k 1 191671.49 0.662032 18.195050946142647 99.6 515 (14) 0 (0) 64
mobilenet_v2 1 82737.52 1.0306 32.44646333549643 99.09 154 (9) 0 (0) 0
mobilenet_v3_large 1 88933.92 1.25044 32.66906239790919 99.15 188 (11) 0 (0) 0
mobilenet_v3_small 1 118022.69 1.78333 34.81894150417828 99.09 158 (11) 0 (0) 0
mobilenetv1_100.ra4_e3600_r224_in1k 1 66679.44 0.857074 59.98800239952009 96.04 85 (7) 0 (0) 0
regnet_x_16gf 1 63865.16 0.0664586 15.651901706057284 99.56 235 (8) 0 (0) 0
regnet_x_1_6gf 1 56965.52 0.41886 27.540622418066647 99.47 195 (8) 0 (0) 0
regnet_x_32gf 1 107482.93 0.0281106 8.057368463459834 99.27 245 (8) 0 (0) 0
regnet_x_3_2gf 1 91311.5 0.25224 22.629554197782305 99.5 265 (8) 0 (0) 0
regnet_x_400mf 1 82798.78 0.968251 26.96144513345915 99.66 235 (8) 0 (0) 0
regnet_x_800mf 1 54691.23 0.671384 34.17634996582365 99.44 175 (8) 0 (0) 0
regnet_x_8gf 1 56008.59 0.127511 18.086453246518357 98.99 245 (8) 0 (0) 0
regnet_y_16gf 1 148402.5 0.065168 11.763321962122102 99.71 303 (10) 0 (0) 0
regnet_y_1_6gf 1 132566.84 0.435114 15.586034912718205 99.65 447 (10) 0 (0) 0
regnet_y_32gf 1 128920.33 0.0341463 7.935878104912309 99.72 335 (10) 0 (0) 0
regnet_y_3_2gf 1 90827.03 0.27943 18.90359168241966 99.82 351 (10) 0 (0) 0
regnet_y_400mf 1 95965.88 1.12252 25.64102564102564 99.64 271 (10) 0 (0) 0
regnet_y_800mf 1 72225.13 0.636991 28.481913984619766 99.59 239 (10) 0 (0) 0
regnet_y_8gf 1 84200.02 0.1165 17.46419839329375 99.82 287 (10) 0 (0) 0
resnet101 1 16838.34 0.105869 18.057060310581434 99.28 346 (9) 0 (0) 1
resnet152 1 100397.93 0.0872946 10.925379656943079 99.14 516 (9) 0 (0) 1
resnet18 1 24661.18 0.421944 76.68711656441718 99.63 70 (9) 0 (0) 1
resnet34 1 53904.96 0.208159 44.662795891022775 98.9 126 (9) 0 (0) 1
resnet50 1 54402.09 0.194155 33.82949932341001 98.61 176 (9) 0 (0) 1
resnext101_32x8d 1 53438.11 0.0634725 8.774238834781084 99.57 346 (9) 0 (0) 1
resnext101_64x4d 1 97216.09 0.0642695 8.939746111210441 99.65 346 (9) 0 (0) 1
resnext50_32x4d 1 32819.5 0.218632 29.7000297000297 99.44 176 (9) 0 (0) 1
textattack/albert-base-v2-imdb 1 35501.15 0.735051 45.41326067211626 100.0 782 (22) 0 (0) 2
tf_efficientnet_lite0.in1k 1 122760.64 0.459021 24.764735017335312 99.3 149 (9) 0 (0) 5
tf_efficientnet_lite1.in1k 1 77887.44 0.564844 18.695083193120208 99.56 194 (9) 0 (0) 5
tf_efficientnet_lite2.in1k 1 147601.07 0.43213 13.222266296443212 99.21 194 (9) 0 (0) 5
twmkn9/albert-base-v2-squad2 1 22227.95 0.0400934 39.635354736424894 99.86 783 (23) 0 (0) 2
vgg11 1 40769.16 0.0835439 98.42519685039369 99.65 33 (8) 0 (0) 5
vgg11_bn 1 2696.0 0.0778108 81.9000819000819 98.93 41 (9) 0 (0) 5
vgg13 1 59941.63 0.0518468 81.43322475570034 99.35 37 (8) 0 (0) 5
vgg13_bn 1 6718.52 0.0608414 72.72727272727272 97.31 47 (9) 0 (0) 5
vgg16 1 2328.64 0.0386412 70.57163020465774 99.44 43 (8) 0 (0) 5
vgg16_bn 1 66344.2 0.0388076 62.18905472636817 98.37 56 (9) 0 (0) 5
vgg19 1 2638.23 0.0312266 62.26650062266501 99.24 49 (8) 0 (0) 5
vgg19_bn 1 4137.37 0.0351847 54.975261132490374 96.97 65 (9) 0 (0) 5
wide_resnet101_2 1 92672.72 0.0437005 16.960651289009498 99.2 346 (9) 0 (0) 1
wide_resnet50_2 1 83865.69 0.0805556 32.67973856209151 98.8 176 (9) 0 (0) 1
xception71.tf_in1k 1 118354.65 0.059471 4.079468037367928 99.21 393 (9) 0 (0) 0
Autoencoder (conv) 🚧 1 3813.67 0.491205 335.5704697986577 100.0 9 (3) 1 (1) 1
Autoencoder (conv)-train 🚧 1 14060.64 0.336507 155.27950310559004 100.0 24 (7) 11 (4) 0
Autoencoder (linear)-train 🚧 1 14020.84 0.322749 76.62835249042145 100.0 104 (8) 14 (2) 0
Bloom 🚧 1 46306.88 0.0334381 1.453361625439642 98.86 1405 (27) 2 (2) 0
CLIP 🚧 1 59223.2 0.0273692 5.929439667951378 99.56 1397 (30) 7 (6) 2
CLIP-train 🚧 1 84239.57 0.0427298 0.672888643658361 100.0 3944 (44) 265 (16) 5
DETR 🚧 1 155683.9 0.00907839 0.1873753953620842 94.02 1663 (42) 9 (6) 3
DINOv2 🚧 1 32610.64 0.0517643 14.684287812041118 98.99 928 (25) 16 (1) 2
GLPN-KITTI 🚧 1 272913.25 0.00816913 0.016969772253777514 99.77 2959 (26) 22 (2) 6
GPT-2 🚧 1 28662.68 0.391319 32.31017770597738 99.98 745 (29) 2 (2) 2
HardNet-train 🚧 1 142217.49 0.0778058 0.12227375381646953 100.0 867 (21) 412 (9) 120
MLPMixer-train 🚧 1 39609.41 0.0566745 0.10479917334412067 100.0 616 (19) 100 (5) 0
Mnist-train 🚧 1 17872.95 0.289311 35.448422545196735 100.0 46 (15) 10 (6) 0
MobileNetSSD 🚧 1 246416.16 1.41758 0.4738078993252976 43.63 522 (31) 7 (4) 32
OpenPose V2-train 🚧 1 78649.12 0.0949146 0.12672761420058953 100.0 523 (14) 246 (7) 6
ResNet18-train 🚧 1 43716.08 0.172851 0.24379897311872523 100.0 241 (19) 121 (9) 0
ResNet50-train 🚧 1 75338.18 0.0588952 0.08341640216715814 100.0 616 (19) 318 (9) 0
SegFormer 🚧 1 27513.11 0.0242198 3.8147554741741057 99.86 676 (22) 16 (1) 4
SegFormer-train 🚧 1 176857.68 0.01241 0.02995324896900917 100.0 1794 (36) 156 (12) 4
U-Net-train 🚧 1 92426.66 0.0124864 0.02529950822815906 100.0 236 (15) 122 (8) 8
Unet-brain-train 🚧 1 61166.68 0.00956468 0.021203808034292494 100.0 236 (15) 122 (8) 8
Unet-carvana-train 🚧 1 151978.47 0.00584343 0.011400970655839697 100.0 232 (13) 121 (7) 8
YOLOS 🚧 1 41291.88 0.0496856 3.5066802258302063 98.46 952 (27) 17 (2) 2
YOLOv3 🚧 1 80197.6 0.00506433 17.7367860943597 98.63 250 (7) 2 (1) 4
albert/albert-xxlarge-v2 🚧 1 19281.55 0.0532404 6.837606837606837 98.54 791 (21) 24 (1) 3
dla34.in1k-train 🚧 1 51271.09 0.104225 0.15712696487269573 100.0 469 (18) 230 (8) 17
ese_vovnet19b_dw.ra_in1k-train 🚧 1 98798.79 0.207539 0.31701448439179186 100.0 383 (25) 176 (10) 16
facebook/deit-base-patch16-224 🚧 1 28152.33 0.0645301 8.766546857192951 98.34 685 (17) 1 (1) 1
facebook/deit-base-patch16-224-train 🚧 1 34051.86 0.0135963 0.8779862507353134 100.0 1854 (27) 127 (8) 2
ghostnet_100.in1k-train 🚧 1 164724.83 0.665053 0.5503092738118822 100.0 1469 (33) 562 (12) 64
ghostnetv2_100.in1k 🚧 1 81170.24 0.694666 8.968609865470851 99.65 683 (18) 24 (2) 68
ghostnetv2_100.in1k-train 🚧 1 72813.84 0.451241 0.2580358824698163 100.0 2001 (39) 852 (17) 68
googlenet 🚧 1 147537.26 0.449198 24.078979051288226 99.67 214 (15) 1 (1) 51
hrnet_w18.ms_aug_in1k 🚧 1 122150.89 0.203783 5.1652892561983474 99.65 1209 (11) 31 (1) 0
hrnet_w18.ms_aug_in1k-train 🚧 1 220129.59 0.068379 0.09654159059993841 100.0 3998 (21) 1973 (9) 0
inception_v4.tf_in1k 🚧 1 134730.62 0.0824491 6.294454585510165 99.09 495 (11) 14 (1) 84
inception_v4.tf_in1k-train 🚧 1 194641.97 0.0222046 0.03174487280623088 100.0 1851 (24) 932 (11) 80
mixer_b16_224.goog_in21k 🚧 1 17043.65 0.0919227 9.143275121148394 3.65 356 (11) 1 (1) 0
mixer_b16_224.goog_in21k-train 🚧 1 36026.15 0.0168269 0.8453299745555678 100.0 959 (18) 101 (6) 0
mobilenetv1_100.ra4_e3600_r224_in1k-train 🚧 1 60787.42 0.270605 0.34602555052665085 100.0 258 (16) 164 (7) 0
regnet_y_128gf 🚧 1 344402.67 0.00192155 0.01659929841405323 98.91 447 (10) 3 (1) 0
ssd300_vgg16 🚧 1 157675.17 0.277711 0.6674631727194452 N/A 332 (30) 8 (5) 37
ssdlite320_mobilenet_v3_large 🚧 1 187960.71 1.06538 0.41905528177277146 41.24 522 (31) 7 (4) 32
swin_b 🚧 1 131255.31 0.0782158 3.903810118675827 99.54 2492 (32) 110 (2) 479
swin_s 🚧 1 30085.54 0.0933425 3.5318217136398955 99.68 2492 (32) 110 (2) 479
swin_t 🚧 1 127891.92 0.175615 7.341065922771986 99.76 1238 (32) 50 (2) 227
swin_v2_b 🚧 1 155984.62 0.0539692 2.893518518518518 23.56 3140 (40) 158 (3) 473
swin_v2_s 🚧 1 29427.15 0.108186 3.4775351231047433 36.71 3140 (40) 158 (3) 473
swin_v2_t 🚧 1 97601.96 0.201151 6.237135907191417 54.34 1562 (40) 74 (3) 221
tf_efficientnet_lite0.in1k-train 🚧 1 136011.23 0.292251 0.13154917571286498 100.0 452 (17) 285 (8) 5
tf_efficientnet_lite1.in1k-train 🚧 1 142142.64 0.230875 0.10081011004431613 100.0 587 (17) 370 (8) 5
tf_efficientnet_lite2.in1k-train 🚧 1 124647.59 0.192633 0.07623084223646046 100.0 587 (17) 370 (8) 5
tf_efficientnet_lite3.in1k 🚧 1 108309.6 0.353552 3.7418147801683816 99.15 221 (9) 5 (1) 5
tf_efficientnet_lite3.in1k-train 🚧 1 114089.02 0.120481 0.05423225832343143 100.0 668 (17) 426 (9) 5
tf_efficientnet_lite4.in1k 🚧 1 155553.34 0.188086 2.314814814814815 99.21 275 (9) 6 (1) 5
tf_efficientnet_lite4.in1k-train 🚧 1 140577.17 0.0684103 0.03741863784932637 100.0 830 (17) 529 (9) 5
vit_b_16 🚧 1 33951.28 0.064918 6.692992436918547 99.52 552 (17) 1 (1) 1
vit_b_32 🚧 1 18068.67 0.185095 6.989097008666479 98.73 552 (17) 1 (1) 1
vit_h_14 🚧 1 695117.12 0.00130755 0.3776506353971941 98.14 1452 (17) 1 (1) 1
vit_l_16 🚧 1 56903.86 0.0196752 3.623844899438304 99.73 1092 (17) 1 (1) 1
vit_l_32 🚧 1 38114.73 0.0561787 4.78423117405033 99.06 1092 (17) 1 (1) 1
xception71.tf_in1k-train 🚧 1 150427.21 0.0179414 0.01836345950682726 100.0 1378 (18) 806 (7) 0
FLAN-T5 N/A N/A 0.231409 N/A N/A 20020 (38) N/A N/A
Falcon-7B N/A N/A 0.0159877 N/A N/A 2600 (27) N/A N/A
GPTNeo N/A N/A 0.0948277 N/A N/A 2733 (35) N/A N/A
Llama N/A N/A 0.00554626 N/A N/A 3690 (35) N/A N/A
OPT N/A N/A 0.0400944 N/A N/A 4003 (32) N/A N/A
Stable Diffusion V2 N/A N/A 0.00134308 N/A N/A 1870 (29) N/A N/A
ViLT N/A N/A 0.0625589 N/A N/A 766 (29) N/A N/A
Whisper N/A N/A 0.00319721 N/A N/A 4310 (21) N/A N/A
YOLOv5 N/A N/A 0.0541112 N/A N/A 236 (13) N/A N/A
codegen N/A N/A 0.141725 N/A N/A 9183 (37) N/A N/A
speecht5-tts N/A N/A 0.0190955 N/A N/A 6942 (40) N/A N/A
t5-base N/A N/A 0.161917 N/A N/A 14681 (38) N/A N/A
t5-large N/A N/A 0.0217367 N/A N/A 22696 (38) N/A N/A
t5-small N/A N/A 0.392642 N/A N/A 6118 (38) N/A N/A

Explanation of Metrics

Model: Name of the model.
Status: Indicates whether the model is:

  • ✅ End-to-end on device: All PyTorch operations have been converted to TT-NN operations.
  • 🚧 Compiled: The converted model runs but some operations still fallback to PyTorch. This may be due to an unsupported operation or configuration.
  • ❌ Traced: The model does not run but its PyTorch operations are traced for future development. This may indicate a temporary incompatibility with a compiler pass.
    Batch: Batch size used for inference
    Compiled First Run (ms): Time until the first compiled run finishes (ms), including compilation time and warming caches.
    Original Throughput (Inferences Per Second): Execution throughput (in inferences per second) of the model before conversion.
    Compiled Throughput (Inferences Per Second): Execution throughput (in inferences per second) of the model after conversion, once caches are warm.
    Accuracy (%): Model accuracy on a predefined test dataset after conversion.
    Torch Ops Before (Unique Ops): The total number of operations used by the model in the original Torch implementation. The number in parentheses represents the total unique ops.
    Torch Ops Remain (Unique Ops): The total number of operations used after conversion to TT-NN. The number in parentheses represents the total unique ops.
    To/From Device Ops: The number of to/from_device operations (data transfer to/from the device).

Contributing

Whether you are new to Tenstorrent hardware or an experienced developer, there are many ways to contribute.

Getting Started

Start with our high level Contribution guide. You can find more information here:

We encourage contributions and offer 🤑 Bounties for some issues.

Development Environment

To get started with development, you'll need a Wormhole or Blackhole Tenstorrent accelerator card, which:

Install the development dependencies:

pip install -r requirements-dev.txt
pip install -e .

You can build the wheel file with

python -m build

Project Structure

  • torch_ttnn/: Main package directory containing the core implementation
  • tests/: Test files for the project including model suites. We use pytest as our testing framework.
  • tools/: Development and utility scripts
  • docs/: Project documentation and reports
  • demo/: Example code and usage demonstrations

Questions and Support

If you have questions or need help getting started, please:

  1. Review the existing documentation
  2. Ask PyTorch TT-NN DeepWiki or TT-Metal DeepWiki
  3. Ask on Discord
  4. Open an issue on GitHub

About

⭐️ TTNN Compiler for PyTorch 2 ⭐️ Enables running PyTorch models on Tenstorrent hardware using eager or compile path

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.8%
  • Other 0.2%