## CPU Threading and TorchScript Inference

PyTorch allows using multiple CPU threads during `TorchScript` model inference. This following figure shows different levels of parallelism one would find in a typical application.

<img src="https://pytorch.org/docs/stable/_images/cpu_threading_torchscript_inference.svg"/>

One or more inference threads execute a model's forward pass on the given inputs. Each inference thread invokes a `JIT` interpreter that executes the ops of a model inline, one by one. A model can utilize a `fork` TorchScript primitive to launch an asynchronous task. Forking several operations at once results in a task that is executed in parallel. The `fork` operator returns a `Future` object which can be used to synchronize on later

```python
import torch

@torch.jit.script
def compute_z(x):
    return torch.mm(x, self.w_z)

@torch.jit.script
def forward(x):
    # launch compute_z asynchronously:
    fut = torch.jit._fork(compute_z, x)
    # execute the next operation in parallel to compute_z:
    y = torch.mm(x, self.w_y)
    # wait for the result of compute_z:
    z = torch.jit._wait(fut)
    return y + z
```

## Tuning the Number of Threads

The following simple script shows how a runtime of matrix multiplication changes with the number of threads:

In [None]:
import torch
import timeit
runtimes = []
threads = [1] + [t for t in range(2, 49, 2)]
for t in threads:
    torch.set_num_threads(t)
    r = timeit.timeit(setup = "import torch; x = torch.randn(1024, 1024); y = torch.randn(1024, 1024)", stmt="torch.mm(x, y)", number=100)
    runtimes.append(r)