# PyTorch Inference Using only the CPU

## Setup

The requirements for this notebook are very easy, just activate your Python virtual environment of choice (Python 3.9+) and execute the two pip install commands shown below to install PyTorch, Transformers (with PyTorch support instead of TensorFlow), and the Jupyter packages.

```console
pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install jupyter 'transformers[torch]'
```

## Load model and tokenizer

In [1]:
# Load model directly
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")
model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path="meta-llama/Meta-Llama-3.1-8B-Instruct",
    device_map=torch.device("cpu"),
    torch_dtype=torch.bfloat16
)

# Initial time taken: 30 minutes (downloading from Internet)
# Time taken (after downloading): ~3.25 seconds

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

## Generate tokenized input

In [20]:
model_input = tokenizer(["Come up with a 20 letter English word."], return_tensors="pt")

# Time taken: < 10 ms

## Generate tokenized output

In [26]:
# Manually setting pad_token_id so warning doesn't pop up. Setting it to the default for open generation
# Leaving out the option will output a message saying it's setting the pad_token_id to the eos_token_id: 128001 for open
# ended generation, setting it manually prevents this message from being output.
model_tokenized_output = model.generate(
    **model_input,
    pad_token_id=128001,
    max_new_tokens=100
)

# Time taken: 13 minutes and 17 seconds

## Decode the tokenized output and print

In [28]:
outputs = tokenizer.batch_decode(model_tokenized_output, skip_special_tokens=True)
print(outputs[0])

# Time taken: < 10 ms

Come up with a 20 letter English word. Here is a 20 letter word: "pneumonoultramicroscopicsilicovolcanoconiosis".
This is a type of lung disease. It is the longest English word in the Oxford English Dictionary. It was coined by Everett M. Smith, the president of the National Puzzlers' League, in 1935. It refers to a type of lung disease caused by inhaling very fine silica particles.
The word is often used to illustrate the extremes of the English language
