### Practice: Parameter Efficient Fine-Tuning
In this notebook, you're gonna fine-tune large language models within limited GPU memory.

In [1]:
%pip install --quiet transformers==4.34.1 accelerate==0.24.0 sentencepiece==0.1.99 optimum==1.13.2 peft==0.5.0 bitsandbytes==0.41.2.post2

import torch
import torch.nn as nn
import torch.nn.functional as F

import transformers
from tqdm.auto import tqdm, trange
assert torch.cuda.is_available(), "you need cuda for this part"
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m121.5/121.5 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m301.0/301.0 kB[0m [31m15.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.7/7.7 MB[0m [31m98.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m261.0/261.0 kB[0m [31m22.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m65.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.6/85.6 kB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.6/92.6 MB[0m [31m9.2 MB/s[0m eta [36

In [2]:
!pip install protobuf==3.20.* safetensors

Collecting protobuf==3.20.*
  Downloading protobuf-3.20.3-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.metadata (679 bytes)
Downloading protobuf-3.20.3-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.1 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.5/1.1 MB[0m [31m13.2 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m17.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: protobuf
  Attempting uninstall: protobuf
    Found existing installation: protobuf 4.25.5
    Uninstalling protobuf-4.25.5:
      Successfully uninstalled protobuf-4.25.5
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
grpcio-status 

In [None]:
model_name = 'Enoch/llama-7b-hf'

# loading Llama tokenizer ...
tokenizer = transformers.LlamaTokenizer.from_pretrained(model_name, device_map=device)
tokenizer.pad_token_id = tokenizer.eos_token_id

# ... and the model itself
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map='auto',
    low_cpu_mem_usage=True,
    offload_state_dict=True,
    load_in_4bit=True,
    torch_dtype=torch.float32,  # weights are 4-bit; layernorms and activations are fp32
)
for param in model.parameters():
    param.requires_grad=False

model.gradient_checkpointing_enable()  # only store a small subset of activations, re-compute the rest.
model.enable_input_require_grads()     # override an implementation quirk in gradient checkpoints that disables backprop unless inputs require grad
# more on gradient checkpointing: https://pytorch.org/docs/stable/checkpoint.html https://arxiv.org/abs/1604.06174

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
  torch.utils._pytree._register_pytree_node(
  torch.utils._pytree._register_pytree_node(


Loading checkpoint shards:   0%|          | 0/33 [00:00<?, ?it/s]

  return torch.load(checkpoint_file, map_location=map_location)


### Prompt tuning: the story of a fox (2 pts)

![img](https://i.imgur.com/Ux3qQAu.png) (source: theodd1souts.fandom.com)

In [None]:
prompt = 'A quick brown fox'
batch = tokenizer(prompt, return_tensors='pt', return_token_type_ids=False).to(device)

for i in range(10):
    next_token = model(**batch).logits[0, -1].argmax(-1).reshape(1, 1)
    batch['input_ids'] = torch.cat([batch['input_ids'], next_token], dim=-1)
    batch['attention_mask'] = torch.cat([batch['attention_mask'], torch.ones_like(next_token)], dim=-1)

print("\nOutput:", tokenizer.decode(batch['input_ids'][0].cpu().numpy().tolist()))


Output: <s>A quick brown fox jumps over the lazy dog.
A quick


What a blatant lie! This particular fox assures you that it didn't in fact jump over the lazy dog. No, sir! The fox was just minding its own business. __Your task is to train the model to say truth: no dog was jumped over today.__

In [None]:
the_truth = "A quick brown fox did not jump over the lazy dog. Besides, that dog deserved it anyway!"
batch = tokenizer(the_truth, return_tensors='pt', return_token_type_ids=False).to(device)
outputs = model(**batch)

next_word_logits = outputs.logits[:, :-1]
true_next_tokens = batch['input_ids'][:, 1:]
loss = F.cross_entropy(next_word_logits.flatten(0, 1), true_next_tokens.flatten(0, 1))

print("Loss:", loss)

Loss: tensor(3.0725, device='cuda:0', grad_fn=<NllLossBackward0>)


Except, we can't train the entire model - that would be 28GB gradients in float32. Instead, let's run [prompt tuning](https://arxiv.org/abs/2104.08691).

![img](https://i.imgur.com/VwNNKnb.png)


In [None]:
class WordEmbeddingsWithLearnedPrompts(nn.Module):
    """
    To perform prompt tuning, you will need to replace the model's original word embeddings with a layer - THIS layer
    - that inserts trainable prompts instead of the first N token embeddings.
    """

    def __init__(self, word_embeddings: nn.Embedding, num_prompts: int):
        super().__init__()
        self.original_word_embeddings = word_embeddings
        self.num_prompts = num_prompts
        self.learnable_prompts = nn.Parameter(
            torch.randn(1, num_prompts, word_embeddings.embedding_dim), requires_grad=True
        )

    def forward(self, input_ids: torch.LongTensor):
        # input_ids shape: [batch_size, seq_length]
        assert input_ids.dtype == torch.int64
        assert input_ids.shape[1] > self.num_prompts
        assert torch.all(input_ids[:, :self.num_prompts] == tokenizer.pad_token_id).item(), (
            "Don't forget to prepend several BOS tokens to input_ids"
        )

        # Embed the input_ids using the original word embeddings
        input_embeddings = self.original_word_embeddings(input_ids)  # Shape: [batch_size, seq_length, embedding_dim]

        # Replace the first num_prompts token embeddings with the learnable prompts
        batch_size = input_ids.shape[0]
        learnable_prompts_expanded = self.learnable_prompts.expand(batch_size, -1, -1)  # Shape: [batch_size, num_prompts, embedding_dim]
        remaining_embeddings = input_embeddings[:, self.num_prompts:, :]  # Shape: [batch_size, seq_length - num_prompts, embedding_dim]

        # Concatenate learnable prompts with the embeddings of the remaining tokens
        output_embeddings = torch.cat([learnable_prompts_expanded, remaining_embeddings], dim=1)

        return output_embeddings


In [None]:
num_prompts = 16
test_emb_layer = WordEmbeddingsWithLearnedPrompts(model.model.embed_tokens, num_prompts=num_prompts).to(device)
test_input_ids = tokenizer("a cat say on a may", return_tensors='pt')['input_ids'].to(device)

space_for_prompts = torch.full([len(test_input_ids), num_prompts], fill_value=tokenizer.pad_token_id,
                               dtype=torch.int64, device=device)
test_inputs_with_prompts = torch.cat([space_for_prompts, test_input_ids], dim=1)

with torch.cuda.amp.autocast():
  test_prompt_embeddings = test_emb_layer(test_inputs_with_prompts)

assert test_prompt_embeddings.shape[:2] == test_inputs_with_prompts.shape
assert test_prompt_embeddings.shape[-1] == model.config.hidden_size
assert torch.allclose(test_prompt_embeddings[:, :num_prompts], test_emb_layer.learnable_prompts.float())
assert torch.allclose(test_prompt_embeddings[:, num_prompts:], model.model.embed_tokens(test_input_ids).float())
print("Looks legit!")

Looks legit!


  with torch.cuda.amp.autocast():


__Now that it works,__ let's inject learnable prompts into the main model and teach it about foxes.

In [None]:
assert isinstance(model.model.embed_tokens, nn.Embedding), "you have already replaced the embedding layer. If the replacement is broken, please reload the model"

model.model.embed_tokens = WordEmbeddingsWithLearnedPrompts(model.model.embed_tokens, num_prompts=num_prompts).to(device)

opt = torch.optim.Adam([model.model.embed_tokens.learnable_prompts], lr=0.01)

In [None]:
the_truth = "A quick brown fox did not jump over the lazy dog. Besides, that dog deserved it anyway!"
batch = tokenizer(the_truth, return_tensors='pt', return_token_type_ids=False).to(device)
space_for_prompts = torch.full([len(test_input_ids), num_prompts], fill_value=tokenizer.pad_token_id,
                               dtype=torch.int64, device=device)
batch['input_ids'] = torch.cat([space_for_prompts, batch['input_ids']], dim=1)
batch['attention_mask'] = torch.cat([torch.ones_like(space_for_prompts), batch['attention_mask']], dim=1)

# Define optimizer for the learnable prompts
opt = torch.optim.Adam([model.model.embed_tokens.learnable_prompts], lr=0.01)

# Training loop
num_epochs = 100  # Maximum number of epochs to train
loss_threshold = 0.1  # Desired loss value

for epoch in range(num_epochs):
    # Forward pass
    outputs = model(**batch)
    next_word_logits = outputs.logits[:, num_prompts : -1, :]  # Exclude prompt logits and last position
    true_next_tokens = batch['input_ids'][:, num_prompts + 1:]  # Exclude prompt tokens and shift by one

    # Compute loss
    loss = F.cross_entropy(next_word_logits.flatten(0, 1), true_next_tokens.flatten(0, 1))

    # Backward pass and optimization
    opt.zero_grad()
    loss.backward()
    opt.step()

    # Print loss for monitoring
    print(f"Epoch {epoch + 1}/{num_epochs}, Loss: {loss.item()}")

    # Early stopping condition
    if loss.item() <= loss_threshold:
        print("Loss threshold reached. Stopping training.")
        break
else:
    print("Maximum epochs reached without meeting loss threshold.")


Epoch 1/100, Loss: 7.437390327453613
Epoch 2/100, Loss: 6.6080098152160645
Epoch 3/100, Loss: 6.18166971206665
Epoch 4/100, Loss: 5.830872058868408
Epoch 5/100, Loss: 5.530222415924072
Epoch 6/100, Loss: 5.272552967071533
Epoch 7/100, Loss: 5.024826526641846
Epoch 8/100, Loss: 4.767119884490967
Epoch 9/100, Loss: 4.50115442276001
Epoch 10/100, Loss: 4.23823881149292
Epoch 11/100, Loss: 3.9841089248657227
Epoch 12/100, Loss: 3.739558458328247
Epoch 13/100, Loss: 3.503448486328125
Epoch 14/100, Loss: 3.2667531967163086
Epoch 15/100, Loss: 3.0289087295532227
Epoch 16/100, Loss: 2.799907684326172
Epoch 17/100, Loss: 2.5797531604766846
Epoch 18/100, Loss: 2.3631536960601807
Epoch 19/100, Loss: 2.155905246734619
Epoch 20/100, Loss: 1.9620521068572998
Epoch 21/100, Loss: 1.7785578966140747
Epoch 22/100, Loss: 1.6021579504013062
Epoch 23/100, Loss: 1.4293497800827026
Epoch 24/100, Loss: 1.2622328996658325
Epoch 25/100, Loss: 1.105930209159851
Epoch 26/100, Loss: 0.9591464400291443
Epoch 27/100

In [None]:
# Final loss assertion
assert loss.item() <= loss_threshold, "Training did not reduce loss to the desired threshold."
print("Good job!")

Good job!


In [None]:
prompt = 'A quick brown fox'
batch = tokenizer(prompt, return_tensors='pt', return_token_type_ids=False).to(device)
batch['input_ids'] = torch.cat([space_for_prompts, batch['input_ids']], dim=1)
batch['attention_mask'] = torch.cat([torch.ones_like(space_for_prompts), batch['attention_mask']], dim=1)


for i in range(15):
    next_token = model(**batch).logits[0, -1].argmax(-1).reshape(1, 1)
    batch['input_ids'] = torch.cat([batch['input_ids'], next_token], dim=-1)
    batch['attention_mask'] = torch.cat([batch['attention_mask'], torch.ones_like(next_token)], dim=-1)

print("\nOutput:", tokenizer.decode(batch['input_ids'][0, num_prompts:].cpu().numpy().tolist()))

# if you did everything right, the model will deny that the fox jumped over the lazy dog


Output: <s>A quick brown fox did not jump over the lazy dog. Besides, that dog deserved it


### Using HuggingFace PEFT (2 points)

[`peft`](https://huggingface.co/docs/peft/index) is a transformer's sister library that allows you to apply various __p__arameter __e__fficient __f__ine-__t__uning methods to pre-trained transformers. The library imlements both prompt tuning, prefix tuning, as well as several adapter-based techniques under a common interface:



In [None]:
import peft
assert isinstance(model.model.embed_tokens, nn.Embedding), "please reload the model"

peft_config = peft.PromptTuningConfig(task_type=peft.TaskType.CAUSAL_LM, num_virtual_tokens=16)
model = peft.get_peft_model(model, peft_config)  # note: for most peft methods, this line also modifies model in-place
print("Trainable parameters:", sum(p.numel() for p in model.parameters() if p.requires_grad))
print("Total parameters (excluding quantization):", sum(p.numel() for p in model.parameters()))

Trainable parameters: 65536
Total parameters (excluding quantization): 3500478464


In [None]:
# Your task: optimize the PEFT-wrapped model to achieve next token prediction loss < 0.1, but this time using PEFT
# Please note: you no longer need to prepend PAD tokens, but you still need to skip :num_virtual_tokens: first logits.
# Finally, generate the sentence to make sure that the model learned the truth.

In [None]:
# Training Configuration
num_epochs = 100  # Max number of epochs
loss_threshold = 0.1  # Desired loss
learning_rate = 0.01  # Learning rate for the optimizer

# Define the optimizer for trainable parameters (PEFT prompts)
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)

In [None]:
# Define the ground truth sentence
# the_truth = "A quick brown fox did not jump over the lazy dog. Besides, that dog deserved it anyway!"
the_truth = "A quick brown fox did not jump over the lazy dog. Besides, that dog deserved it anyway!"
batch = tokenizer(the_truth, return_tensors="pt", return_token_type_ids=False).to(device)

In [None]:
# Training Configuration
num_epochs = 100  # Max number of epochs
loss_threshold = 0.1  # Desired loss threshold
learning_rate = 0.01  # Learning rate

# Define the optimizer for trainable parameters (PEFT prompts)
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)

# Define the ground truth
the_truth = "A quick brown fox did not jump over the lazy dog. Besides, that dog deserved it anyway!"
batch = tokenizer(the_truth, return_tensors="pt", return_token_type_ids=False).to(device)

# Training Loop
for epoch in range(num_epochs):
    # Forward pass
    outputs = model(**batch)

    # Skip logits for virtual tokens and the last token
    next_word_logits = outputs.logits[:, peft_config.num_virtual_tokens:-1, :]  # Skip virtual tokens
    true_next_tokens = batch['input_ids'][:, 1:]  # Shift ground truth tokens by one

    # Compute the loss
    loss = F.cross_entropy(
        next_word_logits.reshape(-1, next_word_logits.size(-1)),  # Flatten logits
        true_next_tokens.reshape(-1)  # Flatten ground truth tokens
    )

    # Backpropagation
    optimizer.zero_grad()  # Reset gradients
    loss.backward()  # Compute gradients
    optimizer.step()  # Update trainable parameters (PEFT prompts)

    # Print loss for tracking
    print(f"Epoch {epoch + 1}/{num_epochs}, Loss: {loss.item()}")

    # Stop training if loss is below threshold
    if loss.item() < loss_threshold:
        print("Loss threshold reached. Stopping training.")
        break
else:
    print("Maximum epochs reached without meeting the loss threshold.")

Epoch 1/100, Loss: 7.5819621086120605
Epoch 2/100, Loss: 6.866239547729492
Epoch 3/100, Loss: 6.3946990966796875
Epoch 4/100, Loss: 5.9803466796875
Epoch 5/100, Loss: 5.610335826873779
Epoch 6/100, Loss: 5.287990093231201
Epoch 7/100, Loss: 5.007410526275635
Epoch 8/100, Loss: 4.752322196960449
Epoch 9/100, Loss: 4.5118513107299805
Epoch 10/100, Loss: 4.2819504737854
Epoch 11/100, Loss: 4.060739994049072
Epoch 12/100, Loss: 3.846289873123169
Epoch 13/100, Loss: 3.6368141174316406
Epoch 14/100, Loss: 3.43117356300354
Epoch 15/100, Loss: 3.228698968887329
Epoch 16/100, Loss: 3.028930425643921
Epoch 17/100, Loss: 2.831965684890747
Epoch 18/100, Loss: 2.638991355895996
Epoch 19/100, Loss: 2.4516098499298096
Epoch 20/100, Loss: 2.2695798873901367
Epoch 21/100, Loss: 2.0901618003845215
Epoch 22/100, Loss: 1.911406397819519
Epoch 23/100, Loss: 1.7351828813552856
Epoch 24/100, Loss: 1.5655841827392578
Epoch 25/100, Loss: 1.405889630317688
Epoch 26/100, Loss: 1.2578959465026855
Epoch 27/100, Lo

In [None]:
# Final assertion to ensure loss is below threshold
assert loss.item() < loss_threshold, "Training failed to reduce loss below threshold."
print("Training successful! Loss is below 0.1.")

Training successful! Loss is below 0.1.


In [None]:
prompt = "A quick brown fox"
batch = tokenizer(prompt, return_tensors="pt", return_token_type_ids=False).to(device)

# Generate 15 tokens
for i in range(15):
    # Forward pass to get the logits
    outputs = model(**batch)
    next_token = outputs.logits[0, -1].argmax(-1).reshape(1, 1)

    # Append the next token to input_ids
    batch["input_ids"] = torch.cat([batch["input_ids"], next_token], dim=-1)

    # Update the attention_mask to match the new input_ids length
    new_attention_mask = torch.ones_like(next_token, dtype=batch["attention_mask"].dtype).to(device)
    batch["attention_mask"] = torch.cat([batch["attention_mask"], new_attention_mask], dim=-1)

# Decode the generated sequence
# Skip the virtual tokens (if applicable) by slicing `batch["input_ids"][:, num_prompts:]`
decoded_output = tokenizer.decode(batch["input_ids"][0].cpu().numpy().tolist(), skip_special_tokens=True)
print("\nOutput:", decoded_output)



Output: A quick brown fox did not jump over the lazy dog. Besides, that dog deserved it


### Parameter-efficient finetuning with LoRA (2 points)

When training on more serious tasks, you can use low-rank adapters based on the [LoRA paper](https://arxiv.org/pdf/2106.09685.pdf).

The core idea is to add low-rank adapters __in parallel with existing linear layers,__ like this:
<center><img src="https://i.imgur.com/6bQLNiG.png" width=240px></center>

In the original LoRA paper, the adapters were only added to attention projection matrices. However, [subsequent works](https://arxiv.org/abs/2305.14314) show that it is useful to adapt FFNs as well. But before we do any training, we need to implement the basic LoRA layer.

In [None]:
# re-load the model to remove any previous PEFT tuners
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_name, device_map='auto', low_cpu_mem_usage=True, offload_state_dict=True,
    load_in_4bit=True, torch_dtype=torch.float32,  # weights are 4-bit; layernorms and activations are fp32
)
for param in model.parameters():
    param.requires_grad=False
model.gradient_checkpointing_enable()
model.enable_input_require_grads()

Loading checkpoint shards:   0%|          | 0/33 [00:00<?, ?it/s]

In [10]:
class LoRALayer(nn.Module):
    """Wraps a linear layer with LoRA-like adapter. Wraps an existing OPT linear layer"""
    def __init__(self, module: nn.Linear, rank: int):
        super().__init__()
        self.module = module  # pre-trained (frozen) linear layer
        self.adapter_A = nn.Parameter(torch.empty(module.in_features, rank, device=module.weight.device))
        nn.init.kaiming_uniform_(self.adapter_A, a=5 ** 0.5)
        self.adapter_B = nn.Parameter(torch.zeros(rank, module.out_features, device=module.weight.device))

    def forward(self, input):
        # Apply self.module and LoRA adapter, return the sum (self.module outputs + adapter outputs)
        original_output = self.module(input)
        lora_output = input @ self.adapter_A @ self.adapter_B

        return original_output + lora_output

In [None]:
# test your implementation
test_linear = nn.Linear(128, 128)
test_linear.weight.data[...] = torch.eye(128)
test_adapter = LoRALayer(test_linear, rank=8)

assert torch.allclose(test_adapter(torch.ones(1, 1, 128)), test_linear.bias + 1), "please check your forward pass"

test_adapter.adapter_A.data[...] = torch.linspace(0.1, -0.5, 128 * 8).view(128, 8)
test_adapter.adapter_B.data[...] = torch.linspace(0.5, -0.1, 128 * 8).view(8, 128)
test_linear.bias.data[...] = torch.linspace(1., -1., 128)

dummy_loss = F.mse_loss(test_adapter(torch.ones(1, 128) / 128).squeeze(), torch.linspace(-1, 1, 128))
assert torch.allclose(dummy_loss, torch.tensor(1.3711389), rtol=0, atol=1e-4)
dummy_loss.backward()
assert all(w.grad is not None for w in [test_adapter.adapter_A, test_adapter.adapter_B]), "some adapter weights have no grad"
assert torch.allclose(test_adapter.adapter_A.grad.sum(), torch.tensor(-0.60158), rtol=0, atol=1e-4), "bad grad w.r.t. A"
assert torch.allclose(test_adapter.adapter_B.grad.sum(), torch.tensor(0.9931), rtol=0, atol=1e-4), "bad grad w.r.t. B"
# note: bad grad means that your code is different from LoRA paper OR that your code is not autograd-friendly (e.g. no_grad)
del dummy_loss, test_linear, test_adapter
print("All tests passed!")

All tests passed!


### Apply LoRA to the model

The code below applies LoRA adapters on top of Q/K/V linear layers in Llama attention. You may also choose to modify other layers:
* self_attn.o_proj - attention output projection
* mlp.up_proj, mlp.gate_proj, mlp.down_proj - transformer feedforward layers
* lm_head - output LM head

__Note:__ please scroll down for the homework task

In [None]:
lora_rank = 8

for name, module in model.model.layers.named_modules():
    if 'LlamaDecoderLayer' in repr(type(module)):
        module.self_attn.q_proj = LoRALayer(module.self_attn.q_proj, rank=lora_rank).to(device)
        module.self_attn.k_proj = LoRALayer(module.self_attn.k_proj, rank=lora_rank).to(device)
        module.self_attn.v_proj = LoRALayer(module.self_attn.v_proj, rank=lora_rank).to(device)

assert sum(isinstance(module, LoRALayer) for module in model.modules()) == 96  # for Llama-7B

In [None]:
batch = tokenizer("This model wants to share its greatest secret:", return_tensors='pt', return_token_type_ids=False)
# test a single training step, make sure we get meaningful gradients
with torch.cuda.amp.autocast(dtype=torch.float32):
    out = model.forward(**batch)
    (out.logits.norm() / 100).backward()

for i, module in enumerate(model.modules()):
    if isinstance(module, LoRALayer):
        assert module.adapter_B.grad is not None
        assert module.adapter_B.grad.norm().item() > 0

model.zero_grad(set_to_none=True)
print("Grad check successful, well done!")

  with torch.cuda.amp.autocast(dtype=torch.float32):


Grad check successful, well done!


  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass


### (example) How to train your model

The example below shows how to train the LoRA adapters on a dummy dataset. You will need to run a _similar_ training task later.

__Note:__ please scroll down for the homework task

In [None]:
import os
os.environ["WANDB_DISABLED"] = "true"

In [None]:
# checking if the model can learn. Change max_steps for proper training
import datasets
data = datasets.load_dataset("Abirate/english_quotes", split="train[:32]") # 32 lines
data = data.map(lambda samples: tokenizer(samples['quote']), batched=True)
model._hf_peft_config_loaded = True  # silence a warning from HF trainer

trainer = transformers.Trainer(
    model=model, train_dataset=data,
    args=transformers.TrainingArguments(
        per_device_train_batch_size=2, gradient_accumulation_steps=1,
        # note: if you want larger batch size, increase gradient_accumulation_steps
        warmup_steps=250, max_steps=100, learning_rate=2e-4, fp16=True,
        logging_steps=1, output_dir='outputs', report_to=None),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
# if you see cache warnings, set `model.config.use_cache = False` to silence them. Please re-enable for inference!

trainer.train()

# NOTE: this is just an example! you do not have to wait for this progressbar to finish :)

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
  self.scaler = torch.cuda.amp.GradScaler(**kwargs)
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
  return fn(*args, **kwargs)


Step,Training Loss
1,1.8912
2,1.696
3,0.897
4,1.7458
5,1.1678
6,0.7296
7,1.5255
8,1.0632
9,0.6671
10,1.421


TrainOutput(global_step=100, training_loss=0.5414391223713756, metrics={'train_runtime': 152.6785, 'train_samples_per_second': 1.31, 'train_steps_per_second': 0.655, 'total_flos': 621258424123392.0, 'train_loss': 0.5414391223713756, 'epoch': 6.25})

### Final task: *actually* train the model (10 points)

Your task is to fine-tune the model to _generate python code_. Please use the above examples for inspiration. More specifically,

* __dataset:__ use [codeparrot-clean](https://huggingface.co/datasets/codeparrot/codeparrot-clean) or any other data containing python code. Since you do not need much data for this excercise, it is enough to use just shorter train subset of `codeparrots`
* __preprocessing:__ select python code based on file extentions (.py)  (may skip in case of codeparrot - it is 100% python)
* __short lines:__ please take the first 512 characters of each line
* __adapter type:__ please use LoRA as defined above __plus at least one of:__
   - extra adapter on lm_head
   - extra adapter on MLP components (mlp.*)
   - trainable input embeddings (requires tweaking memory usage)

* __training:__ you do not have to train to convergence. If all goes well, your model should `.generate` code after 500 steps. Please use batch size of at least 4 (4 x 1 x 512 tokens) using `gradient_accumulation_steps=4`.


Note: the peft library also has LoRA implementation. However, we ask that for this assignment you show at least one complete training run with your own LoRA code.

__Alternative assignment:__ Instead of doing python code, feel free to substitute the task with any other dataset, e.g. your favorite artist or podcast, as long as it's ethical. If you choose your own task, please show examples of what your model learned - or did not learn, akin to the code examples below.

In [3]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from transformers import AutoTokenizer
import transformers
from tqdm.auto import tqdm, trange
assert torch.cuda.is_available(), "you need cuda for this part"
import os

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
os.environ["WANDB_DISABLED"] = "true"

In [5]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [6]:
!unzip /content/drive/MyDrive/codeparrot.zip -d ./codeparrot

Archive:  /content/drive/MyDrive/codeparrot.zip
   creating: ./codeparrot/codeparrot/
  inflating: ./codeparrot/codeparrot/data-00000-of-00012.arrow  
  inflating: ./codeparrot/codeparrot/data-00001-of-00012.arrow  
  inflating: ./codeparrot/codeparrot/data-00002-of-00012.arrow  
  inflating: ./codeparrot/codeparrot/data-00003-of-00012.arrow  
  inflating: ./codeparrot/codeparrot/data-00004-of-00012.arrow  
  inflating: ./codeparrot/codeparrot/data-00005-of-00012.arrow  
  inflating: ./codeparrot/codeparrot/data-00006-of-00012.arrow  
  inflating: ./codeparrot/codeparrot/data-00007-of-00012.arrow  
  inflating: ./codeparrot/codeparrot/data-00008-of-00012.arrow  
  inflating: ./codeparrot/codeparrot/data-00009-of-00012.arrow  
  inflating: ./codeparrot/codeparrot/data-00010-of-00012.arrow  
  inflating: ./codeparrot/codeparrot/data-00011-of-00012.arrow  
  inflating: ./codeparrot/codeparrot/dataset_info.json  
  inflating: ./codeparrot/codeparrot/state.json  


In [7]:
from datasets import load_from_disk

data = load_from_disk('/content/codeparrot/codeparrot')

model_name = 'Enoch/llama-7b-hf'

tokenizer = AutoTokenizer.from_pretrained(model_name, device_map=device)
tokenizer.pad_token = tokenizer.eos_token

# Preprocess: Truncate lines to 512 characters
def preprocess(sample):
    sample['input_text'] = sample['content'][:512]
    return sample

data = data.map(preprocess)
data = data.map(lambda samples: tokenizer(samples['input_text']), batched=True)

  table = cls._concat_blocks(blocks, axis=0)


Downloading tokenizer_config.json:   0%|          | 0.00/218 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.00 [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


Map:   0%|          | 0/536137 [00:00<?, ? examples/s]

Map:   0%|          | 0/536137 [00:00<?, ? examples/s]

In [8]:
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map='auto',
    low_cpu_mem_usage=True,
    load_in_4bit=True,
    torch_dtype=torch.float32,
)
for param in model.parameters():
    param.requires_grad=False

model.gradient_checkpointing_enable()
model.enable_input_require_grads()

Downloading config.json:   0%|          | 0.00/511 [00:00<?, ?B/s]

  torch.utils._pytree._register_pytree_node(
  torch.utils._pytree._register_pytree_node(


Downloading (…)model.bin.index.json:   0%|          | 0.00/25.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/33 [00:00<?, ?it/s]

Downloading (…)l-00001-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00002-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00003-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00004-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00005-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00006-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00007-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00008-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00009-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00010-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00011-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00012-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00013-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00014-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00015-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00016-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00017-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00018-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00019-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00020-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00021-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00022-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00023-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00024-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00025-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00026-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00027-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00028-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00029-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00030-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00031-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00032-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00033-of-00033.bin:   0%|          | 0.00/524M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/33 [00:00<?, ?it/s]

  return torch.load(checkpoint_file, map_location=map_location)


Downloading generation_config.json:   0%|          | 0.00/151 [00:00<?, ?B/s]

In [11]:
class LoRALayer(nn.Module):
    """Wraps a linear layer with LoRA-like adapter. Wraps an existing OPT linear layer"""
    def __init__(self, module: nn.Linear, rank: int):
        super().__init__()
        self.module = module  # pre-trained (frozen) linear layer
        self.adapter_A = nn.Parameter(torch.empty(module.in_features, rank, device=module.weight.device))
        nn.init.kaiming_uniform_(self.adapter_A, a=5 ** 0.5)
        self.adapter_B = nn.Parameter(torch.zeros(rank, module.out_features, device=module.weight.device))

    def forward(self, input):
        original_output = self.module(input)
        lora_output = input @ self.adapter_A @ self.adapter_B

        return original_output + lora_output

In [12]:
lora_rank = 8
for name, module in model.named_modules():
    if 'LlamaDecoderLayer' in repr(type(module)):
        # Apply LoRA to attention projection layers
        module.self_attn.q_proj = LoRALayer(module.self_attn.q_proj, rank=lora_rank).to(model.device)
        module.self_attn.k_proj = LoRALayer(module.self_attn.k_proj, rank=lora_rank).to(model.device)
        module.self_attn.v_proj = LoRALayer(module.self_attn.v_proj, rank=lora_rank).to(model.device)

        # MLP components
        module.mlp.up_proj = LoRALayer(module.mlp.up_proj, rank=lora_rank).to(model.device)
        module.mlp.down_proj = LoRALayer(module.mlp.down_proj, rank=lora_rank).to(model.device)
        module.mlp.gate_proj = LoRALayer(module.mlp.gate_proj, rank=lora_rank).to(model.device)

# LM head
if hasattr(model, "lm_head") and isinstance(model.lm_head, torch.nn.Linear):
    model.lm_head = LoRALayer(model.lm_head, rank=lora_rank).to(model.device)


In [13]:
model.config.use_cache = False
model._hf_peft_config_loaded = True

trainer = transformers.Trainer(
    model=model, train_dataset=data,
    args=transformers.TrainingArguments(
        per_device_train_batch_size=4, gradient_accumulation_steps=4,
        warmup_steps=250, max_steps=100, learning_rate=2e-4, fp16=True,
        logging_steps=1, output_dir='outputs', report_to=None),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)

trainer.train()

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
  self.scaler = torch.cuda.amp.GradScaler(**kwargs)
You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)


Step,Training Loss
1,1.1611
2,1.3702
3,1.3018
4,1.3452
5,1.4974
6,1.2684
7,1.4107
8,1.3705
9,1.3281
10,1.2019


TrainOutput(global_step=100, training_loss=1.1375883358716965, metrics={'train_runtime': 1251.6833, 'train_samples_per_second': 1.278, 'train_steps_per_second': 0.08, 'total_flos': 1.189255004209152e+16, 'train_loss': 1.1375883358716965, 'epoch': 0.0})

In [14]:
def generate(prompt_text, model, tokenizer, max_length=128):

    device = next(model.parameters()).device
    model.eval()
    input_ids = tokenizer(prompt_text, return_tensors="pt").input_ids.to(device)
    with torch.no_grad():
        generated_ids = model.generate(input_ids, max_length=max_length)
    generated = tokenizer.decode(generated_ids[0], skip_special_tokens=True)

    return generated

In [15]:
default_model = transformers.AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map='auto',
    low_cpu_mem_usage=True,
    load_in_4bit=True,
    torch_dtype=torch.float32,
)
for param in default_model.parameters():
    param.requires_grad=False

Loading checkpoint shards:   0%|          | 0/33 [00:00<?, ?it/s]

  return torch.load(checkpoint_file, map_location=map_location)


In [16]:
prompts =  ['', 'import', 'from', 'while', 'try', 'if', 'for', 'torch']  # feel free to add a few more that are not 100% assiciated with Python

# <A WHOLE LOT OF YOUR CODE>
# generate baseline samples with the selected prompts before finetuning
# please feel free to use transformers.Trainer (as above) or your custom training code
# after the training concludes, please show examples of text generated by your model. It is expected to look like Python code fragments
# print the generation examples nicely (suggestion: use pandas or HTML) for easier comparison
# note: your LoRA-enhanced model can run generation the same way as the non-trained model (above)

In [17]:
# This template helps to compare generated code samples in pretty table form
# feel free to present your work in other forms

from IPython.display import HTML, display
table_template = """<table style="border:1px solid black" >
  <tr>
    <th style="text-align: center; border:1px solid black">PROMPT</th>
    <th style="text-align: center; border:1px solid black">BEFORE</th>
    <th style="text-align: center; border:1px solid black">AFTER</th>
  </tr>
{}
</table>"""

row_template = '''  <tr>
    <td style="width:20%; border:1px solid black"><pre align="left">`{}`</pre></td>
    <td style="width:40%; border:1px solid black"><pre align="left">{}</pre></td>
    <td style="width:40%; border:1px solid black"><pre align="left">{}</pre></td>
  </tr>'''

rows = []

max_length = 512
for prompt in prompts:
    default_model_generation = generate(prompt, default_model, tokenizer, max_length=max_length)
    finetuned_model_generation = generate(prompt, model, tokenizer, max_length=max_length)
    rows.append(row_template.format(prompt, default_model_generation, finetuned_model_generation))

display(HTML(table_template.format('\n'.join(rows))))



PROMPT,BEFORE,AFTER
``,▶▶ 2019-2020 School Year The 2019-2020 school year is here! We are so excited to welcome our new students and families to the school. We are also excited to welcome back our returning families. We are looking forward to another great year at the school. We are also looking forward to another great year of learning and growing together. We are also looking forward to another great year of learning and growing together. We are also looking forward to another great year of learning and growing together. We are also looking forward to another great year of learning and growing together. We are also looking forward to another great year of learning and growing together. We are also looking forward to another great year of learning and growing together. We are also looking forward to another great year of learning and growing together. We are also looking forward to another great year of learning and growing together. We are also looking forward to another great year of learning and growing together. We are also looking forward to another great year of learning and growing together. We are also looking forward to another great year of learning and growing together. We are also looking forward to another great year of learning and growing together. We are also looking forward to another great year of learning and growing together. We are also looking forward to another great year of learning and growing together. We are also looking forward to another great year of learning and growing together. We are also looking forward to another great year of learning and growing together. We are also looking forward to another great year of learning and growing together. We are also looking forward to another great year of learning and growing together. We are also looking forward to another great year of learning and growing together. We are also looking forward to another great year of learning and growing together. We are also looking forward to another great year of learning and growing together. We are also looking forward to another great year of learning and growing together. We are also looking forward to another great year of learning and growing together. We are also looking forward to another great year of learning and growing together. We are also looking forward to another great year of learning and growing together. We are also looking forward to another great year of learning and growing together. We are also looking forward to another great year of learning and growing together. We are also looking forward to another great year of learning and growing together. We are also looking forward to another great year of learning and growing together. We are,"# -*- coding: utf-8 -*- # # Copyright (c) 2015, 2016, 2017, 2018, 2019, 2020, 2021 The OpenStack Foundation. # All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the ""License""); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an ""AS IS"" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language go into the License. # # See the AUTHORS file for a list of copyright holders. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are # met: # # * Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # * Redistributions in binary form must reproduce the above # copyright notice, this list of conditions and the following # disclaimer in the documentation and/or the source distribution # * Neither the name of the OpenStack Foundation nor the # names of its contributors may be used to endorse or promote # products derived from this software without specific prior # written permission. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ""AS IS"" # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A"
`import`,import Foundation public extension NSURL {  public var absoluteString: String {  return String(cString: CFBundleGetBundleWithURL(self).UTF8String)  } },import os import sys import time import threading import traceback import logging import logging.handlers import logging.config import logging.config_file import logging.rootlogger import logging.handlers import logging.handlers import logging.config import logging.config_file import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers import logging.handlers
`from`,from __future__ import absolute_import from __future__ import division from __future__ import print_function import os import sys from absl import flags from tensorflow.python import pywrap_tensorflow from tensorflow.python.eager import context from tensorflow.python.eager import function from tensorflow.python.eager import test from tensorflow.python.eager import backprop from tensorflow.python.eager import backprop_ops from tensorflow.python.eager import backprop_util from tensorflow.python.eager import backprop_ops_util from tensorflow.python.eager import backprop_ops_util_v2 from tensorflow.python.eager import backprop_util_v2 from tensorflow.python.eager import backprop_util_v3 from tensorflow.python.eager import backprop_util_v4 from tensorflow.python.eager import backprop_util_v5 from tensorflow.python.eager import backprop_util_v6 from tensorflow.python.eager import backprop_util_v7 from tensorflow.python.eager import backprop_util_v8 from tensorflow.python.eager import backprop_util_v9 from tensorflow.python.eager import backprop_util_v10 from tensorflow.python.eager import backprop_util_v11 from tensorflow.python.eager import backprop_util_v12 from tensorflow.python.eager import backprop_util_v13 from tensorflow.python.eager import backprop_util_v14 from tensorflow.python.eager import backprop_util_v15 from tensorflow.python.eager import backprop_util_v16 from tensorflow.python.eager import backprop_util_v17 from tensorflow.python.eager import backprop_util_v18 from tensorflow.python.eager import backprop_util_v19 from tensorflow.python.eager import backprop_util_v20 from tensorflow.python.eager import backprop_util_v21 from tensorflow.python.eager import backprop_util_v22 from tensorflow.python.eager import,from __future__ import absolute_import import logging import os import sys import tempfile import unittest from django.core.management import call_command from django.core.management.base import CommandError from django.core.management.base import NoArgsCommand from django.core.management.base import OutputCapture from django.core.management.base import SubCommand from django.core.management.base import UserManager from django.core.management.commands.check import Command as Check from django.core.management.commands.check import CommandError from django.core.management.commands.check import CommandResult from django.core.management.commands.check import DEFAULT_CHECKS from django.core.management.commands.check import DEFAULT_IGNORE_CHECKS from django.core.management.commands.check import DEFAULT_IGNORE_WARNINGS from django.core.management.commands.check import DEFAULT_IGNORE_WARNINGS_IN_SETTINGS from django.core.management.commands.check import DEFAULT_IGNORE_WARNINGS_IN_TEMPLATES from django.core.management.commands.check import DEFAULT_IGNORE_WARNINGS_IN_TEMPLATES_IN_SETTINGS from django.core.management.commands.check import DEFAULT_IGNORE_WARNINGS_IN_TEMPLATES_IN_TEMPLATES from django.core.management.commands.check import DEFAULT_IGNORE_WARNINGS_IN_TEMPLATES_IN_TEMPLATES_IN_SETTINGS from django.core.management.commands.check import DEFAULT_IGNORE_WARNINGS_IN_TEMPLATES_IN_TEMPLATES_IN_TEMPLATES from django.core.management.commands.check import DEFAULT_IGNORE_WARNINGS_IN_TEMPLATES_IN_TEMPLATES_IN_TEMPLATES_IN_SETTINGS from django.core.management.commands.check import DEFAULT_IGNORE_WARNINGS_IN_TEMPLATES_IN_TEMPLATES
`while`,"while(1) while(1) {  // do something } \end{code} Comment: This is not the same as the OP's code. Comment: @Jeffrey: It's the same as the OP's code, except that it's not a function. Comment: @Jeffrey: The OP's code is a function, but it's not a function declaration. Comment: @Jeffrey: The OP's code is a function declaration, but it's not a function. Comment: @Jeffrey: The OP's code is a function declaration, but it's not a function. Comment: @Jeffrey: The OP's code is a function declaration, but it's not a function. Comment: @Jeffrey: The OP's code is a function declaration, but it's not a function. Comment: @Jeffrey: The OP's code is a function declaration, but it's not a function. Comment: @Jeffrey: The OP's code is a function declaration, but it's not a function. Comment: @Jeffrey: The OP's code is a function declaration, but it's not a function. Comment: @Jeffrey: The OP's code is a function declaration, but it's not a function. Comment: @Jeffrey: The OP's code is a function declaration, but it's not a function. Comment: @Jeffrey: The OP's code is a function declaration, but it's not a function. Comment: @Jeffrey: The OP's code is a function declaration, but it's not a function. Comment: @Jeffrey: The OP's code is a function declaration, but it's not a function. Comment: @Jeffrey: The OP's code is a function declaration, but it's not a function. Comment: @Jeffrey: The OP's code is a function declaration, but it's not a function. Comment: @Jeffrey: The OP's code is a function declaration, but it's not a function. Comment: @Jeffrey: The","while (true) {  if (x == 0) {  break;  }  x = x - 1; } \end{code} Comment: I'm not sure what you mean by ""the loop is not executed"". The loop is executed, and the break statement is executed. The loop is not executed again. Comment: @JimGarrison: I'm not sure what you mean by ""the loop is not executed"". The loop is executed, and the break statement is executed. The loop is not executed again. Comment: @JimGarrison: I'm not sure what you mean by ""the loop is not executed"". The loop is executed, and the break statement is executed. The loop is not executed again. Comment: @JimGarrison: I'm not sure what you mean by ""the loop is not executed"". The loop is executed, and the break statement is executed. The loop is not executed again. Comment: @JimGarrison: I'm not sure what you mean by ""the loop is not executed"". The loop is executed, and the break statement is executed. The loop is not executed again. Comment: @JimGarrison: I'm not sure what you mean by ""the loop is not executed"". The loop is executed, and the break statement is executed. The loop is not executed again. Comment: @JimGarrison: I'm not sure what you mean by ""the loop is not executed"". The loop is executed, and the break statement is executed. The loop is not executed again. Comment: @JimGarrison: I'm not sure what you mean by ""the loop is not executed"". The loop is executed, and the break statement is executed. The loop is not executed again. Comment: @JimGarrison: I'm not sure what you mean by ""the loop is not executed"". The loop is executed, and the break statement is executed. The loop is not executed again. Comment: @JimGarrison: I'm not sure what you mean by ""the loop is not executed"". The loop is executed, and the break statement is executed. The loop is not executed again. Comment: @JimGarrison: I'm not sure what you mean by ""the loop is not executed""."
`try`,try to find the best solution for your needs. We are a team of professionals with a long experience in the field of web development. We are a team of professionals with a long experience in the field of web development. We are a team of professionals with a long experience in the field of web development. We are a team of professionals with a long experience in the field of web development. We are a team of professionals with a long experience in the field of web development. We are a team of professionals with a long experience in the field of web development. We are a team of professionals with a long experience in the field of web development. We are a team of professionals with a long experience in the field of web development. We are a team of professionals with a long experience in the field of web development. We are a team of professionals with a long experience in the field of web development. We are a team of professionals with a long experience in the field of web development. We are a team of professionals with a long experience in the field of web development. We are a team of professionals with a long experience in the field of web development. We are a team of professionals with a long experience in the field of web development. We are a team of professionals with a long experience in the field of web development. We are a team of professionals with a long experience in the field of web development. We are a team of professionals with a long experience in the field of web development. We are a team of professionals with a long experience in the field of web development. We are a team of professionals with a long experience in the field of web development. We are a team of professionals with a long experience in the field of web development. We are a team of professionals with a long experience in the field of web development. We are a team of professionals with a long experience in the field of web development. We are a team of professionals with a long experience in the field of web development. We are a team of professionals with a long experience in the field of web development. We are a team of professionals with a long experience in the field of web development. We are a team of professionals with a long experience in the field of web development. We are a team of professionals with a long experience in the field of web development. We are a team of professionals with a long experience in the,"try:  import unittest2 as unittest except ImportError:  import unittest from . import utils class TestUtils(unittest.TestCase):  def test_get_file_contents(self):  self.assertEqual(utils.get_file_contents('file.txt'), 'file.txt')  self.assertEqual(utils.get_file_contents('file.txt', 'utf-8'), 'file.txt')  self.assertEqual(utils.get_file_contents('file.txt', 'utf-8', 'utf-8'), 'file.txt')  self.assertEqual(utils.get_file_contents('file.txt', 'utf-8', 'utf-8', 'utf-8'), 'file.txt')  self.assertEqual(utils.get_file_contents('file.txt', 'utf-8', 'utf-8', 'utf-8', 'utf-8'), 'file.txt')  self.assertEqual(utils.get_file_contents('file.txt', 'utf-8', 'utf-8', 'utf-8', 'utf-8', 'utf-8'), 'file.txt')  self.assertEqual(utils.get_file_contents('file.txt', 'utf-8', 'utf-8', 'utf-8', 'utf-8', 'utf-8', 'utf-8'), 'file.txt')  self.assertEqual(utils.get_file_contents('file.txt', 'utf-8', 'utf-8', 'utf-8', 'utf-8', 'utf-8', 'utf-8', 'utf-8'), 'file.txt')  self.assertEqual(utils.get_file_contents('file.txt', 'utf-8', 'utf-8', 'utf-8', 'utf-8', 'utf-8', 'utf-8', 'utf-8', 'utf-8'), 'file.txt')  self.assertEqual(utils.get_file_contents('file.txt', 'utf-8', 'utf-8', 'utf-8', 'utf-8', 'utf-8', 'utf-8', 'utf-8', 'utf-8', '"
`if`,"if ( !window.atmosphere ) {  window.atmosphere = {}; } (function () {  var o = atmosphere.util,  atmosphere = atmosphere.atmosphere = function () {  var _isClosed = false,  _isOpening = false,  _isOpen = false,  _isClosing = false,  _isError = false,  _isReady = false,  _isReadyCalled = false,  _isClosedByClient = false,  _isClosedByServer = false,  _isShuttingDown = false,  _isShutdownInProgress = false,  _isShutdownComplete = false,  _webSocket = null,  _webSockets = null,  _isWebSocketOpen = false,  _isWebSocketClosed = false,  _webSocketConnectParams = null,  _webSocketReceiveParams = null,  _webSocketSendParams = null,  _webSocketSendResize = null,  _webSocketReceiveResize = null,  _webSocketSend = null,  _webSocketReceive = null,  _webSocketSendToAll = null,  _webSocketReceiveToAll = null,  _webSocketSendToGroup = null,  _webSocketReceiveToGroup = null,  _webSocketSendToRoom = null,  _webSocketReceiveToRoom = null,  _webSocketSendToUser = null,  _webSocketReceiveToUser = null,  _webSocketSendToUserList = null,  _webSocketReceiveToUserList = null,  _webSocketSendToUserListOfRoom = null,  _webSocketReceiveToUserListOfRoom = null,  _webSocketSendToUserListOfRoomOfUser = null,  _webSocketReceiveToUserListOfRoomOfUser = null,  _webSocketSendToUserListOfRoomOfUserOfUser = null,  _webSocketReceiveToUserListOfRoomOfUserOfUser = null,  _webSocketSendToUserListOfRoomOfUserOfUserOf",if ( ! defined( 'ABSPATH' ) ) { 	exit; } /**  * @package WPSEO_Helpers  */ /**  * Class WPSEO_Helpers_Taxonomy_Helper  */ class WPSEO_Helpers_Taxonomy_Helper { 	/**  * @var string  */ 	protected $taxonomy; 	/**  * @var string  */ 	protected $taxonomy_singular; 	/**  * @var string  */ 	protected $taxonomy_plural; 	/**  * @var string  */ 	protected $taxonomy_labels; 	/**  * @var string  */ 	protected $taxonomy_labels_singular; 	/**  * @var string  */ 	protected $taxonomy_labels_plural; 	/**  * @var string  */ 	protected $taxonomy_labels_singular_no_post_type; 	/**  * @var string  */ 	protected $taxonomy_labels_plural_no_post_type; 	/**  * @var string  */ 	protected $taxonomy_labels_singular_no_post_type_no_taxonomy; 	/**  * @var string  */ 	protected $taxonomy_labels_plural_no_post_type_no_taxonomy; 	/**  * @var string  */ 	protected $taxonomy_labels_singular_no_post_type_no_taxonomy_no_labels; 	/**  * @var string  */ 	protected $taxonomy_labels_plural_no_post_type_no_taxonomy_no_labels; 	/**  * @var string  */ 	protected $taxonomy_labels_singular_no_post_type_no_taxonomy_no_labels_no_labels; 	/**  * @var string  */ 	protected $taxonomy_labels_plural_no_post_type_no_taxonomy_no_labels_no_labels
`for`,for the 2019-2020 school year. The application process for the 2019-2020 school year is now open. The application process for the 2019-2020 school year is now open. Please click here to apply. The application process for the 2019-2020 school year is now open. Please click here to apply. The application process for the 2019-2020 school year is now open. Please click here to apply. The application process for the 2019-2020 school year is now open. Please click here to apply. The application process for the 2019-2020 school year is now open. Please click here to apply. The application process for the 2019-2020 school year is now open. Please click here to apply. The application process for the 2019-2020 school year is now open. Please click here to apply. The application process for the 2019-2020 school year is now open. Please click here to apply. The application process for the 2019-2020 school year is now open. Please click here to apply. The application process for the 2019-2020 school year is now open. Please click here to apply. The application process for the 2019-2020 school year is now open. Please click here to apply. The application process for the 2019-2020 school year is now open. Please click here to apply. The application process for the 2019-2020 school year is now open. Please click here to apply. The application process for the 2019-2020 school year is now open. Please click here to apply. The application process for the 2019-2020 school year is now open. Please click here to apply. The application process for the 2019-2020 school year is now open. Please click here to apply. The application process for the 2019-2020 school year is now open. Please click here to,"for (var i = 0; i < 10; i++) {  var a = new Array(i);  for (var j = 0; j < i; j++) {  a[j] = j;  }  console.log(a); } // 10 // [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ] // 100 // [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100 ]"
`torch`,"torchbearer 2017-05-18 19:55:25 UTC #1 I’m a newbie to the world of RPGs, and I’m looking for a game that I can play with my wife. We’re both in our 30s, and we’re looking for a game that we can play together. We’re both new to the world of RPGs, and we’re looking for a game that we can play together. We’re both new to the world of RPGs, and we’re looking for a game that we can play together. We’re both new to the world of RPGs, and we’re looking for a game that we can play together. We’re both new to the world of RPGs, and we’re looking for a game that we can play together. We’re both new to the world of RPGs, and we’re looking for a game that we can play together. We’re both new to the world of RPGs, and we’re looking for a game that we can play together. We’re both new to the world of RPGs, and we’re looking for a game that we can play together. We’re both new to the world of RPGs, and we’re looking for a game that we can play together. We’re both new to the world of RPGs, and we’re looking for a game that we can play together. We’re both new to the world of RPGs, and we’re looking for a game that we can play together. We’re both new to the world of RPGs, and we’re looking for a game that we can play together. We’re both new to the world of RPGs, and we’re looking for a game that we can play together. We’re both new to the world of RPGs, and we’re looking for a game that we can play together. We’re both new to the world of RPGs, and we’re looking for a game that we can play together. We’re both new to the world of RPGs, and we’re looking for a game that we can play together. We’re both new to the world of RPGs, and we’re looking for a game that","torch.math.Tensor = torch.class('torch.math.Tensor', function(torch) {  function Tensor(size, dtype) {  if (size) {  this.size = size;  this.dtype = torch.getDtype(dtype);  } else {  this.size = torch.Size(1, 1);  this.dtype = torch.FloatTensor;  }  this.data = torch.FloatTensor(this.size);  }  Tensor.prototype.resize = function(size) {  this.size = size;  this.data = this.data.resize(size);  };  Tensor.prototype.resizeAs = function(size) {  this.size = size;  this.data = this.data.resizeAs(size);  };  Tensor.prototype.resizeTo = function(size) {  this.size = size;  this.data = this.data.resizeTo(size);  };  Tensor.prototype.resizeToOrig = function(size) {  this.size = size;  this.data = this.data.resizeToOrig(size);  };  Tensor.prototype.resizeToOrigAndCrop = function(size, offset) {  this.size = size;  this.data = this.data.resizeToOrigAndCrop(size, offset);  };  Tensor.prototype.resizeToOrigAndCropAndReshape = function(size, offset, dims) {  this.size = size;  this.data = this.data.resizeToOrigAndCropAndReshape(size, offset, dims);  };  Tensor.prototype.resizeToOrigAndCropAndReshapeAndTranspose = function(size, offset, dims) {  this.size = size;  this.data = this.data.resizeToOrigAndCropAndReshapeAndTranspose(size, offset, dims);  };  Tensor.prototype.resizeToOrigAndCrop"


If you reach this: congratulations! you've completed everything in this practice session.

If you want to dig deeper, try to implement prompt-tuning (for bonus points!).
You can read more about prompt tuning variants in paper [1](https://arxiv.org/abs/2104.08691) or paper [2](https://arxiv.org/abs/2101.00190). Both versions can be implemented by passing trainable prompts as `model.forward(..., past_key_values=your_prompts)`.



### Read more

* How post-training quantization works: https://arxiv.org/abs/2208.07339
* An overview of running large models: https://huggingface.co/docs/accelerate/package_reference/big_modeling
* A general library for different adapter types: https://adapterhub.ml/


### [extra info] Running other models.

This notebook's code can run with other models of similar size, such as [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b), [OPT-6.7B](https://huggingface.co/facebook/opt-6.7b) or [BLOOM-7.1B](https://huggingface.co/bigscience/bloom-7b1). However, they will require minor code tweaks:
1. change the model name in `AutoModelForCausalLM.from_pretrained()` __and__ `AutoTokenizer`
2. In the prompt tuning code, change `model.model.embed_tokens` to refer to the target model's word embeddings. Simply `print(model)` to navigate to them.
3. Change code to add Lora layers - specifically where you what the transformer block components, since those components now have different names.