Copyright (c) Meta Platforms, Inc. and affiliates.
This software may be used and distributed according to the terms of the Llama 2 Community License Agreement.

## Quick Start Notebook

This notebook shows how to train a Llama 2 model on a single GPU (e.g. A10 with 24GB) using int8 quantization and LoRA.

### Step 0: Install pre-requirements and convert checkpoint

The example uses the Hugging Face trainer and model which means that the checkpoint has to be converted from its original format into the dedicated Hugging Face format.
The conversion can be achieved by running the `convert_llama_weights_to_hf.py` script provided with the transformer package.
Given that the original checkpoint resides under `models/7B` we can install all requirements and convert the checkpoint with:

In [1]:
# %%bash
# pip install llama-recipes transformers datasets accelerate sentencepiece protobuf==3.20 py7zr scipy peft bitsandbytes fire torch_tb_profiler ipywidgets
# TRANSFORM=`python -c "import transformers;print('/'.join(transformers.__file__.split('/')[:-1])+'/models/llama/convert_llama_weights_to_hf.py')"`
# python ${TRANSFORM} --input_dir models --model_size 7B --output_dir models_hf/7B

In [5]:
%pip install --upgrade --quiet  "unstructured[all-docs]"

[0mNote: you may need to restart the kernel to use updated packages.


### Step 1: Load the model

Point base_model to model weight folder

%%capture
%pip install accelerate peft bitsandbytes transformers trl

In [1]:
import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig
from trl import SFTTrainer

In [2]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import pyarrow as pa
import pyarrow.dataset as ds
import pandas as pd
from datasets import Dataset

logging.set_verbosity(logging.ERROR)
base_model = "./models_hf/7B"
finetuned_model = './finetuned/policy-llama2-7b'


In [3]:
# Function to tokenize pairs of questions and answers
def tokenize_pairs(examples):
    # Tokenize inputs and outputs
    tokenized_inputs = tokenizer(examples['input'], padding='max_length', truncation=True, max_length=512)
    tokenized_outputs = tokenizer(examples['output'], padding='max_length', truncation=True, max_length=512)
    # Return a dictionary combining both
    return {**tokenized_inputs, **{'labels': tokenized_outputs['input_ids']}}

In [21]:
import json
import pandas as pd

#path to the json file containing all the input output pairs of all the 5 teams
train_file_path = 'merged_file_train.json'
val_file_path = 'merged_file_val.json'
test_file_path = 'merged_file_test.json'

In [22]:
# Read and handle UTF-8 BOM if present
with open(train_file_path, 'r', encoding='utf-8-sig') as file:
    json_data = json.load(file)  # Load the file content as JSON
    
# Convert the JSON data into a pandas DataFrame
train_dataframe = pd.DataFrame(json_data)

# Create a Hugging Face dataset from the pandas DataFrame
train_dataset = Dataset(pa.Table.from_pandas(train_dataframe))

In [23]:
# Read and handle UTF-8 BOM if present
with open(val_file_path, 'r', encoding='utf-8-sig') as file:
    json_data = json.load(file)  # Load the file content as JSON
    
# Convert the JSON data into a pandas DataFrame
val_dataframe = pd.DataFrame(json_data)

# Create a Hugging Face dataset from the pandas DataFrame
val_dataset = Dataset(pa.Table.from_pandas(val_dataframe))

In [24]:
# Read and handle UTF-8 BOM if present
with open(test_file_path, 'r', encoding='utf-8-sig') as file:
    json_data = json.load(file)  # Load the file content as JSON
    
# Convert the JSON data into a pandas DataFrame
test_dataframe = pd.DataFrame(json_data)

# Create a Hugging Face dataset from the pandas DataFrame
test_dataset = Dataset(pa.Table.from_pandas(test_dataframe))

In [25]:
print("Train dataset : ", train_dataset[0])

Train dataset :  {'input': " How do China's AI regulations, particularly on recommendation algorithms, deep synthesis, and generative AI, focus on information control?", 'output': 'China\'s AI regulations prioritize information control through measures like barring excessive price discrimination in recommendation algorithms, requiring labels on synthetically generated content in deep synthesis, and demanding "true and accurate" data and outputs in generative AI.'}


In [26]:
print("Validation dataset : ", val_dataset[0])

Validation dataset :  {'input': 'How does China formulate AI governance regulations, and what are the key layers in the policy formulation process?', 'output': 'China formulates AI governance regulations through a four-layered policy funnel involving real-world roots, Xi Jinping and CCP ideology, the "world of ideas," and party and state bureaucracies, with regulations often pinballing through these layers in a non-linear fashion.'}


In [27]:
print("Test dataset : ", test_dataset[0])

Test dataset :  {'input': 'What is the trajectory of Chinese AI governance, and what milestone is it approaching?', 'output': 'Chinese AI governance is heading towards drafting a comprehensive national AI law, mirroring the evolution of internet governance regulations, with a potential draft release by late 2023 or 2024 and subsequent revisions involving key stakeholders.'}


In [28]:
compute_dtype = getattr(torch, "float16")

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=False,
)

In [29]:
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=quant_config,
    #device_map={"": 0, "":1}
    device_map='auto'
)
model.config.use_cache = False
model.config.pretraining_tp = 1

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

### Step 2: Load Auto Tokenizer


In [30]:
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

In [31]:
eval_prompt = """
Answer the following question:
Question: How do China's AI regulations, particularly on recommendation algorithms, deep synthesis, and generative AI, focus on information control?
---
Answer:
"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True))


Answer the following question:
Question: How do China's AI regulations, particularly on recommendation algorithms, deep synthesis, and generative AI, focus on information control?
---
Answer:

China's AI regulations, particularly on recommendation algorithms, deep synthesis, and generative AI, focus on information control by requiring companies to obtain approval from the government before using these technologies. This is done to ensure that the information being used is accurate and not misleading, and to prevent the spread of false or harmful information.

For example, the Chinese government has implemented regulations that require companies to obtain approval before using recommendation algorithms to suggest products


In [32]:
#With Context
eval_prompt = """
Answer the following question with the given context with succinct summary:
Question: How do China's AI regulations, particularly on recommendation algorithms, deep synthesis, and generative AI, focus on information control?
Context: China\u2019s three most concrete and impactful regulations on algorithms and AI are its 2021\nregulation on recommendation algorithms, the 2022 rules for deep synthesis (synthetically generated content), and the 2023 draft rules on generative AI. Information control is a central goal of all three measures, but they also contain many other notable provisions. The rules for recommendation algorithms bar excessive price discrimination and protect the rights of workers subject to algorithmic scheduling. \nThe deep synthesis regulation requires conspicuous labels be placed on synthetically generated content. And the draft generative AI regulation requires both the training data and model outputs to be \u201ctrue and accurate,\u201d Summary: China is in the midst of rolling out some of the world\u2019s earliest and most detailed regulations governing artificial intelligence (AI). These include measures governing recommendation algorithms\u2014the most omnipresent form of AI deployed on the internet\u2014as well as new rules for synthetically generated images and chatbots in the mold of ChatGPT. \nChina\u2019s emerging AI governance framework will reshape how the technology is built and deployed within China and internationally, impacting both Chinese technology exports and global AI research networks.\nBut in the West, China\u2019s regulations are often dismissed as irrelevant or seen purely through the lens of a geopolitical competition to write the rules for AI. These extremely demanding requirements for generative AI systems have kicked off a particularly active public debate on the draft regulation. At the time of writing, Chinese scholars, companies, and policymakers are actively discussing how to maintain effective content controls without squashing China\u2019s nascent generative AI industry. The third paper in this series will dive deep into how this policy debate is playing out in public workshops, academic writing, and corporate lobbying.\nCountries and cultures may differ on the specific content of AI regulations, but they can learn from the content-agnostic structure of the regulations themselves. The above Chinese regulations share three structural similarities: the choice of algorithms as a point of entry; the building of regulatory tools and bureaucratic know-how; and the vertical and iterative approach that is laying the groundwork for a capstone AI law. Three regulations require the deepest analysis: recommendation algorithms, \u201cdeep synthesis,\u201d and generative AI. These interconnected documents contain the most targeted and impactful regulations to date, creating concrete requirements for how algorithms and AI are built and deployed in China. Below is a brief overview of each regulation. The remainder of this paper and subsequent papers will expand on the intellectual roots and key bureaucratic actors behind these regulations.\nProvisions on the Management of Algorithmic Recommendations in Internet Information Services
---
Answer:
"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=200)[0], skip_special_tokens=True))


Answer the following question with the given context with succinct summary:
Question: How do China's AI regulations, particularly on recommendation algorithms, deep synthesis, and generative AI, focus on information control?
Context: China’s three most concrete and impactful regulations on algorithms and AI are its 2021
regulation on recommendation algorithms, the 2022 rules for deep synthesis (synthetically generated content), and the 2023 draft rules on generative AI. Information control is a central goal of all three measures, but they also contain many other notable provisions. The rules for recommendation algorithms bar excessive price discrimination and protect the rights of workers subject to algorithmic scheduling. 
The deep synthesis regulation requires conspicuous labels be placed on synthetically generated content. And the draft generative AI regulation requires both the training data and model outputs to be “true and accurate,” Summary: China is in the midst of rolling out

### Step 3: Load LoRA configuration

In [33]:
peft_params = LoraConfig(
    lora_alpha=32,
    lora_dropout=0.05,
    r=8,
    bias="none",
    target_modules = ["q_proj", "v_proj"],
    task_type="CAUSAL_LM",
)

### Step 4: Set training parameters

In [34]:
training_params = TrainingArguments(
    output_dir="./results",
    num_train_epochs=10,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=1,
    optim="paged_adamw_32bit",
    save_steps=25,
    logging_steps=25,
    learning_rate=2e-4,
    weight_decay=0.001,
    fp16=False,
    bf16=False,
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="constant",
    report_to="tensorboard"
)

### Step 5: Set supervised fine-tuning parameters

In [35]:
def formatting_prompts_func(example) -> list:
    output_texts = []
    for i in range(len(example['input'])):
        text = f"### Question: {example['input'][i]}\n ### Answer: {example['output'][i]}"
        output_texts.append(text)
    return output_texts

In [36]:
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset = val_dataset,
    peft_config=peft_params,
    #dataset_text_field="input",
    formatting_func=formatting_prompts_func,
    max_seq_length=512,
    tokenizer=tokenizer,
    args=training_params,
    packing=False,
)

Map:   0%|          | 0/262 [00:00<?, ? examples/s]

Map:   0%|          | 0/87 [00:00<?, ? examples/s]

Detected kernel version 4.18.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


In [37]:
!nvidia-smi

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Sun Mar  3 14:40:38 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla V100-SXM2-32GB           On  | 00000000:3B:00.0 Off |                    0 |
| N/A   33C    P0              66W / 300W |  11545MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

### Step 6: Train Model


In [38]:
torch.cuda.empty_cache() 
trainer.train()

{'loss': 2.0454, 'grad_norm': 4.866884231567383, 'learning_rate': 0.0002, 'epoch': 0.38}




{'loss': 1.7407, 'grad_norm': 0.9562987089157104, 'learning_rate': 0.0002, 'epoch': 0.76}




{'loss': 1.7319, 'grad_norm': 1.0722901821136475, 'learning_rate': 0.0002, 'epoch': 1.14}




{'loss': 1.6181, 'grad_norm': 1.4156594276428223, 'learning_rate': 0.0002, 'epoch': 1.52}




{'loss': 1.5626, 'grad_norm': 1.6375254392623901, 'learning_rate': 0.0002, 'epoch': 1.89}




{'loss': 1.4347, 'grad_norm': 1.6036258935928345, 'learning_rate': 0.0002, 'epoch': 2.27}




{'loss': 1.3944, 'grad_norm': 2.262648582458496, 'learning_rate': 0.0002, 'epoch': 2.65}




{'loss': 1.3478, 'grad_norm': 1.625264048576355, 'learning_rate': 0.0002, 'epoch': 3.03}




{'loss': 1.1216, 'grad_norm': 2.6727144718170166, 'learning_rate': 0.0002, 'epoch': 3.41}




{'loss': 1.1386, 'grad_norm': 2.9006996154785156, 'learning_rate': 0.0002, 'epoch': 3.79}




{'loss': 1.0036, 'grad_norm': 3.8524985313415527, 'learning_rate': 0.0002, 'epoch': 4.17}




{'loss': 0.889, 'grad_norm': 3.9919826984405518, 'learning_rate': 0.0002, 'epoch': 4.55}




{'loss': 0.8606, 'grad_norm': 3.282957077026367, 'learning_rate': 0.0002, 'epoch': 4.92}




{'loss': 0.7265, 'grad_norm': 3.824230194091797, 'learning_rate': 0.0002, 'epoch': 5.3}




{'loss': 0.6115, 'grad_norm': 4.654026031494141, 'learning_rate': 0.0002, 'epoch': 5.68}




{'loss': 0.6558, 'grad_norm': 3.509364604949951, 'learning_rate': 0.0002, 'epoch': 6.06}




{'loss': 0.4435, 'grad_norm': 5.369958877563477, 'learning_rate': 0.0002, 'epoch': 6.44}




{'loss': 0.5573, 'grad_norm': 4.624128341674805, 'learning_rate': 0.0002, 'epoch': 6.82}




{'loss': 0.3885, 'grad_norm': 3.7444238662719727, 'learning_rate': 0.0002, 'epoch': 7.2}




{'loss': 0.4302, 'grad_norm': 6.467524528503418, 'learning_rate': 0.0002, 'epoch': 7.58}




{'loss': 0.3475, 'grad_norm': 3.0319416522979736, 'learning_rate': 0.0002, 'epoch': 7.95}




{'loss': 0.3218, 'grad_norm': 3.226414203643799, 'learning_rate': 0.0002, 'epoch': 8.33}




{'loss': 0.2791, 'grad_norm': 3.6557064056396484, 'learning_rate': 0.0002, 'epoch': 8.71}




{'loss': 0.3145, 'grad_norm': 4.68241548538208, 'learning_rate': 0.0002, 'epoch': 9.09}




{'loss': 0.2544, 'grad_norm': 2.916290521621704, 'learning_rate': 0.0002, 'epoch': 9.47}




{'loss': 0.2744, 'grad_norm': 4.491607666015625, 'learning_rate': 0.0002, 'epoch': 9.85}




{'train_runtime': 308.6271, 'train_samples_per_second': 8.489, 'train_steps_per_second': 2.139, 'train_loss': 0.8933620879144379, 'epoch': 10.0}


TrainOutput(global_step=660, training_loss=0.8933620879144379, metrics={'train_runtime': 308.6271, 'train_samples_per_second': 8.489, 'train_steps_per_second': 2.139, 'train_loss': 0.8933620879144379, 'epoch': 10.0})

In [39]:
!nvidia-smi

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Sun Mar  3 14:45:47 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla V100-SXM2-32GB           On  | 00000000:3B:00.0 Off |                    0 |
| N/A   48C    P0              77W / 300W |  32301MiB / 32768MiB |      3%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [40]:
# Save trained model
trainer.model.save_pretrained(finetuned_model)
trainer.tokenizer.save_pretrained(finetuned_model)



('./finetuned/policy-llama2-7b/tokenizer_config.json',
 './finetuned/policy-llama2-7b/special_tokens_map.json',
 './finetuned/policy-llama2-7b/tokenizer.model',
 './finetuned/policy-llama2-7b/added_tokens.json',
 './finetuned/policy-llama2-7b/tokenizer.json')

In [42]:
eval_prompt = """
Answer the following question:
Question: What are the cross-sectoral principles of the UK regulatory framework, and how will they be implemented?
---
Answer:
"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True))


Answer the following question:
Question: What are the cross-sectoral principles of the UK regulatory framework, and how will they be implemented?
---
Answer:
The cross-sectoral principles of the UK regulatory framework are: protecting health, safety, and the environment; promoting innovation and growth; and supporting global trade and investment. These principles will be implemented through a combination of binding legal obligations and non-binding aspirational targets and goals, with enforcement mechanisms ensuring compliance.

---
Question: What is the focus of regulatory policy in the 2020s, and how will


In [27]:
%load_ext tensorboard
from tensorboard import notebook
log_dir = "results/runs"
notebook.start("--logdir {} --port 4000".format(log_dir))

In [43]:
# Empty VRAM
del model
#del pipe
del trainer
import gc
gc.collect()
gc.collect()

0