# Falcon-7B 거대 언어모델 파인튜닝

라이브러리 설치 및 런타임 재시작

In [16]:
!pip install -q -U trl transformers accelerate peft
!pip install -q datasets bitsandbytes einops

# 1.데이터세트 로딩

In [1]:
from datasets import load_dataset

# dataset_name = "nlpai-lab/openassistant-guanaco-ko"
dataset_name = "timdettmers/openassistant-guanaco"

dataset = load_dataset(dataset_name, split="train[:100]")
dataset



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Dataset({
    features: ['text'],
    num_rows: 100
})

In [2]:
dataset[0]['text']

'### Human: Can you write a short introduction about the relevance of the term "monopsony" in economics? Please use examples related to potential monopsonies in the labour market and cite relevant research.### Assistant: "Monopsony" refers to a market structure where there is only one buyer for a particular good or service. In economics, this term is particularly relevant in the labor market, where a monopsony employer has significant power over the wages and working conditions of their employees. The presence of a monopsony can result in lower wages and reduced employment opportunities for workers, as the employer has little incentive to increase wages or provide better working conditions.\n\nRecent research has identified potential monopsonies in industries such as retail and fast food, where a few large companies control a significant portion of the market (Bivens & Mishel, 2013). In these industries, workers often face low wages, limited benefits, and reduced bargaining power, lead

# 2.토크나이저와 모델 로딩

In [3]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, AutoTokenizer

model_name = "ybelkada/falcon-7b-sharded-bf16"

# 모델 양자화: 4비트로 불러오기
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    trust_remote_code=True
)

# 모델의 캐싱값을 사용하지 않음 (모델 학습 경량화)
model.config.use_cache = False

`low_cpu_mem_usage` was None, now set to True since model is quantized.


Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

In [4]:
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True
)

tokenizer.pad_token = tokenizer.eos_token

# 3.모델 훈련

## 3.1. LoRA 설정

In [5]:
from peft import LoraConfig

lora_alpha = 8
lora_dropout = 0.1
lora_r = 16

# 학습을 경량화시키는 기
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=[
        "query_key_value",
        "dense",
        "dense_h_to_4h",
        "dense_4h_to_h",
    ]
)

## 3.2. Train

In [6]:
from transformers import TrainingArguments

output_dir = "./results"
per_device_train_batch_size = 4
gradient_accumulation_steps = 4
optim = "paged_adamw_32bit"
save_steps = 10


training_arguments = TrainingArguments(
    output_dir='./results',
    num_train_epochs=1,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    fp16=True,
    group_by_length=True,
)

### 데이터 전처리 및 훈련준비

In [7]:
from trl import SFTTrainer  # Supervised Fine-tuning Trainer

max_seq_length = 512

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
)


dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


In [8]:
for name, module in trainer.model.named_modules():
    # print('name', name) # 모델의 layernorm마다 float처리, 나머진 lora
    if "norm" in name:
        module = module.to(torch.float32)

In [9]:
trainer.train()

OutOfMemoryError: CUDA out of memory. Tried to allocate 316.00 MiB. GPU 0 has a total capacity of 15.77 GiB of which 64.38 MiB is free. Process 51023 has 15.71 GiB memory in use. Of the allocated memory 14.82 GiB is allocated by PyTorch, and 533.41 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

# 4.모델 추론

In [None]:
# 1) 질의 문장
input_text = "what is deep learning?"


# 2) 토큰화하고 텐서로 변환합니다.
input_ids = tokenizer.encode(input_text, return_tensors="pt")


# 3) 생성 옵션을 설정하고 텍스트를 생성합니다.
max_length = 100
sample_outputs = model.generate(input_ids, do_sample=True,
                                max_length=max_length,
                                temperature=0.75)


# 4) 생성된 텍스트를 디코딩합니다.
print(tokenizer.decode(sample_outputs[0], skip_special_tokens=True))



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


what is deep learning?
Deep learning involves designing neural networks to learn from data and make predictions/decisions.
Some of the most popular deep learning techniques include convolutional neural networks, recurrent neural networks, and long short-term memory networks.
Deep learning also includes the use of machine learning or artificial intelligence to perform tasks like text classification, facial recognition, speech recognition, content moderation, fraud detection, and more.
Deep learning is a relatively new machine learning technique. It is considered to
