## PaliGemma Fine-tuning

Pretrained Paligemma를 파인튜닝하여 딥페이크 기술로 생성된 이미지를 분류하는 모델을 생성


### 환경 설정

In [1]:
!pip install torch
!pip install transformers
!pip install peft
!pip install trl
!pip install -U bitsandbytes
!pip install datasets
!pip install accelerate

Collecting peft
  Downloading peft-0.12.0-py3-none-any.whl.metadata (13 kB)
Downloading peft-0.12.0-py3-none-any.whl (296 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m296.4/296.4 kB[0m [31m19.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: peft
Successfully installed peft-0.12.0
Collecting trl
  Downloading trl-0.10.1-py3-none-any.whl.metadata (12 kB)
Collecting datasets (from trl)
  Downloading datasets-3.0.0-py3-none-any.whl.metadata (19 kB)
Collecting tyro>=0.5.11 (from trl)
  Downloading tyro-0.8.10-py3-none-any.whl.metadata (8.4 kB)
Collecting shtab>=1.5.6 (from tyro>=0.5.11->trl)
  Downloading shtab-1.7.1-py3-none-any.whl.metadata (7.3 kB)
Collecting pyarrow>=15.0.0 (from datasets->trl)
  Downloading pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets->trl)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets->trl)
  Downloading xxhash

Huggingface 로그인  
- paligemma에 대한 read 권한 확보
- 로그인 한 계정에 대하여 파인튜닝 모델 업로드 권한 확보

In [2]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

### 학습 준비

학습 데이터셋 로드

In [3]:
from datasets import load_dataset

ds = load_dataset("JamieWithofs/Deepfake-and-real-images")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/624 [00:00<?, ?B/s]

train-00000-of-00003.parquet:   0%|          | 0.00/426M [00:00<?, ?B/s]

train-00001-of-00003.parquet:   0%|          | 0.00/436M [00:00<?, ?B/s]

train-00002-of-00003.parquet:   0%|          | 0.00/424M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/116M [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/391M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/140002 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/10905 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/39428 [00:00<?, ? examples/s]

In [4]:
ds['train']

Dataset({
    features: ['image', 'label'],
    num_rows: 140002
})

In [5]:
train_ds = ds['test'] # 학습 환경을 고려하여 적은 데이터셋을 학습 데이터로 선정

In [6]:
train_ds

Dataset({
    features: ['image', 'label'],
    num_rows: 10905
})

In [7]:
question_make = [ '''Is this image made by AI? The features of the AI-generated image are as follows.
1. Lack of detail
2. an unrealistic element
3. Certain textures appear less realistic than other elements
4. Symmetry Error
Use this to classify the AI-generated image.
''' for i in range(len(train_ds['label']))] # paligemma 학습을 위하여 question 컬럼 생성
train_ds = train_ds.add_column("question", question_make) # train_ds에 question 컬럼 추가

In [8]:
train_ds

Dataset({
    features: ['image', 'label', 'question'],
    num_rows: 10905
})

PaliGemmaProcessor는 PaliGemma 모델과 함께 사용하는 프로세서로, 모델의 입력 데이터를 적절히 전처리하고 모델의 출력을 후처리하는 역할을 수행  




In [9]:
from transformers import PaliGemmaProcessor
model_id = "google/paligemma-3b-pt-224"
processor = PaliGemmaProcessor.from_pretrained(model_id)

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

preprocessor_config.json:   0%|          | 0.00/699 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/40.0k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.26M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/24.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/607 [00:00<?, ?B/s]

In [10]:
import torch
device = "cuda"

image_token = processor.tokenizer.convert_tokens_to_ids("<image>")
'''
collate_fn 함수는 여러 데이터 샘플을 배치로 묶기 위한 역할.
이 함수는 데이터 로더에서 호출되어 입력 데이터를 전처리합니다.
'''
def collate_fn(examples):
    texts = ["answer " + example["question"] for example in examples]
    labels= [str(example['label']) for example in examples]
    images = [example["image"].convert("RGB") for example in examples]
    tokens = processor(text=texts, images=images, suffix=labels,
                    return_tensors="pt", padding="longest",
                    tokenize_newline_separately=False)

    tokens = tokens.to(torch.bfloat16).to(device)
    return tokens


참고한 깃허브의 코드를 살펴보면 Paligemma를 학습하는데 사용된 데이터셋과 해당 코드에서 사용하는 VQA 데이터셋이 유사하기에,  
image encoder 부분을 파인튜닝 하지 않고 텍스트 디코더 부분만을 파인튜닝하도록 설정하였다고 되어있다.  
이에 아래와 같이 특정 파라미터의 업데이트를 False로 설정하였다.  

※ 금번 딥페이크 생성 이미지 분류 역시 데이터셋이 일반적인 얼굴 사진으로 구성된 데이터셋이므로, 동일한 구성으로 설정

In [11]:
from transformers import PaliGemmaForConditionalGeneration
import torch

model = PaliGemmaForConditionalGeneration.from_pretrained(model_id, torch_dtype=torch.bfloat16).to(device)

# 모델의 특정 파라미터 업데이트 진행 X
for param in model.vision_tower.parameters():
    param.requires_grad = False

for param in model.multi_modal_projector.parameters():
    param.requires_grad = False


config.json:   0%|          | 0.00/1.03k [00:00<?, ?B/s]



model.safetensors.index.json:   0%|          | 0.00/62.6k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/1.74G [00:00<?, ?B/s]

`config.hidden_act` is ignored, you should use `config.hidden_activation` instead.
Gemma's activation function will be set to `gelu_pytorch_tanh`. Please, use
`config.hidden_activation` if you want to override this behaviour.
See https://github.com/huggingface/transformers/pull/29402 for more details.


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

QLora 적용

In [12]:
from transformers import BitsAndBytesConfig
from peft import get_peft_model, LoraConfig
import gc
# QLora 적용 (학습을 진행하는 인프라를 고려하여 QLora 적용)

# 4 bit 양자화 적용
bnb_config = BitsAndBytesConfig(        # 4-bit quantization
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16
    )

torch.cuda.empty_cache()
gc.collect()

# Lora 설정
lora_config = LoraConfig(
    r=8, # r 차원
    target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj"], # Lora 적용대상
    task_type="CAUSAL_LM",
)
model = PaliGemmaForConditionalGeneration.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")
gc.collect()

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
#trainable params: 11,298,816 || all params: 2,934,634,224 || trainable%: 0.38501616002417344


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

trainable params: 11,298,816 || all params: 2,934,765,296 || trainable%: 0.3850


 TrainingArguments 클래스를 사용하여 모델 학습에 대한 다양한 하이퍼파라미터를 설정

In [13]:
from transformers import TrainingArguments
args=TrainingArguments(
            num_train_epochs=2,
            remove_unused_columns=False,
            per_device_train_batch_size=4,
            gradient_accumulation_steps=4,
            warmup_steps=2,
            learning_rate=2e-5,
            weight_decay=1e-6,
            adam_beta2=0.999,
            logging_steps=100,
            optim="adamw_hf",
            save_strategy="steps",
            push_to_hub=True,
            save_steps=1000,
            save_total_limit=1,
            output_dir="paligemma_deepfake_2024_COT",
            bf16=True,
            dataloader_pin_memory=False
        )


학습 진행

In [14]:
from transformers import Trainer

trainer = Trainer(
        model=model,
        train_dataset=train_ds ,
        data_collator=collate_fn,
        args=args
        )


In [15]:
trainer.train()



Step,Training Loss
100,0.663
200,0.3319
300,0.3107
400,0.2944
500,0.2557
600,0.2259
700,0.2042
800,0.1823
900,0.1571
1000,0.1664




TrainOutput(global_step=1362, training_loss=0.24415006301476566, metrics={'train_runtime': 6242.8358, 'train_samples_per_second': 3.494, 'train_steps_per_second': 0.218, 'total_flos': 1.0072550656631808e+17, 'train_loss': 0.24415006301476566, 'epoch': 1.9977997799779978})

인퍼런스에 사용하기 위하여 huggingface에 모델 업로드

In [16]:
trainer.push_to_hub()



CommitInfo(commit_url='https://huggingface.co/Donguri-b/paligemma_deepfake_2024_COT/commit/7f8f042f494d36ce23676b8af8050911bed24205', commit_message='End of training', commit_description='', oid='7f8f042f494d36ce23676b8af8050911bed24205', pr_url=None, pr_revision=None, pr_num=None)

In [17]:
# model_id = "paligemma_deepfake_2024"
# model = PaliGemmaForConditionalGeneration.from_pretrained(model_id)
# processor = AutoProcessor.from_pretrained("google/paligemma-3b-pt-224")