해당 코드에서 진행한 것:

1. log likelihood 함수 구현
2. 모델별로 test dataset에 대해 log likelihood를 어떻게 평가하는지 알아봄
3. wpr / lpr 구하는 함수 구현 (fasttext 이용) -> 아직 작동은 잘 안됨..
4. orpo fine tuning 함수 구현 (fine tuning 전 후 결과 평가)\
-> log likelihood로 평가했을 땐 유의미한 개선점이 있었음 !!! 


log likelihood 계산 함수 (orpo 학습지표)

In [11]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import torch.nn.functional as F
import json

def compute_log_likelihood(prompt, response, model, tokenizer, device):
    
    prompt_ids = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).input_ids.to(device)
    response_ids = tokenizer(response, return_tensors="pt", add_special_tokens=False).input_ids.to(device)

    input_ids = torch.cat([prompt_ids, response_ids], dim=1)

    with torch.no_grad():
        outputs = model(input_ids=input_ids)
        logits = outputs.logits

    shift_logits = logits[:, :-1, :].contiguous()
    shift_labels = input_ids[:, 1:].contiguous()

    # 📌 mask
    mask = torch.zeros_like(shift_labels)
    mask[:, prompt_ids.size(1) - 1:] = 1

    # 📌 loss
    loss = F.cross_entropy(
        shift_logits.view(-1, shift_logits.size(-1)),
        shift_labels.view(-1),
        reduction='none'
    ).view(shift_labels.size())

    log_prob = -(loss * mask).sum().item()
    return log_prob

In [7]:
import pandas as pd

def evaluate_orpo_loglikelihood(data, model, tokenizer, device):
    """
    Returns:
        pd.DataFrame: task, lang, log-likelihood, prefer 결과를 담은 DataFrame
    """

    results = []

    for row in data:
        prompt = row["input"]
        chosen = row["chosen"]
        rejected = row["rejected"]

        chosen_ll = compute_log_likelihood(prompt, chosen, model, tokenizer, device)
        rejected_ll = compute_log_likelihood(prompt, rejected, model, tokenizer, device)
        delta_ll = chosen_ll - rejected_ll
        prefer = 1 if delta_ll > 0 else 0

        results.append({
            "task": row["task"],
            "input_lang": row["input_lang"],
            "expected_lang": row["expected_lang"],
            "chosen_ll": chosen_ll,
            "rejected_ll": rejected_ll,
            "delta_ll": delta_ll,
            "prefer": prefer
        })

    return pd.DataFrame(results)

### 모델 (baseline)

In [8]:
# 모델 로딩
checkpoint = "HuggingFaceTB/SmolLM2-135M-intermediate-checkpoints"
revision = "step-240000"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained(checkpoint, revision=revision)
base_model = AutoModelForCausalLM.from_pretrained(checkpoint, revision=revision).to(device)

#### baseline 평가 (log probability)

사용 데이터셋: chatgpt로 수작업 생성한 데이터셋. \
컬럼명: input, chosen, rejected, preferred, input_lang, expected_lang, task

In [20]:
with open("orpo_train_data.json", "r", encoding="utf-8") as f:
    orpo_data = json.load(f)
orpo_data

[{'input': "Was bedeutet 'Realität' in der Philosophie?",
  'chosen': 'Realität ist das, was unabhängig von Wahrnehmung oder Meinungen existiert.',
  'rejected': 'Reality ist was 존재해 regardless of perception or 의견.',
  'preferred': 1,
  'input_lang': 'de',
  'expected_lang': 'de',
  'task': 'single_language_qa'},
 {'input': "Summarize this passage: 'Plants produce oxygen through photosynthesis. This process is vital for all aerobic life.'",
  'chosen': 'Plants make oxygen via photosynthesis, which is essential for life that needs air.',
  'rejected': '식물은 광합성을 통해 oxygen을 만들고, 이것은 생명체에 중요해.',
  'preferred': 1,
  'input_lang': 'en',
  'expected_lang': 'en',
  'task': 'summary'},
 {'input': "请翻译：'Time is money.'",
  'chosen': '时间就是金钱。',
  'rejected': 'Time 就是 money.',
  'preferred': 1,
  'input_lang': 'zh',
  'expected_lang': 'zh',
  'task': 'translation'},
 {'input': "Traduire en français : 'Honesty is the best policy.'",
  'chosen': "L'honnêteté est la meilleure politique.",
  'rejected

##### 데이터셋 평가 (log likelihood 기반)

> **결과**\
> prefer\
> 1    401\
> 0    299

In [21]:
# 평가 수행
results = evaluate_orpo_loglikelihood(orpo_data, base_model, tokenizer, device)

In [22]:
df_results = pd.DataFrame(results)
df_results

Unnamed: 0,task,input_lang,expected_lang,chosen_ll,rejected_ll,delta_ll,prefer
0,single_language_qa,de,de,-82.994560,-109.435364,26.440804,1
1,summary,en,en,-44.881851,-132.947449,88.065598,1
2,translation,zh,zh,-23.871082,-25.237799,1.366716,1
3,translation,fr,fr,-46.450325,-42.712379,-3.737946,0
4,summary,en,en,-49.928787,-136.437988,86.509201,1
...,...,...,...,...,...,...,...
695,summary,zh,zh,-88.713860,-96.719276,8.005417,1
696,summary,ko,ko,-74.384171,-102.150925,27.766754,1
697,single_language_qa,en,en,-56.980434,-108.466019,51.485584,1
698,translation,en,id,-189.460419,-127.818497,-61.641922,0


In [23]:
df_results.prefer.value_counts()

prefer
1    401
0    299
Name: count, dtype: int64

In [24]:
# 그룹별 통계 요약 (expected_lang 기준)
summary_by_expected = df_results.groupby("expected_lang")[["chosen_ll", "rejected_ll", "delta_ll"]].agg(["mean", "std"])
summary_by_expected

Unnamed: 0_level_0,chosen_ll,chosen_ll,rejected_ll,rejected_ll,delta_ll,delta_ll
Unnamed: 0_level_1,mean,std,mean,std,mean,std
expected_lang,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
ar,-106.069562,66.018063,-87.706348,47.808618,-18.363214,21.194358
de,-98.745988,27.583184,-86.019151,18.315147,-12.726837,24.440663
en,-41.774258,21.423889,-106.252156,58.979052,64.477897,47.550126
es,-100.666308,56.655363,-74.265056,20.614815,-26.401253,46.976406
fr,-83.350216,30.227851,-80.592276,31.842492,-2.75794,24.49932
hi,-245.108856,77.773503,-139.147285,59.396614,-105.961571,18.376889
id,-124.735542,45.212881,-111.470629,12.96724,-13.264914,33.018319
it,-78.87828,23.033197,-79.779305,21.617062,0.901026,9.76441
ja,-133.231836,80.334812,-99.530015,20.556431,-33.701821,62.972828
ko,-82.395309,29.652362,-86.614032,20.949382,4.218724,32.437045


In [28]:
# 그룹별 통계 요약 (input_lang 기준)
summary_by_input = df_results.groupby("input_lang")[["chosen_ll", "rejected_ll", "delta_ll"]].agg(["mean", "std"])
summary_by_input

Unnamed: 0_level_0,chosen_ll,chosen_ll,rejected_ll,rejected_ll,delta_ll,delta_ll
Unnamed: 0_level_1,mean,std,mean,std,mean,std
input_lang,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
de,-90.900349,31.911419,-80.55537,19.391928,-10.344979,26.312395
en,-66.229406,46.045878,-111.610799,52.22271,45.381393,60.418943
es,-96.69877,57.210978,-72.746087,21.295301,-23.952683,47.282847
fr,-78.075051,26.832456,-73.982068,18.500621,-4.092983,18.103967
id,-103.233231,23.384523,-107.496662,11.621163,4.263432,11.763359
ko,-75.115501,32.497512,-82.961574,22.657281,7.846073,32.444459
pt,-67.984831,25.268219,-80.883988,25.766968,12.899156,15.913153
zh,-84.272043,53.173346,-72.147475,23.510175,-12.124568,47.912832


In [26]:
# 태스크별 통계 요약 (input_lang 기준)
summary_by_task = df_results.groupby("task")[["chosen_ll", "rejected_ll", "delta_ll"]].agg(["mean", "std"])
summary_by_task

Unnamed: 0_level_0,chosen_ll,chosen_ll,rejected_ll,rejected_ll,delta_ll,delta_ll
Unnamed: 0_level_1,mean,std,mean,std,mean,std
task,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
single_language_qa,-93.801919,48.309954,-95.97161,41.137382,2.169691,64.404249
summary,-63.320049,26.224291,-93.766776,38.132429,30.446727,48.043316
translation,-68.557183,49.714016,-71.973543,31.786073,3.41636,30.562196


### 모델 (체크포인트 480000)

결과에 아주 약간의 변동



> **결과**\
> prefer\
> 1    409\
> 0    291

In [31]:
# 모델 로딩
checkpoint = "HuggingFaceTB/SmolLM2-135M-intermediate-checkpoints"
revision = "step-480000"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = AutoTokenizer.from_pretrained(checkpoint, revision=revision)
model_cp2 = AutoModelForCausalLM.from_pretrained(checkpoint, revision=revision).to(device)

##### 위와 같은 데이터셋으로 평가

In [32]:
# 평가 수행
results = evaluate_orpo_loglikelihood(orpo_data, model_cp2, tokenizer, device)

In [None]:
df_results = pd.DataFrame(results)
df_results

# prefer 1이면 chosen prefer한 것.

Unnamed: 0,task,input_lang,expected_lang,chosen_ll,rejected_ll,delta_ll,prefer
0,single_language_qa,de,de,-90.066147,-103.525848,13.459702,1
1,summary,en,en,-50.758705,-115.004005,64.245300,1
2,translation,zh,zh,-20.733707,-25.993589,5.259882,1
3,translation,fr,fr,-52.497452,-48.331543,-4.165909,0
4,summary,en,en,-48.395866,-125.646713,77.250847,1
...,...,...,...,...,...,...,...
695,summary,zh,zh,-79.738617,-100.218742,20.480125,1
696,summary,ko,ko,-74.688324,-99.701248,25.012924,1
697,single_language_qa,en,en,-58.298721,-115.333366,57.034645,1
698,translation,en,id,-189.516403,-127.304939,-62.211464,0


In [34]:
df_results.prefer.value_counts()

prefer
1    409
0    291
Name: count, dtype: int64

### 모델 (체크포인트 960000)

아주 미미하게 나아짐

> **결과**\
> prefer\
> 1    419\
> 0    281

In [36]:
# 모델 로딩
checkpoint = "HuggingFaceTB/SmolLM2-135M-intermediate-checkpoints"
revision = "step-960000"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = AutoTokenizer.from_pretrained(checkpoint, revision=revision)
model_96 = AutoModelForCausalLM.from_pretrained(checkpoint, revision=revision).to(device)

같은 데이터셋으로 평가

In [37]:
results = evaluate_orpo_loglikelihood(orpo_data, model_96, tokenizer, device)

In [38]:
df_results = pd.DataFrame(results)
df_results

# prefer 1이면 chosen prefer한 것.

Unnamed: 0,task,input_lang,expected_lang,chosen_ll,rejected_ll,delta_ll,prefer
0,single_language_qa,de,de,-89.039795,-108.472519,19.432724,1
1,summary,en,en,-47.942074,-120.021729,72.079655,1
2,translation,zh,zh,-21.435986,-27.517677,6.081692,1
3,translation,fr,fr,-49.453323,-49.571625,0.118301,1
4,summary,en,en,-48.441071,-120.264587,71.823517,1
...,...,...,...,...,...,...,...
695,summary,zh,zh,-81.701996,-97.206123,15.504128,1
696,summary,ko,ko,-70.480644,-108.663483,38.182838,1
697,single_language_qa,en,en,-59.765793,-114.258507,54.492714,1
698,translation,en,id,-185.064423,-122.169479,-62.894943,0


In [39]:
df_results.prefer.value_counts()

prefer
1    419
0    281
Name: count, dtype: int64

### 모델 (360M)

instruction 완료된 새로운 모델 (360m)로 테스트 및 train 수행

오... 훨씬 잘 작동함 


> **결과**\
> prefer\
> 1    476\
> 0    224

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "HuggingFaceTB/SmolLM2-360M-Instruct"

device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")
model_360m = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

같은 데이터셋으로 평가

In [2]:
results = evaluate_orpo_loglikelihood(orpo_data, model_360m, tokenizer, device)

NameError: name 'evaluate_orpo_loglikelihood' is not defined

In [None]:
df_results = pd.DataFrame(results)
df_results

# prefer 1이면 chosen prefer한 것.

Unnamed: 0,task,input_lang,expected_lang,chosen_ll,rejected_ll,delta_ll,prefer
0,single_language_qa,de,de,-64.876671,-112.482338,47.605667,1
1,summary,en,en,-51.787868,-114.740585,62.952717,1
2,translation,zh,zh,-13.677735,-30.340347,16.662612,1
3,translation,fr,fr,-24.268188,-37.116154,12.847965,1
4,summary,en,en,-46.356712,-108.177017,61.820305,1
...,...,...,...,...,...,...,...
695,summary,zh,zh,-72.483459,-94.100983,21.617523,1
696,summary,ko,ko,-67.127640,-93.554352,26.426712,1
697,single_language_qa,en,en,-58.218525,-107.994461,49.775936,1
698,translation,en,id,-140.564713,-127.996048,-12.568665,0


In [None]:
df_results.prefer.value_counts()

prefer
1    476
0    224
Name: count, dtype: int64

##### 함수 정의(wpr, lpr)

구현할 때 참고한 코드: [Language confusion github](https://github.com/for-ai/language-confusion/blob/main/compute_metrics.py)
- 언어 감지 함수
- wpr과 lpr 생성 함수
- text generator 

In [None]:
import fasttext
import re
import string
import jieba
from fugashi import Tagger
from collections import Counter

# fasttext 모델 로드
fasttext_model = fasttext.load_model("lid.176.bin")

# 영어 단어 사전 (길이 >= 4, 소문자만)
EN_WORDS_PATH = "words"  
with open(EN_WORDS_PATH, "r", encoding="utf-8") as f:
    en_words = {word.strip() for word in f if word.strip().islower() and len(word.strip()) > 3}

# 전처리 (문장부호들 빼기~)
def normalize(text):
    text = text.split('\nQ:')[0].strip()
    text = text.translate(str.maketrans("", "", string.punctuation))
    text = text.replace("—", " ").replace("،", "")
    return text.strip()

# 언어 감지 (fasttext)
def langid(text):
    """
    Fasttext 기반 언어 감지하는 함수
    """
    text = text.replace("\n", " ").strip()  # 줄바꿈 제거
    label, score = fasttext_model.predict(text)
    return label[0].replace("__label__", "") if score[0] > 0.3 else "unknown"

ja_tokenizer = Tagger("-O wakati -b 50000")
jieba.initialize()

def tokenize(text, lang):
    if lang == "ja":
        return ja_tokenizer.parse(text).strip().split()
    elif lang == "zh":
        return list(jieba.cut(text))
    else:
        return text.strip().split()

# LPR / WPR 계산 함수
def compute_lpr_wpr(text: str, target_lang: str):
    text = normalize(text)
    lines = text.split("\n")
    tokenized_lines = [tokenize(line, target_lang) for line in lines]
    
    # 너무 짧은 줄 제거 (3단어 미만 제외)
    valid_indices = [i for i, tokens in enumerate(tokenized_lines) if len(tokens) >= 3]
    lines = [lines[i] for i in valid_indices]
    tokenized_lines = [tokenized_lines[i] for i in valid_indices]

    if not lines:
        return {"lpr": 0.0, "wpr": 0.0, "acc": 0.0}

    total_lines = len(lines)
    line_errors = sum(langid(line) != target_lang for line in lines)
    line_pass = (total_lines - line_errors) / total_lines

    # WPR은 line 오류가 없는 경우에만 계산 
    word_error = 0
    if line_errors == 0 and target_lang in ("ar", "hi", "ja", "es", "ko", "ru", "en","zh", "fr", "de", "hi", "id", "it", "pt", "vi"):
        for tokens in tokenized_lines:
            if any(token in en_words for token in tokens):
                word_error += 1
        wpr = 1 - (word_error / len(tokenized_lines)) if len(tokenized_lines) > 0 else 0.0
    else:
        wpr = None

    return {
        "lpr": 1 - (line_errors / total_lines),
        "wpr": wpr if wpr is not None else "N/A",
        "acc": line_pass
    }

from tqdm import tqdm

def clean_generated_text(text: str) -> str:
    # 생성된 텍스트에서 줄바꿈, 숫자 리스트, 기호 등등 제거
    text = text.replace("\n", " ")
    text = re.sub(r"\b\d+\.\b", "", text)         # 1. 2. 3. 형식 제거
    text = re.sub(r"\n+", " ", text)              # 여러 줄바꿈 → 공백
    text = re.sub(r"[•\-–—]", " ", text)          # 리스트 기호 제거
    text = re.sub(r"\s+", " ", text)              # 다중 공백 → 하나
    return text.strip()


def evaluate_generation_language_confusion(data, model, tokenizer, compute_lpr_wpr_fn, device, max_new_tokens=150):
    """
    주어진 프롬프트에 대해 모델이 직접 생성한 응답의 언어 일치율을 평가 (lpr, wpr 지표 이용)

    Args:
        data (list of dict): 각 entry에 'input' (prompt), 'expected_lang' 필드 포함
        compute_lpr_wpr_fn (func): WPR, LPR 계산 함수
        max_new_tokens: 응답 최대 길이

    Returns:
        데이터프레임으로 평가 결과 반환
    """
    results = []

    for entry in tqdm(data):    
        prompt = entry["input"]
        target_lang = entry["expected_lang"]
        task = entry.get("task", "")
        reference = entry.get("reference", "")

        # 응답 생성
        messages = [{"role": "user", "content": prompt}]
        input_text = tokenizer.apply_chat_template(messages, tokenize=False)
        inputs = tokenizer(input_text, return_tensors="pt").to(device)

        # generate 실행!!!
        with torch.no_grad():
            outputs = model.generate(
                inputs["input_ids"],
                max_new_tokens=max_new_tokens,
                temperature=0.7,
                top_p=0.9,
                do_sample=True
            )
        decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)

        # 어시스턴트 응답 부분만 추출
        if "<|assistant|>" in decoded:
            generated = decoded.split("<|assistant|>")[-1].strip()
        elif "assistant\n" in decoded:
            generated = decoded.split("assistant\n", 1)[-1].strip()
        else:
            generated = decoded.strip()

        # if len(generated.split()) < 2:
        #     generated = "[TOO SHORT] " + generated
        
        generated = clean_generated_text(generated)

        # 평가
        wpr, lpr = compute_lpr_wpr_fn(generated, target_lang)
        detected_lang = langid(generated) 

        results.append({
            "task": task,
            "prompt": prompt,
            "reference": reference,
            "generated": generated,
            "target_lang": target_lang,
            "detected_lang": detected_lang,
            "wpr": round(wpr, 4),
            "lpr": lpr
        })

    return pd.DataFrame(results)


wpr, lpr 평가 함수 잘 작동하는지 테스트~ 

정확도 왜저럼... 미치겟네....
- 문제 1: lpr이 너무 부정확함
- 문제 2: 뭘 감지하고 있는건진 모르겠는데 wpr 수치 왔다갔다..
- 문제 3: 중국어는 아예 감지를 못하는디? 말고도 특정 언어들 감지 잘 못하는듯

In [None]:
# 데이터셋 로드
with open("orpo_test_data.json", "r", encoding="utf-8") as f:
    orpo_test_data = json.load(f)

In [None]:
df = evaluate_generation_language_confusion(
    data=orpo_test_data[:40],  # orpo 형식 데이터
    model=model_360m,
    tokenizer=tokenizer,
    compute_lpr_wpr_fn=compute_wpr_lpr, 
    device=device
)

100%|██████████| 40/40 [01:39<00:00,  2.50s/it]


In [None]:
df

Unnamed: 0,task,prompt,reference,generated,target_lang,detected_lang,wpr,lpr
0,summary,Summarize this: 'Regular exposure to natural l...,,Regular exposure to natural light can improve ...,en,en,1.0,1
1,translation,Traduce esta frase al francés: 'A balanced die...,,"Este tipo de dieta, que contiene los nutriente...",fr,es,0.0526,0
2,summary,요약해줘: 재택근무는 유연성을 높이고 통근 시간을 줄여 업무 만족도를 높일 수 있지...,,재택근무는 유연성을 높이고 통근 시간을 줄여 업무 만족도를 높인다는 점에요. 이상한...,ko,ko,1.0,1
3,translation,Translate into English: 'Leer regularmente mej...,,'Read regularly improves concentration and exp...,en,en,1.0,1
4,summary,Summarize this text: 'Frequent handwashing wit...,,Handwashing with soap significantly reduces th...,en,en,1.0,1
5,single_language_qa,¿Cuál es el principal beneficio de la energía ...,,"La energía solar, que tiene mucho beneficio en...",es,es,0.7619,0
6,translation,Translate into Korean: 'Exercising regularly c...,,유지하기 적용하기 위해 좋아하는 지부는 천만 안 이해하거나 안 고망하게 됩니다.,ko,ko,1.0,1
7,summary,Summarize: 'Vaccination has been proven to be ...,,Vaccination has been proven to be one of the m...,en,en,0.9333,0
8,translation,Translate this sentence into Chinese: 'Reducin...,,'消化水基金及时代的荧光水碳性质传播可以保持世界水浪的平民生活平台并发挥为保护了水基金的水碳性质',zh,zh,0.0,0
9,single_language_qa,Was ist der Zweck von Recycling?,,"Der Zweck der Recycling ist, dass die Gesundhe...",de,de,0.9412,0


#### model fine tuning 수행

train 계획: 앞서, instruct가 완료된 350m 모델을 이용해서 600rows에 대한 log likelihood를 계산해보았을 때, 200개 가량을 잘못 선호하고 있었음. 

- step1 : test dataset (현재 약 150개 rows로 구성, 보충필요)로 모델 학습 전 성능 평가
- step2 : orpo train dataset (아까 평가에 사용한 데이터, 약 700rows로 구성, 보충필요)로 orpo 방식 fine tuning 수행
- step3 : fine tuned 된 model로 test dataset에 대해서 재평가

*근데 평가 지표는 다시 생각해 보아야 함 . . . (현재 wpr lpr이 제대로 작동하지 않기 때문에, 아직은 앞서 내가 수행했던 것 처럼 log likelihood를 계산해서 비교하는 수 밖에 없을 듯?...)*

##### Model train - (with 360m)

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

# 모델 다시 로드

checkpoint = "HuggingFaceTB/SmolLM2-360M-Instruct"
device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model_360m = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

Total parameters: 361821120
Trainable parameters: 361821120


##### train 전 성능 확인

orpo_test_data에 관한 360m-instruct-model의 prefer 결과: \
1    120\
0     43

In [None]:
# 데이터셋 로드
with open("orpo_test_data.json", "r", encoding="utf-8") as f:
    orpo_test_data = json.load(f)

results = evaluate_orpo_loglikelihood(orpo_test_data, model_360m, tokenizer, device)

df_results = pd.DataFrame(results)
df_results

# prefer 1이면 chosen prefer한 것.

Unnamed: 0,task,input_lang,expected_lang,chosen_ll,rejected_ll,delta_ll,prefer
0,summary,en,en,-40.229523,-105.618828,65.389305,1
1,translation,es,fr,-54.397270,-53.397953,-0.999317,0
2,summary,ko,ko,-51.461407,-89.669876,38.208469,1
3,translation,es,en,-21.614311,-69.535065,47.920753,1
4,summary,en,en,-28.484894,-115.656044,87.171150,1
...,...,...,...,...,...,...,...
158,translation,en,ja,-47.487366,-38.843826,-8.643539,0
159,summary,ko,ko,-46.220173,-40.308128,-5.912045,0
160,single_language_qa,en,en,-30.917070,-89.036697,58.119627,1
161,translation,en,es,-26.265469,-31.918995,5.653526,1


In [None]:
df_results.prefer.value_counts()

prefer
1    120
0     43
Name: count, dtype: int64

##### Fine tuning

fine tuning 함수 구현 (orpo 방식) - 아니 코드를 조금 바꿨더니 다시 문제가 생겨서 학습이 이상해졌슴다 ...

함수 구현 시 참고한 코드 : [orpo github](https://github.com/xfactlab/orpo/blob/main/trl/test_orpo_trainer_demo.py)

In [3]:
from transformers import AutoModelForCausalLM, AutoTokenizer

# 모델 다시 로드

checkpoint = "HuggingFaceTB/SmolLM2-360M-Instruct"
device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model_360m = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

# 파라미터 수 확인
print("Total parameters:", sum(p.numel() for p in model_360m.parameters()))
print("Trainable parameters:", sum(p.numel() for p in model_360m.parameters() if p.requires_grad))

# pad_token_id 보정
if tokenizer.pad_token_id is None:
    tokenizer.pad_token_id = tokenizer.eos_token_id

Total parameters: 361821120
Trainable parameters: 361821120


In [None]:
from datasets import Dataset
from transformers import AutoTokenizer
import torch
from torch.nn.utils.rnn import pad_sequence
from torch.utils.data import DataLoader
from torch.nn import functional as F
from transformers import AutoModelForCausalLM
import json

# 모델과 토크나이저는 로드되었다고 가정.

# 데이터 로드
with open("orpo_train_data.json", "r", encoding="utf-8") as f:
    orpo_train_data = json.load(f)

dataset = Dataset.from_list(orpo_train_data)

# Tokenization 함수
def tokenize_for_orpo(example):
    prompt_ids = tokenizer.encode(example["input"], add_special_tokens=False, truncation=True, max_length=1024)
    chosen_ids = tokenizer.encode(example["chosen"], add_special_tokens=False, truncation=True, max_length=512)
    rejected_ids = tokenizer.encode(example["rejected"], add_special_tokens=False, truncation=True, max_length=512)
    return {
        "prompt_ids": prompt_ids,
        "chosen_ids": chosen_ids,
        "rejected_ids": rejected_ids,
    }

tokenized_dataset = dataset.map(tokenize_for_orpo)

# Collator for ORPO
class ORPODataCollator:
    def __init__(self, tokenizer):
        self.pad_token_id = getattr(tokenizer, "pad_token_id", tokenizer.eos_token_id or 0)

    def __call__(self, features):
        def pad(batch):
            return pad_sequence([torch.tensor(x) for x in batch], batch_first=True, padding_value=self.pad_token_id)

        chosen_concat = [f["prompt_ids"] + f["chosen_ids"] for f in features]
        rejected_concat = [f["prompt_ids"] + f["rejected_ids"] for f in features]

        return {
            "chosen_concat": pad(chosen_concat),
            "rejected_concat": pad(rejected_concat),
        }

collator = ORPODataCollator(tokenizer)

# original 컬럼 제거
tokenized_dataset = tokenized_dataset.remove_columns(dataset.column_names)

# DataLoader
train_loader = DataLoader(
    tokenized_dataset,
    batch_size=2,
    shuffle=True,
    collate_fn=collator
)

# loss function
def compute_orpo_loss(model, chosen_concat, rejected_concat):
    def get_logps(seq_ids):
        attention_mask = (seq_ids != tokenizer.pad_token_id).long().to(model.device)
        outputs = model(input_ids=seq_ids.to(model.device), attention_mask=attention_mask)
        logits = outputs.logits[:, :-1, :]
        labels = seq_ids[:, 1:].to(model.device)
        log_probs = F.log_softmax(logits, dim=-1)
        logp = torch.gather(log_probs, 2, labels.unsqueeze(-1)).squeeze(-1)
        loss_mask = (labels != tokenizer.pad_token_id).float()
        return (logp * loss_mask).sum(dim=-1)

    chosen_logp = get_logps(chosen_concat)
    rejected_logp = get_logps(rejected_concat)
    loss = -torch.log(torch.sigmoid(chosen_logp - rejected_logp)).mean()
    return loss


Map:   0%|          | 0/700 [00:00<?, ? examples/s]

In [None]:
from tqdm import tqdm

# 모델을 학습 모드로 전환
model_360m.train()

# 에폭 수 설정
num_epochs = 3

# 전체 학습 반복
for epoch in range(num_epochs):
    print(f"📚 Epoch {epoch + 1}/{num_epochs}")
    total_loss = 0.0

    # 배치 단위로 반복
    for batch in tqdm(train_loader, desc=f"Training epoch {epoch+1}"):
        optimizer.zero_grad()  # 이전 gradient 초기화

        # 손실(loss) 계산
        loss = compute_orpo_loss(model_360m, batch["chosen_concat"], batch["rejected_concat"])

        # 역전파: gradient 계산
        loss.backward()

        # 옵티마이저로 파라미터 업데이트
        optimizer.step()

        # 손실 누적
        total_loss += loss.item()

    avg_loss = total_loss / len(train_loader)
    print(f"✅ Epoch {epoch + 1} average loss: {avg_loss:.4f}")


📚 Epoch 1/3


Training epoch 1: 100%|██████████| 350/350 [01:44<00:00,  3.36it/s]


✅ Epoch 1 average loss: inf
📚 Epoch 2/3


Training epoch 2: 100%|██████████| 350/350 [01:43<00:00,  3.39it/s]


✅ Epoch 2 average loss: inf
📚 Epoch 3/3


Training epoch 3: 100%|██████████| 350/350 [01:44<00:00,  3.34it/s]

✅ Epoch 3 average loss: inf





모델 저장

In [None]:
# 저장
output_dir = "./orpo-trained-smollm"
model_360m.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

('./orpo-trained-smollm/tokenizer_config.json',
 './orpo-trained-smollm/special_tokens_map.json',
 './orpo-trained-smollm/vocab.json',
 './orpo-trained-smollm/merges.txt',
 './orpo-trained-smollm/added_tokens.json',
 './orpo-trained-smollm/tokenizer.json')

#### 동일한 테스트 데이터로 재평가

prefer \
1    139 \
0     24

In [None]:
# ORPO JSON 불러오기
with open("orpo_test_data.json", "r", encoding="utf-8") as f:
    orpo_test_data = json.load(f)


# 모델 로드
from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "./orpo-trained-smollm"
model_ft = AutoModelForCausalLM.from_pretrained(model_path).cuda()
tokenizer_ft = AutoTokenizer.from_pretrained(model_path)
model_ft.eval()


# 평가 실행
df_results = evaluate_orpo_loglikelihood(orpo_test_data, model_ft, tokenizer_ft, device)
df_results.head()

Unnamed: 0,task,input_lang,expected_lang,chosen_ll,rejected_ll,delta_ll,prefer
0,summary,en,en,,,,0
1,translation,es,fr,,,,0
2,summary,ko,ko,,,,0
3,translation,es,en,,,,0
4,summary,en,en,,,,0


In [None]:
df_results.prefer.value_counts()

prefer
0    163
Name: count, dtype: int64

dh!!!!!!!!!! 더 나아졋슴!!!!!!!!!!!!!!!!!!!!!

### Model train 수행 2 (with 가장 작은 모델)

In [None]:
# 모델 로딩
checkpoint = "HuggingFaceTB/SmolLM2-135M-intermediate-checkpoints"
revision = "step-240000"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = AutoTokenizer.from_pretrained(checkpoint, revision=revision)
base_model = AutoModelForCausalLM.from_pretrained(checkpoint, revision=revision).to(device)

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(49152, 576)
    (layers): ModuleList(
      (0-29): 30 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear(in_features=576, out_features=576, bias=False)
          (k_proj): Linear(in_features=576, out_features=192, bias=False)
          (v_proj): Linear(in_features=576, out_features=192, bias=False)
          (o_proj): Linear(in_features=576, out_features=576, bias=False)
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=576, out_features=1536, bias=False)
          (up_proj): Linear(in_features=576, out_features=1536, bias=False)
          (down_proj): Linear(in_features=1536, out_features=576, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((576,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSNorm((576,), eps=1e-05)
      )
    )
    (norm): LlamaRMSNorm((576,), eps=1e-05)
    (rotary_emb): LlamaRotaryEm

In [28]:
print("Total parameters:", sum(p.numel() for p in base_model.parameters()))
print("Trainable parameters:", sum(p.numel() for p in base_model.parameters() if p.requires_grad))

# ✅ pad_token_id 보정
if tokenizer.pad_token_id is None:
    tokenizer.pad_token_id = tokenizer.eos_token_id

Total parameters: 134515008
Trainable parameters: 134515008


#### Train 전 성능 확인

prefer\
1    104\
0     59

In [None]:
# ORPO JSON 불러오기
with open("orpo_test_data.json", "r", encoding="utf-8") as f:
    orpo_test_data = json.load(f)

# 평가 실행
base_model_results = evaluate_orpo_loglikelihood(orpo_test_data, base_model, tokenizer, device)
base_model_results.head()

Unnamed: 0,task,input_lang,expected_lang,chosen_ll,rejected_ll,delta_ll,prefer
0,summary,en,en,-129.415405,-192.958115,63.542709,1
1,translation,es,fr,-205.192169,-174.897873,-30.294296,0
2,summary,ko,ko,-318.202789,-366.273132,48.070343,1
3,translation,es,en,-149.312378,-166.028336,16.715958,1
4,summary,en,en,-112.111847,-214.953918,102.842072,1
...,...,...,...,...,...,...,...
158,translation,en,ja,-124.010056,-105.323196,-18.686859,0
159,summary,ko,ko,-214.033997,-210.766876,-3.267120,0
160,single_language_qa,en,en,-53.687931,-141.002289,87.314358,1
161,translation,en,es,-114.859642,-129.893188,15.033546,1


In [33]:
base_model_results.prefer.value_counts()

prefer
1    104
0     59
Name: count, dtype: int64

#### Train

In [None]:
from datasets import Dataset
from transformers import AutoTokenizer
import torch
from torch.nn.utils.rnn import pad_sequence
import torch
from torch.utils.data import DataLoader
from torch.nn import functional as F
from transformers import AutoModelForCausalLM
from torch.optim import AdamW 
from tqdm import tqdm
import json

# Tokenization function
def tokenize_orpo(example, tokenizer):
    prompt_ids = tokenizer.encode(example["input"], add_special_tokens=False)
    chosen_ids = tokenizer.encode(example["chosen"], add_special_tokens=False)
    rejected_ids = tokenizer.encode(example["rejected"], add_special_tokens=False)
    return {
        "prompt_ids": prompt_ids,
        "chosen_ids": chosen_ids,
        "rejected_ids": rejected_ids,
    }

# 데이터셋 전처리
def prepare_tokenized_dataset(dataset, tokenizer):
    tokenized = dataset.map(lambda x: tokenize_orpo(x, tokenizer))
    return tokenized.remove_columns(dataset.column_names)

 
class ORPODataCollator:
    def __init__(self, tokenizer):
        self.pad_token_id = tokenizer.pad_token_id or tokenizer.eos_token_id

    def __call__(self, features):
        def pad(batch):
            return pad_sequence([torch.tensor(x) for x in batch], batch_first=True, padding_value=self.pad_token_id)
        
        chosen_concat = [f["prompt_ids"] + f["chosen_ids"] for f in features]
        rejected_concat = [f["prompt_ids"] + f["rejected_ids"] for f in features]

        return {
            "chosen_concat": pad(chosen_concat),
            "rejected_concat": pad(rejected_concat),
        }

# ORPO Loss function
def compute_orpo_loss(model, chosen_concat, rejected_concat, tokenizer):
    def get_logps(seq_ids):
        seq_ids = seq_ids.to(model.device)
        attention_mask = (seq_ids != tokenizer.pad_token_id).long()
        attention_mask = attention_mask.to(model.device)

        outputs = model(input_ids=seq_ids, attention_mask=attention_mask)
        logits = outputs.logits[:, :-1, :]
        labels = seq_ids[:, 1:]
        labels = labels.to(model.device)

        log_probs = F.log_softmax(logits, dim=-1)
        logp = torch.gather(log_probs, 2, labels.unsqueeze(-1)).squeeze(-1)

        loss_mask = (labels != tokenizer.pad_token_id).float().to(model.device)
        return (logp * loss_mask).sum(dim=-1)

    chosen_logp = get_logps(chosen_concat)
    rejected_logp = get_logps(rejected_concat)
    loss = -torch.log(torch.sigmoid(chosen_logp - rejected_logp)).mean()
    return loss


#  Dataloader 생성 함수
def create_dataloader(tokenized_dataset, tokenizer, batch_size=2):
    collator = ORPODataCollator(tokenizer)
    return DataLoader(
        tokenized_dataset,
        batch_size=batch_size,
        shuffle=True,
        collate_fn=collator
    )


def prepare_tokenized_dataset(data, tokenizer):
    if isinstance(data, list):
        data = Dataset.from_list(data)
    tokenized = data.map(lambda x: tokenize_orpo(x, tokenizer))
    return tokenized.remove_columns(data.column_names)


In [None]:
from datasets import Dataset

with open("orpo_finetuning_data.json", "r", encoding="utf-8") as f:
    orpo_train_data = json.load(f)

# 리스트 → Dataset 변환
dataset = Dataset.from_list(orpo_train_data)

# 토크나이징 및 컬럼 정리
tokenized_dataset = prepare_tokenized_dataset(dataset, tokenizer)

# 데이터로더 준비
train_loader = create_dataloader(tokenized_dataset, tokenizer, batch_size=2)

# train
optimizer = AdamW(base_model.parameters(), lr=5e-5)

num_epochs = 3 
for epoch in range(num_epochs):
    print(f"Epoch {epoch+1}/{num_epochs}")
    for batch in train_loader:
        optimizer.zero_grad()
        loss = compute_orpo_loss(base_model, batch["chosen_concat"], batch["rejected_concat"], tokenizer)
        loss.backward()
        optimizer.step()
        print(f"Loss: {loss.item():.4f}")


Map:   0%|          | 0/579 [00:00<?, ? examples/s]

In [None]:
# 저장
output_dir = "./orpo-trained-smollm-base"
base_model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

('./orpo-trained-smollm-base/tokenizer_config.json',
 './orpo-trained-smollm-base/special_tokens_map.json',
 './orpo-trained-smollm-base/vocab.json',
 './orpo-trained-smollm-base/merges.txt',
 './orpo-trained-smollm-base/added_tokens.json',
 './orpo-trained-smollm-base/tokenizer.json')

동일한 데이터셋으로 성능 평가

In [None]:
# ORPO 테스트 데이터셋 불러오기
with open("orpo_test_data.json", "r", encoding="utf-8") as f:
    orpo_test_data = json.load(f)

# 평가 실행
base_model_results = evaluate_orpo_loglikelihood(orpo_test_data, base_model, tokenizer, device)
base_model_results.head()

Unnamed: 0,task,input_lang,expected_lang,chosen_ll,rejected_ll,delta_ll,prefer
0,summary,en,en,-129.707504,-195.440765,65.733261,1
1,translation,es,fr,-167.368362,-227.213257,59.844894,1
2,summary,ko,ko,-342.033936,-415.856567,73.822632,1
3,translation,es,en,-172.799515,-203.408218,30.608704,1
4,summary,en,en,-115.482971,-226.102814,110.619843,1


In [40]:
base_model_results.prefer.value_counts()

prefer
1    118
0     45
Name: count, dtype: int64

In [43]:
with open("orpo_finetuning_data.json", "r", encoding="utf-8") as f:
    orpo_test_data = json.load(f)

# 평가 실행
base_model_results = evaluate_orpo_loglikelihood(orpo_test_data, base_model, tokenizer, device)
base_model_results.prefer.value_counts()

prefer
1    579
Name: count, dtype: int64


학습 데이터에 관해서는 전부다 맞게 분류하는 것을 알 수 잇음 !!!!!!!!!


---

오 !!! 나아졋슴

-> 아마 데이터셋 사이즈를 늘려야 할 것 같음

내일 할 일 : 데이터셋 5000개 구축. 화이팅