# 0. Requirements

### (1) install
- 평가함수 사용을 위해 두개의 패키지가 필요하다. 허깅페이스의 evaluate와 sacrebleu연산을 위한 sacrebleu패키지 이다. 아래 명령어를 통해 설치 가능하다. 
- `pip install evaluate`
- `pip install sacrebleu`

### (2) import

```python
import evaluate
import numpy as np

metric = evaluate.load("sacrebleu")
```

이렇게 선언해서 사용 가능하다. 

### (3) ETC

- 지금은 Validation data 전체의 term수와 번역한 텍스트에서 영어 term 숫자를 세서 표시하고 있다. 
- 만약 다른 방법이 필요하다면 수정이 가능하다. 

# 1. Set Environments

In [1]:
import wandb
import os
os.environ["WANDB_PROJECT"]="Machin Translator_01"

wandb.login()

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33maeolian83[0m. Use [1m`wandb login --relogin`[0m to force relogin


True

In [2]:
from huggingface_hub import login
from dotenv import load_dotenv

load_dotenv()


login(token= os.environ["HF_TOKEN"])

Token has not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /home/aeolian83/.cache/huggingface/token
Login successful


# 2. Set Datasets

In [3]:
from datasets import load_dataset, Dataset, DatasetDict
import pickle

In [4]:
with open('./data/train_data_300.pkl', 'rb') as file:
    train_data = pickle.load(file)
len(train_data)

with open('./data/validation_data_28.pkl', 'rb') as file:
    test_data = pickle.load(file)
len(test_data)

28

In [5]:
train_dataset = Dataset.from_list(train_data)
test_dataset = Dataset.from_list(test_data)

# DatasetDict로 "train"과 "test" 데이터셋 묶기
dataset_dict = DatasetDict({
        'train': train_dataset,
        'test': test_dataset
    })

In [6]:
dataset_dict

DatasetDict({
    train: Dataset({
        features: ['english', 'korean', 'terms'],
        num_rows: 300
    })
    test: Dataset({
        features: ['english', 'korean', 'terms'],
        num_rows: 28
    })
})

# 3. Prepare Model

In [7]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, TrainerCallback, TrainerState, TrainerControl
from transformers import TrainingArguments
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM

model_id = "beomi/Llama-3-KoEn-8B-Instruct-preview"
device_map = {"": 0}
cache_model_dir="/mnt/t7/.cache/huggingface/models"

In [8]:
# Settings for 4-bit QLoRA Training(4bit QLoRA 학습을 위한 설정)
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True, 
    bnb_4bit_compute_dtype=torch.bfloat16, # Nvidia의 Ampere 아키텍처 이후 가속기는 bf16으로 속도 향상을 꾀할수 있다. 
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

# bnb_4bit_quant_type="nf4" 설정상 기본값은 bnb_4bit_quant_type="fp4"이나 허깅페이스 저자들에 의하면
# 경험적 결과로 "nf4"가 결과가 더 좋았다고 한다. https://huggingface.co/blog/4bit-transformers-bitsandbytes
# bnb_4bit_use_double_quant=True로 하면 매개변수단 0.4bit을 추가로 절약 할 수 있다고 한다. 

In [9]:
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config, device_map=device_map, cache_dir=cache_model_dir, trust_remote_code=True)
model.config.use_cache = False

# model.config.pretraining_tp = 1
# 종종 QLoRA 코드에 이 코드가 보이는데 병렬 학습에 쓰이는 코드로 보인다. 

Loading checkpoint shards:   0%|          | 0/6 [00:00<?, ?it/s]

In [10]:
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True, cache_dir=cache_model_dir)
tokenizer.add_special_tokens({'pad_token': '<PAD>'})

# 이 코드를 쓰지 않는 경우(물론 패딩 토큰을 별도로 사용하는 경우에 해당됨) loss가 0으로 떨어지는 경우가 있다함
tokenizer.padding_side = "left"

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [11]:
model.resize_token_embeddings(len(tokenizer)) # pad_token이 추가되었으므로 embedding과 language modeling head를 resize

Embedding(128257, 4096)

# 4. Set LoRA

In [12]:
from peft import LoraConfig, get_peft_model

lora_alpha = 16
lora_dropout = 0.1
lora_r = 64

In [13]:
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM"
)

# 5. Set DataCollator

In [14]:
# Formatting function
def formatting_func(example):
    output_texts = []
    for i in range(len(example["english"])):
        text = f"Translate input sentence to Korean \n### Input: {example['english'][i]} \n### Translated: {example['korean'][i]}" + tokenizer.eos_token
        output_texts.append(text)

    return output_texts


response_template = " \n### Translated:"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)

# 6. Set Train Arguments

In [15]:
checkpoint_dir = "./checkpoint/translate_machine_llama3ko_intsuct_origindata300_01"

In [16]:
output_dir = checkpoint_dir
per_device_train_batch_size = 1
gradient_accumulation_steps = 2
optim = "paged_adamw_32bit"
evaluation_strategy="steps"
eval_steps=10
report_to="wandb"
save_steps = 10
save_total_limit=5
num_train_epochs = 2
logging_steps = 10
learning_rate = 2e-4
max_grad_norm = 0.3
warmup_ratio = 0.03
lr_scheduler_type = "constant"

In [17]:
training_arguments = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    num_train_epochs=num_train_epochs,
    save_steps=save_steps,
    save_total_limit=save_total_limit,
    logging_steps=logging_steps,
    evaluation_strategy=evaluation_strategy,
    eval_steps = eval_steps,
    report_to = report_to,
    learning_rate=learning_rate,
    bf16=True,
    max_grad_norm=max_grad_norm,
    warmup_ratio=warmup_ratio,
    group_by_length=True,
    lr_scheduler_type=lr_scheduler_type,
)

# 7. Set Evaluation Metric

In [18]:
import evaluate
import numpy as np

metric = evaluate.load("sacrebleu")

In [19]:
def compute_metrics(eval_preds):
    logits, labels = eval_preds
    predictions = np.argmax(logits, axis=-1)

    # decode preds and labels
    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)

    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
    decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True)
    
    
    for i, (input, pred) in enumerate(zip(dataset_dict['test']['english'], decoded_preds)):
        print(input)
        print("-" * 30)
        print(pred)
        print("#" *50)
        if i > 5:
            break

    # 각 prediction과 labels의 terms를 비교하여 term 비율 계산
    total_input_terms = 0
    correct_terms = 0
    label_terms = 0
    
    for input, pred, label, terms in zip(dataset_dict['test']['english'], decoded_preds, decoded_labels, dataset_dict['test']['terms']):
        terms = terms.split(',')  # 'terms'가 쉼표로 구분된 문자열이라 가정

        for term in terms:
            total_input_terms += input.lower().count(term.lower())
            correct_terms = correct_terms + pred.lower().count(term.lower())
            label_terms += label.lower().count(term.lower())
        
        # print(total_input_terms)
        # print(correct_terms)
        # print(label_terms)


    
    result = metric.compute(predictions=decoded_preds, references=decoded_labels)
    term_weight = (correct_terms - total_input_terms) / label_terms

    result["weighted_score"] = result["score"] * term_weight
    result["term_weight"] = term_weight

    return result

In [20]:
from trl import SFTTrainer

max_seq_length = 1024

In [21]:
# Custom callback to print step every 10 steps
class PrintStepCallback(TrainerCallback):
    def on_step_end(self, args: TrainingArguments, state: TrainerState, control: TrainerControl, **kwargs):
        if state.global_step % 10 == 0:
            print("#" * 80)
            print(f"Step: {state.global_step}")
            print("#" * 80)

In [22]:
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset_dict["train"],
    eval_dataset=dataset_dict["test"],
    peft_config=peft_config,
    tokenizer=tokenizer,
    args=training_arguments,
    formatting_func=formatting_func,
    data_collator=collator,
    compute_metrics=compute_metrics,
    callbacks=[PrintStepCallback()]
)



Map:   0%|          | 0/300 [00:00<?, ? examples/s]

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


Map:   0%|          | 0/28 [00:00<?, ? examples/s]

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


In [23]:
for name, module in trainer.model.named_modules():
    if "norm" in name:
        module = module.to(torch.float32)

In [24]:
trainer.train()

Step,Training Loss,Validation Loss,Score,Counts,Totals,Precisions,Bp,Sys Len,Ref Len,Weighted Score,Term Weight
10,0.4652,0.366306,25.709825,"[2730, 2169, 1757, 1439]","[7693, 7665, 7637, 7609]","[35.48680618744313, 28.297455968688844, 23.006416131989003, 18.91181495597319]",1.0,7693,3413,-31.353445,-1.219512
20,0.327,0.287172,28.562756,"[2853, 2364, 1986, 1677]","[7664, 7636, 7608, 7580]","[37.22599164926931, 30.958617077003666, 26.104100946372238, 22.12401055408971]",1.0,7664,3413,-25.776145,-0.902439
30,0.2721,0.262421,29.942242,"[2889, 2415, 2054, 1752]","[7518, 7490, 7462, 7434]","[38.42777334397446, 32.242990654205606, 27.526132404181183, 23.567393058918483]",1.0,7518,3413,-26.655899,-0.890244
40,0.2547,0.250791,31.464293,"[2923, 2497, 2165, 1877]","[7459, 7431, 7403, 7375]","[39.18755865397507, 33.602476113578255, 29.244900715925976, 25.45084745762712]",1.0,7459,3413,-26.09234,-0.829268
50,0.2563,0.246274,31.729861,"[2929, 2511, 2182, 1897]","[7445, 7417, 7389, 7361]","[39.341840161182, 33.8546582176082, 29.530383001759372, 25.77095503328352]",1.0,7445,3413,-26.699517,-0.841463
60,0.2301,0.242651,31.888173,"[2944, 2526, 2200, 1925]","[7471, 7443, 7415, 7387]","[39.405702047918616, 33.93792825473599, 29.6695886716116, 26.059293353188032]",1.0,7471,3413,-26.832731,-0.841463
70,0.2429,0.234834,32.607254,"[2965, 2563, 2240, 1955]","[7408, 7380, 7352, 7324]","[40.02429805615551, 34.7289972899729, 30.46789989118607, 26.693063899508466]",1.0,7408,3413,-26.642513,-0.817073
80,0.2147,0.228237,32.906402,"[2978, 2588, 2273, 1994]","[7430, 7402, 7374, 7346]","[40.08075370121131, 34.9635233720616, 30.824518578790343, 27.144023958616934]",1.0,7430,3413,-26.084343,-0.792683
90,0.2213,0.224795,32.763037,"[2970, 2575, 2257, 1979]","[7421, 7393, 7365, 7337]","[40.02156043659884, 34.830244826186934, 30.644942294636795, 26.972877197764753]",1.0,7421,3413,-28.367996,-0.865854
100,0.2275,0.2242,32.593235,"[2971, 2569, 2251, 1976]","[7448, 7420, 7392, 7364]","[39.889903329752954, 34.62264150943396, 30.451839826839826, 26.83324280282455]",1.0,7448,3413,-27.426015,-0.841463


################################################################################
Step: 10
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------


































 hindsight hindsight hindsight hindsight hindsight hindsight hindsight hindsight hindsight hind



################################################################################
Step: 20
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------
             









    ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ hmac hmac hmac hmac hindsight hindsight hindsight이
 text:




################################################################################
Step: 30
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------
                     hmac hmac hmac hmac hmac hmac hmac이
 text ( Spanish.The 입력 sentence The Aarring is a technique in machineiva

Checkpoint destination directory ./checkpoint/translate_machine_llama3ko_intsuct_origindata300_01/checkpoint-30 already exists and is non-empty. Saving will proceed but saved results may be invalid.


################################################################################
Step: 40
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------























이
 text to Spanish.The 입력 sentence The arring is a technique in machineivariate algebra that describes the



################################################################################
Step: 50
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------















 ▲이
 text to Spanish.Input 입력 Sentence The arring is a technique in machineivariate algebra that describes thearsi



################################################################################
Step: 60
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------




































 ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲이
 text to Spanish.The 입력 Sentence The arring is a technique in machine



################################################################################
Step: 70
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------






















이

 to Spanish.The 입력 Sentence The arring is a technique in machineivariate algebra that describes thearsit



################################################################################
Step: 80
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------






















이

 to Spanish.The 입력 Sentence The arsity is a technique in computerivariate algebra that describes thearsi



################################################################################
Step: 90
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------


















이

 to Spanish.The 입력 Sentence The Aarsity is a technique in computerivariate algebra that describes thearsity 



################################################################################
Step: 100
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------














이 a to to Spanish.Input 입력 Sentence The Aarsity를 a technique in computerivariate algebra that describes thearsity 



################################################################################
Step: 110
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------











      이 a to: Spanish.The 입력 Sentence The Aarsity를 a technique in computerivariate algebra that describes thearsity i



################################################################################
Step: 120
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------





















 ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲이
 to to Spanish.The 입력 Sentence The arring is a technique in



################################################################################
Step: 130
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------


























 ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲이
 to to Spanish.Input Input sentence ### arring is a technique in machineivariate alg



################################################################################
Step: 140
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------

























이
 to
 Spanish:Input Input Sentence ### arsity를 a technique in machineivariate algebra that describes t



################################################################################
Step: 150
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------


























 ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲이
 to:
 Spanish.Input 입력 Sentence ### arsity를 a technique in machineivar



################################################################################
Step: 160
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------



























 ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲I
 to:
 Spanish.Input 입력 Sentence ### arsity를 a technique in machinei



################################################################################
Step: 170
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------






























 ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲I
 to:
 Spanish."The 입력 Sentence ### arsity를 a technique in machin



################################################################################
Step: 180
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------

























 ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲I a to:
 Spanish.Input 입력 Sentence ### arsity를 a technique in machineiv



################################################################################
Step: 190
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------


































 ▲ ▲ ▲ ▲I a to: Spanish"The input Sentence The arsity를 a phenomenon in computerivariate algebr



################################################################################
Step: 200
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------
























 ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲I a to:
 Spanish"The 입력 Sentence The arsity is a phenomenon in computeri



################################################################################
Step: 210
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------



























 ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲I WordPress to:
 Spanish.The Input Sentence The arsity is a technique in co



################################################################################
Step: 220
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------






























 ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲I WordPress
:
 Spanish.The input Sentence The arsity is a phenomenon in 



################################################################################
Step: 230
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------






























 ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲I WordPress to:
 Spanish.The 입력 Sentence The arsity is a phe



################################################################################
Step: 240
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------


























 ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲I WordPress to to Spanish.The Input Sentence The



################################################################################
Step: 250
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------

























 ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲I WordPress to to Spanish.The 입력 Sentence T



################################################################################
Step: 260
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------

























 ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲I a to to Spanish.The 입력 Sentence The arsit



################################################################################
Step: 270
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------



























 ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲I a to to Spanish.The 입력 sentence The Aarsity를 a technique in machi

Checkpoint destination directory ./checkpoint/translate_machine_llama3ko_intsuct_origindata300_01/checkpoint-270 already exists and is non-empty. Saving will proceed but saved results may be invalid.


################################################################################
Step: 280
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------
























 ▲ ▲ ▲ ▲I a to to Spanish.This 입력 sentence The Aarsity를 a technique in machineivariate algebra that desc

Checkpoint destination directory ./checkpoint/translate_machine_llama3ko_intsuct_origindata300_01/checkpoint-280 already exists and is non-empty. Saving will proceed but saved results may be invalid.


################################################################################
Step: 290
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------

















이 a to to Spanish.This 입력 sentence The Aarsity를 a technique in machineivariate algebra that describes thearsity

Checkpoint destination directory ./checkpoint/translate_machine_llama3ko_intsuct_origindata300_01/checkpoint-290 already exists and is non-empty. Saving will proceed but saved results may be invalid.


################################################################################
Step: 300
################################################################################
Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
------------------------------
























I a to to Spanish.The 입력 sentence The Aarsity를 a technique in machineivariate algebra that describes the

Checkpoint destination directory ./checkpoint/translate_machine_llama3ko_intsuct_origindata300_01/checkpoint-300 already exists and is non-empty. Saving will proceed but saved results may be invalid.


TrainOutput(global_step=300, training_loss=0.2014644459883372, metrics={'train_runtime': 897.235, 'train_samples_per_second': 0.669, 'train_steps_per_second': 0.334, 'total_flos': 9466790000295936.0, 'train_loss': 0.2014644459883372, 'epoch': 2.0})

In [25]:
lora_model_save_dir = "./results/translate_machine_llama3ko_intsuct_origindata300_01"

In [26]:
trainer.save_model(lora_model_save_dir)



In [27]:
lora_model_save_dir_01 = "/mnt/t7/dnn/llm_practicing/04_paper_practicing/01_translate_machine/checkpoint/translate_machine_llama3ko_intsuct_origindata300_01/checkpoint-270"

In [28]:
torch.cuda.empty_cache()

In [29]:
from peft import PeftModel

In [30]:
loaded_model = PeftModel.from_pretrained(
    model=model,
    model_id=lora_model_save_dir_01
)

In [31]:
examples = [
    f'''
Translate input sentence to Korean
### Input: {dataset_dict['test']['english'][0]}
### Translated:
''',
    f'''
Translate input sentence to Korean
### Input: {dataset_dict['test']['english'][1]}
### Translated:
''',
 f'''
Translate input sentence to Korean
### Input: {dataset_dict['test']['english'][2]}
### Translated:
''']

In [32]:
example_batch = tokenizer(examples, return_tensors="pt", padding=True)['input_ids'].to(loaded_model.device)

In [33]:
with torch.cuda.amp.autocast():
    output_tokens = loaded_model.generate(example_batch, max_new_tokens = 1024, pad_token_id=tokenizer.pad_token_id)

In [34]:
outputs = [tokenizer.decode(t, skip_special_tokens=True) for t in output_tokens]
for o in outputs:
    print(o)
    print('#'*100)


Translate input sentence to Korean
### Input: Group sparsity is a concept in multilinear algebra that promotes sparsity patterns within groups of variables. This technique is particularly useful in applications involving high-dimensional data, where it helps to identify relevant groups of features. In the context of factor graphs, group sparsity can enhance the efficiency of inference algorithms by reducing the complexity of the graph structure. Multilinear algebra provides the mathematical foundation for understanding and manipulating the interactions between these groups. By leveraging group sparsity and multilinear algebra, factor graphs can be optimized to handle large-scale problems more effectively.
### Translated:
그룹 희소성(group sparsity)은 변수 집합 내에서 희소성 패턴을 촉진하는 다변수 대수(multilinear algebra)의 개념입니다. 이 기술은 고차원 데이터를 포함하는 응용 프로그램에서 특히 유용하며, 관련된 특징 집합을 식별하는 데 도움이 됩니다. 인자 그래프(factor graphs)에서 그룹 희소성(group sparsity)은 그래프 구조의 복잡성을 줄여 추론 알고리즘의 효율성을 향상시킬 수 있습니다. 다변수 대수(multilinear algebra)는