### 월간 데이콘 : 법원 판결 예측 AI 경진대회 <font size = 4><a href='https://dacon.io/competitions/official/236112/overview/description'>자세한 정보</a></font>

제공 데이터셋에는 미국 대법원 사례의 사건의 식별자와 사건의 내용이 담겨 있습니다.

특정 사건에서 첫 번째 당사자와 두 번째 당사자 중 첫 번째 당사자의 승소 여부를 예측하는 AI 모델을 개발해야합니다.


#### 데이터 정보

    train.csv [파일]
    ID : 사건 샘플 ID
    first_party : 사건의 첫 번째 당사자
    second_party : 사건의 두 번째 당사자
    facts : 사건 내용
    first_party_winner : 첫 번째 당사자의 승소 여부 (0 : 패배, 1 : 승리)
    
    test.csv [파일]
    ID : 사건 샘플 ID
    first_party : 사건의 첫 번째 당사자
    second_party : 사건의 두 번째 당사자
    facts : 사건 내용
    
    sample_submission.csv [파일] - 제출 양식
    ID : 사건 샘플 ID
    first_party_winner : 예측한 첫 번째 당사자의 승소 여부 (0 : 패배, 1 : 승리)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### 데이터 가져오기

In [None]:
import pandas as pd
import numpy as np

In [None]:
base_path = '/content/drive/MyDrive/Colab Notebooks/Data Project/Dacon-Project'
train = pd.read_csv(base_path + '/all_project/data/train.csv')
test = pd.read_csv(base_path + '/all_project/data/test.csv')

train.head()

Unnamed: 0,ID,first_party,second_party,facts,first_party_winner
0,TRAIN_0000,Phil A. St. Amant,Herman A. Thompson,"On June 27, 1962, Phil St. Amant, a candidate ...",1
1,TRAIN_0001,Stephen Duncan,Lawrence Owens,Ramon Nelson was riding his bike when he suffe...,0
2,TRAIN_0002,Billy Joe Magwood,"Tony Patterson, Warden, et al.",An Alabama state court convicted Billy Joe Mag...,1
3,TRAIN_0003,Linkletter,Walker,Victor Linkletter was convicted in state court...,0
4,TRAIN_0004,William Earl Fikes,Alabama,"On April 24, 1953 in Selma, Alabama, an intrud...",1


In [None]:
test.head()

Unnamed: 0,ID,first_party,second_party,facts
0,TEST_0000,Salerno,United States,The 1984 Bail Reform Act allowed the federal c...
1,TEST_0001,Milberg Weiss Bershad Hynes and Lerach,"Lexecon, Inc.",Lexecon Inc. was a defendant in a class action...
2,TEST_0002,No. 07-582\t Title: \t Federal Communications ...,"Fox Television Stations, Inc., et al.","In 2002 and 2003, Fox Television Stations broa..."
3,TEST_0003,Harold Kaufman,United States,During his trial for armed robbery of a federa...
4,TEST_0004,Berger,Hanlon,"In 1993, a magistrate judge issued a warrant a..."


#### 데이터 전처리

In [None]:
# first party와 second party 정보가 담긴 party_info_facts 컬럼 추가
party_info = 'First party is ' + train.first_party	+' and Second party is '+train.second_party+'. '+ train.facts
train['party_info_facts'] = party_info
train.head()

Unnamed: 0,ID,first_party,second_party,facts,first_party_winner,party_info_facts
0,TRAIN_0000,Phil A. St. Amant,Herman A. Thompson,"On June 27, 1962, Phil St. Amant, a candidate ...",1,First party is Phil A. St. Amant and Second pa...
1,TRAIN_0001,Stephen Duncan,Lawrence Owens,Ramon Nelson was riding his bike when he suffe...,0,First party is Stephen Duncan and Second party...
2,TRAIN_0002,Billy Joe Magwood,"Tony Patterson, Warden, et al.",An Alabama state court convicted Billy Joe Mag...,1,First party is Billy Joe Magwood and Second pa...
3,TRAIN_0003,Linkletter,Walker,Victor Linkletter was convicted in state court...,0,First party is Linkletter and Second party is ...
4,TRAIN_0004,William Earl Fikes,Alabama,"On April 24, 1953 in Selma, Alabama, an intrud...",1,First party is William Earl Fikes and Second p...


### BERT

In [None]:
!pip install transformers



In [None]:
!pip3 install adamp
!pip install torch_optimizer




In [None]:
from transformers import BertTokenizer
from transformers import BertForSequenceClassification, AdamW, BertConfig
from transformers import get_linear_schedule_with_warmup
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler
from keras.utils import pad_sequences
from sklearn.model_selection import train_test_split
from adamp import AdamP
import torch_optimizer as optim
from sklearn.metrics import accuracy_score, recall_score, precision_score, f1_score
from transformers import TrainingArguments, Trainer
from transformers import EarlyStoppingCallback


import torch
import random
import time
import datetime

### GPU 확인

In [None]:
import os

n_devices = torch.cuda.device_count()
print(n_devices)

for i in range(n_devices):
    print(torch.cuda.get_device_name(i))

1
NVIDIA A100-SXM4-40GB


In [None]:

print(train.shape)
print(test.shape)


(2478, 6)
(1240, 4)


### Input Format

BERT는 특정 형식의 입력 데이터를 필요로 한다.

- special token[sep]은 문장의 끝을 표시하거나 두 문장의 분리할 때 사용한다.
- special token[CLS]은 문장 시작할 때 사용한다. 이 토큰은 분류 문제에 사용되지만, 어떤 문제를 풀더라도 입력해야한다.

- BERT에서 사용되는 단어사전에 있는 토큰
- BERT 토크 나이저의 토큰에 대한 Token ID
- 시퀀스에서 어떤 요소가 토큰이고 패딩 요소인지를 나타내는 Mask ID
- 다른 문장을 구별하는데 사용되는 Segment ID
- 시퀀스 내에서 토큰 위치를 표시하는 데 사용되는 Positional Embeddings

<br>

####  Special Tokens
- [CLS] : 모든 문장의 시작을 알리는 토큰
- [SEP] : 두 문장을 구분해주기 위한 토큰

<br>

BERT는 하나 또는 두개의 문장을 입력으로 사용할 수 있고, 특수 토큰 [SEP]으로 구분한다.

[CLS] 토큰은 항상 텍스트 시작 부분에 나타나며 분류 문제를 해결할 때만 사용되지만, 다른 문제를 풀더라도 입력은 무조건 해야한다.
<br>


**두 문장을 입력하는 경우**

> [CLS] The man went to the store. [SEP] He bought a gallon of milk. [SEP]



**한 문장을 입력하는 경우**

> [CLS] The man went to the store. [SEP]


In [None]:
bert_sentences = ["[CLS] " + str(s) + " [SEP]" for s in train.party_info_facts]
bert_sentences[:2]

['[CLS] First party is Phil A. St. Amant and Second party is Herman A. Thompson. On June 27, 1962, Phil St. Amant, a candidate for public office, made a television speech in Baton Rouge, Louisiana.  During this speech, St. Amant accused his political opponent of being a Communist and of being involved in criminal activities with the head of the local Teamsters Union.  Finally, St. Amant implicated Herman Thompson, an East Baton Rouge deputy sheriff, in a scheme to move money between the Teamsters Union and St. Amant’s political opponent. \nThompson successfully sued St. Amant for defamation.  Louisiana’s First Circuit Court of Appeals reversed, holding that Thompson did not show St. Amant acted with “malice.”  Thompson then appealed to the Supreme Court of Louisiana.  That court held that, although public figures forfeit some of their First Amendment protection from defamation, St. Amant accused Thompson of a crime with utter disregard of whether the remarks were true.  Finally, that c

In [None]:
# 0'과 '1'의 재판 결과 라벨 컬럼 저장

labels = train['first_party_winner'].values
labels


array([1, 0, 1, ..., 0, 0, 0])

#### Tokenization

- original word가 subword로 쪼개짐

- "##ant"는 어떤 단어의 일부, subword라는 뜻. 독립적인 단어 "ant"랑 다르다는 것을 보여주기 위해

- 전체 단어가 BERT vocab에 없으면 subword로 쪼갠다.

- 'OOV' : Out Of Vocabulary

- 'UNK' : UNKnown


In [None]:
tokenizer = BertTokenizer.from_pretrained('bert-base-cased', do_lower_case=False)
tokenized_texts = [tokenizer.tokenize(s) for s in bert_sentences]
print(bert_sentences[0])
print(tokenized_texts[0])

[CLS] First party is Phil A. St. Amant and Second party is Herman A. Thompson. On June 27, 1962, Phil St. Amant, a candidate for public office, made a television speech in Baton Rouge, Louisiana.  During this speech, St. Amant accused his political opponent of being a Communist and of being involved in criminal activities with the head of the local Teamsters Union.  Finally, St. Amant implicated Herman Thompson, an East Baton Rouge deputy sheriff, in a scheme to move money between the Teamsters Union and St. Amant’s political opponent. 
Thompson successfully sued St. Amant for defamation.  Louisiana’s First Circuit Court of Appeals reversed, holding that Thompson did not show St. Amant acted with “malice.”  Thompson then appealed to the Supreme Court of Louisiana.  That court held that, although public figures forfeit some of their First Amendment protection from defamation, St. Amant accused Thompson of a crime with utter disregard of whether the remarks were true.  Finally, that cour

#### 패딩
token들의 max length보다 크게 MAX_LEN을 설정합니다.

설정한 MAX_LEN 만큼 빈 공간을 0이 채웁니다.

이 이후에, 문장의 최대 시퀀스를 설정하여 정수 인코딩과 제로 패딩을 수행해준다.

In [None]:
#token의 max length 찾기
len_list = [ len(token) for idx, token in enumerate(tokenized_texts)]
max_idx = np.where(np.array(len_list) == max(len_list))[0][0]
print(f'최대 시퀀스 : {max(len_list)}')  #1228로 Bert 최대 시쿼스 512 초과

# 512 초과 시퀀스 제거
over_length = list(filter(lambda num: num > 512, len_list))
indices = [index for index, num in enumerate(len_list) if num in over_length]
new_tokenized_texts = [tokenized_texts[i] for i in range(len(tokenized_texts)) if i not in indices]

# label 데이터도 동일하게 처리

new_labels = [labels[i] for i in range(len(labels)) if i not in indices]



print(len(tokenized_texts))
print(len(new_tokenized_texts))
print(len(new_labels))


최대 시퀀스 : 1228
2478
2428
2428


#### 최대 시퀀스 512로 조정
tokenized_texts의 길이가 512 초과하는 데이터 제거

In [None]:
# !pip install nltk
!python -m nltk.downloader all

[nltk_data] Downloading collection 'all'
[nltk_data]    | 
[nltk_data]    | Downloading package abc to /root/nltk_data...
[nltk_data]    |   Package abc is already up-to-date!
[nltk_data]    | Downloading package alpino to /root/nltk_data...
[nltk_data]    |   Package alpino is already up-to-date!
[nltk_data]    | Downloading package averaged_perceptron_tagger to
[nltk_data]    |     /root/nltk_data...
[nltk_data]    |   Package averaged_perceptron_tagger is already up-
[nltk_data]    |       to-date!
[nltk_data]    | Downloading package averaged_perceptron_tagger_ru to
[nltk_data]    |     /root/nltk_data...
[nltk_data]    |   Package averaged_perceptron_tagger_ru is already
[nltk_data]    |       up-to-date!
[nltk_data]    | Downloading package basque_grammars to
[nltk_data]    |     /root/nltk_data...
[nltk_data]    |   Package basque_grammars is already up-to-date!
[nltk_data]    | Downloading package bcp47 to /root/nltk_data...
[nltk_data]    |   Package bcp47 is already up-to-dat

In [None]:
from nltk import sent_tokenize

# train.party_info_facts = train.party_info_facts.apply(lambda x : str(sent_tokenize(x)[0])+''.join(sent_tokenize(x)[2:]))[0]
# train.tail()



In [None]:
# bert_sentences = ["[CLS] " + str(s) + " [SEP]" for s in train.party_info_facts]
# tokenized_texts = [tokenizer.tokenize(s) for s in bert_sentences]

#token의 max length 찾기from_pretrained
len_list = [ len(token) for idx, token in enumerate(new_tokenized_texts)]
max_idx = np.where(np.array(len_list) == max(len_list))[0][0]
print(new_tokenized_texts[0])
print(f'최대 시퀀스 : {max(len_list)}')  #512

['[CLS]', 'First', 'party', 'is', 'Phil', 'A', '.', 'St', '.', 'Am', '##ant', 'and', 'Second', 'party', 'is', 'Herman', 'A', '.', 'Thompson', '.', 'On', 'June', '27', ',', '1962', ',', 'Phil', 'St', '.', 'Am', '##ant', ',', 'a', 'candidate', 'for', 'public', 'office', ',', 'made', 'a', 'television', 'speech', 'in', 'Baton', 'Rouge', ',', 'Louisiana', '.', 'During', 'this', 'speech', ',', 'St', '.', 'Am', '##ant', 'accused', 'his', 'political', 'opponent', 'of', 'being', 'a', 'Communist', 'and', 'of', 'being', 'involved', 'in', 'criminal', 'activities', 'with', 'the', 'head', 'of', 'the', 'local', 'Teams', '##ters', 'Union', '.', 'Finally', ',', 'St', '.', 'Am', '##ant', 'implicated', 'Herman', 'Thompson', ',', 'an', 'East', 'Baton', 'Rouge', 'deputy', 'sheriff', ',', 'in', 'a', 'scheme', 'to', 'move', 'money', 'between', 'the', 'Teams', '##ters', 'Union', 'and', 'St', '.', 'Am', '##ant', '’', 's', 'political', 'opponent', '.', 'Thompson', 'successfully', 'sued', 'St', '.', 'Am', '##ant

In [None]:
MAX_LEN = 512 #최대 시퀀스 길이 설정
input_ids = [tokenizer.convert_tokens_to_ids(x) for x in new_tokenized_texts]
input_ids = pad_sequences(input_ids, maxlen=MAX_LEN, dtype="long", truncating="post", padding="post")
input_ids[max_idx]

array([  101,  1752,  1710,  1110,  8067, 22686,   117,  3084,   119,
        2393,   119,  1105,  2307,  1710,  1110,  4769,  2250,  1104,
        3398,   119,  1130,  1347,  1816,   117,  1210, 24574,  5680,
       12555,  8390,  2310,  1146,  1107,  1126,   170, 11090,  1298,
        1107,  6167,   119,  3841,  1103,  4475,  1127,  2022,   158,
         119,   156,   119,  4037,   117,  1150, 14007,  5770,   170,
        9680,  1222,  1103,  4769,  2250,  1104,  3398,  1111,  1157,
        1648,  1107,  3558,  2578,  1619,  1106,  1103, 19450,   119,
        1130,  1704,   117, 14611,  6670,  1132, 11650,  1121,  9680,
         117,  1133,  1103,  4201, 24600,   146,  6262, 19782,  4338,
        2173,   113,   107,   143, 13882,  1592,   107,   114,  2790,
        1126,  5856,  1106,  1115, 17523,  1107,  2740,  1104,  1352,
         118,  5988,  1104, 12010,   119,   138,  1629,  3942,  1107,
         141,   119,   140,   119,  2242,   170,   109,  5729,   119,
         126,  1550,

#### 어텐션 마스크
텐션 마스크란 0 값을 가지는 패딩 토큰에 대해서 어텐션 연산을 불필요하게 수행하지 않도록 단어와 패딩 토큰을 구분할 수 있게 알려주는 것을 말한다.

 패딩된 값은 '0', 패딩되지 않은 단어는 '1'의 값을 갖는다.


In [None]:
attention_masks = []

for seq in input_ids:
    seq_mask = [float(i>0) for i in seq]
    attention_masks.append(seq_mask)

attention_masks[0]


[1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0

### 훈련셋과 검증셋으로 분리하기

어텐션 마스크도 함께 훈련셋과 검증셋으로 분리하고, 데이터를 모두 파이토치 텐서로 변환시킨다


In [None]:
train_X, val_X, train_y, val_y = train_test_split(input_ids,new_labels,random_state=42,test_size=0.2)

train_masks, val_masks, _, _ = train_test_split(attention_masks,
                                                       input_ids,
                                                       random_state=42,
                                                       test_size=0.2)

# 파이토치 텐서로 변환
train_inputs = torch.tensor(train_X)
train_labels = torch.tensor(train_y)
train_masks = torch.tensor(train_masks)
validation_inputs = torch.tensor(val_X)
validation_labels = torch.tensor(val_y)
validation_masks = torch.tensor(val_masks)


#### 데이터로더 설정
입력데이터, 어텐션 마스크, 라벨을 하나의 데이터로 묶어 train_dataloader, validation_dataloader라는 입력데이터를 생성

In [None]:
!pip install wandb --upgrade


Collecting wandb
  Downloading wandb-0.15.4-py3-none-any.whl (2.1 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.1 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m2.1/2.1 MB[0m [31m105.3 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m57.6 MB/s[0m eta [36m0:00:00[0m
Collecting GitPython!=3.1.29,>=1.0.0 (from wandb)
  Downloading GitPython-3.1.31-py3-none-any.whl (184 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m184.3/184.3 kB[0m [31m24.4 MB/s[0m eta [36m0:00:00[0m
Collecting sentry-sdk>=1.0.0 (from wandb)
  Downloading sentry_sdk-1.26.0-py2.py3-none-any.whl (209 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m209.4/209.4 kB[0m [31m27.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting docker-pycreds>=0.4.0 (from wandb)
  Downloading docker_pycreds-0.4.0-py2.py3-none-any.whl (9.0 kB)

In [None]:
import wandb

wandb.login()

[34m[1mwandb[0m: Currently logged in as: [33mjyunxxxxx[0m. Use [1m`wandb login --relogin`[0m to force relogin


True

In [None]:
config={
    "learning_rate": 2e-5,
    "architecture": "BERT",
    "epochs": 5,
    'weight_decay' : 1e-2,
    'batch_size' : 10,
     'seed' : 42
}

wandb.init(config = config)

In [None]:
def get_train_validation_dataloader(batch_size, train_inputs, train_masks, train_labels, validation_inputs, validation_masks, validation_labels ):
  train_data = TensorDataset(train_inputs, train_masks, train_labels)
  train_sampler = RandomSampler(train_data)
  train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

  validation_data = TensorDataset(validation_inputs, validation_masks, validation_labels)
  validation_sampler = SequentialSampler(validation_data)
  validation_dataloader = DataLoader(validation_data, sampler=validation_sampler, batch_size=batch_size)

  return train_dataloader, validation_dataloader


batch_size = wandb.config.batch_size
train_dataloader, validation_dataloader =  get_train_validation_dataloader(batch_size, train_inputs, train_masks, train_labels, validation_inputs, validation_masks, validation_labels )


### 테스트셋 전처리
Train 데이터와 동일하게 전처리해준다

In [None]:
test.tail()

Unnamed: 0,ID,first_party,second_party,facts
1235,TEST_1235,"Haitian Centers Council, Inc., et al.","Chris Sale, Acting Commissioner, Immigration A...",According to Executive Order No. 12807 signed ...
1236,TEST_1236,Whitman,"American Trucking Associations, Inc.",Section 109(a) of the Clean Air Act (CAA) requ...
1237,TEST_1237,Linda A. Matteo and John J. Madigan,William G. Barr,Linda Matteo and John Madigan created a plan f...
1238,TEST_1238,Washington State Apple Advertising Commission,Hunt,"In 1972, the North Carolina Board of Agricultu..."
1239,TEST_1239,Theodore Stovall,"Wilfred Denno, Warden","On August 23, 1961, Dr. Paul Berheldt was stab..."


In [None]:
# first party와 second party 정보가 담긴 party_info_facts 컬럼 추가
party_info = 'First party is ' + test.first_party	+' and Second party is '+test.second_party+'. '+ test.facts
test['party_info_facts'] = party_info


# [CLS] + 문장 + [SEP]
bert_sentences = ["[CLS] " + str(s) + " [SEP]" for s in test.party_info_facts]


# Word 토크나이저 토큰화
tokenizer = BertTokenizer.from_pretrained('bert-base-cased', do_lower_case=False)
tokenized_texts_test = [tokenizer.tokenize(sent) for sent in bert_sentences]


print('tokenized_texts_test size : ',len(tokenized_texts_test))



tokenized_texts_test size :  1240


In [None]:
# 시퀀스 설정 및 패딩
input_ids = [tokenizer.convert_tokens_to_ids(x) for x in tokenized_texts_test]
input_ids = pad_sequences(input_ids, maxlen=MAX_LEN, dtype="long", truncating="post", padding="post")

# 어텐션 마스크
attention_masks = []
for seq in input_ids:
    seq_mask = [float(i>0) for i in seq]
    attention_masks.append(seq_mask)

# 파이토치 텐서로 변환
test_inputs = torch.tensor(input_ids)
test_masks = torch.tensor(attention_masks)



### 모델 학습

In [None]:
# GPU 설정
if torch.cuda.is_available():
    device = torch.device("cuda")
    print('There are %d GPU(s) available.' % torch.cuda.device_count())
    print('We will use the GPU:', torch.cuda.get_device_name(0))
else:
    device = torch.device("cpu")
    print('No GPU available, using the CPU instead.')


There are 1 GPU(s) available.
We will use the GPU: NVIDIA A100-SXM4-40GB


### BERT 모델 생성

In [None]:
model = BertForSequenceClassification.from_pretrained("bert-base-cased", num_labels=2 , output_attentions = False, output_hidden_states = False,) # 이진분류
model.cuda()

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initi

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(28996, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12,


#### 옵티마이, 스케줄러 설정

- AdamW
- AdamP
- RAdam


In [None]:
# 옵티마이저
optimizer_AdamW = AdamW(model.parameters(),
                  lr = wandb.config.learning_rate, # 학습률(learning rate)
                  eps = 1e-8,
                  weight_decay=wandb.config.weight_decay  # 가중치 감쇠(L2 정규화)
                )
optimizer_AdamP = AdamP(model.parameters(),
                  lr = wandb.config.learning_rate, # 학습률(learning rate)
                  betas=(0.9, 0.999),
                  weight_decay=wandb.config.weight_decay,
                  eps = 1e-8
                )




epochs =  wandb.config.epochs

# 총 훈련 스텝
total_steps = len(train_dataloader) * epochs

# 스케줄러 생성 : Learning rate decay
scheduler_AdamW = get_linear_schedule_with_warmup(optimizer_AdamW,
                                            num_warmup_steps = 0,
                                            num_training_steps = total_steps)
scheduler_AdamP = get_linear_schedule_with_warmup(optimizer_AdamP,
                                            num_warmup_steps = 0,
                                            num_training_steps = total_steps)
scheduler_RAdam = get_linear_schedule_with_warmup(optimizer_RAdam,
                                            num_warmup_steps = 0,
                                            num_training_steps = total_steps)




### 모델 학습

### Hyperparemeter Tunning

batch_Size = 10 가장 성능이 좋음

#### 1. batch_size 조정

#### batch_size : 10

	 - Validation Precision: 0.6684
	 - Validation Recall: 0.9024
	 - Validation Specificity: NaN
	 - Validation F1: 0.7585
   
#### batch_size : 16
	 - Validation Accuracy: 0.5927
	 - Validation Precision: 0.6642
	 - Validation Recall: 0.7800
	 - Validation Specificity: 0.2625
	 - Validation F1: 0.7074

#### batch_size : 5
    - Validation Accuracy: 0.5857
    - Validation Precision: 0.6466
    - Validation Recall: 0.7689
    - Validation Specificity: 0.2349
    - Validation F1: nan


#### 함수 생성

In [None]:
# 정확도 계산 함수
def accuracy_measure(preds, labels):

    pred_flat = np.argmax(preds, axis=1).flatten()
    labels_flat = labels.flatten()

    return np.sum(pred_flat == labels_flat) / len(labels_flat)

# 시간 표시 함수
def time_elapsed(elapsed):

    # 반올림
    elapsed_rounded = int(round((elapsed)))

    # hh:mm:ss으로 형태 변경
    return str(datetime.timedelta(seconds=elapsed_rounded))

In [None]:
def calc_tp(preds, labels):
  '''Returns True Positives (TP): count of correct predictions of actual class 1'''
  return sum([preds == labels and preds == 1 for preds, labels in zip(preds, labels)])

def calc_fp(preds, labels):
  '''Returns False Positives (FP): count of wrong predictions of actual class 1'''
  return sum([preds != labels and preds == 1 for preds, labels in zip(preds, labels)])

def calc_tn(preds, labels):
  '''Returns True Negatives (TN): count of correct predictions of actual class 0'''
  return sum([preds == labels and preds == 0 for preds, labels in zip(preds, labels)])

def calc_fn(preds, labels):
  '''Returns False Negatives (FN): count of wrong predictions of actual class 0'''
  return sum([preds != labels and preds == 0 for preds, labels in zip(preds, labels)])

def get_metrics(preds, labels):
  '''
  Returns the following metrics:
    - accuracy    = (TP + TN) / N
    - precision   = TP / (TP + FP)
    - recall      = TP / (TP + FN)
    - specificity = TN / (TN + FP)
  '''
  preds = np.argmax(preds, axis = 1).flatten()
  labels = labels.flatten()
  tp = calc_tp(preds, labels)
  tn = calc_tn(preds, labels)
  fp = calc_fp(preds, labels)
  fn = calc_fn(preds, labels)
  b_accuracy = (tp + tn) / len(labels)
  b_precision = tp / (tp + fp) if (tp + fp) > 0 else 'nan'
  b_recall = tp / (tp + fn) if (tp + fn) > 0 else 'nan'
  b_specificity = tn / (tn + fp) if (tn + fp) > 0 else 'nan'
  if b_precision != 'nan' and b_recall != 'nan':
        b_f1 = 2*((b_precision*b_recall)/(b_precision+b_recall))
  else :
        b_f1 = 'nan'

  return b_accuracy, b_precision, b_recall, b_specificity,  b_f1

In [None]:
def model_train(model_case, optimizer, scheduler, train_dataloader, validation_dataloader):
  #랜덤시드 고정
  seed_val = wandb.config.seed
  random.seed(seed_val)
  np.random.seed(seed_val)
  torch.manual_seed(seed_val)
  torch.cuda.manual_seed_all(seed_val)

  #그래디언트 초기화
  model.zero_grad()

  # 학습
  for epoch_i in range(0, epochs):

      print("")
      print('======== Train Epoch {:} / {:} ========'.format(epoch_i + 1, epochs))
      print('Training...')

      # 시작 시간 설정
      t0 = time.time()

      total_loss = 0

      # 훈련모드로 변경
      model.train()

      # 데이터로더에서 배치만큼 반복하여 가져옴
      for step, batch in enumerate(train_dataloader):
          # 경과 정보 표시
          if step % 500 == 0 and not step == 0:
              elapsed = time_elapsed(time.time() - t0)
              print('  Batch {:>5,}  of  {:>5,}.    Elapsed: {:}.'.format(step, len(train_dataloader), elapsed))

          # 배치를 GPU에 넣음
          batch = tuple(t.to(device) for t in batch)

          # 배치에서 데이터 추출
          b_input_ids, b_input_mask, b_labels = batch

          # Forward 수행
          outputs = model(b_input_ids,
                          token_type_ids=None,
                          attention_mask=b_input_mask,
                          labels=b_labels)

          # 로스 구함
          loss = outputs[0]

          # 총 로스 계산
          total_loss += loss.item()

          # Backward 수행으로 그래디언트 계산
          loss.backward()

          # 그래디언트 클리핑
          torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

          # 그래디언트를 통해 가중치 파라미터 업데이트
          optimizer.step()

          # 스케줄러로 학습률 감소
          scheduler.step()

          # 그래디언트 초기화
          model.zero_grad()

      # 평균 로스 계산
      avg_train_loss = total_loss / len(train_dataloader)

      print("")
      print("  Average training loss: {0:.2f}".format(avg_train_loss))
      print("  Training epcoh took: {:}".format(time_elapsed(time.time() - t0)))



      print()
      print("Validation...")

      #시작 시간 설정
      t0 = time.time()

      # 평가모드로 변경
      model.eval()

      # 변수 초기화
      eval_loss, eval_accuracy = 0, 0
      nb_eval_steps, nb_eval_examples = 0, 0

       # Tracking variables
      val_accuracy = []
      val_precision = []
      val_recall = []
      val_specificity = []
      val_f1 = []


      # 데이터로더에서 배치만큼 반복하여 가져옴
      for batch in validation_dataloader:
          # 배치를 GPU에 넣음
          batch = tuple(t.to(device) for t in batch)

          # 배치에서 데이터 추출
          b_input_ids, b_input_mask, b_labels = batch
          # 그래디언트 계산 안함
          with torch.no_grad():
              # Forward 수행
              outputs = model(b_input_ids,
                              token_type_ids=None,
                              attention_mask=b_input_mask)

          # 로스 구함
          logits = outputs[0]

          # CPU로 데이터 이동
          logits = logits.detach().cpu().numpy()
          label_ids = b_labels.to('cpu').numpy()

          # 출력 로짓과 라벨을 비교하여 정확도 계산
          tmp_eval_accuracy = accuracy_measure(logits, label_ids)
          eval_accuracy += tmp_eval_accuracy
          nb_eval_steps += 1

          b_accuracy, b_precision, b_recall, b_specificity, b_f1 = get_metrics(logits, label_ids)
          val_accuracy.append(b_accuracy)
          # Update precision only when (tp + fp) !=0; ignore nan
          if b_precision != 'nan': val_precision.append(b_precision)
          # Update recall only when (tp + fn) !=0; ignore nan
          if b_recall != 'nan': val_recall.append(b_recall)
          # Update specificity only when (tn + fp) !=0; ignore nan
          if b_specificity != 'nan': val_specificity.append(b_specificity)
           # Update specificity only when (tn + fp) !=0; ignore nan
          if b_f1 != 'nan': val_f1.append(b_f1)



      print("  Accuracy: {0:.2f}".format(eval_accuracy/nb_eval_steps))
      print("  Validation took: {:}".format(time_elapsed(time.time() - t0)))

      print('\t - Validation Accuracy: {:.4f}'.format(sum(val_accuracy)/len(val_accuracy)))
      print('\t - Validation Precision: {:.4f}'.format(sum(val_precision)/len(val_precision)) if len(val_precision)>0 else '\t - Validation Precision: NaN')
      print('\t - Validation Recall: {:.4f}'.format(sum(val_recall)/len(val_recall)) if len(val_recall)>0 else '\t - Validation Recall: NaN')
      print('\t - Validation Specificity: {:.4f}'.format(sum(val_specificity)/len(val_specificity)) if len(val_specificity)>0 else '\t - Validation Specificity: NaN')
      print('\t - Validation F1: {:.4f}\n'.format(sum(val_f1)/len(val_f1)) if len( val_f1)>0  else'\t - Validation F1: NaN')




  print()
  print("======== COMPLETE ========")

  add_result(model_case, val_accuracy[-1], val_precision[-1],val_recall[-1], val_f1[-1],  batch_size, epochs )

In [None]:
!pip install accelerate -U



In [None]:
result_df = pd.DataFrame({'case' : [],
              'Accuracy ' : [],
              'Precision':[],
              'Recal':[],
              'F1':[],
              'batch_size' : [],
              'epochs' : []
              })

In [None]:
def add_result(model_type, accuracy, precision, recall, f1, batch_size, epochs):
  result_df.loc[len(result_df)] = [model_type, accuracy, precision, recall, f1, batch_size, epochs]

#### 모델 학습
스케줄러 생성
- optimizer_AdamW, scheduler_AdamW
- optimizer_AdamP, scheduler_AdamP
- optimizer_RAdam, scheduler_RAdam

In [None]:
model_train('optimizer_AdamW', optimizer_AdamW, scheduler_AdamW,  train_dataloader, validation_dataloader)
model_train('optimizer_AdamP', optimizer_AdamP, scheduler_AdamP,  train_dataloader, validation_dataloader)
model_train('optimizer_RAdam', optimizer_RAdam, scheduler_RAdam,  train_dataloader, validation_dataloader)



Training...

  Average training loss: 0.10
  Training epcoh took: 0:00:46

Validation...
  Accuracy: 0.65
  Validation took: 0:00:03
	 - Validation Accuracy: 0.6456
	 - Validation Precision: 0.6684
	 - Validation Recall: 0.9024
	 - Validation Specificity: NaN
	 - Validation F1: 0.7585


Training...

  Average training loss: 0.27
  Training epcoh took: 0:00:46

Validation...
  Accuracy: 0.65
  Validation took: 0:00:03
	 - Validation Accuracy: 0.6456
	 - Validation Precision: 0.6684
	 - Validation Recall: 0.9024
	 - Validation Specificity: NaN
	 - Validation F1: 0.7585


Training...

  Average training loss: 0.40
  Training epcoh took: 0:00:46

Validation...
  Accuracy: 0.65
  Validation took: 0:00:03
	 - Validation Accuracy: 0.6456
	 - Validation Precision: 0.6684
	 - Validation Recall: 0.9024
	 - Validation Specificity: NaN
	 - Validation F1: 0.7585


Training...

  Average training loss: 0.41
  Training epcoh took: 0:00:46

Validation...
  Accuracy: 0.65
  Validation took: 0:00:03
	 

  b_f1 = 2*((b_precision*b_recall)/(b_precision+b_recall))


  Accuracy: 0.52
  Validation took: 0:00:03
	 - Validation Accuracy: 0.5245
	 - Validation Precision: 0.6758
	 - Validation Recall: 0.5387
	 - Validation Specificity: NaN
	 - Validation F1: nan


Training...

  Average training loss: 0.03
  Training epcoh took: 0:00:46

Validation...
  Accuracy: 0.62
  Validation took: 0:00:03
	 - Validation Accuracy: 0.6177
	 - Validation Precision: 0.6571
	 - Validation Recall: 0.8661
	 - Validation Specificity: NaN
	 - Validation F1: 0.7355


Training...

  Average training loss: 0.07
  Training epcoh took: 0:00:46

Validation...
  Accuracy: 0.64
  Validation took: 0:00:03
	 - Validation Accuracy: 0.6422
	 - Validation Precision: 0.6635
	 - Validation Recall: 0.9184
	 - Validation Specificity: NaN
	 - Validation F1: 0.7600


Training...

  Average training loss: 0.04
  Training epcoh took: 0:00:46

Validation...
  Accuracy: 0.63
  Validation took: 0:00:03
	 - Validation Accuracy: 0.6320
	 - Validation Precision: 0.6735
	 - Validation Recall: 0.8592


### 성능 평가



In [None]:
batch_size = 20 #10 -> 20
train_dataloader, validation_dataloader =  get_train_validation_dataloader(batch_size, train_inputs, train_masks, train_labels, validation_inputs, validation_masks, validation_labels )
model_train('optimizer_AdamW', optimizer_AdamW, scheduler_AdamW,  train_dataloader, validation_dataloader)


In [None]:
batch_size = 5 #10 -> 20 -> 5
train_dataloader, validation_dataloader =  get_train_validation_dataloader(batch_size, train_inputs, train_masks, train_labels, validation_inputs, validation_masks, validation_labels )
model_train('optimizer_AdamW', optimizer_AdamW, scheduler_AdamW,  train_dataloader, validation_dataloader)



Training...

  Average training loss: 0.00
  Training epcoh took: 0:00:51

Validation...


  b_f1 = 2*((b_precision*b_recall)/(b_precision+b_recall))


  Accuracy: 0.59
  Validation took: 0:00:04
	 - Validation Accuracy: 0.5857
	 - Validation Precision: 0.6466
	 - Validation Recall: 0.7689
	 - Validation Specificity: 0.2349
	 - Validation F1: nan


Training...

  Average training loss: 0.03
  Training epcoh took: 0:00:51

Validation...
  Accuracy: 0.59
  Validation took: 0:00:04
	 - Validation Accuracy: 0.5857
	 - Validation Precision: 0.6466
	 - Validation Recall: 0.7689
	 - Validation Specificity: 0.2349
	 - Validation F1: nan


Training...

  Average training loss: 0.01
  Training epcoh took: 0:00:51

Validation...
  Accuracy: 0.59
  Validation took: 0:00:04
	 - Validation Accuracy: 0.5857
	 - Validation Precision: 0.6466
	 - Validation Recall: 0.7689
	 - Validation Specificity: 0.2349
	 - Validation F1: nan


Training...

  Average training loss: 0.03
  Training epcoh took: 0:00:51

Validation...
  Accuracy: 0.59
  Validation took: 0:00:04
	 - Validation Accuracy: 0.5857
	 - Validation Precision: 0.6466
	 - Validation Recall: 0.76

#### 2. Learning Rate 조정

In [None]:
optimizer_AdamW = AdamW(model.parameters(),
                  lr = 1e-5, # 학습률(learning rate)
                  eps = 1e-8,
                  weight_decay=wandb.config.weight_decay  # 가중치 감쇠(L2 정규화)
                )
# 에폭수
epochs = 5

# 총 훈련 스텝
total_steps = len(train_dataloader) * epochs

scheduler_AdamW = get_linear_schedule_with_warmup(optimizer_AdamW,
                                            num_warmup_steps = 0,
                                            num_training_steps = total_steps)


train_dataloader, validation_dataloader =  get_train_validation_dataloader(wandb.config.batch_size , train_inputs, train_masks, train_labels, validation_inputs, validation_masks, validation_labels )
model_train('optimizer_AdamW', optimizer_AdamW, scheduler_AdamW,  train_dataloader, validation_dataloader)



Training...

  Average training loss: 0.01
  Training epcoh took: 0:00:46

Validation...
  Accuracy: 0.55
  Validation took: 0:00:03
	 - Validation Accuracy: 0.5463
	 - Validation Precision: 0.6522
	 - Validation Recall: 0.6555
	 - Validation Specificity: 0.3584
	 - Validation F1: 0.6354


Training...

  Average training loss: 0.03
  Training epcoh took: 0:00:46

Validation...
  Accuracy: 0.60
  Validation took: 0:00:03
	 - Validation Accuracy: 0.5986
	 - Validation Precision: 0.6532
	 - Validation Recall: 0.8106
	 - Validation Specificity: 0.1849
	 - Validation F1: 0.7118


Training...

  Average training loss: 0.03
  Training epcoh took: 0:00:46

Validation...
  Accuracy: 0.59
  Validation took: 0:00:03
	 - Validation Accuracy: 0.5918
	 - Validation Precision: 0.6524
	 - Validation Recall: 0.8171
	 - Validation Specificity: 0.1716
	 - Validation F1: 0.7127


Training...

  Average training loss: 0.02
  Training epcoh took: 0:00:46

Validation...
  Accuracy: 0.60
  Validation took: 0

#### 3. epochs 증가

In [None]:
optimizer_AdamW = AdamW(model.parameters(),
                  lr = wandb.config.learning_rate, # 학습률(learning rate)
                  eps = 1e-8,
                  weight_decay=wandb.config.weight_decay  # 가중치 감쇠(L2 정규화)
                )
# 에폭수
epochs = 10

# 총 훈련 스텝
total_steps = len(train_dataloader) * epochs

scheduler_AdamW = get_linear_schedule_with_warmup(optimizer_AdamW,
                                            num_warmup_steps = 0,
                                            num_training_steps = total_steps)


train_dataloader, validation_dataloader =  get_train_validation_dataloader(wandb.config.batch_size , train_inputs, train_masks, train_labels, validation_inputs, validation_masks, validation_labels )
model_train('optimizer_AdamW', optimizer_AdamW, scheduler_AdamW,  train_dataloader, validation_dataloader)




Training...

  Average training loss: 0.02
  Training epcoh took: 0:00:46

Validation...
  Accuracy: 0.57
  Validation took: 0:00:03
	 - Validation Accuracy: 0.5714
	 - Validation Precision: 0.6613
	 - Validation Recall: 0.7208
	 - Validation Specificity: 0.3000
	 - Validation F1: 0.6708


Training...

  Average training loss: 0.13
  Training epcoh took: 0:00:46

Validation...
  Accuracy: 0.63
  Validation took: 0:00:03
	 - Validation Accuracy: 0.6299
	 - Validation Precision: 0.6654
	 - Validation Recall: 0.8939
	 - Validation Specificity: 0.1492
	 - Validation F1: 0.7492


Training...

  Average training loss: 0.03
  Training epcoh took: 0:00:46

Validation...
  Accuracy: 0.62
  Validation took: 0:00:03
	 - Validation Accuracy: 0.6211
	 - Validation Precision: 0.6500
	 - Validation Recall: 0.8915
	 - Validation Specificity: 0.1070
	 - Validation F1: 0.7431


Training...

  Average training loss: 0.02
  Training epcoh took: 0:00:46

Validation...
  Accuracy: 0.58
  Validation took: 0

In [None]:
### 모델 선택
batch_size = 10
epochs = 2

In [None]:
model = BertForSequenceClassification.from_pretrained("bert-base-cased", num_labels=2 , output_attentions = False, output_hidden_states = False,) # 이진분류
model.cuda()

# 옵티마이저
optimizer_AdamW = AdamW(model.parameters(),
                  lr = wandb.config.learning_rate, # 학습률(learning rate)
                  eps = 1e-8,
                  weight_decay=wandb.config.weight_decay  # 가중치 감쇠(L2 정규화)
                )
optimizer_AdamP = AdamP(model.parameters(),
                  lr = wandb.config.learning_rate, # 학습률(learning rate)
                  betas=(0.9, 0.999),
                  weight_decay=wandb.config.weight_decay,
                  eps = 1e-8
                )

optimizer_RAdam = optim.RAdam(model.parameters(),
                  lr = wandb.config.learning_rate, # 학습률(learning rate)
                  betas=(0.9, 0.999),
                  weight_decay=wandb.config.weight_decay,
                  eps = 1e-8,
                )



epochs =  2

# 총 훈련 스텝
total_steps = len(train_dataloader) * epochs

# 스케줄러 생성 : Learning rate decay
scheduler_AdamW = get_linear_schedule_with_warmup(optimizer_AdamW,
                                            num_warmup_steps = 0,
                                            num_training_steps = total_steps)
scheduler_AdamP = get_linear_schedule_with_warmup(optimizer_AdamP,
                                            num_warmup_steps = 0,
                                            num_training_steps = total_steps)
scheduler_RAdam = get_linear_schedule_with_warmup(optimizer_RAdam,
                                            num_warmup_steps = 0,
                                            num_training_steps = total_steps)


train_dataloader, validation_dataloader =  get_train_validation_dataloader(wandb.config.batch_size , train_inputs, train_masks, train_labels, validation_inputs, validation_masks, validation_labels )

model_train('optimizer_AdamW', optimizer_AdamW, scheduler_AdamW,  train_dataloader, validation_dataloader)


Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initi


Training...

  Average training loss: 0.64
  Training epcoh took: 0:00:46

Validation...
  Accuracy: 0.65
  Validation took: 0:00:03
	 - Validation Accuracy: 0.6517
	 - Validation Precision: 0.6517
	 - Validation Recall: 1.0000
	 - Validation Specificity: 0.0000
	 - Validation F1: 0.7808


Training...

  Average training loss: 0.64
  Training epcoh took: 0:00:45

Validation...
  Accuracy: 0.65
  Validation took: 0:00:03
	 - Validation Accuracy: 0.6517
	 - Validation Precision: 0.6517
	 - Validation Recall: 1.0000
	 - Validation Specificity: 0.0000
	 - Validation F1: 0.7808




In [None]:
# 문장 테스트
def set_eval(test_input, test_masks):

    # 평가모드로 변경
    model.eval()


    b_input_ids = test_input.to(device)
    b_input_mask = test_masks.to(device)


    with torch.no_grad():
        # Forward 수행
        outputs = model(b_input_ids,
                        token_type_ids=None,
                        attention_mask=b_input_mask)



    logits = outputs[0]


    logits = logits.detach().cpu().numpy()

    return logits

In [None]:
#out of memory 문제 방지

batch_size = 20

# 입력 데이터를 배치로 분할
input_batches = [test_inputs[i:i+batch_size] for i in range(0, len(test_inputs), batch_size)]
mask_batches = [test_masks[i:i+batch_size] for i in range(0, len(test_masks), batch_size)]

logits_list = []

# 각 배치에 대해 순차적으로 Forward 수행
for input_batch, mask_batch in zip(input_batches, mask_batches):
    b_input_ids = input_batch.to(device)
    b_input_mask = mask_batch.to(device)

    with torch.no_grad():
        outputs = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask)

    logits = outputs[0].detach().cpu().numpy()
    logits_list.append(logits)

# 모든 배치의 logits를 하나로 합침
logits = np.concatenate(logits_list, axis=0)

In [None]:
predict_list = [ np.argmax(logit) for logit in logits]
print(len(predict_list))
predict_list[-20:]

1240


[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

In [None]:
test.ID.values

array(['TEST_0000', 'TEST_0001', 'TEST_0002', ..., 'TEST_1237',
       'TEST_1238', 'TEST_1239'], dtype=object)

In [None]:
sub5 = pd.DataFrame({
    'ID' : test.ID.values,
    'first_party_winner' : predict_list
})

sub5

Unnamed: 0,ID,first_party_winner
0,TEST_0000,1
1,TEST_0001,1
2,TEST_0002,1
3,TEST_0003,1
4,TEST_0004,1
...,...,...
1235,TEST_1235,1
1236,TEST_1236,1
1237,TEST_1237,1
1238,TEST_1238,1


In [None]:
sub5['first_party_winner'].value_counts()

1    1240
Name: first_party_winner, dtype: int64

In [None]:
sub5.to_csv('sub5', index = False)