IBDB 영화평 데이터 > 감성분류를 위한 트랜스포머 아키택쳐 모델 구축

1. 정수 토큰 시퀀스(길이 80)입력

2. 토큰 임베딩 + 위치 임베딩

3. 멀티 헤드 어텐션 (3head)

4. concate + 정규화

5. FFN (Dense + Dense)

6. concate + 정규화

7. 분류기 (Dense)

# 1. 정수 토큰 시퀀스(길이 80)입력

In [1]:
import tensorflow as tf
from tensorflow.keras import Model, layers

In [4]:
inputs = layers.Input(shape=(80, ))
input_embedding = layers.Embedding(input_dim = 1000, output_dim = 32)(inputs)


I0000 00:00:1757040717.474825   21728 gpu_device.cc:2020] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1916 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.9


# 2. 토큰 임베딩 + 위치 임베딩

In [5]:
# 위치 임베딩
positions = tf.range(start = 0, limit = 80)
pos_embedding = layers.Embedding(input_dim = 80, output_dim = 32)(positions)
pos_enc_output = pos_embedding + input_embedding

합치면, 모델은 **“어떤 단어가 어디에 있는지”**를 학습 가능

# 3. 멀티 헤드 어텐션 (3head)

In [9]:
attention_output = layers.MultiHeadAttention(num_heads = 3, key_dim = 32)(pos_enc_output,pos_enc_output)

어텐션을 여러 번 동시에 수행해서, 모델이 입력 시퀀스의 다양한 관계와 패턴을 학습할 수 있도록한다.
약간 컨볼루샨의 필터같은 느낌.

문장: “The cat sat on the mat”

Head 1: “cat ↔ sat” 관계 집중 → 주어-동사 관계 파악

Head 2: “cat ↔ mat” 관계 집중 → 장소 관계 파악

Head 3: “sat ↔ mat” 관계 집중 → 동작-장소 관계 파악

→ 여러 헤드가 합쳐져 문장의 다양한 의미와 관계를 풍부하게 표현

# 4. concate + 정규화

In [14]:
x = layers.add([pos_enc_output, attention_output])
x = layers.BatchNormalization()(x)

단어+위치 임베딩 합치면 값이 들쭉날쭉 → 모델이 어느 정보에 집중할지 헷갈림

정규화하면 값 범위를 일정하게 맞춰서, 멀티헤드 어텐션이 각 단어 관계를 정확하게 학습 가능


# 5. FFN (Dense + Dense)

# 6. concate + 정규화

In [17]:
from tensorflow.keras.models import Sequential
ffnn = Sequential([
    layers.Dense(64, activation = 'relu'),
    layers.Dense(32, activation ='relu')
    ])(x)
x = layers.add([ffnn, x])
x = layers.BatchNormalization()(x)

어텐션/FFN 후 정규화 모두 같은 목적: 값 분포를 일정하게 유지해서 학습 안정화

# 7. 분류기 (Dense)

In [20]:
x = layers.GlobalAveragePooling1D()(x)
x = layers.Dropout(0.1)(x)
x = layers.Dense(64, activation = 'relu')(x)
x = layers.Dropout(0.1)(x)
outputs = layers.Dense(2, activation ='softmax')(x)

In [22]:
# 모델구성
model = Model(inputs = inputs, outputs= outputs)

In [23]:
model.compile(loss ='sparse_categorical_crossentropy'
              , optimizer = 'adam'
              , metrics = ['accuracy'])

In [None]:
# 데이터가지고오기
from tensorflow.keras.datasets import imdb
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words = 10000)

In [25]:
# 텍스트 데이터의 전처리
from tensorflow.keras.preprocessing.sequence import pad_sequences
# 영화평안의 사이즈를 맞추기 위해서 패딩을 사용
X_train_pad = pad_sequences(X_train, maxlen = 80, padding = 'post', truncating = 'post')
X_test_pad = pad_sequences(X_test, maxlen = 80, padding = 'post', truncating = 'post') 

In [26]:
model.fit(X_train_pad, y_train, epochs = 10, batch_size = 200)

Epoch 1/10


2025-09-05 12:26:05.847238: I external/local_xla/xla/service/service.cc:163] XLA service 0x75a72c009170 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2025-09-05 12:26:05.847268: I external/local_xla/xla/service/service.cc:171]   StreamExecutor device (0): NVIDIA GeForce RTX 4060 Laptop GPU, Compute Capability 8.9
2025-09-05 12:26:05.910896: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2025-09-05 12:26:06.229771: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:473] Loaded cuDNN version 91200
2025-09-05 12:26:06.414132: I external/local_xla/xla/service/gpu/autotuning/dot_search_space.cc:208] All configs were filtered out because none of them sufficiently match the hints. Maybe the hints set does not contain a good representative set of valid configs? Working around this by using the full hints set instead.
2025-09-05 12:26:06.

[1m  7/125[0m [32m━[0m[37m━━━━━━━━━━━━━━━━━━━[0m [1m2s[0m 23ms/step - accuracy: 0.4988 - loss: 0.6945

I0000 00:00:1757042773.867101   22960 device_compiler.h:196] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 6ms/step - accuracy: 0.7210 - loss: 0.5347
Epoch 2/10
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.7852 - loss: 0.4522
Epoch 3/10
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.7912 - loss: 0.4409
Epoch 4/10
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.7954 - loss: 0.4298
Epoch 5/10
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.8007 - loss: 0.4209
Epoch 6/10
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.8076 - loss: 0.4110
Epoch 7/10
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.8136 - loss: 0.3974
Epoch 8/10
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.8200 - loss: 0.3876
Epoch 9/10
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x75a80f70b820>

In [27]:
# 모델 평가하기
model.evaluate(X_test_pad, y_test)

2025-09-05 12:27:40.001841: I external/local_xla/xla/service/gpu/autotuning/dot_search_space.cc:208] All configs were filtered out because none of them sufficiently match the hints. Maybe the hints set does not contain a good representative set of valid configs? Working around this by using the full hints set instead.
2025-09-05 12:27:40.001893: I external/local_xla/xla/service/gpu/autotuning/dot_search_space.cc:208] All configs were filtered out because none of them sufficiently match the hints. Maybe the hints set does not contain a good representative set of valid configs? Working around this by using the full hints set instead.
2025-09-05 12:27:40.001939: I external/local_xla/xla/service/gpu/autotuning/dot_search_space.cc:208] All configs were filtered out because none of them sufficiently match the hints. Maybe the hints set does not contain a good representative set of valid configs? Working around this by using the full hints set instead.








[1m767/782[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 2ms/step - accuracy: 0.7701 - loss: 0.5301

2025-09-05 12:27:43.711206: I external/local_xla/xla/service/gpu/autotuning/dot_search_space.cc:208] All configs were filtered out because none of them sufficiently match the hints. Maybe the hints set does not contain a good representative set of valid configs? Working around this by using the full hints set instead.
2025-09-05 12:27:43.711261: I external/local_xla/xla/service/gpu/autotuning/dot_search_space.cc:208] All configs were filtered out because none of them sufficiently match the hints. Maybe the hints set does not contain a good representative set of valid configs? Working around this by using the full hints set instead.
2025-09-05 12:27:43.711305: I external/local_xla/xla/service/gpu/autotuning/dot_search_space.cc:208] All configs were filtered out because none of them sufficiently match the hints. Maybe the hints set does not contain a good representative set of valid configs? Working around this by using the full hints set instead.








[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 5ms/step - accuracy: 0.7669 - loss: 0.5346


[0.5346301794052124, 0.7668799757957458]

In [31]:
import numpy as np
pred = model.predict(X_test_pad)
pred = np.argmax(pred, axis = 1)

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step


In [32]:
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, pred)

array([[ 9009,  3491],
       [ 2337, 10163]])