IMDB 영화평데이터 > 감성분류를 위한 트랜스포머 아키텍처 모델 구축

1. 정수토큰 시퀀스(길이80)입력
2. 토큰임베딩 + 위치임베딩
3. 멀티헤드어텐션
4. concat+정규화
5. FFN (Dense+Dense)
6. concat+정규화
7. 분류기 (Dense)

# 1. 정수토큰 시퀀스(길이80)입력

# 2. 토큰임베딩 + 위치임베딩

In [1]:
import tensorflow as tf
from tensorflow.keras import Model, layers

2025-09-05 12:47:05.282008: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1757044025.291656   67880 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1757044025.294574   67880 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-09-05 12:47:05.304559: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [2]:
# 토큰임베딩
inputs = layers.Input(shape=(100,))
input_embedding = layers.Embedding(input_dim=1000, output_dim=32)(inputs)

I0000 00:00:1757044026.881276   67880 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5563 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.9


In [3]:
# 위치임베딩
positions = tf.range(start=0, limit=100)
pos_embedding = layers.Embedding(input_dim=100, output_dim=32)(positions)
pos_enc_output = pos_embedding + input_embedding

# 3. 멀티헤드어텐션

In [4]:
attention_output = layers.MultiHeadAttention(num_heads=3, key_dim=32)(pos_enc_output, pos_enc_output) #K, V

# 4. concat+정규화


In [5]:
x = layers.add([pos_enc_output, attention_output])
x = layers.BatchNormalization()(x)

# 5. FFN (Dense+Dense)

In [6]:
from tensorflow.keras.models import Sequential
ffnn = Sequential([
    layers.Dense(64, activation='relu'),
    layers.Dense(32, activation='relu')
])(x)

# 6. concat+정규화

In [7]:
x = layers.add([ffnn, x])
x = layers.BatchNormalization()(x)

# 7. 분류기 (Dense)

In [8]:
x = layers.GlobalAveragePooling1D()(x)
# x = layers.Dropout(0.1)(x)
x = layers.Dense(64, activation='relu')(x)
# x = layers.Dropout(0.1)(x)
outputs = layers.Dense(2, activation='softmax')(x)

# 모델 구성

In [9]:
model = Model(inputs=inputs, outputs=outputs)
model.summary()

In [10]:
# 손실함수와 옵티마이저 지정
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# imdb data load

In [11]:
from tensorflow.keras.datasets import imdb
(X_train, y_train), (X_test, y_test)=imdb.load_data(nim_words=10000)

In [12]:
from tensorflow.keras.preprocessing.sequence import pad_sequences
X_train_pad = pad_sequences(X_train, maxlen=100, padding='pre', truncating='pre')
X_test_pad = pad_sequences(X_test, maxlen=100, padding='pre', truncating='pre')

In [13]:
model.fit(X_train_pad, y_train, epochs=12, batch_size=200)

Epoch 1/12


I0000 00:00:1757044031.158795   67968 service.cc:148] XLA service 0x77609c0066a0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1757044031.158824   67968 service.cc:156]   StreamExecutor device (0): NVIDIA GeForce RTX 4060 Laptop GPU, Compute Capability 8.9
2025-09-05 12:47:11.192386: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
I0000 00:00:1757044031.359945   67968 cuda_dnn.cc:529] Loaded cuDNN version 90101



[1m 16/125[0m [32m━━[0m[37m━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 11ms/step - accuracy: 0.5675 - loss: 0.6805

I0000 00:00:1757044037.302268   67968 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 8ms/step - accuracy: 0.7690 - loss: 0.4777
Epoch 2/12
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - accuracy: 0.8276 - loss: 0.3818
Epoch 3/12
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - accuracy: 0.8307 - loss: 0.3772
Epoch 4/12
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - accuracy: 0.8413 - loss: 0.3582
Epoch 5/12
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - accuracy: 0.8440 - loss: 0.3485
Epoch 6/12
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - accuracy: 0.8478 - loss: 0.3413
Epoch 7/12
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - accuracy: 0.8524 - loss: 0.3324
Epoch 8/12
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - accuracy: 0.8564 - loss: 0.3274
Epoch 9/12
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x77626f53d960>

In [14]:
model.evaluate(X_test_pad,y_test)

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - accuracy: 0.8094 - loss: 0.4806


[0.48064449429512024, 0.809440016746521]

In [15]:
import numpy as np
pred = model.predict(X_test_pad)
pred = np.argmax(pred, axis=1)

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step


In [16]:
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, pred)

array([[ 9522,  2978],
       [ 1786, 10714]])