## 사전 학습된 워드 임베딩 

- 사전 학습된 워드 임베딩 벡터 사용 
  - 자연어 처리 작업 시 학습 데이터로 직접 처음부터 임베딩 학습을 시키기도 하지만
  - 위키피디아 등의 방대한 데이터로 이미 학습된 워드 임베딩(pre-trained word embedding vector)을 가져다 사용 가능
- 예 : 감성 분석 작업 시 학습 데이터 양이 부족한 경우
  - 이미 방대한 데이터로 Word2Vec이나 GloVe 등을 사용하여 사전에 학습시켜놓은 임베딩 벡터 가져와서
  - 모델의 입력으로 사용 가능
  - 더 좋은 성능을 얻을 수도 있음 
- 구글에서 제공하는 사전 학습된 Word2Vec 모델 사용 방법
  - 구글에서는 사전 힉습된 3백만 개의 Word2Vec 단어 벡터 제공
  - 각 임베딩 벡터의 차원은 300 
  - gensim을 통해서 이 모델 다운로드하고 파일 경로만 기재

- 케라스의 임베딩 층(embedding layer)과 사전 학습된 워드 임베딩(pre-trained word embedding)을 가져와서 사용하는 것 비교
- 사전 훈련된 GloVe
- 사전 훈련된 Word2Vec

- 학습 데이터가 적은 경우 최적화된 임베딩 벡터값을 구하는 것이 쉽지 않음
- 그래서 많은 학습 데이터로 이미 학습 완료된 임베딩 벡터를 가져와서 사용하는 것이
- 성능 개선을 가져올 수 있음 

In [1]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity="all"

### (1) 케라스 임베딩 층(Keras Embedding layer)

- 임베딩층 구현하여 임베딩 벡터로 학습 

In [2]:
# 긍/부정 감성 데이터 
sentences = ['nice great best amazing', 'stop lies', 'pitiful nerd', 'excellent work', 'supreme quality', 'bad', 'highly respectable']
y_train = [1, 0, 0, 1, 1, 0, 1]
# 긍정 : 1
# 부정 : 0

In [3]:
# 정수 인코딩 
from tensorflow.keras.preprocessing.text import Tokenizer

tokenizer = Tokenizer()
tokenizer.fit_on_texts(sentences)
vocab_size = len(tokenizer.word_index) + 1
print(vocab_size)

16


In [4]:
X_encoded = tokenizer.texts_to_sequences(sentences)
print(X_encoded)

[[1, 2, 3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13], [14, 15]]


In [5]:
max_len = max(len(seq) for seq in X_encoded)
max_len

4

In [6]:
# 패딩 
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np

X_train = pad_sequences(X_encoded, maxlen=max_len,  padding='post')
y_train = np.array(y_train)
print(X_train)

[[ 1  2  3  4]
 [ 5  6  0  0]
 [ 7  8  0  0]
 [ 9 10  0  0]
 [11 12  0  0]
 [13  0  0  0]
 [14 15  0  0]]


#### 긍/부정 감성 분석 이진 분류 모델 

vocab_size = 텍스트 데이터의 전체 단어 집합 크기 (입력)  
output_dim = 워드 임베딩 후의 임베딩 벡터의 차원

In [7]:
# 임베딩 벡터 크기 : 4
# Embedding 층 
# Flateen 층
# Demse 층

In [8]:
vocab_size

16

In [9]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding, Flatten

embedding_dim = 4

model = Sequential()
model.add(Embedding(vocab_size, embedding_dim))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))  

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=100, verbose=2)

Epoch 1/100
1/1 - 1s - 767ms/step - accuracy: 0.5714 - loss: 0.6901
Epoch 2/100
1/1 - 0s - 49ms/step - accuracy: 0.5714 - loss: 0.6884
Epoch 3/100
1/1 - 0s - 42ms/step - accuracy: 0.5714 - loss: 0.6868
Epoch 4/100
1/1 - 0s - 40ms/step - accuracy: 0.7143 - loss: 0.6851
Epoch 5/100
1/1 - 0s - 37ms/step - accuracy: 0.7143 - loss: 0.6835
Epoch 6/100
1/1 - 0s - 40ms/step - accuracy: 0.7143 - loss: 0.6818
Epoch 7/100
1/1 - 0s - 46ms/step - accuracy: 0.7143 - loss: 0.6802
Epoch 8/100
1/1 - 0s - 41ms/step - accuracy: 0.7143 - loss: 0.6785
Epoch 9/100
1/1 - 0s - 41ms/step - accuracy: 0.7143 - loss: 0.6769
Epoch 10/100
1/1 - 0s - 38ms/step - accuracy: 0.7143 - loss: 0.6752
Epoch 11/100
1/1 - 0s - 38ms/step - accuracy: 0.8571 - loss: 0.6736
Epoch 12/100
1/1 - 0s - 40ms/step - accuracy: 0.8571 - loss: 0.6719
Epoch 13/100
1/1 - 0s - 43ms/step - accuracy: 0.8571 - loss: 0.6702
Epoch 14/100
1/1 - 0s - 49ms/step - accuracy: 0.8571 - loss: 0.6686
Epoch 15/100
1/1 - 0s - 44ms/step - accuracy: 0.8571 - l

<keras.src.callbacks.history.History at 0x165e977cec0>

In [10]:
model.evaluate(X_train, y_train)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 157ms/step - accuracy: 1.0000 - loss: 0.5012


[0.5011909604072571, 1.0]

### (2) 사전 훈련된 GloVe 사용하기

In [11]:
from urllib.request import urlretrieve
import zipfile
# urlretrieve("http://nlp.stanford.edu/data/glove.6B.zip", filename="./data/glove.6B.zip")
# zf = zipfile.ZipFile('./data/glove.6B.zip', 'r')
# zf.extractall('./data/') 
# zf.close()
# zip 파일 압축제제

In [12]:
# 앞에서 사용한 동일 데이터 사용
print(X_train)

[[ 1  2  3  4]
 [ 5  6  0  0]
 [ 7  8  0  0]
 [ 9 10  0  0]
 [11 12  0  0]
 [13  0  0  0]
 [14 15  0  0]]


In [13]:
print(y_train)

[1 0 0 1 1 0 1]


In [14]:
# glove.6B.100d.txt에 있는 모든 임베딩 벡터 로드

embedding_dict = dict()

f = open('./data/0307/glove.6B.100d.txt', encoding="utf8")

for line in f:
    word_vector = line.split()
    word = word_vector[0]
    word_vector_arr = np.array(word_vector[1:], dtype="float32")
    embedding_dict[word] = word_vector_arr
    
f.close()

print(f'Embedding vector : {len(embedding_dict)} 개')

Embedding vector : 400000 개


In [23]:
sentences = ["nice great best amazing", "stop lies", "pitiful nerd", "excellent work", ]

In [15]:
# 임의의 단어 'respectable'의 임베딩 벡터값 출력
len(embedding_dict["respectable"])
embedding_dict["respectable"]

100

array([-0.049773 ,  0.19903  ,  0.10585  ,  0.1391   , -0.32395  ,
        0.44053  ,  0.3947   , -0.22805  , -0.25793  ,  0.49768  ,
        0.15384  , -0.08831  ,  0.0782   , -0.8299   , -0.037788 ,
        0.16772  , -0.45197  , -0.17085  ,  0.74756  ,  0.98256  ,
        0.81872  ,  0.28507  ,  0.16178  , -0.48626  , -0.006265 ,
       -0.92469  , -0.30625  , -0.067318 , -0.046762 , -0.76291  ,
       -0.0025264, -0.018795 ,  0.12882  , -0.52457  ,  0.3586   ,
        0.43119  , -0.89477  , -0.057421 , -0.53724  ,  0.25587  ,
        0.55195  ,  0.44698  , -0.24252  ,  0.29946  ,  0.25776  ,
       -0.8717   ,  0.68426  , -0.05688  , -0.1848   , -0.59352  ,
       -0.11227  , -0.57692  , -0.013593 ,  0.18488  , -0.32507  ,
       -0.90171  ,  0.17672  ,  0.075601 ,  0.54896  , -0.21488  ,
       -0.54018  , -0.45882  , -0.79536  ,  0.26331  ,  0.18879  ,
       -0.16363  ,  0.3975   ,  0.1099   ,  0.1164   , -0.083499 ,
        0.50159  ,  0.35802  ,  0.25677  ,  0.088546 ,  0.4210

In [16]:
# 사용하고 있는 단어 집합 크기의 행과 100개의 열을 가지는 행렬 생성. 초기값으로 0으로 채움
embedding_matrix = np.zeros((vocab_size, 100))

In [19]:
np.shape(embedding_matrix)
embedding_matrix[0]

(16, 100)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [20]:
# 기존 데이터의 각 단어와 맵핑된 정수 인덱스 확인
print(tokenizer.word_index.items())

dict_items([('nice', 1), ('great', 2), ('best', 3), ('amazing', 4), ('stop', 5), ('lies', 6), ('pitiful', 7), ('nerd', 8), ('excellent', 9), ('work', 10), ('supreme', 11), ('quality', 12), ('bad', 13), ('highly', 14), ('respectable', 15)])


In [21]:
print(tokenizer.word_index)

{'nice': 1, 'great': 2, 'best': 3, 'amazing': 4, 'stop': 5, 'lies': 6, 'pitiful': 7, 'nerd': 8, 'excellent': 9, 'work': 10, 'supreme': 11, 'quality': 12, 'bad': 13, 'highly': 14, 'respectable': 15}


In [24]:
tokenizer.word_index["great"]

2

In [27]:
tokenizer.word_index
embedding_dict["great"]

{'nice': 1,
 'great': 2,
 'best': 3,
 'amazing': 4,
 'stop': 5,
 'lies': 6,
 'pitiful': 7,
 'nerd': 8,
 'excellent': 9,
 'work': 10,
 'supreme': 11,
 'quality': 12,
 'bad': 13,
 'highly': 14,
 'respectable': 15}

array([-0.013786 ,  0.38216  ,  0.53236  ,  0.15261  , -0.29694  ,
       -0.20558  , -0.41846  , -0.58437  , -0.77355  , -0.87866  ,
       -0.37858  , -0.18516  , -0.128    , -0.20584  , -0.22925  ,
       -0.42599  ,  0.3725   ,  0.26077  , -1.0702   ,  0.62916  ,
       -0.091469 ,  0.70348  , -0.4973   , -0.77691  ,  0.66045  ,
        0.09465  , -0.44893  ,  0.018917 ,  0.33146  , -0.35022  ,
       -0.35789  ,  0.030313 ,  0.22253  , -0.23236  , -0.19719  ,
       -0.0053125, -0.25848  ,  0.58081  , -0.10705  , -0.17845  ,
       -0.16206  ,  0.087086 ,  0.63029  , -0.76649  ,  0.51619  ,
        0.14073  ,  1.019    , -0.43136  ,  0.46138  , -0.43585  ,
       -0.47568  ,  0.19226  ,  0.36065  ,  0.78987  ,  0.088945 ,
       -2.7814   , -0.15366  ,  0.01015  ,  1.1798   ,  0.15168  ,
       -0.050112 ,  1.2626   , -0.77527  ,  0.36031  ,  0.95761  ,
       -0.11385  ,  0.28035  , -0.02591  ,  0.31246  , -0.15424  ,
        0.3778   , -0.13599  ,  0.2946   , -0.31579  ,  0.4294

### 사용 단어를 사전 학습된 GloVe의 임베딩 벡터값 맵핑

In [28]:

for word, index in tokenizer.word_index.items():
    vector_value = embedding_dict.get(word)
    if vector_value is not None:
        embedding_matrix[index] = vector_value


In [30]:
embedding_matrix[2]
embedding_matrix.shape

array([-0.013786  ,  0.38216001,  0.53236002,  0.15261   , -0.29694   ,
       -0.20558   , -0.41846001, -0.58437002, -0.77354997, -0.87866002,
       -0.37858   , -0.18516   , -0.12800001, -0.20584001, -0.22925   ,
       -0.42598999,  0.3725    ,  0.26076999, -1.07019997,  0.62915999,
       -0.091469  ,  0.70348001, -0.4973    , -0.77691001,  0.66044998,
        0.09465   , -0.44893   ,  0.018917  ,  0.33146   , -0.35021999,
       -0.35789001,  0.030313  ,  0.22253001, -0.23236001, -0.19719   ,
       -0.0053125 , -0.25848001,  0.58081001, -0.10705   , -0.17845   ,
       -0.16205999,  0.087086  ,  0.63028997, -0.76648998,  0.51618999,
        0.14072999,  1.01900005, -0.43136001,  0.46138   , -0.43584999,
       -0.47567999,  0.19226   ,  0.36065   ,  0.78987002,  0.088945  ,
       -2.78139997, -0.15366   ,  0.01015   ,  1.17980003,  0.15167999,
       -0.050112  ,  1.26259995, -0.77526999,  0.36030999,  0.95761001,
       -0.11385   ,  0.28035   , -0.02591   ,  0.31246001, -0.15

(16, 100)

In [None]:
# 긍/부정 모델 생성

In [35]:
embedding_matrix.shape[1]

100

In [36]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding, Flatten

model = Sequential()
model.add(Embedding(vocab_size, output_dim = 100, weights=[embedding_matrix], trainable=True)) # trainable : 재학습 여부 T/F
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))  

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=100, verbose=2)

Epoch 1/100
1/1 - 1s - 624ms/step - accuracy: 0.4286 - loss: 0.7505
Epoch 2/100
1/1 - 0s - 44ms/step - accuracy: 0.4286 - loss: 0.7259
Epoch 3/100
1/1 - 0s - 39ms/step - accuracy: 0.4286 - loss: 0.7021
Epoch 4/100
1/1 - 0s - 40ms/step - accuracy: 0.4286 - loss: 0.6791
Epoch 5/100
1/1 - 0s - 45ms/step - accuracy: 0.4286 - loss: 0.6569
Epoch 6/100
1/1 - 0s - 47ms/step - accuracy: 0.4286 - loss: 0.6354
Epoch 7/100
1/1 - 0s - 39ms/step - accuracy: 0.5714 - loss: 0.6147
Epoch 8/100
1/1 - 0s - 44ms/step - accuracy: 0.7143 - loss: 0.5947
Epoch 9/100
1/1 - 0s - 43ms/step - accuracy: 0.7143 - loss: 0.5754
Epoch 10/100
1/1 - 0s - 39ms/step - accuracy: 0.7143 - loss: 0.5567
Epoch 11/100
1/1 - 0s - 35ms/step - accuracy: 0.7143 - loss: 0.5387
Epoch 12/100
1/1 - 0s - 40ms/step - accuracy: 0.7143 - loss: 0.5213
Epoch 13/100
1/1 - 0s - 36ms/step - accuracy: 0.8571 - loss: 0.5046
Epoch 14/100
1/1 - 0s - 38ms/step - accuracy: 0.8571 - loss: 0.4884
Epoch 15/100
1/1 - 0s - 35ms/step - accuracy: 0.8571 - l

<keras.src.callbacks.history.History at 0x165fd0a6ba0>

In [37]:
model.evaluate(X_train, y_train)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 147ms/step - accuracy: 1.0000 - loss: 0.0532


[0.05318016931414604, 1.0]

### (3)사전 훈련된 Word2Vec 사용하기

In [38]:
%%time
import gensim
word2ve_embedding_dict = gensim.models.KeyedVectors.load_word2vec_format('./data/0307/GoogleNews-vectors-negative300.bin.gz', binary=True)

CPU times: total: 36.4 s
Wall time: 37 s


In [39]:
word2ve_embedding_dict

<gensim.models.keyedvectors.KeyedVectors at 0x165fe3fd6d0>

In [40]:
print(word2ve_embedding_dict.vectors.shape)

(3000000, 300)


In [42]:
word2ve_embedding_dict['king']

array([ 1.25976562e-01,  2.97851562e-02,  8.60595703e-03,  1.39648438e-01,
       -2.56347656e-02, -3.61328125e-02,  1.11816406e-01, -1.98242188e-01,
        5.12695312e-02,  3.63281250e-01, -2.42187500e-01, -3.02734375e-01,
       -1.77734375e-01, -2.49023438e-02, -1.67968750e-01, -1.69921875e-01,
        3.46679688e-02,  5.21850586e-03,  4.63867188e-02,  1.28906250e-01,
        1.36718750e-01,  1.12792969e-01,  5.95703125e-02,  1.36718750e-01,
        1.01074219e-01, -1.76757812e-01, -2.51953125e-01,  5.98144531e-02,
        3.41796875e-01, -3.11279297e-02,  1.04492188e-01,  6.17675781e-02,
        1.24511719e-01,  4.00390625e-01, -3.22265625e-01,  8.39843750e-02,
        3.90625000e-02,  5.85937500e-03,  7.03125000e-02,  1.72851562e-01,
        1.38671875e-01, -2.31445312e-01,  2.83203125e-01,  1.42578125e-01,
        3.41796875e-01, -2.39257812e-02, -1.09863281e-01,  3.32031250e-02,
       -5.46875000e-02,  1.53198242e-02, -1.62109375e-01,  1.58203125e-01,
       -2.59765625e-01,  

In [41]:
embedding_matrix = np.zeros((vocab_size, 300))
np.shape(embedding_matrix)

(16, 300)

In [43]:
print(tokenizer.word_index.items())

dict_items([('nice', 1), ('great', 2), ('best', 3), ('amazing', 4), ('stop', 5), ('lies', 6), ('pitiful', 7), ('nerd', 8), ('excellent', 9), ('work', 10), ('supreme', 11), ('quality', 12), ('bad', 13), ('highly', 14), ('respectable', 15)])


In [44]:
# word2ve_embedding_dict에서 특정 단어를 입력하면 해당 단어의 임베딩 벡터를 반환 받을 때
# word2ve_embedding_dict에 특정 단어의 임베딩 벡터값이 없는 경우 None 값 반환하는 함수 
def get_vector(word) :
    if word in word2ve_embedding_dict :
        return word2ve_embedding_dict[word]
    else :
        return None

In [49]:
for word, index in tokenizer.word_index.items() :
    vector_value = get_vector(word)
    if vector_value is not None :
        embedding_matrix[index] = vector_value

In [51]:
embedding_matrix[1]
embedding_matrix[1].sum()

array([ 0.15820312,  0.10595703, -0.18945312,  0.38671875,  0.08349609,
       -0.26757812,  0.08349609,  0.11328125, -0.10400391,  0.17871094,
       -0.12353516, -0.22265625, -0.01806641, -0.25390625,  0.13183594,
        0.0859375 ,  0.16113281,  0.11083984, -0.11083984, -0.0859375 ,
        0.0267334 ,  0.34570312,  0.15136719, -0.00415039,  0.10498047,
        0.04907227, -0.06982422,  0.08642578,  0.03198242, -0.02844238,
       -0.15722656,  0.11865234,  0.36132812,  0.00173187,  0.05297852,
       -0.234375  ,  0.11767578,  0.08642578, -0.01123047,  0.25976562,
        0.28515625, -0.11669922,  0.38476562,  0.07275391,  0.01147461,
        0.03466797,  0.18164062, -0.03955078,  0.04199219,  0.01013184,
       -0.06054688,  0.09765625,  0.06689453,  0.14648438, -0.12011719,
        0.08447266, -0.06152344,  0.06347656,  0.3046875 , -0.35546875,
       -0.2890625 ,  0.19628906, -0.33203125, -0.07128906,  0.12792969,
        0.09619141, -0.12158203, -0.08691406, -0.12890625,  0.27

1.836944580078125

In [55]:
word2ve_embedding_dict["nice"]
word2ve_embedding_dict["nice"].sum()

array([ 0.15820312,  0.10595703, -0.18945312,  0.38671875,  0.08349609,
       -0.26757812,  0.08349609,  0.11328125, -0.10400391,  0.17871094,
       -0.12353516, -0.22265625, -0.01806641, -0.25390625,  0.13183594,
        0.0859375 ,  0.16113281,  0.11083984, -0.11083984, -0.0859375 ,
        0.0267334 ,  0.34570312,  0.15136719, -0.00415039,  0.10498047,
        0.04907227, -0.06982422,  0.08642578,  0.03198242, -0.02844238,
       -0.15722656,  0.11865234,  0.36132812,  0.00173187,  0.05297852,
       -0.234375  ,  0.11767578,  0.08642578, -0.01123047,  0.25976562,
        0.28515625, -0.11669922,  0.38476562,  0.07275391,  0.01147461,
        0.03466797,  0.18164062, -0.03955078,  0.04199219,  0.01013184,
       -0.06054688,  0.09765625,  0.06689453,  0.14648438, -0.12011719,
        0.08447266, -0.06152344,  0.06347656,  0.3046875 , -0.35546875,
       -0.2890625 ,  0.19628906, -0.33203125, -0.07128906,  0.12792969,
        0.09619141, -0.12158203, -0.08691406, -0.12890625,  0.27

1.8369446

In [53]:
print(X_train)
print(y_train)

[[ 1  2  3  4]
 [ 5  6  0  0]
 [ 7  8  0  0]
 [ 9 10  0  0]
 [11 12  0  0]
 [13  0  0  0]
 [14 15  0  0]]
[1 0 0 1 1 0 1]


In [54]:
vocab_size

16

#### Embedding에 사전 학습된 embedding_matrix를 입력으로 넣어주고 모델 학습

In [58]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding, Flatten, Input

model = Sequential()
model.add(Embedding(vocab_size, 300, weights=[embedding_matrix], trainable=False))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
model.fit(X_train, y_train, epochs=100, verbose=2)

Epoch 1/100
1/1 - 1s - 849ms/step - acc: 0.5714 - loss: 0.6770
Epoch 2/100
1/1 - 0s - 40ms/step - acc: 0.7143 - loss: 0.6587
Epoch 3/100
1/1 - 0s - 35ms/step - acc: 1.0000 - loss: 0.6409
Epoch 4/100
1/1 - 0s - 40ms/step - acc: 1.0000 - loss: 0.6237
Epoch 5/100
1/1 - 0s - 47ms/step - acc: 1.0000 - loss: 0.6071
Epoch 6/100
1/1 - 0s - 43ms/step - acc: 1.0000 - loss: 0.5909
Epoch 7/100
1/1 - 0s - 44ms/step - acc: 1.0000 - loss: 0.5754
Epoch 8/100
1/1 - 0s - 51ms/step - acc: 1.0000 - loss: 0.5603
Epoch 9/100
1/1 - 0s - 39ms/step - acc: 1.0000 - loss: 0.5457
Epoch 10/100
1/1 - 0s - 38ms/step - acc: 1.0000 - loss: 0.5317
Epoch 11/100
1/1 - 0s - 35ms/step - acc: 1.0000 - loss: 0.5181
Epoch 12/100
1/1 - 0s - 49ms/step - acc: 1.0000 - loss: 0.5050
Epoch 13/100
1/1 - 0s - 45ms/step - acc: 1.0000 - loss: 0.4924
Epoch 14/100
1/1 - 0s - 54ms/step - acc: 1.0000 - loss: 0.4802
Epoch 15/100
1/1 - 0s - 48ms/step - acc: 1.0000 - loss: 0.4684
Epoch 16/100
1/1 - 0s - 38ms/step - acc: 1.0000 - loss: 0.4571


<keras.src.callbacks.history.History at 0x1658cef7140>

In [59]:
model.evaluate(X_train, y_train)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 132ms/step - acc: 1.0000 - loss: 0.1072


[0.10720305144786835, 1.0]

In [60]:
# 직접 임베딩 벡터 생성 : loss 0.5
# Glove 방식으로 기학습된 임베딩 벡터 사용 : loss 0.09
# word2vec 방식으로 기학습된 임베딩 벡터 사용 : loss 0.1
# 자연어 분석 진행 시, 기학습된 임베딩 벡터 사용하면 모델 성능이 좋아진다