##**2. Word2Vec**
1. 주어진 단어들을 word2vec 모델에 들어갈 수 있는 형태로 만듭니다.
2. CBOW, Skip-gram 모델을 각각 구현합니다.
3. 모델을 실제로 학습해보고 결과를 확인합니다.

### **필요 패키지 import**

In [1]:
!pip install konlpy

Collecting konlpy
[?25l  Downloading https://files.pythonhosted.org/packages/85/0e/f385566fec837c0b83f216b2da65db9997b35dd675e107752005b7d392b1/konlpy-0.5.2-py2.py3-none-any.whl (19.4MB)
[K     |████████████████████████████████| 19.4MB 172kB/s 
Collecting colorama
  Downloading https://files.pythonhosted.org/packages/44/98/5b86278fbbf250d239ae0ecb724f8572af1c91f4a11edf4d36a206189440/colorama-0.4.4-py2.py3-none-any.whl
Collecting JPype1>=0.7.0
[?25l  Downloading https://files.pythonhosted.org/packages/de/af/93f92b38ec1ff3091cd38982ed19cea2800fefb609b5801c41fc43c0781e/JPype1-1.2.1-cp36-cp36m-manylinux2010_x86_64.whl (457kB)
[K     |████████████████████████████████| 460kB 46.0MB/s 
[?25hCollecting tweepy>=3.7.0
  Downloading https://files.pythonhosted.org/packages/67/c3/6bed87f3b1e5ed2f34bd58bf7978e308c86e255193916be76e5a5ce5dfca/tweepy-3.10.0-py2.py3-none-any.whl
Collecting beautifulsoup4==4.6.0
[?25l  Downloading https://files.pythonhosted.org/packages/9e/d4/10f46e5cfac773e2270723

In [2]:
from tqdm import tqdm
from konlpy.tag import Okt
from torch import nn
from torch.nn import functional as F
from torch.utils.data import Dataset, DataLoader
from collections import defaultdict

import torch
import copy
import numpy as np

### **데이터 전처리**



데이터를 확인하고 Word2Vec 형식에 맞게 전처리합니다.  
학습 데이터는 1번 실습과 동일하고, 테스트를 위한 단어를 아래와 같이 가정해봅시다.

In [3]:
train_data = [
  "정말 맛있습니다. 추천합니다.",
  "기대했던 것보단 별로였네요.",
  "다 좋은데 가격이 너무 비싸서 다시 가고 싶다는 생각이 안 드네요.",
  "완전 최고입니다! 재방문 의사 있습니다.",
  "음식도 서비스도 다 만족스러웠습니다.",
  "위생 상태가 좀 별로였습니다. 좀 더 개선되기를 바랍니다.",
  "맛도 좋았고 직원분들 서비스도 너무 친절했습니다.",
  "기념일에 방문했는데 음식도 분위기도 서비스도 다 좋았습니다.",
  "전반적으로 음식이 너무 짰습니다. 저는 별로였네요.",
  "위생에 조금 더 신경 썼으면 좋겠습니다. 조금 불쾌했습니다."       
]

test_words = ["음식", "맛", "서비스", "위생", "가격"]

Tokenization과 vocab을 만드는 과정은 이전 실습과 유사합니다.

In [4]:
tokenizer = Okt()

In [5]:
def make_tokenized(data):
  tokenized = []
  for sent in tqdm(data):
    tokens = tokenizer.morphs(sent, stem=True)
    tokenized.append(tokens)

  return tokenized

In [6]:
train_tokenized = make_tokenized(train_data)

100%|██████████| 10/10 [00:05<00:00,  1.74it/s]


In [7]:
word_count = defaultdict(int)

for tokens in tqdm(train_tokenized):
  for token in tokens:
    word_count[token] += 1

100%|██████████| 10/10 [00:00<00:00, 43509.38it/s]


In [8]:
word_count = sorted(word_count.items(), key=lambda x: x[1], reverse=True)
print(list(word_count))

[('.', 14), ('도', 7), ('이다', 4), ('좋다', 4), ('별로', 3), ('다', 3), ('이', 3), ('너무', 3), ('음식', 3), ('서비스', 3), ('하다', 2), ('방문', 2), ('위생', 2), ('좀', 2), ('더', 2), ('에', 2), ('조금', 2), ('정말', 1), ('맛있다', 1), ('추천', 1), ('기대하다', 1), ('것', 1), ('보단', 1), ('가격', 1), ('비싸다', 1), ('다시', 1), ('가다', 1), ('싶다', 1), ('생각', 1), ('안', 1), ('드네', 1), ('요', 1), ('완전', 1), ('최고', 1), ('!', 1), ('재', 1), ('의사', 1), ('있다', 1), ('만족스럽다', 1), ('상태', 1), ('가', 1), ('개선', 1), ('되다', 1), ('기르다', 1), ('바라다', 1), ('맛', 1), ('직원', 1), ('분들', 1), ('친절하다', 1), ('기념일', 1), ('분위기', 1), ('전반', 1), ('적', 1), ('으로', 1), ('짜다', 1), ('저', 1), ('는', 1), ('신경', 1), ('써다', 1), ('불쾌하다', 1)]


In [9]:
w2i = {}
for pair in tqdm(word_count):
  if pair[0] not in w2i:
    w2i[pair[0]] = len(w2i)

100%|██████████| 60/60 [00:00<00:00, 258907.65it/s]


In [10]:
print(train_tokenized)
print(w2i)

[['정말', '맛있다', '.', '추천', '하다', '.'], ['기대하다', '것', '보단', '별로', '이다', '.'], ['다', '좋다', '가격', '이', '너무', '비싸다', '다시', '가다', '싶다', '생각', '이', '안', '드네', '요', '.'], ['완전', '최고', '이다', '!', '재', '방문', '의사', '있다', '.'], ['음식', '도', '서비스', '도', '다', '만족스럽다', '.'], ['위생', '상태', '가', '좀', '별로', '이다', '.', '좀', '더', '개선', '되다', '기르다', '바라다', '.'], ['맛', '도', '좋다', '직원', '분들', '서비스', '도', '너무', '친절하다', '.'], ['기념일', '에', '방문', '하다', '음식', '도', '분위기', '도', '서비스', '도', '다', '좋다', '.'], ['전반', '적', '으로', '음식', '이', '너무', '짜다', '.', '저', '는', '별로', '이다', '.'], ['위생', '에', '조금', '더', '신경', '써다', '좋다', '.', '조금', '불쾌하다', '.']]
{'.': 0, '도': 1, '이다': 2, '좋다': 3, '별로': 4, '다': 5, '이': 6, '너무': 7, '음식': 8, '서비스': 9, '하다': 10, '방문': 11, '위생': 12, '좀': 13, '더': 14, '에': 15, '조금': 16, '정말': 17, '맛있다': 18, '추천': 19, '기대하다': 20, '것': 21, '보단': 22, '가격': 23, '비싸다': 24, '다시': 25, '가다': 26, '싶다': 27, '생각': 28, '안': 29, '드네': 30, '요': 31, '완전': 32, '최고': 33, '!': 34, '재': 35, '의사': 36, '있다': 37, '만족스럽다': 38, '상태

실제 모델에 들어가기 위한 input을 만들기 위해 `Dataset` 클래스를 정의합니다.

In [11]:
class CBOWDataset(Dataset):# 주변단어가 input, 중심단어가 output
  def __init__(self, train_tokenized, window_size=2):
    self.x = []
    self.y = []

    for tokens in tqdm(train_tokenized):
      token_ids = [w2i[token] for token in tokens]
      for i, id in enumerate(token_ids):
        if i-window_size >= 0 and i+window_size < len(token_ids):
          self.x.append(token_ids[i-window_size:i] + token_ids[i+1:i+window_size+1])
          self.y.append(id)

    self.x = torch.LongTensor(self.x)  # (전체 데이터 개수, 2 * window_size)
    self.y = torch.LongTensor(self.y)  # (전체 데이터 개수)

  def __len__(self):
    return self.x.shape[0]

  def __getitem__(self, idx):
    return self.x[idx], self.y[idx]

In [12]:
class SkipGramDataset(Dataset):# 중심단어가 input, 주변단어가 output
  def __init__(self, train_tokenized, window_size=2):
    self.x = []
    self.y = []

    for tokens in tqdm(train_tokenized):
      token_ids = [w2i[token] for token in tokens]
      for i, id in enumerate(token_ids):
        if i-window_size >= 0 and i+window_size < len(token_ids):
          self.y += (token_ids[i-window_size:i] + token_ids[i+1:i+window_size+1])
          self.x += [id] * 2 * window_size

    self.x = torch.LongTensor(self.x)  # (전체 데이터 개수)
    self.y = torch.LongTensor(self.y)  # (전체 데이터 개수)

  def __len__(self):
    return self.x.shape[0]

  def __getitem__(self, idx):
    return self.x[idx], self.y[idx]

각 모델에 맞는 `Dataset` 객체를 생성합니다.

In [13]:
cbow_set = CBOWDataset(train_tokenized)
skipgram_set = SkipGramDataset(train_tokenized)
print(list(skipgram_set))

100%|██████████| 10/10 [00:00<00:00, 20301.57it/s]
100%|██████████| 10/10 [00:00<00:00, 3433.45it/s]

[(tensor(0), tensor(17)), (tensor(0), tensor(18)), (tensor(0), tensor(19)), (tensor(0), tensor(10)), (tensor(19), tensor(18)), (tensor(19), tensor(0)), (tensor(19), tensor(10)), (tensor(19), tensor(0)), (tensor(22), tensor(20)), (tensor(22), tensor(21)), (tensor(22), tensor(4)), (tensor(22), tensor(2)), (tensor(4), tensor(21)), (tensor(4), tensor(22)), (tensor(4), tensor(2)), (tensor(4), tensor(0)), (tensor(23), tensor(5)), (tensor(23), tensor(3)), (tensor(23), tensor(6)), (tensor(23), tensor(7)), (tensor(6), tensor(3)), (tensor(6), tensor(23)), (tensor(6), tensor(7)), (tensor(6), tensor(24)), (tensor(7), tensor(23)), (tensor(7), tensor(6)), (tensor(7), tensor(24)), (tensor(7), tensor(25)), (tensor(24), tensor(6)), (tensor(24), tensor(7)), (tensor(24), tensor(25)), (tensor(24), tensor(26)), (tensor(25), tensor(7)), (tensor(25), tensor(24)), (tensor(25), tensor(26)), (tensor(25), tensor(27)), (tensor(26), tensor(24)), (tensor(26), tensor(25)), (tensor(26), tensor(27)), (tensor(26), tens




### **모델 Class 구현**

차례대로 두 가지 Word2Vec 모델을 구현합니다.  


*   `self.embedding`: `vocab_size` 크기의 one-hot vector를 특정 크기의 `dim` 차원으로 embedding 시키는 layer.
*   `self.linear`: 변환된 embedding vector를 다시 원래 `vocab_size`로 바꾸는 layer.


In [14]:
class CBOW(nn.Module):
  def __init__(self, vocab_size, dim):
    super(CBOW, self).__init__()
    self.embedding = nn.Embedding(vocab_size, dim, sparse=True)
    self.linear = nn.Linear(dim, vocab_size)

  # B: batch size, W: window size, d_w: word embedding size, V: vocab size, 이런식으로 차원수를 추적하면서 실행하면 좋다.
  def forward(self, x):  # x: (B, 2W)
    embeddings = self.embedding(x)  # (B, 2W, d_w)
    embeddings = torch.sum(embeddings, dim=1)  # (B, d_w)
    output = self.linear(embeddings)  # (B, V)
    return output

In [15]:
class SkipGram(nn.Module):
  def __init__(self, vocab_size, dim):
    super(SkipGram, self).__init__()
    self.embedding = nn.Embedding(vocab_size, dim, sparse=True)
    self.linear = nn.Linear(dim, vocab_size)

  # B: batch size, W: window size, d_w: word embedding size, V: vocab size
  def forward(self, x): # x: (B)
    embeddings = self.embedding(x)  # (B, d_w)
    output = self.linear(embeddings)  # (B, V)
    return output

두 가지 모델을 생성합니다.

In [16]:
cbow = CBOW(vocab_size=len(w2i), dim=256)
skipgram = SkipGram(vocab_size=len(w2i), dim=256)

### **모델 학습**

다음과 같이 hyperparamter를 세팅하고 `DataLoader` 객체를 만듭니다.

In [17]:
batch_size=4
learning_rate = 5e-4
num_epochs = 5
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

cbow_loader = DataLoader(cbow_set, batch_size=batch_size)
skipgram_loader = DataLoader(skipgram_set, batch_size=batch_size)

첫번째로 CBOW 모델 학습입니다.

**CBOW(Continuous Bag-of-Words)**

- 주변 단어들을 가지고 중심 단어를 예측하는 방식으로 학습합니다.
- 주변 단어들의 one-hot encoding 벡터를 각각 embedding layer에 projection하여 각각의 embedding 벡터를 얻고 이 embedding들을 element-wise한 덧셈으로 합친 뒤, 다시 linear transformation하여 예측하고자 하는 중심 단어의 one-hot encoding 벡터와 같은 사이즈의 벡터로 만든 뒤, 중심 단어의 one-hot encoding 벡터와의 loss를 계산합니다.
- 예) "A cute puppy is walking in the park." & window size: 2
  - Input(주변 단어): "A", "cute", "is", "walking"
  - Output(중심 단어): "puppy"

In [18]:
cbow.train()
cbow = cbow.to(device)
optim = torch.optim.SGD(cbow.parameters(), lr=learning_rate)
loss_function = nn.CrossEntropyLoss()

for e in range(1, num_epochs+1):
  print("#" * 50)
  print(f"Epoch: {e}")
  for batch in tqdm(cbow_loader):
    x, y = batch
    x, y = x.to(device), y.to(device) # (B, W), (B)
    output = cbow(x)  # (B, V)
 
    optim.zero_grad()
    loss = loss_function(output, y)
    loss.backward()
    optim.step()

    print(f"Train loss: {loss.item()}")

print("Finished.")

100%|██████████| 16/16 [00:00<00:00, 83.44it/s]
  0%|          | 0/16 [00:00<?, ?it/s]

##################################################
Epoch: 1
Train loss: 5.400465965270996
Train loss: 5.497161388397217
Train loss: 4.7446818351745605
Train loss: 5.193964958190918
Train loss: 4.940471649169922
Train loss: 5.513838291168213
Train loss: 5.023492336273193
Train loss: 5.240934371948242
Train loss: 5.6840972900390625
Train loss: 5.390736103057861
Train loss: 4.35294771194458
Train loss: 5.906587600708008
Train loss: 3.830533981323242
Train loss: 4.302803039550781
Train loss: 4.920823097229004
Train loss: 4.524956703186035
##################################################
Epoch: 2
Train loss: 5.193465232849121


100%|██████████| 16/16 [00:00<00:00, 603.00it/s]
100%|██████████| 16/16 [00:00<00:00, 591.18it/s]
100%|██████████| 16/16 [00:00<00:00, 647.11it/s]
100%|██████████| 16/16 [00:00<00:00, 649.85it/s]

Train loss: 5.369214057922363
Train loss: 4.618618488311768
Train loss: 5.058065891265869
Train loss: 4.813621520996094
Train loss: 5.214047431945801
Train loss: 4.822259426116943
Train loss: 5.1097002029418945
Train loss: 5.546432971954346
Train loss: 5.211984157562256
Train loss: 4.168797492980957
Train loss: 5.491884231567383
Train loss: 3.6983656883239746
Train loss: 4.194274425506592
Train loss: 4.74608850479126
Train loss: 4.399409294128418
##################################################
Epoch: 3
Train loss: 4.991948127746582
Train loss: 5.242608547210693
Train loss: 4.494771957397461
Train loss: 4.923862457275391
Train loss: 4.688762664794922
Train loss: 4.9228363037109375
Train loss: 4.624922275543213
Train loss: 4.980766296386719
Train loss: 5.412385940551758
Train loss: 5.038084983825684
Train loss: 3.991915464401245
Train loss: 5.091802597045898
Train loss: 3.568585157394409
Train loss: 4.088517189025879
Train loss: 4.574139595031738
Train loss: 4.2777934074401855
#######




다음으로 Skip-gram 모델 학습입니다.

**Skip-gram**

- 중심 단어를 가지고 주변 단어들을 예측하는 방식으로 학습합니다.
- 중심 단어의 one-hot encoding 벡터를 embedding layer에 projection하여 해당 단어의 embedding 벡터를 얻고 이 벡터를 다시 linear transformation하여 예측하고자 하는 각각의 주변 단어들과의 one-hot encoding 벡터와 같은 사이즈의 벡터로 만든 뒤, 그 주변 단어들의 one-hot encoding 벡터와의 loss를 각각 계산합니다.
- 예) "A cute puppy is walking in the park." & window size: 2
  - Input(중심 단어): "puppy"
  - Output(주변 단어): "A", "cute", "is", "walking"

In [19]:
skipgram.train()
skipgram = skipgram.to(device)
optim = torch.optim.SGD(skipgram.parameters(), lr=learning_rate)
loss_function = nn.CrossEntropyLoss()

for e in range(1, num_epochs+1):
  print("#" * 50)
  print(f"Epoch: {e}")
  for batch in tqdm(skipgram_loader):
    x, y = batch
    x, y = x.to(device), y.to(device) # (B, W), (B)
    output = skipgram(x)  # (B, V)

    optim.zero_grad()
    loss = loss_function(output, y)
    loss.backward()
    optim.step()

    print(f"Train loss: {loss.item()}")

print("Finished.")

  0%|          | 0/64 [00:00<?, ?it/s]

##################################################
Epoch: 1
Train loss: 3.957571268081665
Train loss: 3.998509645462036
Train loss: 4.128431797027588
Train loss: 4.149399280548096
Train loss: 4.745832443237305
Train loss: 4.339951515197754
Train loss: 4.190417766571045
Train loss: 4.260259628295898
Train loss: 4.578275680541992
Train loss: 4.0970330238342285
Train loss: 4.349617004394531
Train loss: 4.4171342849731445
Train loss: 4.195819854736328
Train loss: 4.793526649475098
Train loss: 4.422263145446777
Train loss: 4.181345462799072
Train loss: 4.199154853820801
Train loss: 4.504922866821289
Train loss: 4.373044013977051
Train loss: 4.181743621826172
Train loss: 4.497072696685791
Train loss: 4.043231964111328
Train loss: 3.951425313949585
Train loss: 4.185670852661133
Train loss: 4.201718330383301
Train loss: 4.35379695892334
Train loss: 4.145749568939209
Train loss: 4.365073204040527
Train loss: 4.257458686828613
Train loss: 4.244597434997559
Train loss: 4.429289817810059
Train los

100%|██████████| 64/64 [00:00<00:00, 687.56it/s]
100%|██████████| 64/64 [00:00<00:00, 727.87it/s]
  0%|          | 0/64 [00:00<?, ?it/s]

Train loss: 4.503276824951172
Train loss: 4.5812177658081055
Train loss: 4.650382041931152
Train loss: 4.342751502990723
Train loss: 4.27174711227417
Train loss: 3.9923622608184814
Train loss: 4.660881996154785
Train loss: 3.926513671875
Train loss: 4.542387962341309
Train loss: 4.584965705871582
Train loss: 4.633040428161621
Train loss: 3.7982826232910156
##################################################
Epoch: 2
Train loss: 3.9361531734466553
Train loss: 3.9479012489318848
Train loss: 4.103466987609863
Train loss: 4.08169412612915
Train loss: 4.706428527832031
Train loss: 4.302283763885498
Train loss: 4.157321929931641
Train loss: 4.227934837341309
Train loss: 4.545372486114502
Train loss: 4.062566757202148
Train loss: 4.320302486419678
Train loss: 4.3782148361206055
Train loss: 4.165563583374023
Train loss: 4.762080192565918
Train loss: 4.3911590576171875
Train loss: 4.1500020027160645
Train loss: 4.1692214012146
Train loss: 4.480230331420898
Train loss: 4.338691234588623
Train los

100%|██████████| 64/64 [00:00<00:00, 743.65it/s]

Train loss: 4.667239189147949
Train loss: 4.265108108520508
Train loss: 4.124675750732422
Train loss: 4.195780277252197
Train loss: 4.512603759765625
Train loss: 4.028378963470459
Train loss: 4.291165828704834
Train loss: 4.339540481567383
Train loss: 4.135803699493408
Train loss: 4.73074197769165
Train loss: 4.360186576843262
Train loss: 4.119035720825195
Train loss: 4.139466762542725
Train loss: 4.455650329589844
Train loss: 4.304670333862305
Train loss: 4.120651721954346
Train loss: 4.2594804763793945
Train loss: 3.845550298690796
Train loss: 3.833815097808838
Train loss: 4.142523288726807
Train loss: 4.121757984161377
Train loss: 4.214048385620117
Train loss: 4.055435657501221
Train loss: 4.2957234382629395
Train loss: 4.165124893188477
Train loss: 4.165287494659424
Train loss: 4.370656490325928
Train loss: 3.831766128540039
Train loss: 4.063387870788574
Train loss: 4.252038478851318
Train loss: 4.5069899559021
Train loss: 3.9863619804382324
Train loss: 4.026553153991699
Train loss


100%|██████████| 64/64 [00:00<00:00, 664.46it/s]
  0%|          | 0/64 [00:00<?, ?it/s]

##################################################
Epoch: 4
Train loss: 3.894866704940796
Train loss: 3.8476405143737793
Train loss: 4.053837776184082
Train loss: 3.9493579864501953
Train loss: 4.628266334533691
Train loss: 4.228429794311523
Train loss: 4.092483043670654
Train loss: 4.163799285888672
Train loss: 4.479972839355469
Train loss: 3.9944710731506348
Train loss: 4.262206077575684
Train loss: 4.301114559173584
Train loss: 4.106539726257324
Train loss: 4.699512481689453
Train loss: 4.329346656799316
Train loss: 4.088451385498047
Train loss: 4.109890937805176
Train loss: 4.431180477142334
Train loss: 4.270984649658203
Train loss: 4.090366363525391
Train loss: 4.142650604248047
Train loss: 3.751007080078125
Train loss: 3.776102304458618
Train loss: 4.1210737228393555
Train loss: 4.082438945770264
Train loss: 4.1457319259643555
Train loss: 4.01092529296875
Train loss: 4.261929512023926
Train loss: 4.119708061218262
Train loss: 4.126271724700928
Train loss: 4.341537952423096
Train 

100%|██████████| 64/64 [00:00<00:00, 754.95it/s]

Train loss: 4.233422756195068
Train loss: 4.262939929962158
Train loss: 4.0777716636657715
Train loss: 4.668393135070801
Train loss: 4.298642635345459
Train loss: 4.058252811431885
Train loss: 4.080495834350586
Train loss: 4.406820297241211
Train loss: 4.237636566162109
Train loss: 4.060253620147705
Train loss: 4.027334213256836
Train loss: 3.659658670425415
Train loss: 3.719160318374634
Train loss: 4.099705696105957
Train loss: 4.043569564819336
Train loss: 4.07851505279541
Train loss: 3.9668571949005127
Train loss: 4.228720664978027
Train loss: 4.07480525970459
Train loss: 4.087691783905029
Train loss: 4.312553405761719
Train loss: 3.7734408378601074
Train loss: 3.9977571964263916
Train loss: 4.195476055145264
Train loss: 4.444230556488037
Train loss: 3.8828835487365723
Train loss: 3.8977320194244385
Train loss: 4.160377502441406
Train loss: 4.3050737380981445
Train loss: 3.6826171875
Train loss: 4.368112087249756
Train loss: 4.052628517150879
Train loss: 4.0044941902160645
Train los




### **테스트**

학습된 각 모델을 이용하여 test 단어들의 word embedding을 확인합니다.

In [20]:
for word in test_words:
  input_id = torch.LongTensor([w2i[word]]).to(device)
  emb = cbow.embedding(input_id)

  print(f"Word: {word}")
  print(emb.squeeze(0))

Word: 음식
tensor([ 0.6851,  0.4828, -0.3156,  0.2014,  0.0701, -1.5939, -1.4075, -0.0692,
         1.0438, -0.9316, -0.2956,  0.0892, -0.1724,  0.8037,  1.1360, -0.5427,
         1.3115,  0.5798,  0.3898,  0.7903,  0.0683, -2.6461, -0.5415, -0.4043,
        -0.5969, -0.8820, -1.3398,  1.2012, -0.8177, -1.8458, -0.2342, -0.8089,
        -0.1872, -0.0527,  1.1812, -2.2949,  0.9804,  0.2554,  2.3396, -1.2162,
        -0.8000,  0.2590,  0.3042,  0.6546,  0.4970,  0.6004, -2.1689,  1.6548,
        -0.6053,  1.2208,  1.2702, -1.0061,  0.0506, -0.6449, -1.0355, -0.5834,
         0.4870,  0.3367,  0.4290,  0.5942,  1.4122,  0.4739, -0.0477, -0.6388,
        -0.5289, -0.4685, -0.0247,  0.6007, -1.2811,  1.9487, -0.7122,  0.6895,
         1.5356, -0.4739,  0.5500, -0.7482, -1.0250, -1.0849,  0.5584, -0.8335,
         1.1975, -0.2579,  0.5554,  0.1727,  1.7219,  0.8122, -0.7458,  1.0250,
        -0.7357, -1.5008,  0.1059,  0.9300,  1.0316,  0.1732,  0.0671,  1.2842,
        -0.2883, -0.8419, -1.30

In [21]:
for word in test_words:
  input_id = torch.LongTensor([w2i[word]]).to(device)
  emb = skipgram.embedding(input_id)

  print(f"Word: {word}")
  print(max(emb.squeeze(0)))

Word: 음식
tensor(2.7137, device='cuda:0', grad_fn=<UnbindBackward>)
Word: 맛
tensor(2.8000, device='cuda:0', grad_fn=<UnbindBackward>)
Word: 서비스
tensor(2.5724, device='cuda:0', grad_fn=<UnbindBackward>)
Word: 위생
tensor(2.8582, device='cuda:0', grad_fn=<UnbindBackward>)
Word: 가격
tensor(3.4897, device='cuda:0', grad_fn=<UnbindBackward>)


In [22]:
!apt-get install -qq texlive texlive-xetex texlive-latex-extra pandoc
!pip install -qq pypandoc

from google.colab import drive
drive.mount('/content/drive')

!jupyter nbconvert --to PDF '/content/drive/My Drive/Colab Notebooks/1_naive_bayes.ipynb의 사본'

Extracting templates from packages: 100%
Preconfiguring packages ...
Selecting previously unselected package fonts-droid-fallback.
(Reading database ... 146425 files and directories currently installed.)
Preparing to unpack .../00-fonts-droid-fallback_1%3a6.0.1r16-1.1_all.deb ...
Unpacking fonts-droid-fallback (1:6.0.1r16-1.1) ...
Selecting previously unselected package fonts-lato.
Preparing to unpack .../01-fonts-lato_2.0-2_all.deb ...
Unpacking fonts-lato (2.0-2) ...
Selecting previously unselected package poppler-data.
Preparing to unpack .../02-poppler-data_0.4.8-2_all.deb ...
Unpacking poppler-data (0.4.8-2) ...
Selecting previously unselected package tex-common.
Preparing to unpack .../03-tex-common_6.09_all.deb ...
Unpacking tex-common (6.09) ...
Selecting previously unselected package fonts-lmodern.
Preparing to unpack .../04-fonts-lmodern_2.004.5-3_all.deb ...
Unpacking fonts-lmodern (2.004.5-3) ...
Selecting previously unselected package fonts-noto-mono.
Preparing to unpack .

KeyboardInterrupt: ignored