# Doc2Vec
- 작성 일시: 2018-06-08
- 수정 일시: 2018-06-08
- 작성자: 부현경 (hyunkyung.boo@gmail.com)

## 3. Doc2Vec 모델 생성
- 생성된 코퍼스를 불러와 Doc2Vec 모델을 생성한다.

#### 3.1 DB에서 데이터(코퍼스 결과) 출력하기

In [1]:
import mysql.connector
import pandas as pd

In [2]:
table_config = {
    'user': 'root',
    'password': '1234',
    'host': 'localhost',
    'port': 3306,
    'database': 'db_test',
    'raise_on_warnings': True,
    'charset' : 'utf8'
}

try:
    conn = mysql.connector.connect(**table_config)
    curs = conn.cursor()

# 백업용
#     save_path = "D:\\Train_defined_naver_movie_information_add_corpus_result.xlsx"
#     writer = pd.ExcelWriter(save_path)
#     df.to_excel(writer, 'Sheet1', header=True, index=False)
#     writer.save()
    
    # data_type이 'Train'이고 tokenized_user_review의 값이 공백이 아닌 경우만 추출한다.
    sql_select_data = "select idx, tokenized_user_review from naver_movie_info where data_type = 'Train' and tokenized_user_review != \"\"" 
    df = pd.read_sql(sql_select_data, con=conn, columns=True)

except Exception as e:
    print(e)
    
finally:
    conn.close()

In [3]:
# 영어, 숫자는 형태소 분석에서 제외되어 형태소 분석 후 공백이 있을 수 있다.
# 따라서 모델 학습에 사용되는 데이터는 tokenized_user_review가 공백이 아닐 때만 사용한다.
print(df.head(1))

# 총 81,343건.
# print(df.shape)

   idx tokenized_user_review
0    1         번, 보다, 볼때, 재다


In [4]:
# import nltk
# from collections import Counter

# words_list = []
# review_word = []
# tmp = ""
# for r in df['tokenized_user_review']:
#     review_word.append(" ".join(r))
    
# for r in review_word:
#     words_list.append(" ".join(r))
    
# counts = Counter(words_list)
# print(counts,)

#### 3.2 Doc2Vec 모델 생성하기

In [5]:
from gensim.models.doc2vec import Doc2Vec, TaggedDocument
import multiprocessing
import logging
import time

# Doc2Vec 모델시 사용되는 함수들이다.

# 문서와 문서 ID를 태깅
def TaggedDocumentation(corpus_df, tag_df):
    custom_taggedDocument = []
    corpus_list = []
    for corpus, tag in zip(corpus_df, tag_df):
        splited_corpus = corpus.split(", ")
        custom_taggedDocument.append(TaggedDocument(list(splited_corpus), tags=[tag]))
        # print(TaggedDocumentation)
    # print(TaggedDocumentation[3:7])
    return custom_taggedDocument


# 모델을 생성한다. config를 이용해 모델 세팅을 할 수 있음
def create_doc2vec_model(TaggedDocumentation, config):
    model = Doc2Vec(**config)
    model.build_vocab(TaggedDocumentation)

    # epoch 설정 후 학습
    start = time.time()
    for epoch in range(model.epochs):
        model.train(TaggedDocumentation, total_examples=len(TaggedDocumentation), epochs=model.epochs)
        model.alpha -= 0.002  # decrease the learning rate
        model.min_alpha = model.alpha  # fix the learning rate, no decay

    end = time.time()
    print("During Time: {}".format(end - start))
    return model


# 생성된 모델 저장
# ex. save_path = 'model/Doc2vec(dbow+w,d300,n10,hs,w8,mc20,s0.001,t24).model'
save_root = 'D:\\Doc2Vec_model_20180608\\'
def save_doc2vec_model(model, save_path):
    model.save(save_path)
    print(save_path + "에 저장 완료")



In [6]:
#태그는 idx를 이용하며 TaggedDocumentation()를 이용해 문서와 태그를 매핑시켜준다.
TaggedDocuments = TaggedDocumentation(df['tokenized_user_review'], df['idx'])
print(TaggedDocuments[10:11])
# print(len(TaggedDocuments))

[TaggedDocument(words=['눈물', '난', '할머니', '힘내다', '감동', '케미', '좋다'], tags=[11])]


#### Doc2Vec 모델 생성시
- for을 이용하여 config만 바꾸어 설정할 수 있다.
- '(번외) 모델 생성시 config 셋팅' 참고

In [8]:
# ** 300차원, 단어 앞뒤로 5번 읽기, 학습률 0.25, epoch 10번 등, 'sample': 1e-4추가
# 1) 모델 생성
# CPU 코어 사용과 학습시 반환할 로깅을 설정
cores = multiprocessing.cpu_count()
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

# Doc2Vec 파라매터 설정
config = {
    'dm': 0,  # PV-DBOW / default 1
    'dbow_words': 1,  # w2v simultaneous with DBOW d2v / default 0
    'window': 5,  # distance between the predicted word and context words
    'vector_size': 300,  # vector size
    'alpha': 0.025,  # learning-rate
    'sample': 1e-4,
    'seed': 1234,
    'min_count': 20,  # ignore with freq lower
    'min_alpha': 0.025,  # min learning-rate
    'workers': cores,  # multi cpu
    'hs': 1,  # hierarchical softmax / default 0
    'negative': 20,  # negative sampling / default 5
    'epochs': 10,  # 보통 딥러닝에서 말하는 epoch과 비슷한, 반복 횟수. 또한 모델 학습시 반복 횟수 만큼 반복된다!!!
}

kyobo_model = create_doc2vec_model(TaggedDocuments, config)

# 2) 모델 저장
doc2vec_save_path = save_root+'0407_Doc2vec(dbow+w,d300,n20,hs,w5,mc20,s0.001,e10).model'
save_doc2vec_model(kyobo_model, doc2vec_save_path)

2018-06-08 17:44:25,848 : INFO : collecting all words and their counts
2018-06-08 17:44:25,851 : INFO : PROGRESS: at example #0, processed 0 words (0/s), 0 word types, 0 tags
2018-06-08 17:44:25,910 : INFO : PROGRESS: at example #10000, processed 117703 words (2160910/s), 12350 word types, 10048 tags
2018-06-08 17:44:25,950 : INFO : PROGRESS: at example #20000, processed 237491 words (2993363/s), 17288 word types, 20088 tags
2018-06-08 17:44:25,990 : INFO : PROGRESS: at example #30000, processed 357715 words (3122096/s), 20929 word types, 30128 tags
2018-06-08 17:44:26,030 : INFO : PROGRESS: at example #40000, processed 476622 words (2975405/s), 23854 word types, 40168 tags
2018-06-08 17:44:26,063 : INFO : PROGRESS: at example #50000, processed 594769 words (3722464/s), 26250 word types, 50216 tags
2018-06-08 17:44:26,103 : INFO : PROGRESS: at example #60000, processed 714605 words (3049773/s), 28391 word types, 60252 tags
2018-06-08 17:44:26,147 : INFO : PROGRESS: at example #70000, p

2018-06-08 17:45:06,225 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-06-08 17:45:06,228 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-06-08 17:45:06,229 : INFO : EPOCH - 5 : training on 967424 raw words (530549 effective words) took 7.2s, 73242 effective words/s
2018-06-08 17:45:07,379 : INFO : EPOCH 6 - PROGRESS: at 13.53% examples, 63055 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:45:08,529 : INFO : EPOCH 6 - PROGRESS: at 29.90% examples, 69586 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:45:09,705 : INFO : EPOCH 6 - PROGRESS: at 46.34% examples, 71230 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:45:10,877 : INFO : EPOCH 6 - PROGRESS: at 62.98% examples, 72152 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:45:11,988 : INFO : EPOCH 6 - PROGRESS: at 79.51% examples, 73488 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:45:13,161 : INFO : EPOCH 6 - PROGRESS: at 96.05% examples, 73638 words/s, in_qsize 4, out_qsize 0
2018-06-08 1

2018-06-08 17:45:56,199 : INFO : EPOCH 2 - PROGRESS: at 92.92% examples, 64023 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:45:56,495 : INFO : worker thread finished; awaiting finish of 3 more threads
2018-06-08 17:45:56,602 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-06-08 17:45:56,611 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-06-08 17:45:56,622 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-06-08 17:45:56,623 : INFO : EPOCH - 2 : training on 967424 raw words (530739 effective words) took 8.1s, 65270 effective words/s
2018-06-08 17:45:57,651 : INFO : EPOCH 3 - PROGRESS: at 11.54% examples, 59614 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:45:58,755 : INFO : EPOCH 3 - PROGRESS: at 23.79% examples, 59625 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:45:59,818 : INFO : EPOCH 3 - PROGRESS: at 38.13% examples, 63711 words/s, in_qsize 8, out_qsize 0
2018-06-08 17:46:00,851 : INFO : EPOCH 3 - PROGRESS: a

2018-06-08 17:46:48,197 : INFO : EPOCH - 8 : training on 967424 raw words (529808 effective words) took 7.7s, 68900 effective words/s
2018-06-08 17:46:49,414 : INFO : EPOCH 9 - PROGRESS: at 13.53% examples, 59411 words/s, in_qsize 8, out_qsize 0
2018-06-08 17:46:50,669 : INFO : EPOCH 9 - PROGRESS: at 24.81% examples, 53498 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:46:51,674 : INFO : EPOCH 9 - PROGRESS: at 32.96% examples, 50590 words/s, in_qsize 8, out_qsize 0
2018-06-08 17:46:52,842 : INFO : EPOCH 9 - PROGRESS: at 43.22% examples, 49686 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:46:53,862 : INFO : EPOCH 9 - PROGRESS: at 52.63% examples, 49416 words/s, in_qsize 8, out_qsize 0
2018-06-08 17:46:54,872 : INFO : EPOCH 9 - PROGRESS: at 66.09% examples, 52576 words/s, in_qsize 8, out_qsize 0
2018-06-08 17:46:55,881 : INFO : EPOCH 9 - PROGRESS: at 77.45% examples, 53544 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:46:56,943 : INFO : EPOCH 9 - PROGRESS: at 89.83% examples, 54539 wor

2018-06-08 17:47:38,898 : INFO : EPOCH - 4 : training on 967424 raw words (530704 effective words) took 8.7s, 61268 effective words/s
2018-06-08 17:47:40,031 : INFO : EPOCH 5 - PROGRESS: at 9.45% examples, 44346 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:47:41,065 : INFO : EPOCH 5 - PROGRESS: at 21.73% examples, 53554 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:47:42,066 : INFO : EPOCH 5 - PROGRESS: at 32.96% examples, 55697 words/s, in_qsize 8, out_qsize 0
2018-06-08 17:47:43,133 : INFO : EPOCH 5 - PROGRESS: at 41.18% examples, 52012 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:47:44,225 : INFO : EPOCH 5 - PROGRESS: at 54.74% examples, 54728 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:47:45,243 : INFO : EPOCH 5 - PROGRESS: at 65.03% examples, 54526 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:47:46,258 : INFO : EPOCH 5 - PROGRESS: at 74.25% examples, 53733 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:47:47,260 : INFO : EPOCH 5 - PROGRESS: at 85.74% examples, 54496 word

2018-06-08 17:48:29,497 : INFO : training on a 9674240 raw words (5304569 effective words) took 84.5s, 62805 effective words/s
2018-06-08 17:48:29,498 : INFO : training model with 4 workers on 4325 vocabulary and 300 features, using sg=1 hs=1 sample=0.0001 negative=20 window=5
2018-06-08 17:48:30,589 : INFO : EPOCH 1 - PROGRESS: at 13.53% examples, 66380 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:48:31,725 : INFO : EPOCH 1 - PROGRESS: at 29.90% examples, 71834 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:48:32,814 : INFO : EPOCH 1 - PROGRESS: at 46.34% examples, 74665 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:48:33,973 : INFO : EPOCH 1 - PROGRESS: at 62.98% examples, 74860 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:48:35,005 : INFO : EPOCH 1 - PROGRESS: at 78.50% examples, 75793 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:48:36,025 : INFO : EPOCH 1 - PROGRESS: at 91.88% examples, 74800 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:48:36,400 : INFO : worker thread fin

2018-06-08 17:49:20,602 : INFO : worker thread finished; awaiting finish of 3 more threads
2018-06-08 17:49:20,661 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-06-08 17:49:20,714 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-06-08 17:49:20,762 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-06-08 17:49:20,763 : INFO : EPOCH - 7 : training on 967424 raw words (530282 effective words) took 7.2s, 73903 effective words/s
2018-06-08 17:49:21,991 : INFO : EPOCH 8 - PROGRESS: at 13.53% examples, 58905 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:49:23,076 : INFO : EPOCH 8 - PROGRESS: at 29.90% examples, 69071 words/s, in_qsize 8, out_qsize 0
2018-06-08 17:49:24,130 : INFO : EPOCH 8 - PROGRESS: at 44.24% examples, 70241 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:49:25,208 : INFO : EPOCH 8 - PROGRESS: at 58.85% examples, 70456 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:49:26,284 : INFO : EPOCH 8 - PROGRESS: a

2018-06-08 17:50:09,168 : INFO : EPOCH 4 - PROGRESS: at 58.85% examples, 68354 words/s, in_qsize 8, out_qsize 0
2018-06-08 17:50:10,217 : INFO : EPOCH 4 - PROGRESS: at 73.22% examples, 69227 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:50:11,218 : INFO : EPOCH 4 - PROGRESS: at 86.78% examples, 69511 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:50:11,856 : INFO : worker thread finished; awaiting finish of 3 more threads
2018-06-08 17:50:11,987 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-06-08 17:50:11,993 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-06-08 17:50:12,029 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-06-08 17:50:12,030 : INFO : EPOCH - 4 : training on 967424 raw words (530565 effective words) took 7.4s, 71321 effective words/s
2018-06-08 17:50:13,181 : INFO : EPOCH 5 - PROGRESS: at 13.53% examples, 62974 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:50:14,335 : INFO : EPOCH 5 - PROGRESS: a

2018-06-08 17:50:57,423 : INFO : EPOCH 1 - PROGRESS: at 29.90% examples, 65438 words/s, in_qsize 8, out_qsize 0
2018-06-08 17:50:58,425 : INFO : EPOCH 1 - PROGRESS: at 45.29% examples, 70211 words/s, in_qsize 8, out_qsize 0
2018-06-08 17:50:59,534 : INFO : EPOCH 1 - PROGRESS: at 60.96% examples, 71154 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:51:00,544 : INFO : EPOCH 1 - PROGRESS: at 75.34% examples, 72021 words/s, in_qsize 8, out_qsize 0
2018-06-08 17:51:01,702 : INFO : EPOCH 1 - PROGRESS: at 89.83% examples, 71010 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:51:02,125 : INFO : worker thread finished; awaiting finish of 3 more threads
2018-06-08 17:51:02,244 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-06-08 17:51:02,308 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-06-08 17:51:02,345 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-06-08 17:51:02,346 : INFO : EPOCH - 1 : training on 967424 raw words (5

2018-06-08 17:51:47,783 : INFO : EPOCH 8 - PROGRESS: at 27.87% examples, 69414 words/s, in_qsize 8, out_qsize 0
2018-06-08 17:51:48,838 : INFO : EPOCH 8 - PROGRESS: at 42.21% examples, 70458 words/s, in_qsize 8, out_qsize 0
2018-06-08 17:51:49,864 : INFO : EPOCH 8 - PROGRESS: at 56.76% examples, 71529 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:51:50,914 : INFO : EPOCH 8 - PROGRESS: at 69.14% examples, 69727 words/s, in_qsize 8, out_qsize 0
2018-06-08 17:51:51,916 : INFO : EPOCH 8 - PROGRESS: at 84.71% examples, 71715 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:51:52,797 : INFO : worker thread finished; awaiting finish of 3 more threads
2018-06-08 17:51:52,975 : INFO : EPOCH 8 - PROGRESS: at 98.17% examples, 71035 words/s, in_qsize 2, out_qsize 1
2018-06-08 17:51:52,976 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-06-08 17:51:52,989 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-06-08 17:51:53,008 : INFO : worker thread finished

2018-06-08 17:52:35,813 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-06-08 17:52:35,829 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-06-08 17:52:35,830 : INFO : EPOCH - 4 : training on 967424 raw words (530190 effective words) took 7.2s, 73360 effective words/s
2018-06-08 17:52:36,999 : INFO : EPOCH 5 - PROGRESS: at 13.53% examples, 61857 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:52:38,029 : INFO : EPOCH 5 - PROGRESS: at 28.86% examples, 70066 words/s, in_qsize 8, out_qsize 0
2018-06-08 17:52:39,148 : INFO : EPOCH 5 - PROGRESS: at 42.21% examples, 67950 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:52:40,196 : INFO : EPOCH 5 - PROGRESS: at 57.78% examples, 70470 words/s, in_qsize 8, out_qsize 0
2018-06-08 17:52:41,206 : INFO : EPOCH 5 - PROGRESS: at 72.17% examples, 71491 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:52:42,233 : INFO : EPOCH 5 - PROGRESS: at 86.78% examples, 72015 words/s, in_qsize 7, out_qsize 0
2018-06-08 1

2018-06-08 17:53:26,375 : INFO : worker thread finished; awaiting finish of 3 more threads
2018-06-08 17:53:26,509 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-06-08 17:53:26,540 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-06-08 17:53:26,563 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-06-08 17:53:26,564 : INFO : EPOCH - 1 : training on 967424 raw words (530528 effective words) took 7.2s, 73852 effective words/s
2018-06-08 17:53:27,601 : INFO : EPOCH 2 - PROGRESS: at 9.45% examples, 48224 words/s, in_qsize 8, out_qsize 0
2018-06-08 17:53:28,644 : INFO : EPOCH 2 - PROGRESS: at 23.79% examples, 61070 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:53:29,659 : INFO : EPOCH 2 - PROGRESS: at 37.08% examples, 63990 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:53:30,770 : INFO : EPOCH 2 - PROGRESS: at 52.63% examples, 66621 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:53:31,800 : INFO : EPOCH 2 - PROGRESS: at

2018-06-08 17:54:16,210 : INFO : EPOCH 8 - PROGRESS: at 75.33% examples, 73038 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:54:17,267 : INFO : EPOCH 8 - PROGRESS: at 89.83% examples, 72956 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:54:17,698 : INFO : worker thread finished; awaiting finish of 3 more threads
2018-06-08 17:54:17,897 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-06-08 17:54:17,916 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-06-08 17:54:17,918 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-06-08 17:54:17,919 : INFO : EPOCH - 8 : training on 967424 raw words (530273 effective words) took 7.2s, 73784 effective words/s
2018-06-08 17:54:19,079 : INFO : EPOCH 9 - PROGRESS: at 13.53% examples, 62625 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:54:20,093 : INFO : EPOCH 9 - PROGRESS: at 28.86% examples, 71100 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:54:21,118 : INFO : EPOCH 9 - PROGRESS: a

2018-06-08 17:55:07,021 : INFO : worker thread finished; awaiting finish of 3 more threads
2018-06-08 17:55:07,091 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-06-08 17:55:07,299 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-06-08 17:55:07,344 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-06-08 17:55:07,345 : INFO : EPOCH - 4 : training on 967424 raw words (530167 effective words) took 10.2s, 52001 effective words/s
2018-06-08 17:55:08,414 : INFO : EPOCH 5 - PROGRESS: at 8.43% examples, 41763 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:55:09,501 : INFO : EPOCH 5 - PROGRESS: at 19.69% examples, 48748 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:55:10,553 : INFO : EPOCH 5 - PROGRESS: at 31.96% examples, 53261 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:55:11,559 : INFO : EPOCH 5 - PROGRESS: at 43.20% examples, 54900 words/s, in_qsize 8, out_qsize 0
2018-06-08 17:55:12,560 : INFO : EPOCH 5 - PROGRESS: a

2018-06-08 17:55:54,954 : INFO : EPOCH 1 - PROGRESS: at 29.90% examples, 74729 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:55:55,974 : INFO : EPOCH 1 - PROGRESS: at 45.29% examples, 76671 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:55:56,976 : INFO : EPOCH 1 - PROGRESS: at 58.85% examples, 75328 words/s, in_qsize 8, out_qsize 0
2018-06-08 17:55:58,024 : INFO : EPOCH 1 - PROGRESS: at 74.25% examples, 75953 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:55:59,107 : INFO : EPOCH 1 - PROGRESS: at 90.93% examples, 76814 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:55:59,455 : INFO : worker thread finished; awaiting finish of 3 more threads
2018-06-08 17:55:59,512 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-06-08 17:55:59,598 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-06-08 17:55:59,612 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-06-08 17:55:59,613 : INFO : EPOCH - 1 : training on 967424 raw words (5

2018-06-08 17:56:43,923 : INFO : EPOCH 8 - PROGRESS: at 13.53% examples, 66883 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:56:45,026 : INFO : EPOCH 8 - PROGRESS: at 29.90% examples, 73135 words/s, in_qsize 8, out_qsize 0
2018-06-08 17:56:46,072 : INFO : EPOCH 8 - PROGRESS: at 42.21% examples, 69836 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:56:47,096 : INFO : EPOCH 8 - PROGRESS: at 57.78% examples, 72379 words/s, in_qsize 8, out_qsize 0
2018-06-08 17:56:48,205 : INFO : EPOCH 8 - PROGRESS: at 74.25% examples, 73725 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:56:49,308 : INFO : EPOCH 8 - PROGRESS: at 90.93% examples, 74691 words/s, in_qsize 7, out_qsize 0
2018-06-08 17:56:49,597 : INFO : worker thread finished; awaiting finish of 3 more threads
2018-06-08 17:56:49,688 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-06-08 17:56:49,742 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-06-08 17:56:49,789 : INFO : worker thread finished

During Time: 760.9719998836517


2018-06-08 17:57:09,209 : INFO : saved D:\Doc2Vec_model_20180608\0407_Doc2vec(dbow+w,d300,n20,hs,w5,mc20,s0.001,e10).model


D:\Doc2Vec_model_20180608\0407_Doc2vec(dbow+w,d300,n20,hs,w5,mc20,s0.001,e10).model에 저장 완료


In [9]:
# ** 300차원, 단어 앞뒤로 5번 읽기, 학습률 0.25, epoch 10번 등, 'sample': 1e-4추가
# 1) 모델 생성
# CPU 코어 사용과 학습시 반환할 로깅을 설정
cores = multiprocessing.cpu_count()
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

# Doc2Vec 파라매터 설정
config = {
    'dm': 0,  # PV-DBOW / default 1
    'dbow_words': 1,  # w2v simultaneous with DBOW d2v / default 0
    'window': 5,  # distance between the predicted word and context words
    'vector_size': 300,  # vector size
    'alpha': 0.025,  # learning-rate
    'sample': 1e-2,
    'seed': 1234,
    'min_count': 20,  # ignore with freq lower
    'min_alpha': 0.025,  # min learning-rate
    'workers': cores,  # multi cpu
    'hs': 1,  # hierarchical softmax / default 0
    'negative': 20,  # negative sampling / default 5
    'epochs': 10,  # 보통 딥러닝에서 말하는 epoch과 비슷한, 반복 횟수. 또한 모델 학습시 반복 횟수 만큼 반복된다!!!
}

kyobo_model = create_doc2vec_model(TaggedDocuments, config)

# 2) 모델 저장
doc2vec_save_path = save_root+'0408_Doc2vec(dbow+w,d300,n20,hs,w5,mc20,s0.001,e10).model'
save_doc2vec_model(kyobo_model, doc2vec_save_path)

2018-06-08 18:01:46,219 : INFO : collecting all words and their counts
2018-06-08 18:01:46,224 : INFO : PROGRESS: at example #0, processed 0 words (0/s), 0 word types, 0 tags
2018-06-08 18:01:46,276 : INFO : PROGRESS: at example #10000, processed 117703 words (2325212/s), 12350 word types, 10048 tags
2018-06-08 18:01:46,319 : INFO : PROGRESS: at example #20000, processed 237491 words (2996329/s), 17288 word types, 20088 tags
2018-06-08 18:01:46,350 : INFO : PROGRESS: at example #30000, processed 357715 words (3991805/s), 20929 word types, 30128 tags
2018-06-08 18:01:46,390 : INFO : PROGRESS: at example #40000, processed 476622 words (3066188/s), 23854 word types, 40168 tags
2018-06-08 18:01:46,425 : INFO : PROGRESS: at example #50000, processed 594769 words (3403819/s), 26250 word types, 50216 tags
2018-06-08 18:01:46,462 : INFO : PROGRESS: at example #60000, processed 714605 words (3320625/s), 28391 word types, 60252 tags
2018-06-08 18:01:46,500 : INFO : PROGRESS: at example #70000, p

2018-06-08 18:02:40,258 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-06-08 18:02:40,327 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-06-08 18:02:40,356 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-06-08 18:02:40,357 : INFO : EPOCH - 3 : training on 967424 raw words (941353 effective words) took 17.7s, 53044 effective words/s
2018-06-08 18:02:41,836 : INFO : EPOCH 4 - PROGRESS: at 5.24% examples, 33014 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:02:42,919 : INFO : EPOCH 4 - PROGRESS: at 11.54% examples, 41869 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:02:44,149 : INFO : EPOCH 4 - PROGRESS: at 17.64% examples, 43587 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:02:45,273 : INFO : EPOCH 4 - PROGRESS: at 24.81% examples, 47483 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:02:46,521 : INFO : EPOCH 4 - PROGRESS: at 31.96% examples, 48887 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:02:47,820 : INFO : 

2018-06-08 18:03:46,669 : INFO : EPOCH 8 - PROGRESS: at 5.32% examples, 36624 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:03:47,711 : INFO : EPOCH 8 - PROGRESS: at 12.52% examples, 49165 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:03:48,761 : INFO : EPOCH 8 - PROGRESS: at 19.69% examples, 53903 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:03:49,797 : INFO : EPOCH 8 - PROGRESS: at 26.82% examples, 56618 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:03:50,889 : INFO : EPOCH 8 - PROGRESS: at 33.98% examples, 57739 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:03:52,110 : INFO : EPOCH 8 - PROGRESS: at 42.21% examples, 58813 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:03:53,219 : INFO : EPOCH 8 - PROGRESS: at 50.51% examples, 60399 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:03:54,313 : INFO : EPOCH 8 - PROGRESS: at 58.85% examples, 61709 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:03:55,395 : INFO : EPOCH 8 - PROGRESS: at 67.10% examples, 62815 words/s, in_qsize 7, out_q

2018-06-08 18:04:47,060 : INFO : EPOCH 2 - PROGRESS: at 38.13% examples, 64457 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:04:48,125 : INFO : EPOCH 2 - PROGRESS: at 46.34% examples, 65825 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:04:49,201 : INFO : EPOCH 2 - PROGRESS: at 54.74% examples, 66732 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:04:50,400 : INFO : EPOCH 2 - PROGRESS: at 62.98% examples, 66489 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:04:51,467 : INFO : EPOCH 2 - PROGRESS: at 71.17% examples, 67182 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:04:52,477 : INFO : EPOCH 2 - PROGRESS: at 78.50% examples, 67223 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:04:53,593 : INFO : EPOCH 2 - PROGRESS: at 86.78% examples, 67475 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:04:54,675 : INFO : EPOCH 2 - PROGRESS: at 95.01% examples, 67848 words/s, in_qsize 5, out_qsize 0
2018-06-08 18:04:54,915 : INFO : worker thread finished; awaiting finish of 3 more threads
2018-06-08 18

2018-06-08 18:05:50,185 : INFO : worker thread finished; awaiting finish of 3 more threads
2018-06-08 18:05:50,187 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-06-08 18:05:50,332 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-06-08 18:05:50,450 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-06-08 18:05:50,451 : INFO : EPOCH - 6 : training on 967424 raw words (941553 effective words) took 13.7s, 68640 effective words/s
2018-06-08 18:05:51,556 : INFO : EPOCH 7 - PROGRESS: at 5.32% examples, 44419 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:05:52,680 : INFO : EPOCH 7 - PROGRESS: at 13.53% examples, 56846 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:05:53,767 : INFO : EPOCH 7 - PROGRESS: at 21.73% examples, 61592 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:05:54,923 : INFO : EPOCH 7 - PROGRESS: at 29.90% examples, 63020 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:05:56,015 : INFO : EPOCH 7 - PROGRESS: a

2018-06-08 18:06:47,501 : INFO : EPOCH 1 - PROGRESS: at 13.54% examples, 57722 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:06:48,605 : INFO : EPOCH 1 - PROGRESS: at 21.73% examples, 61953 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:06:49,727 : INFO : EPOCH 1 - PROGRESS: at 29.90% examples, 63776 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:06:50,892 : INFO : EPOCH 1 - PROGRESS: at 38.13% examples, 64406 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:06:51,938 : INFO : EPOCH 1 - PROGRESS: at 46.34% examples, 65973 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:06:53,021 : INFO : EPOCH 1 - PROGRESS: at 54.74% examples, 66786 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:06:54,152 : INFO : EPOCH 1 - PROGRESS: at 62.98% examples, 67061 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:06:55,269 : INFO : EPOCH 1 - PROGRESS: at 71.17% examples, 67354 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:06:56,411 : INFO : EPOCH 1 - PROGRESS: at 79.51% examples, 67455 words/s, in_qsize 7, out_

2018-06-08 18:07:51,118 : INFO : EPOCH 5 - PROGRESS: at 79.51% examples, 68315 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:07:52,162 : INFO : EPOCH 5 - PROGRESS: at 87.78% examples, 68866 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:07:53,226 : INFO : EPOCH 5 - PROGRESS: at 96.05% examples, 69214 words/s, in_qsize 4, out_qsize 0
2018-06-08 18:07:53,288 : INFO : worker thread finished; awaiting finish of 3 more threads
2018-06-08 18:07:53,390 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-06-08 18:07:53,601 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-06-08 18:07:53,667 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-06-08 18:07:53,668 : INFO : EPOCH - 5 : training on 967424 raw words (941491 effective words) took 13.5s, 69700 effective words/s
2018-06-08 18:07:54,829 : INFO : EPOCH 6 - PROGRESS: at 5.32% examples, 42340 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:07:55,962 : INFO : EPOCH 6 - PROGRESS: a

2018-06-08 18:08:49,243 : INFO : EPOCH 10 - PROGRESS: at 5.32% examples, 47366 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:08:50,366 : INFO : EPOCH 10 - PROGRESS: at 13.53% examples, 58677 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:08:51,499 : INFO : EPOCH 10 - PROGRESS: at 21.73% examples, 62058 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:08:52,613 : INFO : EPOCH 10 - PROGRESS: at 29.90% examples, 63994 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:08:53,693 : INFO : EPOCH 10 - PROGRESS: at 38.13% examples, 65583 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:08:54,729 : INFO : EPOCH 10 - PROGRESS: at 46.34% examples, 67087 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:08:55,840 : INFO : EPOCH 10 - PROGRESS: at 54.74% examples, 67497 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:08:57,016 : INFO : EPOCH 10 - PROGRESS: at 62.98% examples, 67332 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:08:58,181 : INFO : EPOCH 10 - PROGRESS: at 71.17% examples, 67275 words/s, in_qsize

2018-06-08 18:09:49,090 : INFO : EPOCH 4 - PROGRESS: at 46.34% examples, 66819 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:09:50,142 : INFO : EPOCH 4 - PROGRESS: at 54.68% examples, 67788 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:09:51,181 : INFO : EPOCH 4 - PROGRESS: at 62.98% examples, 68652 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:09:52,277 : INFO : EPOCH 4 - PROGRESS: at 71.17% examples, 68929 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:09:53,384 : INFO : EPOCH 4 - PROGRESS: at 79.51% examples, 69082 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:09:54,624 : INFO : EPOCH 4 - PROGRESS: at 87.78% examples, 68455 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:09:55,670 : INFO : EPOCH 4 - PROGRESS: at 96.05% examples, 68935 words/s, in_qsize 4, out_qsize 0
2018-06-08 18:09:55,680 : INFO : worker thread finished; awaiting finish of 3 more threads
2018-06-08 18:09:55,687 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-06-08 18:09:55,873 : INFO : w

2018-06-08 18:10:50,028 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-06-08 18:10:50,320 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-06-08 18:10:50,322 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-06-08 18:10:50,323 : INFO : EPOCH - 8 : training on 967424 raw words (941408 effective words) took 13.4s, 70072 effective words/s
2018-06-08 18:10:51,360 : INFO : EPOCH 9 - PROGRESS: at 5.32% examples, 47419 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:10:52,483 : INFO : EPOCH 9 - PROGRESS: at 13.53% examples, 58752 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:10:53,622 : INFO : EPOCH 9 - PROGRESS: at 21.73% examples, 61979 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:10:54,745 : INFO : EPOCH 9 - PROGRESS: at 29.90% examples, 63796 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:10:55,807 : INFO : EPOCH 9 - PROGRESS: at 38.13% examples, 65631 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:10:56,849 : INFO : 

2018-06-08 18:11:47,483 : INFO : EPOCH 3 - PROGRESS: at 21.73% examples, 63376 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:11:48,520 : INFO : EPOCH 3 - PROGRESS: at 29.90% examples, 66181 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:11:49,612 : INFO : EPOCH 3 - PROGRESS: at 38.14% examples, 67199 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:11:50,823 : INFO : EPOCH 3 - PROGRESS: at 46.34% examples, 66654 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:11:51,871 : INFO : EPOCH 3 - PROGRESS: at 54.74% examples, 67693 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:11:52,917 : INFO : EPOCH 3 - PROGRESS: at 62.98% examples, 68513 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:11:54,002 : INFO : EPOCH 3 - PROGRESS: at 71.17% examples, 68870 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:11:55,051 : INFO : EPOCH 3 - PROGRESS: at 79.51% examples, 69406 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:11:56,268 : INFO : EPOCH 3 - PROGRESS: at 87.80% examples, 68872 words/s, in_qsize 7, out_

2018-06-08 18:12:50,003 : INFO : EPOCH 7 - PROGRESS: at 87.78% examples, 68965 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:12:51,052 : INFO : EPOCH 7 - PROGRESS: at 96.05% examples, 69388 words/s, in_qsize 4, out_qsize 0
2018-06-08 18:12:51,088 : INFO : worker thread finished; awaiting finish of 3 more threads
2018-06-08 18:12:51,107 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-06-08 18:12:51,308 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-06-08 18:12:51,368 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-06-08 18:12:51,369 : INFO : EPOCH - 7 : training on 967424 raw words (941410 effective words) took 13.3s, 70523 effective words/s
2018-06-08 18:12:52,478 : INFO : EPOCH 8 - PROGRESS: at 5.32% examples, 44170 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:12:53,555 : INFO : EPOCH 8 - PROGRESS: at 13.54% examples, 57906 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:12:54,727 : INFO : EPOCH 8 - PROGRESS: a

2018-06-08 18:13:45,515 : INFO : EPOCH - 1 : training on 967424 raw words (941390 effective words) took 13.4s, 70470 effective words/s
2018-06-08 18:13:46,551 : INFO : EPOCH 2 - PROGRESS: at 5.32% examples, 47320 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:13:47,718 : INFO : EPOCH 2 - PROGRESS: at 13.53% examples, 57484 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:13:48,850 : INFO : EPOCH 2 - PROGRESS: at 21.73% examples, 61241 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:13:49,902 : INFO : EPOCH 2 - PROGRESS: at 29.90% examples, 64253 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:13:50,957 : INFO : EPOCH 2 - PROGRESS: at 38.13% examples, 66087 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:13:51,999 : INFO : EPOCH 2 - PROGRESS: at 46.34% examples, 67460 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:13:53,165 : INFO : EPOCH 2 - PROGRESS: at 54.74% examples, 67334 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:13:54,361 : INFO : EPOCH 2 - PROGRESS: at 62.98% examples, 67040 wor

2018-06-08 18:14:48,134 : INFO : EPOCH 6 - PROGRESS: at 62.98% examples, 67802 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:14:49,181 : INFO : EPOCH 6 - PROGRESS: at 71.17% examples, 68498 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:14:50,368 : INFO : EPOCH 6 - PROGRESS: at 79.51% examples, 68201 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:14:51,513 : INFO : EPOCH 6 - PROGRESS: at 87.78% examples, 68193 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:14:52,624 : INFO : EPOCH 6 - PROGRESS: at 96.05% examples, 68348 words/s, in_qsize 4, out_qsize 0
2018-06-08 18:14:52,666 : INFO : worker thread finished; awaiting finish of 3 more threads
2018-06-08 18:14:52,761 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-06-08 18:14:52,908 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-06-08 18:14:52,945 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-06-08 18:14:52,946 : INFO : EPOCH - 6 : training on 967424 raw words (9

2018-06-08 18:15:46,953 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-06-08 18:15:46,954 : INFO : EPOCH - 10 : training on 967424 raw words (941474 effective words) took 13.4s, 70380 effective words/s
2018-06-08 18:15:46,955 : INFO : training on a 9674240 raw words (9413984 effective words) took 134.8s, 69838 effective words/s
2018-06-08 18:15:46,956 : INFO : training model with 4 workers on 4325 vocabulary and 300 features, using sg=1 hs=1 sample=0.01 negative=20 window=5
2018-06-08 18:15:48,017 : INFO : EPOCH 1 - PROGRESS: at 5.32% examples, 46246 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:15:49,215 : INFO : EPOCH 1 - PROGRESS: at 13.53% examples, 56095 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:15:50,259 : INFO : EPOCH 1 - PROGRESS: at 21.73% examples, 61854 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:15:51,311 : INFO : EPOCH 1 - PROGRESS: at 29.90% examples, 64744 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:15:52,427 : INFO : EPOCH 1 - PROGRE

2018-06-08 18:16:45,219 : INFO : EPOCH 5 - PROGRESS: at 29.90% examples, 65404 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:16:46,330 : INFO : EPOCH 5 - PROGRESS: at 38.13% examples, 66360 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:16:47,451 : INFO : EPOCH 5 - PROGRESS: at 46.34% examples, 66882 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:16:48,505 : INFO : EPOCH 5 - PROGRESS: at 54.74% examples, 67834 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:16:49,620 : INFO : EPOCH 5 - PROGRESS: at 62.98% examples, 68091 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:16:50,675 : INFO : EPOCH 5 - PROGRESS: at 71.17% examples, 68707 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:16:51,781 : INFO : EPOCH 5 - PROGRESS: at 79.51% examples, 68896 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:16:52,914 : INFO : EPOCH 5 - PROGRESS: at 87.78% examples, 68895 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:16:53,979 : INFO : EPOCH 5 - PROGRESS: at 96.05% examples, 69238 words/s, in_qsize 4, out_

2018-06-08 18:17:48,166 : INFO : EPOCH 9 - PROGRESS: at 96.05% examples, 69023 words/s, in_qsize 4, out_qsize 0
2018-06-08 18:17:48,259 : INFO : worker thread finished; awaiting finish of 3 more threads
2018-06-08 18:17:48,341 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-06-08 18:17:48,499 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-06-08 18:17:48,556 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-06-08 18:17:48,557 : INFO : EPOCH - 9 : training on 967424 raw words (941435 effective words) took 13.5s, 69775 effective words/s
2018-06-08 18:17:49,578 : INFO : EPOCH 10 - PROGRESS: at 5.32% examples, 48037 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:17:50,625 : INFO : EPOCH 10 - PROGRESS: at 13.53% examples, 61273 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:17:51,739 : INFO : EPOCH 10 - PROGRESS: at 21.73% examples, 64219 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:17:52,775 : INFO : EPOCH 10 - PROGRES

2018-06-08 18:18:42,697 : INFO : EPOCH - 3 : training on 967424 raw words (941402 effective words) took 13.5s, 69693 effective words/s
2018-06-08 18:18:43,712 : INFO : EPOCH 4 - PROGRESS: at 5.32% examples, 48277 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:18:44,771 : INFO : EPOCH 4 - PROGRESS: at 13.53% examples, 61065 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:18:45,890 : INFO : EPOCH 4 - PROGRESS: at 21.73% examples, 63960 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:18:46,950 : INFO : EPOCH 4 - PROGRESS: at 29.90% examples, 66272 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:18:48,118 : INFO : EPOCH 4 - PROGRESS: at 38.13% examples, 66341 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:18:49,167 : INFO : EPOCH 4 - PROGRESS: at 46.34% examples, 67610 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:18:50,204 : INFO : EPOCH 4 - PROGRESS: at 54.74% examples, 68641 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:18:51,299 : INFO : EPOCH 4 - PROGRESS: at 62.98% examples, 68954 wor

2018-06-08 18:19:46,788 : INFO : EPOCH 8 - PROGRESS: at 48.47% examples, 56420 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:19:47,948 : INFO : EPOCH 8 - PROGRESS: at 56.76% examples, 57747 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:19:49,104 : INFO : EPOCH 8 - PROGRESS: at 65.08% examples, 58830 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:19:50,460 : INFO : EPOCH 8 - PROGRESS: at 73.20% examples, 58661 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:19:51,629 : INFO : EPOCH 8 - PROGRESS: at 80.54% examples, 58645 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:19:52,650 : INFO : EPOCH 8 - PROGRESS: at 85.74% examples, 57850 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:19:53,713 : INFO : EPOCH 8 - PROGRESS: at 90.93% examples, 57001 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:19:54,603 : INFO : worker thread finished; awaiting finish of 3 more threads
2018-06-08 18:19:54,652 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-06-08 18:19:54,789 : INFO : E

2018-06-08 18:20:46,697 : INFO : EPOCH 2 - PROGRESS: at 37.08% examples, 59562 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:20:47,778 : INFO : EPOCH 2 - PROGRESS: at 44.24% examples, 60085 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:20:48,904 : INFO : EPOCH 2 - PROGRESS: at 49.50% examples, 57717 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:20:50,300 : INFO : EPOCH 2 - PROGRESS: at 56.77% examples, 56392 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:20:51,336 : INFO : EPOCH 2 - PROGRESS: at 62.98% examples, 56397 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:20:52,546 : INFO : EPOCH 2 - PROGRESS: at 68.15% examples, 54726 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:20:53,885 : INFO : EPOCH 2 - PROGRESS: at 76.42% examples, 55084 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:20:54,956 : INFO : EPOCH 2 - PROGRESS: at 81.64% examples, 54358 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:20:56,018 : INFO : EPOCH 2 - PROGRESS: at 87.78% examples, 54409 words/s, in_qsize 8, out_

2018-06-08 18:21:54,012 : INFO : EPOCH 6 - PROGRESS: at 78.50% examples, 60937 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:21:55,115 : INFO : EPOCH 6 - PROGRESS: at 86.78% examples, 61753 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:21:56,263 : INFO : EPOCH 6 - PROGRESS: at 95.01% examples, 62243 words/s, in_qsize 5, out_qsize 0
2018-06-08 18:21:56,654 : INFO : worker thread finished; awaiting finish of 3 more threads
2018-06-08 18:21:56,709 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-06-08 18:21:56,831 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-06-08 18:21:56,965 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-06-08 18:21:56,967 : INFO : EPOCH - 6 : training on 967424 raw words (941390 effective words) took 15.1s, 62443 effective words/s
2018-06-08 18:21:58,301 : INFO : EPOCH 7 - PROGRESS: at 5.32% examples, 36754 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:21:59,523 : INFO : EPOCH 7 - PROGRESS: a

2018-06-08 18:22:58,151 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-06-08 18:22:58,266 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-06-08 18:22:58,287 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-06-08 18:22:58,288 : INFO : EPOCH - 10 : training on 967424 raw words (941310 effective words) took 15.5s, 60583 effective words/s
2018-06-08 18:22:58,289 : INFO : training on a 9674240 raw words (9413442 effective words) took 153.6s, 61270 effective words/s
2018-06-08 18:22:58,289 : INFO : training model with 4 workers on 4325 vocabulary and 300 features, using sg=1 hs=1 sample=0.01 negative=20 window=5
2018-06-08 18:22:59,470 : INFO : EPOCH 1 - PROGRESS: at 5.32% examples, 41682 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:23:00,662 : INFO : EPOCH 1 - PROGRESS: at 13.53% examples, 53475 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:23:01,690 : INFO : EPOCH 1 - PROGRESS: at 20.71% examples, 57294 words/s, in_

2018-06-08 18:24:00,473 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-06-08 18:24:00,474 : INFO : EPOCH - 4 : training on 967424 raw words (941552 effective words) took 16.7s, 56462 effective words/s
2018-06-08 18:24:01,630 : INFO : EPOCH 5 - PROGRESS: at 5.32% examples, 42384 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:24:03,071 : INFO : EPOCH 5 - PROGRESS: at 13.53% examples, 48740 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:24:04,256 : INFO : EPOCH 5 - PROGRESS: at 21.73% examples, 53995 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:24:05,494 : INFO : EPOCH 5 - PROGRESS: at 29.90% examples, 56153 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:24:06,868 : INFO : EPOCH 5 - PROGRESS: at 37.08% examples, 54723 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:24:08,012 : INFO : EPOCH 5 - PROGRESS: at 45.29% examples, 56745 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:24:09,031 : INFO : EPOCH 5 - PROGRESS: at 52.63% examples, 57928 words/s, in_qsize 7, out

2018-06-08 18:25:09,167 : INFO : EPOCH 8 - PROGRESS: at 100.00% examples, 43832 words/s, in_qsize 0, out_qsize 1
2018-06-08 18:25:09,168 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-06-08 18:25:09,170 : INFO : EPOCH - 8 : training on 967424 raw words (941416 effective words) took 21.5s, 43825 effective words/s
2018-06-08 18:25:10,645 : INFO : EPOCH 9 - PROGRESS: at 5.32% examples, 33262 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:25:12,181 : INFO : EPOCH 9 - PROGRESS: at 12.52% examples, 38849 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:25:13,448 : INFO : EPOCH 9 - PROGRESS: at 18.67% examples, 40935 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:25:14,489 : INFO : EPOCH 9 - PROGRESS: at 23.79% examples, 42061 words/s, in_qsize 8, out_qsize 0
2018-06-08 18:25:15,709 : INFO : EPOCH 9 - PROGRESS: at 28.86% examples, 41616 words/s, in_qsize 7, out_qsize 0
2018-06-08 18:25:16,876 : INFO : EPOCH 9 - PROGRESS: at 32.96% examples, 40355 words/s, in_qsize 8, ou

During Time: 1445.2309997081757


2018-06-08 18:25:53,763 : INFO : saved D:\Doc2Vec_model_20180608\0408_Doc2vec(dbow+w,d300,n20,hs,w5,mc20,s0.001,e10).model


D:\Doc2Vec_model_20180608\0408_Doc2vec(dbow+w,d300,n20,hs,w5,mc20,s0.001,e10).model에 저장 완료


In [11]:
print('finished!')

finished!
