# **미니프로젝트 4차 1대1 문의 내용 유형 분류기**
# 단계3 : Text classification

### 문제 정의
> 1:1 문의 내용 분류 문제<br>
> 1. 문의 내용 분석
> 2. 문의 내용 분류 모델 성능 평가
### 학습 데이터
> * 1:1 문의 내용 데이터 : train.csv

### 변수 소개
> * text : 문의 내용
> * label : 문의 유형

### References
> * Machine Learning
>> * [sklearn-tutorial](https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html)
> * Deep Learning
>> * [Google Tutorial](https://developers.google.com/machine-learning/guides/text-classification)
>> * [Tensorflow Tutorial](https://www.tensorflow.org/tutorials/keras/text_classification)
>> * [Keras-tutorial](https://keras.io/examples/nlp/text_classification_from_scratch/)
>> * [BERT-tutorial](https://www.tensorflow.org/text/guide/bert_preprocessing_guide)

## 1. 개발 환경 설정

### 1-1. 라이브러리 설치

In [None]:
# 필요 라이브러리부터 설치할께요.
!pip install konlpy pandas seaborn gensim wordcloud python-mecab-ko wget

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


### 1-2. 라이브러리 import

In [None]:
from mecab import MeCab
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import wget,os
from IPython.display import display
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.font_manager as fm
import matplotlib.pyplot as plt
import tensorflow as tf
import nltk
import wget,os
import pickle

### 1-3. 한글 글꼴 설정(Windows)

In [None]:
# 실행 완료 후 런타임 재시작 필요 ! ! !
# !sudo apt-get install -y fonts-nanum
# !sudo fc-cache -fv
# !rm ~/.cache/matplotlib -rf

In [None]:
import matplotlib.pyplot as plt
import matplotlib.font_manager as fm

fontpath = '/usr/share/fonts/truetype/nanum/NanumBarunGothic.ttf'
font = fm.FontProperties(fname=fontpath, size=9)
plt.rc('font', family='NanumBarunGothic')

### 1-4. 자바 경로 설정(Windows)

In [None]:
os.environ['JAVA_HOME'] = "C:\Program Files\Java\jdk-19"

### 1-3. 한글 글꼴 설정(Colab)

In [None]:
!sudo apt-get install -y fonts-nanum

Reading package lists... Done
Building dependency tree       
Reading state information... Done
fonts-nanum is already the newest version (20180306-3).
0 upgraded, 0 newly installed, 0 to remove and 24 not upgraded.


In [None]:
FONT_PATH = '/usr/share/fonts/truetype/nanum/NanumGothic.ttf'
font_name = fm.FontProperties(fname=FONT_PATH, size=10).get_name()
print(font_name)
plt.rcParams['font.family']=font_name
assert plt.rcParams['font.family'] == [font_name], "한글 폰트가 설정되지 않았습니다."

NanumGothic


### 1-4. 구글드라이브 연결(Colab)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## 2. 전처리한 데이터 불러오기
* 1, 2일차에 전처리한 데이터를 불러옵니다.
* sparse data에 대해서는 scipy.sparse.load_npz 활용

In [None]:
train = pd.read_csv("/content/drive/MyDrive/AIVLE/4월/4차 미니프로젝트/for_train.csv")

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer, TfidfTransformer
import numpy as np

In [None]:
x_train = pd.read_csv("/content/drive/MyDrive/AIVLE/4월/4차 미니프로젝트/real/x_train.csv")
x_val = pd.read_csv("/content/drive/MyDrive/AIVLE/4월/4차 미니프로젝트/real/x_val.csv")
y_train = pd.read_csv("/content/drive/MyDrive/AIVLE/4월/4차 미니프로젝트/real/y_train.csv")
y_val = pd.read_csv("/content/drive/MyDrive/AIVLE/4월/4차 미니프로젝트/real/y_val.csv")

In [None]:
x_train.head()

Unnamed: 0.1,Unnamed: 0,text,text_length,mecab_nouns,mecab_pos,mecab_morphs,TTR,tk_words1,tk_words2,tk_words3,tk_pos
0,2622,기존에 있던 파일을 삭제해서 윈도우10.ova 가져오기가 안되는데 혹시 다시 파...,60,"['기존', '파일', '삭제', '윈도우', '파일', '수']","[('기존', 'NNG'), ('에', 'JKB'), ('있', 'VV'), ('던...","['기존', '에', '있', '던', '파일', '을', '삭제', '해서', '...",0.000168,"['기존에', '있던', '파일을', '삭제해서', '윈도우10.ova', '가져오...","['기존에', '있던', '파일을', '삭제해서', '윈도우10', '.', 'ov...","['기존에', '있던', '파일을', '삭제해서', '윈도우10', 'ova', '...","[('기존', 'NNG'), ('에', 'JKB')]"
1,949,"2. 출입문 구조를 교안 38쪽처럼 미는 쪽은 손잡이가 없게, 당기는 쪽은 손잡이가...",352,"['출입문', '구조', '교안', '쪽', '쪽', '손잡이', '쪽', '손잡이...","[('2', 'SN'), ('.', 'SF'), ('출입문', 'NNG'), ('구...","['2', '.', '출입문', '구조', '를', '교안', '38', '쪽', ...",0.001097,"['2', '.', '출입문', '구조를', '교안', '38쪽처럼', '미는', ...","['2', '.', '출입문', '구조를', '교안', '38쪽처럼', '미는', ...","['2', '출입문', '구조를', '교안', '38쪽처럼', '미는', '쪽은',...","[('2', 'SN')]"
2,3575,안녕하세요! 실습하다가 궁금한 점이 생겨서 질문드립니다.\n실습에서 머신러닝은 n-...,189,"['안녕', '실습', '점', '질문', '실습', '머신', '러닝', '벡터'...","[('안녕', 'NNG'), ('하', 'XSV'), ('세요', 'EP+EF'),...","['안녕', '하', '세요', '!', '실습', '하', '다가', '궁금', ...",0.000546,"['안녕하세요', '!', '실습하다가', '궁금한', '점이', '생겨서', '질...","['안녕하세요', '!', '실습하다가', '궁금한', '점이', '생겨서', '질...","['안녕하세요', '실습하다가', '궁금한', '점이', '생겨서', '질문드립니다...","[('안녕', 'NNG'), ('하', 'XSV'), ('세요', 'EP+EF')]"
3,3653,Q. AI 분류 모델 만들기\n전처리한 데이터셋을 활용해 악성사이트 여부를 판별하는...,114,"['분류', '모델', '전처리', '데이터', '셋', '활용', '악성', '사...","[('Q', 'SL'), ('.', 'SY'), ('AI', 'SL'), ('분류'...","['Q', '.', 'AI', '분류', '모델', '만들', '기', '전처리',...",0.000324,"['Q.', 'AI', '분류', '모델', '만들기', '전처리한', '데이터셋을...","['Q', '.', 'AI', '분류', '모델', '만들기', '전처리한', '데...","['q', 'ai', '분류', '모델', '만들기', '전처리한', '데이터셋을'...","[('Q', 'SL'), ('.', 'SF')]"
4,3511,/(이전 문의/\n저 분명히 제출 시간 전(2:50분 이전)에 제출을 완료하고 \n...,515,"['이전', '문의', '제출', '시간', '전', '분', '이전', '제출',...","[('/', 'SC'), ('(', 'SSO'), ('이전', 'NNG'), ('문...","['/', '(', '이전', '문의', '/', '저', '분명히', '제출', ...",0.001385,"['/', '(', '이전', '문의/', '저', '분명히', '제출', '시간'...","['/(', '이전', '문의', '/', '저', '분명히', '제출', '시간'...","['이전', '문의', '저', '분명히', '제출', '시간', '전', '2',...","[('/', 'SC')]"


In [None]:
# x_train_m_nouns_seq = 
# x_train_m_pos_seq = 
# x_train_m_mor_seq = 
# x_train_tk_w1_seq = 
# x_train_tk_w2_seq = 
# x_train_tk_w3_seq = 

# x_var_m_nouns_seq = 
# x_var_m_pos_seq = 
# x_var_m_mor_seq = 
# x_var_tk_w1_seq = 
# x_var_tk_w2_seq = 
# x_var_tk_w3_seq = 

In [None]:
with open('/content/drive/MyDrive/AIVLE/4월/4차 미니프로젝트/train/trainx_train_m_nouns_seq.p','rb') as f: x_train_m_nouns_seq = pickle.load(f)
with open('/content/drive/MyDrive/AIVLE/4월/4차 미니프로젝트/train/trainx_train_m_pos_seq.p','rb') as f: x_train_m_pos_seq = pickle.load(f)
with open('/content/drive/MyDrive/AIVLE/4월/4차 미니프로젝트/train/trainx_train_m_mor_seq.p','rb') as f: x_train_m_mor_seq = pickle.load(f)
with open('/content/drive/MyDrive/AIVLE/4월/4차 미니프로젝트/train/trainx_train_tk_w1_seq.p','rb') as f: x_train_tk_w1_seq = pickle.load(f)
with open('/content/drive/MyDrive/AIVLE/4월/4차 미니프로젝트/train/trainx_train_tk_w2_seq.p','rb') as f: x_train_tk_w2_seq = pickle.load(f)
with open('/content/drive/MyDrive/AIVLE/4월/4차 미니프로젝트/train/trainx_train_tk_w3_seq.p','rb') as f: x_train_tk_w3_seq = pickle.load(f)

with open('/content/drive/MyDrive/AIVLE/4월/4차 미니프로젝트/valid/validx_var_m_nouns_seq.p','rb') as f: x_var_m_nouns_seq = pickle.load(f)
with open('/content/drive/MyDrive/AIVLE/4월/4차 미니프로젝트/valid/validx_var_m_pos_seq.p','rb') as f: x_var_m_pos_seq = pickle.load(f)
with open('/content/drive/MyDrive/AIVLE/4월/4차 미니프로젝트/valid/validx_var_m_mor_seq.p','rb') as f: x_var_m_mor_seq = pickle.load(f)
with open('/content/drive/MyDrive/AIVLE/4월/4차 미니프로젝트/valid/validx_var_tk_w1_seq.p','rb') as f: x_var_tk_w1_seq = pickle.load(f)
with open('/content/drive/MyDrive/AIVLE/4월/4차 미니프로젝트/valid/validx_var_tk_w2_seq.p','rb') as f: x_var_tk_w2_seq = pickle.load(f)
with open('/content/drive/MyDrive/AIVLE/4월/4차 미니프로젝트/valid/validx_var_tk_w3_seq.p','rb') as f: x_var_tk_w3_seq = pickle.load(f)

x_train_tk_w1_seq.shape, x_var_tk_w1_seq.shape

((2779, 213), (927, 213))

In [None]:
y_train.head()

Unnamed: 0.1,Unnamed: 0,label
0,2622,시스템 운영
1,949,이론
2,3575,이론
3,3653,시스템 운영
4,3511,시스템 운영


In [None]:
y_train.drop

<bound method DataFrame.drop of       Unnamed: 0   label
0           2622  시스템 운영
1            949      이론
2           3575      이론
3           3653  시스템 운영
4           3511  시스템 운영
...          ...     ...
2774        1130      이론
2775        1294      이론
2776         860  시스템 운영
2777        3507     코드2
2778        3174  시스템 운영

[2779 rows x 2 columns]>

In [None]:
y_train.drop("Unnamed: 0", axis=1, inplace=True)
y_train.head()

Unnamed: 0,label
0,시스템 운영
1,이론
2,이론
3,시스템 운영
4,시스템 운영


In [None]:
label_dict = {
    '코드1': 0,
    '코드2': 0,
    '웹': 1,
    '이론': 2,
    '시스템 운영': 3,
    '원격': 4}
y_train = y_train.replace({'label': label_dict}).copy()

y_train.head()

Unnamed: 0,label
0,3
1,2
2,2
3,3
4,3


In [None]:
y_val.drop("Unnamed: 0", axis=1, inplace=True)
label_dict = {
    '코드1': 0,
    '코드2': 0,
    '웹': 1,
    '이론': 2,
    '시스템 운영': 3,
    '원격': 4}
y_val = y_val.replace({'label': label_dict}).copy()

y_val.head()

Unnamed: 0,label
0,1
1,0
2,2
3,2
4,0


In [None]:
# x_train_m_nouns_seq = 
# x_train_m_pos_seq = 
# x_train_m_mor_seq = 
# x_train_tk_w1_seq = 
# x_train_tk_w2_seq = 
# x_train_tk_w3_seq = 

# x_var_m_nouns_seq = 
# x_var_m_pos_seq = 
# x_var_m_mor_seq = 
# x_var_tk_w1_seq = 
# x_var_tk_w2_seq = 
# x_var_tk_w3_seq = 

x_train_tk_w1_seq.shape, x_var_tk_w1_seq.shape

((2779, 213), (927, 213))

In [None]:
from sklearn.ensemble import RandomForestClassifier

RFC = RandomForestClassifier(max_depth=5)
RFC.fit(x_train_m_nouns_seq, y_train)
# RFC.fit(x_train_m_pos_seq, y_train)
# RFC.fit(x_train_m_mor_seq, y_train)
# RFC.fit(x_train_tk_w1_seq, y_train)
# RFC.fit(x_train_tk_w2_seq, y_train)
# RFC.fit(x_train_tk_w3_seq, y_train)

  RFC.fit(x_train_m_nouns_seq, y_train)


In [None]:
from sklearn.metrics import accuracy_score
pred = RFC.predict(x_var_tk_w2_seq)
print(accuracy_score(y_val, pred))

0.4412081984897519


In [None]:
from sklearn.metrics import accuracy_score
pred = RFC.predict(x_var_tk_w2_seq)
print(accuracy_score(y_val, pred))

0.43473570658036675


In [None]:
import scipy.sparse
import numpy as np
sparse_matrix = scipy.sparse.csc_matrix(x_var_m_mor_seq)

x_var_m_mor_seq = sparse_matrix.toarray()

In [None]:
x_var_m_mor_seq

array([[    0,     0,     0, ...,   146,     1,  1340],
       [  922,     2,  1063, ...,    56, 16014,   259],
       [    0,     0,     0, ...,   325,   144,     7],
       ...,
       [    0,     0,     0, ...,   360,   249,    84],
       [    0,     0,     0, ...,    21,   146,     1],
       [    0,     0,     0, ...,    16,    74,     7]], dtype=int32)

n-gram

In [None]:
from sklearn.feature_extraction import DictVectorizer
from sklearn.feature_extraction.text import CountVectorizer

## 3. Machine Learning(N-grams)
* N-gram으로 전처리한 데이터를 이용하여 3개 이상의 Machine Learning 모델 학습 및 성능 분석
> * [sklearn-tutorial](https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html)

### 3-1. Model 1

In [None]:
from sklearn.ensemble import RandomForestClassifier

RF = RandomForestClassifier()


### 3-2. Model 2

### 3-3. Model 3

In [None]:
x_train.head()

Unnamed: 0.1,Unnamed: 0,text,text_length,mecab_nouns,mecab_pos,mecab_morphs,TTR,tk_words1,tk_words2,tk_words3,tk_pos
0,2622,기존에 있던 파일을 삭제해서 윈도우10.ova 가져오기가 안되는데 혹시 다시 파...,60,"['기존', '파일', '삭제', '윈도우', '파일', '수']","[('기존', 'NNG'), ('에', 'JKB'), ('있', 'VV'), ('던...","['기존', '에', '있', '던', '파일', '을', '삭제', '해서', '...",0.000168,"['기존에', '있던', '파일을', '삭제해서', '윈도우10.ova', '가져오...","['기존에', '있던', '파일을', '삭제해서', '윈도우10', '.', 'ov...","['기존에', '있던', '파일을', '삭제해서', '윈도우10', 'ova', '...","[('기존', 'NNG'), ('에', 'JKB')]"
1,949,"2. 출입문 구조를 교안 38쪽처럼 미는 쪽은 손잡이가 없게, 당기는 쪽은 손잡이가...",352,"['출입문', '구조', '교안', '쪽', '쪽', '손잡이', '쪽', '손잡이...","[('2', 'SN'), ('.', 'SF'), ('출입문', 'NNG'), ('구...","['2', '.', '출입문', '구조', '를', '교안', '38', '쪽', ...",0.001097,"['2', '.', '출입문', '구조를', '교안', '38쪽처럼', '미는', ...","['2', '.', '출입문', '구조를', '교안', '38쪽처럼', '미는', ...","['2', '출입문', '구조를', '교안', '38쪽처럼', '미는', '쪽은',...","[('2', 'SN')]"
2,3575,안녕하세요! 실습하다가 궁금한 점이 생겨서 질문드립니다.\n실습에서 머신러닝은 n-...,189,"['안녕', '실습', '점', '질문', '실습', '머신', '러닝', '벡터'...","[('안녕', 'NNG'), ('하', 'XSV'), ('세요', 'EP+EF'),...","['안녕', '하', '세요', '!', '실습', '하', '다가', '궁금', ...",0.000546,"['안녕하세요', '!', '실습하다가', '궁금한', '점이', '생겨서', '질...","['안녕하세요', '!', '실습하다가', '궁금한', '점이', '생겨서', '질...","['안녕하세요', '실습하다가', '궁금한', '점이', '생겨서', '질문드립니다...","[('안녕', 'NNG'), ('하', 'XSV'), ('세요', 'EP+EF')]"
3,3653,Q. AI 분류 모델 만들기\n전처리한 데이터셋을 활용해 악성사이트 여부를 판별하는...,114,"['분류', '모델', '전처리', '데이터', '셋', '활용', '악성', '사...","[('Q', 'SL'), ('.', 'SY'), ('AI', 'SL'), ('분류'...","['Q', '.', 'AI', '분류', '모델', '만들', '기', '전처리',...",0.000324,"['Q.', 'AI', '분류', '모델', '만들기', '전처리한', '데이터셋을...","['Q', '.', 'AI', '분류', '모델', '만들기', '전처리한', '데...","['q', 'ai', '분류', '모델', '만들기', '전처리한', '데이터셋을'...","[('Q', 'SL'), ('.', 'SF')]"
4,3511,/(이전 문의/\n저 분명히 제출 시간 전(2:50분 이전)에 제출을 완료하고 \n...,515,"['이전', '문의', '제출', '시간', '전', '분', '이전', '제출',...","[('/', 'SC'), ('(', 'SSO'), ('이전', 'NNG'), ('문...","['/', '(', '이전', '문의', '/', '저', '분명히', '제출', ...",0.001385,"['/', '(', '이전', '문의/', '저', '분명히', '제출', '시간'...","['/(', '이전', '문의', '/', '저', '분명히', '제출', '시간'...","['이전', '문의', '저', '분명히', '제출', '시간', '전', '2',...","[('/', 'SC')]"


In [None]:
x_train_m_nouns_seq

array([[   0,    0,    0, ..., 2199,   63,   20],
       [   0,    0,    0, ..., 7008,   20,  307],
       [   0,    0,    0, ...,  110,   27,   74],
       ...,
       [   0,    0,    0, ..., 1345,   20,   27],
       [   0,    0,    0, ...,  124,  110,   20],
       [   0,    0,    0, ...,  252, 3293,   63]], dtype=int32)

### 3-4. Hyperparameter Tuning(Optional) 
* Manual Search, Grid search, Bayesian Optimization, TPE...
> * [grid search tutorial sklearn](https://scikit-learn.org/stable/modules/grid_search.html)
> * [optuna tutorial](https://optuna.org/#code_examples)
> * [ray-tune tutorial](https://docs.ray.io/en/latest/tune/examples/tune-sklearn.html)

## 4. Deep Learning(Sequence)
* Sequence로 전처리한 데이터를 이용하여 DNN, 1-D CNN, LSTM 등 3가지 이상의 deep learning 모델 학습 및 성능 분석
> * [Google Tutorial](https://developers.google.com/machine-learning/guides/text-classification)
> * [Tensorflow Tutorial](https://www.tensorflow.org/tutorials/keras/text_classification)
> * [Keras-tutorial](https://keras.io/examples/nlp/text_classification_from_scratch/)

In [None]:
# x_train_m_nouns_seq = np.expand_dims(x_train_m_nouns_seq, axis=-1)
# x_train_m_pos_seq = np.expand_dims(x_train_m_pos_seq, axis=-1)
# x_train_m_mor_seq = np.expand_dims(x_train_m_mor_seq, axis=-1)
# x_train_tk_w1_seq = np.expand_dims(x_train_tk_w1_seq, axis=-1)
# x_train_tk_w2_seq = np.expand_dims(x_train_tk_w2_seq, axis=-1)
# x_train_tk_w3_seq = np.expand_dims(x_train_tk_w3_seq, axis=-1)

# x_var_m_nouns_seq = np.expand_dims(x_var_m_nouns_seq, axis=-1)
# x_var_m_pos_seq = np.expand_dims(x_var_m_pos_seq, axis=-1)
# x_var_m_mor_seq = np.expand_dims(x_var_m_mor_seq, axis=-1)
# x_var_tk_w1_seq = np.expand_dims(x_var_tk_w1_seq, axis=-1)
# x_var_tk_w2_seq = np.expand_dims(x_var_tk_w2_seq, axis=-1)
# x_var_tk_w3_seq = np.expand_dims(x_var_tk_w3_seq, axis=-1)
 
x_train_tk_w3_seq.shape, x_var_tk_w3_seq.shape

((2779, 213), (927, 213))

In [None]:
[x_train_m_nouns_seq, x_train_m_pos_seq, x_train_m_mor_seq, x_train_tk_w1_seq, x_train_tk_w2_seq, x_train_tk_w3_seq]
[x_var_m_nouns_seq, x_var_m_pos_seq, x_var_m_mor_seq, x_var_tk_w1_seq, x_var_tk_w2_seq, x_var_tk_w3_seq]

In [None]:
print(x_train_m_nouns_seq.shape, x_train_m_pos_seq.shape, x_train_m_mor_seq.shape, x_train_tk_w1_seq.shape, x_train_tk_w2_seq.shape, x_train_tk_w3_seq.shape)
print(x_var_m_nouns_seq.shape, x_var_m_pos_seq.shape, x_var_m_mor_seq.shape, x_var_tk_w1_seq.shape, x_var_tk_w2_seq.shape, x_var_tk_w3_seq.shape)

(2779, 213) (2779, 213) (2779, 213) (2779, 213) (2779, 213) (2779, 213)
(927, 213) (927, 213) (927, 213) (927, 213) (927, 213) (927, 213)


In [None]:
from tensorflow.keras.layers import Dense, Flatten, SimpleRNN
from tensorflow.keras.layers import Input, LSTM, GRU
from tensorflow.keras.layers import Bidirectional, Conv1D, MaxPool1D

from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.backend import clear_session

from tensorflow.keras.utils import plot_model

import os
import numpy as np
import tensorflow as tf
from tensorflow import keras

from tensorflow.keras.backend import clear_session
from tensorflow.keras.models import Sequential
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Flatten, Conv2D, MaxPool2D, BatchNormalization, Dropout

from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

from tensorflow.keras.backend import clear_session
from tensorflow.keras.layers import Input, Dense, Embedding
from tensorflow.keras.layers import Bidirectional, SimpleRNN, LSTM, GRU
from tensorflow.keras.layers import Flatten, MaxPool1D, Conv1D, Dropout
from tensorflow.keras.layers import Concatenate
from tensorflow.keras.models import Model

from tensorflow.keras.callbacks import EarlyStopping

In [None]:
# ####################
# ## Your Code Here ##
# ####################
from tensorflow import keras
# 1. session clear
keras.backend.clear_session()

# 2. model declaration
model1 = keras.models.Sequential()

# 3. model stacking
model1.add(keras.layers.Embedding(input_dim = 10000,
                                  output_dim = 128,
                                  input_length = 213))
model1.add(keras.layers.LSTM(16, activation='tanh', return_sequences=True))
# model1.add(keras.layers.LSTM(1024, activation='tanh', return_sequences=True))
# model1.add(keras.layers.LSTM(2048, activation='tanh', return_sequences=True))
model1.add(keras.layers.LSTM(512, activation='tanh', return_sequences=True))
model1.add(keras.layers.GRU(32, activation='tanh', return_sequences=True))

model1.add(keras.layers.Flatten())
model1.add(keras.layers.Dense(5, activation='swish'))

# 4. model compile
model1.compile(loss='sparse_categorical_crossentropy',
              metrics=['accuracy'],
              optimizer='adam')

# 5. summary
model1.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 213, 128)          1280000   
                                                                 
 lstm (LSTM)                 (None, 213, 16)           9280      
                                                                 
 lstm_1 (LSTM)               (None, 213, 512)          1083392   
                                                                 
 gru (GRU)                   (None, 213, 32)           52416     
                                                                 
 flatten (Flatten)           (None, 6816)              0         
                                                                 
 dense (Dense)               (None, 5)                 34085     
                                                                 
Total params: 2,459,173
Trainable params: 2,459,173
Non-

In [None]:
## 학습도 시킬 것
es = EarlyStopping(monitor='val_loss',
                   min_delta=0,
                   patience=10,
                   restore_best_weights=1,
                   verbose=1)

model1.fit(x_train_tk_w3_seq, y_train, 
           epochs=1000,
          verbose=1, 
           validation_data=(x_var_tk_w3_seq, y_val), 
           callbacks=[es])

Epoch 1/1000
Epoch 2/1000
Epoch 3/1000
Epoch 4/1000
Epoch 5/1000
Epoch 6/1000
Epoch 7/1000
Epoch 8/1000
Epoch 9/1000
Epoch 10/1000
Epoch 11/1000
Epoch 11: early stopping


<keras.callbacks.History at 0x7fad186084c0>

In [None]:
## 학습도 시킬 것
es = EarlyStopping(monitor='val_loss',
                   min_delta=0,
                   patience=10,
                   restore_best_weights=1,
                   verbose=1)

model1.fit(x_train_m_mor_seq, y_train, 
           epochs=1000,
          verbose=1, 
           validation_data=(x_var_m_mor_seq, y_val), 
           callbacks=[es])

Epoch 1/1000
Epoch 2/1000
Epoch 3/1000
Epoch 4/1000
Epoch 5/1000
Epoch 6/1000
Epoch 7/1000
Epoch 8/1000

KeyboardInterrupt: ignored

### 4-1. DNN

### 4-2. 1-D CNN

### 4-3. LSTM

## 5. Using pre-trained model(Optional)
* 한국어 pre-trained model로 fine tuning 및 성능 분석
> * [BERT-tutorial](https://www.tensorflow.org/text/guide/bert_preprocessing_guide)
> * [HuggingFace-Korean](https://huggingface.co/models?language=korean)