<span style='font-size:150%'>딥러닝 수행 (10점) - TensorFlow, Keras를 활용하여 딥러닝 수행</span>   
  
다양한 방식으로 딥러닝을 수행한 코딩을 모두 제출할 것    
   
파일명 : 4_Deep learning_1.ipynb   
   
(여러개의 파일을 사용한 경우 Deep learning_2.ipynb, Deep learning_3.ipynb...)   

In [1]:
import numpy as np
import pandas as pd

import os
import matplotlib.pyplot as plt
import matplotlib.font_manager

import tensorflow as tf
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

seed=0
np.random.seed(3)
tf.random.set_seed(3)

In [2]:
import os
os.getcwd()

'D:\\ai\\Final'

# 데이터
Data_set_1 : 종가,시가, 고가, 저가, 거래량, 변동 데이터가 담김. 아직 클래스 데이터 문자 형태  
Data_set_2 : 기존 속성 데이터들을 속성들 간의 차로 변경. 스케일링됨. 클래스 데이터 숫자 형태  

In [3]:
# 데이터 불러오기

# 일반 데이터
data_set=pd.read_csv("D:/ai/Final/Data set_2.csv")

In [4]:
data_set

Unnamed: 0,close-start,high-low,high-close,high-start,close-low,start-low,volume,Diff
0,0.541985,0.033708,0.067568,0.084337,0.051282,0.024390,0.111197,0
1,0.465649,0.168539,0.121622,0.012048,0.153846,0.243902,0.123828,1
2,0.465649,0.112360,0.202703,0.084337,0.012821,0.109756,0.134799,1
3,0.633588,0.101124,0.000000,0.168675,0.192308,0.012195,0.129628,0
4,0.412214,0.112360,0.202703,0.000000,0.012821,0.195122,0.059433,1
...,...,...,...,...,...,...,...,...
594,0.603053,0.292135,0.162162,0.265060,0.256410,0.121951,0.437442,0
595,0.450382,0.224719,0.324324,0.168675,0.025641,0.146341,0.374049,1
596,0.801527,0.359551,0.000000,0.433735,0.487179,0.024390,0.357297,0
597,0.557252,0.224719,0.243243,0.265060,0.102564,0.048780,0.644365,1


# 데이터셋 설정

주가를 예측하는 것은 시계열 데이터를 필요로 하기 때문에 기존 데이터를 시계열 데이터에 맞게 변형을 해줘야 한다.  
며칠 동안의 데이터를 가지고 예측을 할지를 window_size에 입력받고, 해당 숫자를 기반으로 시계열 데이터를 생성할 것이다.  
window_size에 따라 딥러닝 결과가 달라질 수 있기 때문에 시계열 데이터를 '2_...'나 '3_...'에서 만들지 않고 '4_...'에서 바로바로 만들어서 사용했다.

또한 여러번 실행을 해보기 위해 함수를 사용했다.

In [5]:
# window의 사이즈와 train dataset의 사이즈 입력받는 함수

def set_window_train_size(data):
    
    # 학습 데이터셋 70% 테스트 데이터셋 30%로 나눈다. (고정)
    train_size=int(len(data)*0.7) 
    
    # 시계열 데이터셋을 만들 때, 몇 개의 데이터를 가지고 만들지 설정한다.
    # 예 : window_size=30 => 30일간의 데이터를 가진 시계열 데이터셋
    window_size=int(input("window size : "))

    return window_size, train_size

In [6]:
# feature 데이터와 class 데이터를 구분

def split_feature_class(data):
    
    #feature_cols = ['close-start', 'high-low', 'high-close', 'high-start','close-low','start-low','volume']
    #close_cols = ['close']
    
    feature_cols = ['close-start', 'high-low', 'high-close', 'high-start','close-low','start-low','volume']
    close_cols = ['Diff']
    
    data_feature = data[feature_cols]
    data_close = data[close_cols]
    
    return data_feature, data_close

In [7]:
# window_size에 따라 데이터를 묶어주고 새로운 데이터셋 리턴
def make_window_data(feature_data, class_data, window_size):
    
    feature_list=[]
    class_list=[]
    
    for i in range(len(feature_data)-window_size):
        feature_list.append(np.array(feature_data.iloc[i:i+window_size]))
        class_list.append(np.array(class_data.iloc[i+window_size]))
    
    return np.array(feature_list), np.array(class_list)

In [8]:
def make_dataset(data):
    
    # 시계열 데이터 크기(window_size)와 학습 데이터(train_size)의 크기
    window_size, train_size=set_window_train_size(data)
    
    # 학습 데이터셋과 테스트 데이터셋 분리
    train = data[:train_size] 
    test = data[train_size:] 
    
    train_feature, train_class = split_feature_class(train)
    test_feature, test_class = split_feature_class(test)
    
    train_feature, train_class = make_window_data(train_feature, train_class, window_size)
    test_feature, test_class = make_window_data(test_feature, test_class, window_size)
    
    return train_feature, train_class, test_feature, test_class, window_size

In [9]:
# 학습 데이터셋을 train과 valid로 분리

def train_valid(train_feature,train_class):
    split_size= float(input("학습 데이터 중 test size(validation data의 size)의 비율 (예 : 0.2) : "))
    X_train, X_valid, Y_train, Y_valid = train_test_split(train_feature, train_class, test_size = split_size, random_state=seed)
    return X_train, X_valid, Y_train, Y_valid

# 기본 데이터에 대한 딥러닝

In [10]:
train_feature, train_class, test_feature, test_class, window_size = make_dataset(data_set)

X_train, X_valid, Y_train, Y_valid = train_valid(train_feature, train_class)

window size : 5
학습 데이터 중 test size(validation data의 size)의 비율 (예 : 0.2) : 0.2


<span style='font-size:150%'>[LSTM]</span>  
  
시계열 데이터이기 때문에 LSTM을 사용했다.  
2진 class이기 때문에  활성화 함수로 'sigmoid'를, 손실 함수로 'binary_crossentropy'를 사용한다.

In [210]:
model = Sequential()
model.add(LSTM(20, 
               input_shape=(train_feature.shape[1], train_feature.shape[2]), #시계열 데이터의 구조
               activation='relu',return_sequences=False))
model.add(Dense(1,activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='rmsprop',metrics=['accuracy'])

#optimizer='rmsprop'
#optimizer='adam'

model.summary()

Model: "sequential_20"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_20 (LSTM)               (None, 20)                2240      
_________________________________________________________________
dense_20 (Dense)             (None, 1)                 21        
Total params: 2,261
Trainable params: 2,261
Non-trainable params: 0
_________________________________________________________________


# 모델 업데이트 과정 저장을 위한 환경 설정

In [211]:
# 모델 저장 폴더 생성
model_dir='./model_save'
if not os.path.exists(model_dir):
    os.mkdir(model_dir)
    
# 파일 저장 경로
model_path='./model_save/3_'+str(window_size)+'_{epoch:d}_{val_loss:.4f}_{val_accuracy:.4f}.h5'

In [212]:
model_path

'./model_save/3_5_{epoch:d}_{val_loss:.4f}_{val_accuracy:.4f}.h5'

In [213]:
# 학습 자동 중단 설정
# 20번 연속 성능에 변화 없으면 학습 멈춤
early_stop = EarlyStopping(monitor='val_loss', patience=20)

In [214]:
# 모델 업데이트 및 저장
checkpoint = ModelCheckpoint(filepath=model_path, monitor='val_loss', verbose=1, save_best_only=True)

In [215]:
history = model.fit(X_train, Y_train,                     
                    validation_data=(X_valid, Y_valid), 
                    epochs=200, 
                    batch_size=15, 
                    callbacks=[early_stop, checkpoint])

Train on 331 samples, validate on 83 samples
Epoch 1/200
Epoch 00001: val_loss improved from inf to 0.69348, saving model to ./model_save/3_5_1_0.6935_0.4578.h5
Epoch 2/200
Epoch 00002: val_loss did not improve from 0.69348
Epoch 3/200
Epoch 00003: val_loss improved from 0.69348 to 0.69313, saving model to ./model_save/3_5_3_0.6931_0.4578.h5
Epoch 4/200
Epoch 00004: val_loss improved from 0.69313 to 0.69208, saving model to ./model_save/3_5_4_0.6921_0.4940.h5
Epoch 5/200
Epoch 00005: val_loss improved from 0.69208 to 0.69169, saving model to ./model_save/3_5_5_0.6917_0.5181.h5
Epoch 6/200
Epoch 00006: val_loss improved from 0.69169 to 0.69113, saving model to ./model_save/3_5_6_0.6911_0.5422.h5
Epoch 7/200
Epoch 00007: val_loss did not improve from 0.69113
Epoch 8/200
Epoch 00008: val_loss improved from 0.69113 to 0.69102, saving model to ./model_save/3_5_8_0.6910_0.5060.h5
Epoch 9/200
Epoch 00009: val_loss improved from 0.69102 to 0.69038, saving model to ./model_save/3_5_9_0.6904_0.5

Epoch 28/200
Epoch 00028: val_loss did not improve from 0.68854
Epoch 29/200
Epoch 00029: val_loss did not improve from 0.68854
Epoch 30/200
Epoch 00030: val_loss did not improve from 0.68854
Epoch 31/200
Epoch 00031: val_loss did not improve from 0.68854
Epoch 32/200
Epoch 00032: val_loss did not improve from 0.68854
Epoch 33/200
Epoch 00033: val_loss improved from 0.68854 to 0.68820, saving model to ./model_save/3_5_33_0.6882_0.4940.h5
Epoch 34/200
Epoch 00034: val_loss improved from 0.68820 to 0.68710, saving model to ./model_save/3_5_34_0.6871_0.5060.h5
Epoch 35/200
Epoch 00035: val_loss improved from 0.68710 to 0.68640, saving model to ./model_save/3_5_35_0.6864_0.5301.h5
Epoch 36/200
Epoch 00036: val_loss did not improve from 0.68640
Epoch 37/200
Epoch 00037: val_loss did not improve from 0.68640
Epoch 38/200
Epoch 00038: val_loss improved from 0.68640 to 0.68495, saving model to ./model_save/3_5_38_0.6849_0.5663.h5
Epoch 39/200
Epoch 00039: val_loss did not improve from 0.68495


Epoch 56/200
Epoch 00056: val_loss did not improve from 0.68434
Epoch 57/200
Epoch 00057: val_loss did not improve from 0.68434
Epoch 58/200
Epoch 00058: val_loss did not improve from 0.68434
Epoch 59/200
Epoch 00059: val_loss did not improve from 0.68434
Epoch 60/200
Epoch 00060: val_loss did not improve from 0.68434
Epoch 61/200
Epoch 00061: val_loss did not improve from 0.68434
Epoch 62/200
Epoch 00062: val_loss improved from 0.68434 to 0.68177, saving model to ./model_save/3_5_62_0.6818_0.5783.h5
Epoch 63/200
Epoch 00063: val_loss did not improve from 0.68177
Epoch 64/200
Epoch 00064: val_loss did not improve from 0.68177
Epoch 65/200
Epoch 00065: val_loss did not improve from 0.68177
Epoch 66/200
Epoch 00066: val_loss improved from 0.68177 to 0.68176, saving model to ./model_save/3_5_66_0.6818_0.5542.h5
Epoch 67/200
Epoch 00067: val_loss did not improve from 0.68176
Epoch 68/200
Epoch 00068: val_loss did not improve from 0.68176
Epoch 69/200
Epoch 00069: val_loss did not improve f

Epoch 84/200
Epoch 00084: val_loss did not improve from 0.68136
Epoch 85/200
Epoch 00085: val_loss improved from 0.68136 to 0.67985, saving model to ./model_save/3_5_85_0.6799_0.5422.h5
Epoch 86/200
Epoch 00086: val_loss did not improve from 0.67985
Epoch 87/200
Epoch 00087: val_loss improved from 0.67985 to 0.67974, saving model to ./model_save/3_5_87_0.6797_0.5422.h5
Epoch 88/200
Epoch 00088: val_loss improved from 0.67974 to 0.67857, saving model to ./model_save/3_5_88_0.6786_0.5301.h5
Epoch 89/200
Epoch 00089: val_loss did not improve from 0.67857
Epoch 90/200
Epoch 00090: val_loss did not improve from 0.67857
Epoch 91/200
Epoch 00091: val_loss did not improve from 0.67857
Epoch 92/200
Epoch 00092: val_loss did not improve from 0.67857
Epoch 93/200
Epoch 00093: val_loss did not improve from 0.67857
Epoch 94/200
Epoch 00094: val_loss did not improve from 0.67857
Epoch 95/200
Epoch 00095: val_loss did not improve from 0.67857
Epoch 96/200
Epoch 00096: val_loss did not improve from 0.

In [216]:
history

<tensorflow.python.keras.callbacks.History at 0x17513f39b00>

In [217]:
print("\nAccuracy : %.4f"%(model.evaluate(train_feature,train_class)[1]))


Accuracy : 0.5749


In [218]:
print("\nAccuracy : %.4f"%(model.evaluate(test_feature,test_class)[1]))


Accuracy : 0.4857


테스트 데이터셋에 대한 정확도가 50%도 안됨. 탈락.