-----
<h1><font color="#f37626">[Experiment]</font> tensorflow-autolog 예제</h1>

- 보다 상세한 Accuinsight 파이썬 패키지 사용법은 [Accuinsight 안내 홈페이지](https://accuinsight.cloudz.co.kr/#/intro) 또는 [Accuinsight Youtube 채널](https://www.youtube.com/channel/UChFs-FAVxgG4C00h8C1MqoA)을 참조하시기 바랍니다.
- Accuinsight 패키지를 사용한 분석 코드는 [Accuinsight-github](https://github.com/AccuInsight/accuinsight_Lifecycle_example)에서 조회 가능합니다.

###  # 보스턴 주택 가격 예측
----

### 1. Import modules

In [1]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

import pandas as pd
import numpy as np

### 2. 데이터셋이 저장되어 있는 스토리지 정보 입력 

> __(Case 1) Accuinsight+ Pipeline에서 전처리한 데이터를 HDFS에 저장한 경우__

In [None]:
from Accuinsight.Lifecycle.tensorflow import accuinsight

accu = accuinsight()

accu.set_storage(hdfs_uri = 'HDFS URI',
                 target = 'MEDV',          # 보스턴 주택 가격 예측의 target 변수명: <MEDV> 
                 save_json = True)         # True: 입력한 storage 접속 정보를 <runs/sotrage-info-json>에 저장함

> __(Case 2) AWS의 S3에 저장된 데이터를 사용할 경우__

In [2]:
from Accuinsight.Lifecycle.tensorflow import accuinsight

accu = accuinsight()

accu.set_storage(access_key = 'your access_key',
                 secret_key = 'your secret_key',
                 region = 's3 region info',
                 bucket_name = 'bucket name',
                 file_path = 'data file path',
                 target = 'MEDV',                 # 보스턴 주택 가격 예측의 target 변수명: <MEDV> 
                 save_json = True)                # True: 입력한 storage 접속 정보를 <runs/sotrage-info-json>에 저장함

### 3. 데이터 다운로드

In [3]:
accu.get_file()

Downloading file... boston_data.csv 

/home/work/data_from_aws/boston_data_20210305.csv


In [4]:
boston = pd.read_csv('file_path', index_col = 0)

In [5]:
boston.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33,36.2


### 4. 데이터 분할(train, validation, test) 

In [6]:
from sklearn.model_selection import train_test_split

boston_train, boston_valid = train_test_split(boston, test_size = 0.4)
boston_valid, boston_test = train_test_split(boston_valid, test_size = 0.5) 

In [7]:
y_train = boston_train.loc[:, 'MEDV']
y_valid = boston_valid.loc[:, 'MEDV']
y_test = boston_test.loc[:, 'MEDV']

In [8]:
X_train = boston_train.drop(['MEDV'], axis = 1)
X_valid = boston_valid.drop(['MEDV'], axis = 1)
X_test = boston_test.drop(['MEDV'], axis = 1)

In [9]:
X_train.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 303 entries, 73 to 305
Data columns (total 13 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   CRIM     303 non-null    float64
 1   ZN       303 non-null    float64
 2   INDUS    303 non-null    float64
 3   CHAS     303 non-null    float64
 4   NOX      303 non-null    float64
 5   RM       303 non-null    float64
 6   AGE      303 non-null    float64
 7   DIS      303 non-null    float64
 8   RAD      303 non-null    float64
 9   TAX      303 non-null    float64
 10  PTRATIO  303 non-null    float64
 11  B        303 non-null    float64
 12  LSTAT    303 non-null    float64
dtypes: float64(13)
memory usage: 33.1 KB


### 5. Normalization

In [10]:
train_stats = X_train.describe().transpose()
train_stats

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
CRIM,303.0,3.819962,8.568206,0.00632,0.08339,0.26169,4.548895,88.9762
ZN,303.0,11.768977,24.073379,0.0,0.0,0.0,12.5,95.0
INDUS,303.0,11.261452,6.812454,0.46,5.19,9.9,18.1,27.74
CHAS,303.0,0.075908,0.265288,0.0,0.0,0.0,0.0,1.0
NOX,303.0,0.559094,0.118229,0.389,0.451,0.538,0.647,0.871
RM,303.0,6.258977,0.690129,3.561,5.8715,6.211,6.63,8.78
AGE,303.0,67.814851,29.105241,2.9,42.0,79.2,93.55,100.0
DIS,303.0,3.712902,2.040878,1.1296,2.0643,3.2157,5.10855,10.7103
RAD,303.0,10.207921,9.082386,1.0,4.0,5.0,24.0,24.0
TAX,303.0,423.247525,173.533099,188.0,282.5,348.0,666.0,711.0


In [11]:
def norm(x):
    return (x - train_stats['mean']) / train_stats['std']
normed_train = norm(X_train)
normed_valid = norm(X_valid)
normed_test = norm(X_test)

NumExpr defaulting to 8 threads.


### 6. 모델 구성 (+ 하이퍼파라미터 설정)

> __autoDL에서 최적화할 Hyperparameter 지정__

In [12]:
learning_rate = 0.01
num_nodes = 100

In [13]:
def build_model():
    model = keras.Sequential([
        layers.Dense(64, activation='relu', input_shape=[len(X_train.keys())]),
        layers.Dense(num_nodes, activation='relu'),
        layers.Dense(1)
    ])

    optimizer = tf.keras.optimizers.RMSprop(learning_rate)

    model.compile(loss='mse',
                  optimizer=optimizer,
                  metrics=['mae', 'mse'])
    return model

In [14]:
model = build_model()

### 7. (optional) Slack 

In [None]:
token = 'your slack token'
cid = 'your slack channel id'

accu.set_slack(token = token, channel_id = cid)

##accu.send_message(theresholds = 0.1)
accu.send_message(message = 'AccuInsight+ 모델 학습 완료')

### 8. autolog() 실행
- `autolog()`에 현재 학습할 모델에 관한 간단한 tag를 입력할 수 있습니다.
- `autolog()`는 반드시 모델 학습(model.fit()) __이전에__ 호출이 되어야 합니다.

> __모델 학습이 완료되면 `autolog()`는 자동으로 해제됩니다.__  
따라서 모델 학습 이력을 추가하고자 할 경우, 다시 한 번 `autolog()` 호출 후 모델 훈련을 진행해야 합니다.

In [15]:
### model_monitor = False
#accu.autolog('boston-house-pricing', best_weights = True)  

### model_monitor = True
accu.autolog('boston-house-pricing', best_weights = True, model_monitor = True)  

### 9. 모델 학습
- autolog()를 사용하기 위해서는 `validation_data`를 반드시 지정해주어야 합니다.

In [16]:
model.fit(normed_train, y_train,
          epochs=10,
          validation_data = (normed_valid, y_valid))

Using autolog(best_weights=True, model_monitor=True)


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10

Using epoch 00010 with val_mae: 2.84556


<tensorflow.python.keras.callbacks.History at 0x7f6bb42b2cd0>

### 10. 저장된 모델 불러오기
- `autolog()`를 사용하여 모델의 학습 이력을 Lifecycle에 기록할 경우, 자동으로 훈련 도중 가장 좋은 metric을 기록한 epoch에서의 모델 가중치가 저장됩니다.
- 따라서 저장된 모델을 불러와 공동 작업자들과 모델을 공유하거나, 모델 재학습을 수행할 수 있습니다.

    1. Accuinsight+ workspace list 혹은 해당 모델의 상세화면으로 접속하여 _Experiment_ 중 불러오고자 하는 모델의 __Run name__을 복사합니다.
    2. ___utils___에서 `load_model()` 함수를 호출하여 모델을 불러올 수 있습니다.

In [17]:
from Accuinsight.Lifecycle.utils import load_model

saved_model = load_model('tf.keras-A759BECB75334FEFB51D6CE9165F120B_209')

In [18]:
saved_model.get_weights()[0]

array([[-1.16477117e-01, -1.07975872e-02,  4.38557714e-02,
        -2.41911143e-01, -2.23871455e-01,  4.92228493e-02,
        -1.14615306e-01, -3.80154967e-01, -3.94537956e-01,
        -2.02209339e-01,  1.79499562e-03, -1.61164969e-01,
        -2.86987364e-01, -2.31358826e-01,  7.93363824e-02,
         1.22414276e-01,  5.14739659e-04, -2.62934476e-01,
         2.18358606e-01,  1.07690334e-01, -4.87192310e-02,
         1.92605034e-02, -9.25729647e-02,  3.53003480e-02,
         4.83748727e-02,  1.48784474e-01, -7.69519247e-03,
        -3.23701471e-01, -7.90839046e-02, -4.88079563e-02,
        -2.89710552e-01, -6.31012022e-02, -3.78027827e-01,
        -2.53585428e-01, -2.62530565e-01, -1.27450684e-02,
        -7.39632994e-02, -1.85526371e-01, -1.72656313e-01,
        -7.82156587e-02, -1.68233395e-01,  2.40602950e-03,
        -1.24297708e-01,  7.47181848e-02, -2.32554942e-01,
        -1.33196503e-01,  2.57764775e-02, -2.58293986e-01,
         7.90107399e-02,  5.59200607e-02,  1.33516803e-0

-------
### AutoDL에 사용할 데이터 npy 형식으로 저장
- 반드시 `filestorage`에 저장해야 함

In [19]:
X_train.to_csv('filestorage/X_train.csv')
y_train.to_csv('filestorage/y_train.csv')
X_valid.to_csv('filestorage/X_valid.csv')
y_valid.to_csv('filestorage/y_valid.csv')
X_test.to_csv('filestorage/X_test.csv')
y_test.to_csv('filestorage/y_test.csv')