## ANN 모델

ANN은 Artificial Neural Network의 약자로, 여기서는 가장 기본적인 Neural Network인 Fully-Connected Layer를 사용하여 모델을 구성하였다. 여러 종류의 Feature를 Input으로 받아들여서 Layer안에 있는 Weight들과 연산했을 때, Target 값에 가장 가까워질 수 있도록 훈련한다. 

### 모델의 목적

-Input data를 통해 AVG, ERA, PCT를 예측하는 것이 목적이다.


### 모델의 사용 이유

-Sports prediction 분야에서 최근 빈번하게 사용되고 있는 prediction방법이다. <br>
-ANN을 사용하면 복잡한 규칙을 가지고 있는 데이터도 훈련이 가능하다. <br>
-여러 층의 weight들을 수정하며 최적의 방식으로 target을 추정할 수 있기 때문이다.  

### 모델의 결과
<p>
[MSE]<br>
승률:0.011329503946875074<br>
타율:0.0005160643474193386<br>
방어율:1.1528867054761496<br>
</p>

### 모델의 한계

Dataset을 그래로 Training 할 때, 상관관계를 통해 전처리를 해준 경우, Network Size를 변경하는 등의 시도를 해보았지만 
성능의 유의미한 차이가 없었다. 

## 0. 패키지 로드

In [1]:
import pandas as pd
import numpy as np

import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.utils import to_categorical

from tensorflow.keras.models import Sequential
# from tensorflow.keras.utils import np_utils
from tensorflow.keras.layers import Dense, LSTM, Dropout
from tensorflow.keras import layers 
import tensorflow.keras.backend as K 

from tensorflow.keras.optimizers import RMSprop, Adam
from tensorflow.keras.callbacks import EarlyStopping

from sklearn.metrics import accuracy_score
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.preprocessing import MinMaxScaler

## 1. 데이터 전처리
<p>Input Features와 Target Data 간의 상관관계가 지나치게 낮은 Feature는 사용하지 않는다. <br>
그리고 음의 상관관계를 보이는 Feature는 -1을 곱하였다. 
</p>

In [2]:
def data_preprocess(x_train, y_train, x_test, y_test, target):
    x_train=x_train.drop(columns=["T_ID", "YEAR"])
    x_test=x_test.drop(columns=["T_ID", "YEAR"])

    x_train['target']=y_train[target]

    corr_x_train=x_train.corr()

    sign=[0 if abs(e)<0.15 else np.sign(e) for e in corr_x_train['target'].to_list()]
    sign=sign[:-1]

    remove_idx=[idx for idx,i in enumerate(sign) if (i==0)]

    x_train=x_train.drop(columns=['target'])
    x_train=x_train.drop(columns=[i for idx,i in enumerate(x_train.columns) if idx in remove_idx])
    x_test=x_test.drop(columns=[i for idx,i in enumerate(x_test.columns) if idx in remove_idx])
    sign=[i for i in sign if i!=0]
    x_train=x_train*sign
    x_test=x_test*sign
    return x_train, x_test

## 2. 모델 정의
<p>Input layer와 Hidden Layer의 사이즈를 모두 1024로 두었다. </p>

In [3]:
def build_model(trainset):
  model = keras.Sequential([
    layers.Dense(1024, activation='relu', input_shape=[len(trainset.keys())]),
    layers.Dense(1024, activation='relu'),
    layers.Dense(1024, activation='relu'),
    layers.Dense(1)
  ])

  optimizer = tf.keras.optimizers.RMSprop(0.001)

  model.compile(loss='mse',
                optimizer=optimizer,
                metrics=['mae', 'mse'])
  return model

## 3. 승률/타율/방어율 예측

### 3.1 승률 예측

In [4]:
PCT_x_train=pd.read_csv("PCT\\PCT_train_x.csv")
PCT_y_train=pd.read_csv("PCT\\PCT_train_y.csv")
PCT_x_test=pd.read_csv("PCT\\PCT_test_x.csv")
PCT_y_test=pd.read_csv("PCT\\PCT_test_y.csv")

In [5]:
PCT_x_train, PCT_x_test=data_preprocess(PCT_x_train, PCT_y_train, PCT_x_test, PCT_y_test, "PCT")

In [6]:
PCT_y_train=PCT_y_train.drop(columns=["T_ID", "YEAR"])

In [7]:
model = build_model(PCT_x_train)
model.summary()

EPOCHS=500
early_stop = EarlyStopping(monitor='loss', mode = 'min',patience=2, verbose=1)

history = model.fit(
  PCT_x_train, PCT_y_train,
  epochs=EPOCHS, validation_split = 0.2, verbose=2,
  callbacks=[early_stop])    

W0927 19:43:24.849918 21432 training.py:504] Falling back from v2 loop because of error: Failed to find data adapter that can handle input: <class 'pandas.core.frame.DataFrame'>, <class 'NoneType'>


Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 1024)              29696     
_________________________________________________________________
dense_1 (Dense)              (None, 1024)              1049600   
_________________________________________________________________
dense_2 (Dense)              (None, 1024)              1049600   
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 1025      
Total params: 2,129,921
Trainable params: 2,129,921
Non-trainable params: 0
_________________________________________________________________
Train on 2336 samples, validate on 584 samples
Epoch 1/500
2336/2336 - 3s - loss: 11.5632 - mae: 0.6810 - mse: 11.5632 - val_loss: 0.0200 - val_mae: 0.1127 - val_mse: 0.0200
Epoch 2/500
2336/2336 - 2s - loss: 0.0582 - mae: 0.1959 - mse: 0.0582 

In [8]:
y_pred = model.predict(PCT_x_test)
PCT_y_test['y_pred']=y_pred
PCT_y_test

W0927 19:44:39.087781 21432 training.py:504] Falling back from v2 loop because of error: Failed to find data adapter that can handle input: <class 'pandas.core.frame.DataFrame'>, <class 'NoneType'>


Unnamed: 0,T_ID,YEAR,PCT,y_pred
0,HH,2016,0.5,0.406899
1,HT,2016,0.458333,0.531205
2,KT,2016,0.291667,0.431979
3,LG,2016,0.608696,0.536605
4,LT,2016,0.5,0.403614
5,NC,2016,0.565217,0.5029
6,OB,2016,0.666667,0.519808
7,SK,2016,0.458333,0.498745
8,SS,2016,0.5,0.433768
9,WO,2016,0.375,0.526162


In [9]:
mse = mean_squared_error(PCT_y_test['PCT'], y_pred)
mse

0.011329503946875074

### 3.2 타율 예측

In [10]:
AVG_x_train=pd.read_csv("AVG\\AVG_train_x.csv")
AVG_y_train=pd.read_csv("AVG\\AVG_train_y.csv")
AVG_x_test=pd.read_csv("AVG\\AVG_test_x.csv")
AVG_y_test=pd.read_csv("AVG\\AVG_test_y.csv")

In [11]:
AVG_x_train, AVG_x_test=data_preprocess(AVG_x_train, AVG_y_train, AVG_x_test, AVG_y_test, "AVG")

In [12]:
AVG_y_train=AVG_y_train.drop(columns=["T_ID", "YEAR"])

In [13]:
model = build_model(AVG_x_train)
model.summary()

EPOCHS=500
early_stop = EarlyStopping(monitor='loss', mode = 'min',patience=2, verbose=1)

history = model.fit(
  AVG_x_train, AVG_y_train,
  epochs=EPOCHS, validation_split = 0.2, verbose=2,
  callbacks=[early_stop])    

W0927 19:44:40.319487 21432 training.py:504] Falling back from v2 loop because of error: Failed to find data adapter that can handle input: <class 'pandas.core.frame.DataFrame'>, <class 'NoneType'>


Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_4 (Dense)              (None, 1024)              28672     
_________________________________________________________________
dense_5 (Dense)              (None, 1024)              1049600   
_________________________________________________________________
dense_6 (Dense)              (None, 1024)              1049600   
_________________________________________________________________
dense_7 (Dense)              (None, 1)                 1025      
Total params: 2,128,897
Trainable params: 2,128,897
Non-trainable params: 0
_________________________________________________________________
Train on 2336 samples, validate on 584 samples
Epoch 1/500
2336/2336 - 4s - loss: 5.8654 - mae: 0.4877 - mse: 5.8654 - val_loss: 0.0023 - val_mae: 0.0413 - val_mse: 0.0023
Epoch 2/500
2336/2336 - 3s - loss: 0.0220 - mae: 0.1090 - mse: 0.0220 

In [14]:
y_pred = model.predict(AVG_x_test)
AVG_y_test['y_pred']=y_pred
AVG_y_test

W0927 19:45:08.109154 21432 training.py:504] Falling back from v2 loop because of error: Failed to find data adapter that can handle input: <class 'pandas.core.frame.DataFrame'>, <class 'NoneType'>


Unnamed: 0,T_ID,YEAR,AVG,y_pred
0,HH,2016,0.288575,0.291614
1,HT,2016,0.256739,0.290338
2,KT,2016,0.295455,0.289613
3,LG,2016,0.296069,0.290718
4,LT,2016,0.309893,0.289685
5,NC,2016,0.28744,0.291122
6,OB,2016,0.298225,0.29121
7,SK,2016,0.305263,0.290506
8,SS,2016,0.283863,0.292028
9,WO,2016,0.289941,0.290544


In [15]:
mse = mean_squared_error(AVG_y_test['AVG'], y_pred)
mse

0.0005160643474193386

### 3.3 방어율 예측

In [16]:
ERA_x_train=pd.read_csv("ERA\\ERA_train_x.csv")
ERA_y_train=pd.read_csv("ERA\\ERA_train_y.csv")
ERA_x_test=pd.read_csv("ERA\\ERA_test_x.csv")
ERA_y_test=pd.read_csv("ERA\\ERA_test_y.csv")

In [17]:
ERA_x_train, ERA_x_test=data_preprocess(ERA_x_train, ERA_y_train, ERA_x_test, ERA_y_test, "ERA")

In [18]:
ERA_y_train=ERA_y_train.drop(columns=["T_ID", "YEAR"])

In [19]:
model = build_model(ERA_x_train)
model.summary()

EPOCHS=500
early_stop = EarlyStopping(monitor='loss', mode = 'min',patience=2, verbose=1)

history = model.fit(
  ERA_x_train, ERA_y_train,
  epochs=EPOCHS, validation_split = 0.2, verbose=2,
  callbacks=[early_stop])    

W0927 19:45:08.721516 21432 training.py:504] Falling back from v2 loop because of error: Failed to find data adapter that can handle input: <class 'pandas.core.frame.DataFrame'>, <class 'NoneType'>


Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_8 (Dense)              (None, 1024)              23552     
_________________________________________________________________
dense_9 (Dense)              (None, 1024)              1049600   
_________________________________________________________________
dense_10 (Dense)             (None, 1024)              1049600   
_________________________________________________________________
dense_11 (Dense)             (None, 1)                 1025      
Total params: 2,123,777
Trainable params: 2,123,777
Non-trainable params: 0
_________________________________________________________________
Train on 2336 samples, validate on 584 samples
Epoch 1/500
2336/2336 - 3s - loss: 5.1150 - mae: 1.3860 - mse: 5.1150 - val_loss: 0.8727 - val_mae: 0.7664 - val_mse: 0.8727
Epoch 2/500
2336/2336 - 2s - loss: 1.6414 - mae: 1.0246 - mse: 1.6414 

In [20]:
y_pred = model.predict(ERA_x_test)
ERA_y_test['y_pred']=y_pred
ERA_y_test

W0927 19:45:53.893691 21432 training.py:504] Falling back from v2 loop because of error: Failed to find data adapter that can handle input: <class 'pandas.core.frame.DataFrame'>, <class 'NoneType'>


Unnamed: 0,T_ID,YEAR,ERA,y_pred
0,HH,2016,5.258114,5.591334
1,HT,2016,4.120827,5.424034
2,KT,2016,6.314516,5.666447
3,LG,2016,3.64977,5.223759
4,LT,2016,5.849294,5.727253
5,NC,2016,3.575342,5.250371
6,OB,2016,4.71028,5.413857
7,SK,2016,5.144882,5.364096
8,SS,2016,5.4,5.706434
9,WO,2016,5.704839,5.47819


In [21]:
mse = mean_squared_error(ERA_y_test['ERA'], y_pred)
mse

1.1528867054761496