# HW3 - Stock Movement Prediction

作業檔案：
- hw3.ipynb: 作業＆報告
- hw3-fb-data-ipynb: 其它資料(facebook)測試 (資料來源：https://www.sharecast.com/index/Nasdaq_100/prices/download)
- hw3-nasdaq-data-ipynb: 其它資料(nasdaq100)測試 (資料來源：https://www.sharecast.com/equity/Facebook_Inc/share-prices/download)

資料：
- train.csv: sp100 訓練資料(2009-2017)
- test.csv: sp100 測試資料(2018)
- fb-train.csv: facebook inc 訓練資料(2009-2017)
- fb-test.csv: facebook inc 測試資料(2018)
- nasdaq-train.csv: nasdaq100 訓練資料(2009-2017)
- nasdaq-test.csv: nasdaq100 測試資料(2018)


In [151]:
# Read data

import pandas as pd
import numpy as np

train_data_path = './train.csv'
test_data_path = './test.csv'

train_df = pd.read_csv(train_data_path)
test_df = pd.read_csv(test_data_path)

print(train_df.shape)
print(train_df.head())

(2264, 6)
          Date  Open Price  Close Price  High Price  Low Price      Volume
0  02-Jan-2009      902.99       931.80      934.73     899.35  4048270080
1  05-Jan-2009      929.17       927.45      936.63     919.53  5413910016
2  06-Jan-2009      931.17       934.70      943.85     927.28  5392620032
3  07-Jan-2009      927.45       906.65      927.45     902.37  4704940032
4  08-Jan-2009      905.73       909.73      910.00     896.81  4991549952


In [152]:
# Drop unnecessary columns

train_df.drop(columns=['Date', 'Volume'], inplace=True) # , 'Volume', 'High Price', 'Low Price'
test_df.drop(columns=['Date', 'Volume'], inplace=True) # , 'Volume', 'High Price', 'Low Price'

print(train_df.shape)
print(train_df.head())

(2264, 4)
   Open Price  Close Price  High Price  Low Price
0      902.99       931.80      934.73     899.35
1      929.17       927.45      936.63     919.53
2      931.17       934.70      943.85     927.28
3      927.45       906.65      927.45     902.37
4      905.73       909.73      910.00     896.81


In [153]:
# Add the column `Tomorrow Movement` by comparing the `Close Price` with previous days as the training target
# Add the column `Tomorrow Open` by shifting the column `Open Price` as one of the new features

train_df['Tomorrow Movement'] = np.where(train_df['Close Price'].diff() >= 0, 1, 0)
test_df['Tomorrow Movement'] = np.where(test_df['Close Price'].diff() >= 0, 1, 0)

train_df['Tomorrow Movement'] = train_df['Tomorrow Movement'].shift(-1)
test_df['Tomorrow Movement'] = test_df['Tomorrow Movement'].shift(-1)

train_df['Tomorrow Open'] = train_df['Open Price'].shift(-1)
test_df['Tomorrow Open'] = test_df['Open Price'].shift(-1)

print(train_df.head())
print(train_df.tail())

   Open Price  Close Price  High Price  Low Price  Tomorrow Movement  \
0      902.99       931.80      934.73     899.35                0.0   
1      929.17       927.45      936.63     919.53                1.0   
2      931.17       934.70      943.85     927.28                0.0   
3      927.45       906.65      927.45     902.37                1.0   
4      905.73       909.73      910.00     896.81                0.0   

   Tomorrow Open  
0         929.17  
1         931.17  
2         927.45  
3         905.73  
4         909.91  
      Open Price  Close Price  High Price  Low Price  Tomorrow Movement  \
2259     2684.22      2683.34     2685.35    2678.13                0.0   
2260     2679.09      2680.50     2682.74    2677.96                1.0   
2261     2682.10      2682.62     2685.64    2678.91                1.0   
2262     2686.10      2687.54     2687.66    2682.69                0.0   
2263     2689.15      2673.61     2692.12    2673.61                NaN   

  

In [154]:
# Add other new features `S_10`, `Corr`, `Open-Close`, `Open-Open` (explanation is described below)

train_df['S_10'] = train_df['Close Price'].rolling(window=10).mean()
train_df['Corr'] = train_df['Close Price'].rolling(window=10).corr(train_df['S_10'])
train_df['Open-Close'] = train_df['Open Price'] - train_df['Close Price'].shift(1)
train_df['Open-Open'] = train_df['Open Price'] - train_df['Open Price'].shift(1)
train_df = train_df.dropna()
new_train_df = train_df.iloc[:,:10]

print(new_train_df.shape)
print(new_train_df.head())

test_df['S_10'] = test_df['Close Price'].rolling(window=10).mean()
test_df['Corr'] = test_df['Close Price'].rolling(window=10).corr(test_df['S_10'])
test_df['Open-Close'] = test_df['Open Price'] - test_df['Close Price'].shift(1)
test_df['Open-Open'] = test_df['Open Price'] - test_df['Open Price'].shift(1)
test_df = test_df.dropna()
new_test_df = test_df.iloc[:,:10]

print(new_test_df.shape)
print(new_test_df.head())

(2245, 10)
    Open Price  Close Price  High Price  Low Price  Tomorrow Movement  \
18      868.89       845.14      868.89     844.15                0.0   
19      845.69       825.88      851.66     821.67                0.0   
20      823.09       825.44      830.78     812.87                1.0   
21      825.69       838.51      842.60     821.98                0.0   
22      837.77       832.23      851.85     829.18                1.0   

    Tomorrow Open     S_10      Corr  Open-Close  Open-Open  
18         845.69  840.028 -0.237592       -5.20      23.16  
19         823.09  838.242 -0.264066        0.55     -23.20  
20         825.69  835.774 -0.501164       -2.79     -22.60  
21         837.77  839.103 -0.121338        0.25       2.60  
22         831.75  838.302 -0.187328       -0.74      12.08  
(233, 10)
    Open Price  Close Price  High Price  Low Price  Tomorrow Movement  \
18     2867.23      2853.53     2870.62    2851.48                0.0   
19     2832.74      28

In [155]:
# Divide x and y data

data_train_x = new_train_df.drop(columns=['Tomorrow Movement'])
data_train_y = new_train_df['Tomorrow Movement']

data_test_x = new_test_df.drop(columns=['Tomorrow Movement'])
data_test_y = new_test_df['Tomorrow Movement']

print(data_train_x.shape)
print(data_train_x.head())
print(data_train_y.shape)
print(data_train_y.head())
print('-----')
print(data_test_x.shape)
print(data_test_x.head())
print(data_test_y.shape)
print(data_test_y.head())

(2245, 9)
    Open Price  Close Price  High Price  Low Price  Tomorrow Open     S_10  \
18      868.89       845.14      868.89     844.15         845.69  840.028   
19      845.69       825.88      851.66     821.67         823.09  838.242   
20      823.09       825.44      830.78     812.87         825.69  835.774   
21      825.69       838.51      842.60     821.98         837.77  839.103   
22      837.77       832.23      851.85     829.18         831.75  838.302   

        Corr  Open-Close  Open-Open  
18 -0.237592       -5.20      23.16  
19 -0.264066        0.55     -23.20  
20 -0.501164       -2.79     -22.60  
21 -0.121338        0.25       2.60  
22 -0.187328       -0.74      12.08  
(2245,)
18    0.0
19    0.0
20    1.0
21    0.0
22    1.0
Name: Tomorrow Movement, dtype: float64
-----
(233, 9)
    Open Price  Close Price  High Price  Low Price  Tomorrow Open      S_10  \
18     2867.23      2853.53     2870.62    2851.48        2832.74  2826.260   
19     2832.74      28

# Logistic Regression

In [156]:
from sklearn.linear_model import LogisticRegression, SGDClassifier
from sklearn.metrics import accuracy_score

lr_model = LogisticRegression(max_iter=500)
#lr_model = SGDClassifier(loss='log',  max_iter=800)
lr_model.fit(data_train_x, data_train_y)

predict_train_y = lr_model.predict(data_train_x)
print('training accuracy:')
print(accuracy_score(data_train_y, predict_train_y))

lr_predict_test_y = lr_model.predict(data_test_x)
print('\ntesting accuracy:')
print(accuracy_score(data_test_y, lr_predict_test_y))

print('\ntesting result prob:')
print(lr_model.predict_proba(data_test_x))

print('\npredicted testing labels:')
print(lr_predict_test_y)

training accuracy:
0.6806236080178174

testing accuracy:
0.6995708154506438

testing result prob:
[[9.99646050e-01 3.53949537e-04]
 [1.30871704e-02 9.86912830e-01]
 [9.55789009e-01 4.42109907e-02]
 [9.93249012e-01 6.75098796e-03]
 [9.99653096e-01 3.46903675e-04]
 [9.99996968e-01 3.03189316e-06]
 [8.31329046e-01 1.68670954e-01]
 [7.14412426e-02 9.28558757e-01]
 [1.27243879e-04 9.99872756e-01]
 [2.83113759e-03 9.97168862e-01]
 [9.77539345e-01 2.24606553e-02]
 [9.86851773e-01 1.31482267e-02]
 [2.02623266e-03 9.97973767e-01]
 [8.71146376e-01 1.28853624e-01]
 [9.68646596e-01 3.13534038e-02]
 [1.37960699e-01 8.62039301e-01]
 [2.18477456e-02 9.78152254e-01]
 [1.08120745e-02 9.89187925e-01]
 [2.77545360e-02 9.72245464e-01]
 [4.53745154e-01 5.46254846e-01]
 [1.57663168e-02 9.84233683e-01]
 [3.71201144e-01 6.28798856e-01]
 [9.99420736e-01 5.79263970e-04]
 [9.82233922e-01 1.77660781e-02]
 [1.77200870e-02 9.82279913e-01]
 [9.99279116e-01 7.20884439e-04]
 [7.41259764e-02 9.25874024e-01]
 [4.7700822

In [157]:
# Print precision, recall, fbeta-score and confusion matrix

from sklearn.metrics import precision_recall_fscore_support
from sklearn.metrics import confusion_matrix

print('precision, recall, fbeta-score:')
print(precision_recall_fscore_support(data_test_y, lr_predict_test_y, average='weighted'))
print('\nconfusion matrix(tn, fp, fn, tp):')
tn, fp, fn, tp = confusion_matrix(data_test_y, lr_predict_test_y).ravel()
print((tn, fp, fn, tp))

precision, recall, fbeta-score:
(0.7017212559295234, 0.6995708154506438, 0.6980889477179236, None)

confusion matrix(tn, fp, fn, tp):
(72, 42, 28, 91)


# SVM

In [158]:
# Normalize data

from sklearn.preprocessing import MinMaxScaler, RobustScaler

#scaler = MinMaxScaler()
scaler = RobustScaler()
scaler.fit(data_train_x) #scaler.fit(train_df.append(test_df, ignore_index=True))

train_normalize = scaler.transform(data_train_x)
train_normalize = np.transpose(train_normalize)

normalize_train_x = pd.DataFrame({
    'Open Price': train_normalize[0],
    'Close Price': train_normalize[1],
    'High Price': train_normalize[2],
    'Low Price': train_normalize[3],
    'Tomorrow Open': train_normalize[4],
    'S_10': train_normalize[5],
    'Corr': train_normalize[6],
    'Open-Close': train_normalize[7],
    'Open-Open': train_normalize[8],
})

test_normalize = scaler.transform(data_test_x)
test_normalize = np.transpose(test_normalize)
normalize_test_x = pd.DataFrame({
    'Open Price': test_normalize[0],
    'Close Price': test_normalize[1],
    'High Price': test_normalize[2],
    'Low Price': test_normalize[3],
    'Tomorrow Open': test_normalize[4],
    'S_10': test_normalize[5],
    'Corr': test_normalize[6],
    'Open-Close': test_normalize[7],
    'Open-Open': test_normalize[8],
})

data_train_y = np.where(data_train_y == 0, -1, 1)
data_test_y = np.where(data_test_y == 0, -1, 1)

print(normalize_train_x.head())
print(data_train_y[:5])

   Close Price      Corr  High Price  Low Price  Open Price  Open-Close  \
0    -0.984993 -1.076844   -0.963346  -0.977122   -0.952458   -2.492823   
1    -1.008416 -1.113485   -0.984314  -1.004493   -0.980653    0.258373   
2    -1.008951 -1.441628   -1.009723  -1.015207   -1.008118   -1.339713   
3    -0.993056 -0.915948   -0.995339  -1.004115   -1.004958    0.114833   
4    -1.000693 -1.007278   -0.984083  -0.995349   -0.990278   -0.358852   

   Open-Open      S_10  Tomorrow Open  
0   1.572977 -0.979571      -0.981240  
1  -1.776734 -0.981731      -1.008700  
2  -1.733382 -0.984715      -1.005541  
3   0.087428 -0.980690      -0.990863  
4   0.772399 -0.981658      -0.998177  
[-1 -1  1 -1  1]


In [159]:
from sklearn.svm import SVC

svc_model = SVC(kernel='linear', C=3000, tol=1e-5)
svc_model.fit(normalize_train_x, data_train_y)

predict_train_y = svc_model.predict(normalize_train_x)
print('training accuracy:')
print(accuracy_score(data_train_y, predict_train_y))

svc_predict_test_y = svc_model.predict(normalize_test_x)
print('\ntesting accuracy:')
print(accuracy_score(data_test_y, svc_predict_test_y))
print(svc_predict_test_y)

training accuracy:
0.6561247216035635

testing accuracy:
0.703862660944206
[-1  1 -1 -1 -1 -1 -1  1  1  1 -1 -1  1 -1 -1  1  1  1  1 -1  1  1 -1 -1
  1 -1  1  1  1  1  1  1  1 -1  1 -1 -1  1  1  1 -1  1 -1  1 -1  1 -1  1
  1 -1  1  1  1  1  1 -1 -1  1  1 -1  1  1  1 -1 -1 -1 -1  1 -1  1  1 -1
  1 -1  1 -1 -1  1  1 -1 -1 -1 -1  1 -1  1  1  1  1  1 -1  1  1  1  1 -1
 -1 -1  1  1  1 -1  1  1 -1  1 -1  1  1 -1  1  1 -1  1 -1 -1 -1  1 -1  1
 -1  1 -1 -1  1 -1  1  1 -1  1 -1  1 -1 -1 -1  1  1 -1  1 -1  1  1 -1 -1
  1  1  1  1 -1 -1 -1 -1 -1 -1  1 -1 -1  1  1 -1  1  1  1  1 -1  1  1  1
 -1  1 -1  1 -1  1 -1 -1 -1 -1  1 -1  1 -1 -1  1  1 -1 -1  1 -1  1 -1  1
  1  1  1 -1  1 -1 -1 -1  1  1 -1 -1 -1 -1  1 -1  1 -1  1 -1  1  1 -1  1
 -1 -1 -1  1  1  1 -1 -1  1 -1 -1 -1 -1  1 -1  1  1]


In [160]:
# Print precision, recall, fbeta-score and confusion matrix

from sklearn.metrics import precision_recall_fscore_support
from sklearn.metrics import confusion_matrix

print('precision, recall, fbeta-score:')
print(precision_recall_fscore_support(data_test_y, svc_predict_test_y, average='weighted'))
print('\nconfusion matrix(tn, fp, fn, tp):')
tn, fp, fn, tp = confusion_matrix(data_test_y, svc_predict_test_y).ravel()
print((tn, fp, fn, tp))

precision, recall, fbeta-score:
(0.7038341373808904, 0.703862660944206, 0.703731590476021, None)

confusion matrix(tn, fp, fn, tp):
(78, 36, 33, 86)


# Neural Network

In [161]:
left_col = pd.DataFrame(data=np.where(data_train_y == -1, 1, 0)[:])
data_train_y = pd.DataFrame(data=np.where(data_train_y == -1, 0, 1)[:])
data_train_y = pd.concat( [ left_col, data_train_y ], axis=1, ignore_index=True )

left_col = pd.DataFrame(data=np.where(data_test_y == -1, 1, 0)[:])
data_test_y = pd.DataFrame(data=np.where(data_test_y == -1, 0, 1)[:])
data_test_y = pd.concat( [ left_col, data_test_y ], axis=1, ignore_index=True )

print(normalize_train_x.shape)
print(normalize_train_x.head())

print(data_train_y.shape)
print(data_train_y.head())

(2245, 9)
   Close Price      Corr  High Price  Low Price  Open Price  Open-Close  \
0    -0.984993 -1.076844   -0.963346  -0.977122   -0.952458   -2.492823   
1    -1.008416 -1.113485   -0.984314  -1.004493   -0.980653    0.258373   
2    -1.008951 -1.441628   -1.009723  -1.015207   -1.008118   -1.339713   
3    -0.993056 -0.915948   -0.995339  -1.004115   -1.004958    0.114833   
4    -1.000693 -1.007278   -0.984083  -0.995349   -0.990278   -0.358852   

   Open-Open      S_10  Tomorrow Open  
0   1.572977 -0.979571      -0.981240  
1  -1.776734 -0.981731      -1.008700  
2  -1.733382 -0.984715      -1.005541  
3   0.087428 -0.980690      -0.990863  
4   0.772399 -0.981658      -0.998177  
(2245, 2)
   0  1
0  1  0
1  1  0
2  0  1
3  1  0
4  0  1


In [166]:
import torch
import torch.nn.functional as F
from sklearn.metrics import accuracy_score

class M_NN(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        super(M_NN, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, H)
        self.linear2 = torch.nn.Linear(H, D_out)

    def forward(self, x):
        h = self.linear1(x)
        acti_out = F.relu(h)
        y_pred = self.linear2(h) #.clamp(0,1)
        return y_pred


# N is batch size
N, D_in, H, D_out = 300, 9, 100, 2

model = M_NN(D_in, H, D_out)
criterion = torch.nn.BCEWithLogitsLoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

for t in range(20000):
    for batch_num in range(N, len(normalize_train_x), N):
        
        y_pred = model(torch.FloatTensor(normalize_train_x[batch_num-N:batch_num].values.tolist()))

        loss = criterion(y_pred, torch.FloatTensor(data_train_y[batch_num-N:batch_num].values.tolist()))

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
    if (t%100 == 0):
        print(t, loss.item())

0 418.63055419921875
100 415.57806396484375
200 414.8939514160156
300 414.54803466796875
400 414.3395690917969
500 414.1991882324219
600 414.096923828125
700 414.0181579589844
800 413.9545593261719
900 413.9014587402344
1000 413.8555908203125
1100 413.81439208984375
1200 413.7757568359375
1300 413.7384338378906
1400 413.7011413574219
1500 413.66290283203125
1600 413.62298583984375
1700 413.5809326171875
1800 413.5362854003906
1900 413.4889831542969
2000 413.43878173828125
2100 413.38543701171875
2200 413.3291015625
2300 413.26959228515625
2400 413.2067565917969
2500 413.1407165527344
2600 413.0715637207031
2700 412.9991149902344
2800 412.92333984375
2900 412.8443908691406
3000 412.7621154785156
3100 412.6766357421875
3200 412.5877990722656
3300 412.4957275390625
3400 412.40045166015625
3500 412.3016357421875
3600 412.1995849609375
3700 412.09423828125
3800 411.98553466796875
3900 411.8735046386719
4000 411.7581481933594
4100 411.6396179199219
4200 411.51788330078125
4300 411.3930053710

In [167]:
nn_predict_train_y = model.forward( torch.FloatTensor(normalize_train_x.values.tolist()))
result_train = np.where(nn_predict_train_y[:, 0] > nn_predict_train_y[:, 1], 1, 0)
print('training accuracy:')
print(accuracy_score(data_train_y[0], result_train))

nn_predict_y = model.forward( torch.FloatTensor(normalize_test_x.values.tolist()))
result = np.where(nn_predict_y[:, 0] > nn_predict_y[:, 1], 1, 0)
print('\ntesting accuracy:')
print(accuracy_score(data_test_y[0], result))

print('predicted testing prob:')
print(nn_predict_y)
print('predicted testing labels:')
print(result)

# Print precision, recall, fbeta-score and confusion matrix

from sklearn.metrics import precision_recall_fscore_support
from sklearn.metrics import confusion_matrix

print('precision, recall, fbeta-score:')
print(precision_recall_fscore_support(data_test_y[0], result, average='weighted'))
print('\nconfusion matrix(tn, fp, fn, tp):')
tn, fp, fn, tp = confusion_matrix(data_test_y[0], result).ravel()
print((tn, fp, fn, tp))

training accuracy:
0.5826280623608018

testing accuracy:
0.5879828326180258
predicted testing prob:
tensor([[ 8.6088e-01, -8.6018e-01],
        [-1.6110e+00,  1.6121e+00],
        [-3.6443e-01,  3.6419e-01],
        [ 3.7257e-01, -3.7143e-01],
        [ 8.9495e-01, -8.9405e-01],
        [ 2.2214e+00, -2.2190e+00],
        [-4.3752e-01,  4.3911e-01],
        [-1.4441e+00,  1.4464e+00],
        [-3.0387e+00,  3.0390e+00],
        [-2.8699e+00,  2.8675e+00],
        [-3.8798e-01,  3.8795e-01],
        [ 3.9464e-03, -3.2797e-03],
        [-2.2361e+00,  2.2372e+00],
        [-9.6622e-01,  9.6512e-01],
        [ 2.1817e-01, -2.1661e-01],
        [-1.0329e+00,  1.0339e+00],
        [-1.3235e+00,  1.3250e+00],
        [-1.6270e+00,  1.6276e+00],
        [-1.7154e+00,  1.7152e+00],
        [-1.0837e+00,  1.0836e+00],
        [-1.7026e+00,  1.7034e+00],
        [-8.5092e-01,  8.5128e-01],
        [ 8.2784e-01, -8.2737e-01],
        [ 1.5480e-01, -1.5382e-01],
        [-1.8293e+00,  1.8304e+00],


# Discussion
## How did you preprocess this dataset ?
欄位說明：
- data x:
    - Open Price: 前一天的開盤價
    - Close Price: 前一天的收盤價
    - High Price: 前一天的最高價
    - Low Price: 前一天的最低價
    - Tomorrow Open: 當天的開盤價
    - S_10: 前十天的平均收盤價
    - Corr: 前一天收盤價與前十天平均收盤價的相關度
    - Open-Close: 前一天的開盤價與更前一天的收盤價差
    - Open-Open: 前一天的開盤價與更前一天的開盤價差
- data y:
    - Tomorrow Movement: 隔天的收盤價漲跌(1為漲, 0或-1為跌)

三種分類器中，只有 SVM 與 NN 使用 normalize 過的 `data x`

## Which classifier reaches the highest classification accuracy in this dataset ?
SVM 跑測試資料的 accuracy 最高（可以從上面跑的結果看到各個分類器跑測試資料的 precision, recall, fbeta-score 和 confusion matrix），但其實跑出來的結果也跟 logistic regression 的 accuracy 很接近，可能因為這次使用的 SVM kernel 也是 linear 的。

nn 跑出來的結果從 confusion matrix 的 true-negative 跟 false-negative 的數量都很多，就可以知道它大部分都預測收盤價下跌，precision 也接近 0.7 只是因為它都預測 negative 所以 fp 也很少。

svm 可以表現比較好可能也因為資料的 feature 很少，不到 10 個，svm 就可以比較容易找到 hyperplane。

三個同樣的分類器跑其它股價資料的 accuracy 也都跟跑 sp100 資料的 accuracy 差不多（執行的過程可以看另外兩個檔案： `hw3-nasdaq-data.ipynb` 和 `hw3-fb-data.ipynb`）都是 svm 跟 logistic regression 表現差不多，accuracy 都接近 0.7，而 nn 的 accuracy 最低。

- 跑fb資料結果（訓練資料1400筆左右）：
    - logistic regression
        - training accuracy:
        0.6437275985663082

        - testing accuracy:
        0.6824034334763949
        
        - precision, recall, fbeta-score:
        (0.6849727015269627, 0.6824034334763949, 0.6804216988464749, None)

        - confusion matrix(tn, fp, fn, tp):
        (69, 45, 29, 90)
    - svm
        - training accuracy:
        0.6379928315412187
        
        - testing accuracy:
        0.6824034334763949
        
        - precision, recall, fbeta-score:
        (0.6831712054788209, 0.6824034334763949, 0.6815222651392658, None)

        - confusion matrix(tn, fp, fn, tp):
        (72, 42, 32, 87)
    - nn
        - training accuracy:
        0.617921146953405

        - testing accuracy:
        0.6437768240343348

        - precision, recall, fbeta-score:
        (0.6145076659593196, 0.6094420600858369, 0.6021943961339545, None)

        - confusion matrix(tn, fp, fn, tp):
        (88, 31, 60, 54)
- 跑nasdaq100資料結果（訓練資料2250筆左右）：
    - logistic regression
        - training accuracy:
        0.6898395721925134

        - testing accuracy:
        0.6952789699570815
        
        - precision, recall, fbeta-score:
        (0.6953338947871519, 0.6952789699570815, 0.6942611409862763, None)

        - confusion matrix(tn, fp, fn, tp):
        (71, 40, 31, 91)
    - svm
        - training accuracy:
        0.6836007130124777

        - testing accuracy:
        0.6952789699570815
        
        - precision, recall, fbeta-score:
        (0.695036719122556, 0.6952789699570815, 0.6950423721854327, None)

        - confusion matrix(tn, fp, fn, tp):
        (74, 37, 34, 88)
    - nn
        - training accuracy:
        0.625222816399287

        - testing accuracy:
        0.6137339055793991

        - precision, recall, fbeta-score:
        (0.6714440817799076, 0.6137339055793991, 0.5650350410170044, None)

        - confusion matrix(tn, fp, fn, tp):
        (113, 9, 81, 30)
        
## How did you improve your classifiers ?
- logistic regression：
    把參數`max_iter`調大，讓它把訓練資料多訓練幾次，accuracy 會有比較明顯的增加。

- svm：
    試過各個 kernel function，發現 `linear` kernel 表現較好，將`C`值調大，讓它分錯資料時會有較大的 error，可以找到較適合區分資料的 hyperplane。

- nn：
    一開始會一直都預測 0，試過一些 activation layer，最後加了一層 relu，再調整 batch_size, hidden_size 跟調大 iteration 次數，它預測1的次數才有多一點點但是 accuracy 還是跟 svm 或 logistic regression 差很多。
