# Artificial Intelligence Nanodegree
## Deep Neural Networks
----

## Perceptrons
![dnn_0.png](images/dnn_0.png)

## Perceptrons as Logical Operators

#### AND
![dnn_1.png](images/dnn_1.png)

In [1]:
import pandas as pd

# TODO: Set weight1, weight2, and bias
weight1 = 0.5
weight2 = 0.5
bias = -0.6


# DON'T CHANGE ANYTHING BELOW
# Inputs and outputs
test_inputs = [(0, 0), (0, 1), (1, 0), (1, 1)]
correct_outputs = [False, False, False, True]
outputs = []

# Generate and check output
for test_input, correct_output in zip(test_inputs, correct_outputs):
    linear_combination = weight1 * test_input[0] + weight2 * test_input[1] + bias
    output = int(linear_combination >= 0)
    is_correct_string = 'Yes' if output == correct_output else 'No'
    outputs.append([test_input[0], test_input[1], linear_combination, output, is_correct_string])

# Print output
num_wrong = len([output[4] for output in outputs if output[4] == 'No'])
output_frame = pd.DataFrame(outputs, columns=['Input 1', '  Input 2', '  Linear Combination', '  Activation Output', '  Is Correct'])
if not num_wrong:
    print('Nice!  You got it all correct.\n')
else:
    print('You got {} wrong.  Keep trying!\n'.format(num_wrong))
print(output_frame.to_string(index=False))

Nice!  You got it all correct.

Input 1    Input 2    Linear Combination    Activation Output   Is Correct
      0          0                  -0.6                    0          Yes
      0          1                  -0.1                    0          Yes
      1          0                  -0.1                    0          Yes
      1          1                   0.4                    1          Yes


#### OR
![dnn_2.png](images/dnn_2.png)

#### NOT

In [2]:
import pandas as pd

# TODO: Set weight1, weight2, and bias
weight1 = 0.0
weight2 = -0.5
bias = 0.4


# DON'T CHANGE ANYTHING BELOW
# Inputs and outputs
test_inputs = [(0, 0), (0, 1), (1, 0), (1, 1)]
correct_outputs = [True, False, True, False]
outputs = []

# Generate and check output
for test_input, correct_output in zip(test_inputs, correct_outputs):
    linear_combination = weight1 * test_input[0] + weight2 * test_input[1] + bias
    output = int(linear_combination >= 0)
    is_correct_string = 'Yes' if output == correct_output else 'No'
    outputs.append([test_input[0], test_input[1], linear_combination, output, is_correct_string])

# Print output
num_wrong = len([output[4] for output in outputs if output[4] == 'No'])
output_frame = pd.DataFrame(outputs, columns=['Input 1', '  Input 2', '  Linear Combination', '  Activation Output', '  Is Correct'])
if not num_wrong:
    print('Nice!  You got it all correct.\n')
else:
    print('You got {} wrong.  Keep trying!\n'.format(num_wrong))
print(output_frame.to_string(index=False))

Nice!  You got it all correct.

Input 1    Input 2    Linear Combination    Activation Output   Is Correct
      0          0                   0.4                    1          Yes
      0          1                  -0.1                    0          Yes
      1          0                   0.4                    1          Yes
      1          1                  -0.1                    0          Yes


#### XOR
![dnn_3.png](images/dnn_3.png)
![dnn_4.png](images/dnn_4.png)

## Perceptron Trick

무작위로 선을 긋고, learning rate로 조금씩 이동시켜 적절한 구분선을 구한다.

## Perceptron Algorithm

In [3]:
import numpy as np
# Setting the random seed, feel free to change it and see different solutions.
np.random.seed(42)

def stepFunction(t):
    if t >= 0:
        return 1
    return 0

def prediction(X, W, b):
    return stepFunction((np.matmul(X,W)+b)[0])

# TODO: Fill in the code below to implement the perceptron trick.
# The function should receive as inputs the data X, the labels y,
# the weights W (as an array), and the bias b,
# update the weights and bias W, b, according to the perceptron algorithm,
# and return W and b.
def perceptronStep(X, y, W, b, learn_rate = 0.01):
    for i in range(len(X)):
        y_hat = prediction(X[i],W,b)
        if y[i]-y_hat == 1:
            W[0] += X[i][0]*learn_rate
            W[1] += X[i][1]*learn_rate
            b += learn_rate
        elif y[i]-y_hat == -1:
            W[0] -= X[i][0]*learn_rate
            W[1] -= X[i][1]*learn_rate
            b -= learn_rate
    return W, b
    
# This function runs the perceptron algorithm repeatedly on the dataset,
# and returns a few of the boundary lines obtained in the iterations,
# for plotting purposes.
# Feel free to play with the learning rate and the num_epochs,
# and see your results plotted below.
def trainPerceptronAlgorithm(X, y, learn_rate = 0.01, num_epochs = 25):
    x_min, x_max = min(X.T[0]), max(X.T[0])
    y_min, y_max = min(X.T[1]), max(X.T[1])
    W = np.array(np.random.rand(2,1))
    b = np.random.rand(1)[0] + x_max
    # These are the solution lines that get plotted below.
    boundary_lines = []
    for i in range(num_epochs):
        # In each epoch, we apply the perceptron step.
        W, b = perceptronStep(X, y, W, b, learn_rate)
        boundary_lines.append((-W[0]/W[1], -b/W[1]))
    return boundary_lines


## Discrete vs Continuous

![dnn_5.png](images/dnn_5.png)

## Softmax

exp로 항상 양수로 만든다.

![dnn_6.png](images/dnn_6.png)

In [4]:
import numpy as np

# Write a function that takes as input a list of numbers, and returns
# the list of values given by the softmax function.
def softmax(L):
    expL = np.exp(L)
    sumExpL = sum(expL)
    result = []
    for i in expL:
        result.append(i*1.0/sumExpL)
    return result

## Cross-Entropy

손실 함수. -ln (자연로그)를 씌워 구한다. 크로스 엔트로피가 낮을 수록 좋은 모델    
Goal : Minimize the Cross Entropy

![dnn_7.png](images/dnn_7.png)

In [5]:
import numpy as np

# Write a function that takes as input two lists Y, P,
# and returns the float corresponding to their cross-entropy.
def cross_entropy(Y, P):
    Y = np.float_(Y)
    P = np.float_(P)
    
    return -np.sum(Y * np.log(P) + (1 - Y) * np.log(1 - P))

## Logistic Regression

![dnn_8.png](images/dnn_8.png)

## Logistic Regression Algorithm

In [6]:
import numpy as np
# Setting the random seed, feel free to change it and see different solutions.
np.random.seed(42)

def sigmoid(x):
    return 1/(1+np.exp(-x))
def sigmoid_prime(x):
    return sigmoid(x)*(1-sigmoid(x))
def prediction(X, W, b):
    return sigmoid(np.matmul(X,W)+b)
def error_vector(y, y_hat):
    return [-y[i]*np.log(y_hat[i]) - (1-y[i])*np.log(1-y_hat[i]) for i in range(len(y))]
def error(y, y_hat):
    ev = error_vector(y, y_hat)
    return sum(ev)/len(ev)

# TODO: Fill in the code below to calculate the gradient of the error function.
# The result should be a list of three lists:
# The first list should contain the gradient (partial derivatives) with respect to w1
# The second list should contain the gradient (partial derivatives) with respect to w2
# The third list should contain the gradient (partial derivatives) with respect to b
def dErrors(X, y, y_hat):
    DErrorsDx1 = [X[i][0]*(y[i]-y_hat[i]) for i in range(len(y))]
    DErrorsDx2 = [X[i][1]*(y[i]-y_hat[i]) for i in range(len(y))]
    DErrorsDb = [y[i]-y_hat[i] for i in range(len(y))]
    return DErrorsDx1, DErrorsDx2, DErrorsDb

# TODO: Fill in the code below to implement the gradient descent step.
# The function should receive as inputs the data X, the labels y,
# the weights W (as an array), and the bias b.
# It should calculate the prediction, the gradients, and use them to
# update the weights and bias W, b. Then return W and b.
# The error e will be calculated and returned for you, for plotting purposes.
def gradientDescentStep(X, y, W, b, learn_rate = 0.01):
    y_hat = prediction(X,W,b)
    errors = error_vector(y, y_hat)
    derivErrors = dErrors(X, y, y_hat)
    W[0] += sum(derivErrors[0])*learn_rate
    W[1] += sum(derivErrors[1])*learn_rate
    b += sum(derivErrors[2])*learn_rate
    return W, b, sum(errors)

# This function runs the perceptron algorithm repeatedly on the dataset,
# and returns a few of the boundary lines obtained in the iterations,
# for plotting purposes.
# Feel free to play with the learning rate and the num_epochs,
# and see your results plotted below.
def trainLR(X, y, learn_rate = 0.01, num_epochs = 100):
    x_min, x_max = min(X.T[0]), max(X.T[0])
    y_min, y_max = min(X.T[1]), max(X.T[1])
    # Initialize the weights randomly
    W = np.array(np.random.rand(2,1))*2 -1
    b = np.random.rand(1)[0]*2 - 1
    # These are the solution lines that get plotted below.
    boundary_lines = []
    errors = []
    for i in range(num_epochs):
        # In each epoch, we apply the gradient descent step.
        W, b, error = gradientDescentStep(X, y, W, b, learn_rate)
        boundary_lines.append((-W[0]/W[1], -b/W[1]))
        errors.append(error)
    return boundary_lines, errors

# Neural Network Architecture

실생활에서는 Linear 모델이 아닌 경우가 많다. 퍼셉트론을 연결한 뉴런 네트워크로 Non-Linear Model을 구현할 수 있다. 

![dnn_9.png](images/dnn_9.png)
![dnn_10.png](images/dnn_10.png)

Input layer, Hidden layer, Output layer로 이루어진다. Output layer가 여러 개 있어, classification하는 모델도 만들 수 있다. 이 경우에는 soft-max를 활성함수로 주로 사용한다.

# Keras

### Sequential Model

In [7]:
from keras.models import Sequential

model = Sequential() #선형 model 생성

### Layers

In [8]:
import numpy as np
from keras.models import Sequential
from keras.layers.core import Dense, Activation

# X has shape (num_rows, num_cols), where the training data are stored
# as row vectors
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=np.float32) #인풋

# y must have an output vector for each input vector
y = np.array([[0], [0], [0], [1]], dtype=np.float32) #정답레이블

# Create the Sequential model
model = Sequential() #Sequential 모델 생성

# 1st Layer - Add an input layer of 32 nodes with the same input shape as
# the training samples in X
model.add(Dense(32, input_dim=X.shape[1])) #첫 레이어에는 input shape를 지정해 줘야 하지만 다른 레이어에서는 알아서 추정한다.
#입력으로 2개의 요소를 가진 벡터를 받는 32개의 노드 생성 #히든 레이어

# Add a softmax activation layer
model.add(Activation('softmax')) #활성 함수로 소프트 맥스
#model.add(Dense(128, activation="softmax")) 이런식으로 한 번에 추가해 줄 수도 있다.

# 2nd Layer - Add a fully connected output layer
model.add(Dense(1)) #아웃풋 레이어. 노드가 1개

# Add a sigmoid activation layer
model.add(Activation('sigmoid')) #분류에서 아웃풋의 활성화함수는 시그모이드로 하는 것이 일반적이다.

# model.compile(loss="categorical_crossentropy", optimizer="adam", metrics = ["accuracy"])
# #모델 빌드 후 실행 전에 컴파일 해야 한다. 백엔드(tensorflow 등)를 호출하고 최적화, 손실 함수 등의 매개변수를 입력할 수 있다.
# #손실함수로 categorical_crossentropy : 클래스가 두 개인 경우에만 사용
# #최적화 함수 : adam #모델 평가 측정 항목 : 정확도

# model.summary() #모델의 결과를 볼 수 있다.

# model.fit(X, y, nb_epoch=1000, verbose=0) #모델 학습

# model.evaluate() #모델 평가

In [11]:
#XOR 예제

import numpy as np
from keras.utils import np_utils
import tensorflow as tf
# Using TensorFlow 1.0.0; use tf.python_io in later versions
tf.python.control_flow_ops = tf

# Set random seed
np.random.seed(42)

# Our data
X = np.array([[0,0],[0,1],[1,0],[1,1]]).astype('float32')
y = np.array([[0],[1],[1],[0]]).astype('float32')

# Initial Setup for Keras
from keras.models import Sequential
from keras.layers.core import Dense, Activation

# Building the model
xor = Sequential()

# Add required layers
xor.add(Dense(8, input_dim=X.shape[1]))
xor.add(Activation("tanh"))
xor.add(Dense(1))
xor.add(Activation("sigmoid"))

# Specify loss as "binary_crossentropy", optimizer as "adam",
# and add the accuracy metric
xor.compile(loss="binary_crossentropy", optimizer="adam", metrics = ['accuracy'])

# Uncomment this line to print the model architecture
# xor.summary()

# Fitting the model
history = xor.fit(X, y, nb_epoch=50, verbose=0) #nb_epoch이 늘어날 수록 시행 수가 늘어나 정확도 증가

# Scoring the model
score = xor.evaluate(X, y)
print("\nAccuracy: ", score[-1])

# Checking the predictions
print("\nPredictions:")
print(xor.predict_proba(X))




Accuracy:  0.5

Predictions:
[[ 0.51922798]
 [ 0.54183304]
 [ 0.46536183]
 [ 0.47800067]]


## Batch vs Stochastic Gradient Descent

배치를 나눠서 업데이트를 반복한다. 예를 들어 20개의 데이터가 있으면 4개씩 5배치로 나눠서 시행

## Overfitting and Underfitting

![dnn_11.png](images/dnn_11.png)

## Early Stopping

![dnn_12.png](images/dnn_12.png)

## Dropout

epoch마다 몇 개의 특정 노드를 제외하고 결과를 도출한다. 단, 모델 트레이닝이 아닌 실제 예측에서는 모든 노드를 포함해야 한다.

In [None]:
model.add(Dense(32, activation='sigmoid'))
model.add(Dropout(0.2)) #드롭 아웃 추가