# Neural Networks

* Classification & Regression
* 대표적인 Black-box 모델
  + cf) Decision Tree
* Brain-like
  + Network of neurons
  + Neuron들 사이의 연결강도(weight)에 의해 모델이 결정됨
  + 어렵고 복잡한 문제도 잘 푼다 
      - 복잡한 문제 &rarr; 더 많은 neuron &rarr; 더 많은 계산비용

## Perceptron
* Neural Network을 구성하는 기본 단위
* Simple, single neuron (뇌세포 하나에 해당)
* $y = f(\sum x_iw_i$)
    + $f$ : Step function
    + $x$ : Inputs
    + $w$ : weights ($w_0$ : weight for bias) 

![Perceptron](images/perceptron.PNG)

In [2]:
import sys
sys.path.append('..')

from scratch.linear_algebra import Vector, dot

In [3]:
def step_function(x: float) -> float:
    return 1.0 if x >= 0 else 0.0

def perceptron_output(weights: Vector, bias: float, x: Vector) -> float:
    """Returns 1 if the perceptron 'fires', 0 if not"""
    calculation = dot(weights, x) + bias
    return step_function(calculation)

## Perceptron으로 간단한 문제 풀기
* Logic Gates
 
|x|y|AND(x,y)|OR(x,y)|NOT(x)|XOR(x,y)|
|---|---|---|---|---|---|
|0|0|0|0|1|0|
|0|1|0|1|1|1|
|1|0|0|1|0|1|
|1|1|1|1|0|0|

* AND, OR, NOT Gate를 Perceptron으로 풀기
* 문제풀기 = Weight 찾기

<img src="images/simple_gates.png" alt="Simple Gates" style="width:600px;" align="left"/>

In [4]:
and_weights = [2., 2]
and_bias = -3.

assert perceptron_output(and_weights, and_bias, [1, 1]) == 1
assert perceptron_output(and_weights, and_bias, [0, 1]) == 0
assert perceptron_output(and_weights, and_bias, [1, 0]) == 0
assert perceptron_output(and_weights, and_bias, [0, 0]) == 0

or_weights = [2., 2]
or_bias = -1.

assert perceptron_output(or_weights, or_bias, [1, 1]) == 1
assert perceptron_output(or_weights, or_bias, [0, 1]) == 1
assert perceptron_output(or_weights, or_bias, [1, 0]) == 1
assert perceptron_output(or_weights, or_bias, [0, 0]) == 0

not_weights = [-2.]
not_bias = 1.

assert perceptron_output(not_weights, not_bias, [0]) == 1
assert perceptron_output(not_weights, not_bias, [1]) == 0

## XOR gate?
* (0,1), (1,0) &rarr; 1
* (0,0), (1,1) &rarr; 0

생각처럼 쉽지 않음
* 2차원 평면 좌표계 : XOR 값을 구별할 수 있는 하나의 직선(분할기준)을 구할 수 없다
* 즉, 하나의 Perceptron만으로는 XOR 값을 구별할 수 없다
* 해석적으로 풀 수는 있지만...
    + XOR(x,y) = OR(x,y) AND (NOT(AND(x,y)))

![XOR Gate](images/xor_gate_naive.png)


## Feed-Forward Neural Network
* 여러 개의 Perceptron을 연결하는 체계적인 모형
* Multi-layer Perceptron (Layered Network of Neurons, "Neural Network")
  + Input Layer
  + Hidden Layers (여러개)
  + Output Layer
  + 각 Layer는 여러 개의 Perceptron으로 구성
* Activation function : Sigmoid
  + S자 모양의 함수
  + $f(x)={1\over{(1+\exp^{-x})}}$
  + 중간영역에서는 Linear하게 증감하지만, 큰 값이나 작은 값으로 갈수록 특정 값으로 수렴하는 특징을 가진다
    - 모델이 예측하는 값이 크면 클수록 1이 나올 가능성이 높고, 작아질수록 0이 나올 가능성이 높아지도록 변환시킬 수 있다
  + Wiki: [시그모이드 함수](https://ko.wikipedia.org/wiki/%EC%8B%9C%EA%B7%B8%EB%AA%A8%EC%9D%B4%EB%93%9C_%ED%95%A8%EC%88%98)

![Feed-Forward Neural Network](images/feed_forward_nn.png)


In [5]:
import math

def sigmoid(t: float) -> float:
    return 1 / (1 + math.exp(-t))

def neuron_output(weights: Vector, inputs: Vector) -> float:
    # weights includes the bias term, inputs includes a 1
    return sigmoid(dot(weights, inputs))

![Step Function vs. Sigmoid Function](images/step_vs_sigmoid.png)

In [6]:
from typing import List

def feed_forward(neural_network: List[List[Vector]],
                 input_vector: Vector) -> List[Vector]:
    """
    Feeds the input vector through the neural network.
    Returns the outputs of all layers (not just the last one).
    """
    outputs: List[Vector] = []

    for layer in neural_network:
        input_with_bias = input_vector + [1]              # Add a constant.
        output = [neuron_output(neuron, input_with_bias)  # Compute the output
                  for neuron in layer]                    # for each neuron.
        outputs.append(output)                            # Add to results.

        # Then the input to the next layer is the output of this one
        input_vector = output

    return outputs

### Feed-Forward Network 구성 - XOR Gate
* Input Layer : Neuron 2개(입력 데이터 Feature 수 만큼) + bias
* Hidden Layer : Neuron 2개
    + 왜 2개인지,  Weight를 어떻게 찾았는지는 생략 (찾았다 치고!)
* Output Layer : Neuron 1개 (0 또는 1의 값을 얻기 위함)

#### Feed-Forward
* Weight를 찾은 상태에서 결과값을 얻는 처리 과정

![Neural Network for XOR](images/xor_network.png)

In [7]:
xor_network = [ # hidden layer
            [[20, 20, -30], # 'and' neuron
             [20, 20, -10]], # 'or' neuron
            # output layer
            [[-60, 60, -30]]] # '2nd input but not 1st input' neuron

# feed_forward returns the outputs of all layers, so the [-1] gets the
# final output, and the [0] gets the value out of the resulting vector
assert 0.000 < feed_forward(xor_network, [0, 0])[-1][0] < 0.001
assert 0.999 < feed_forward(xor_network, [1, 0])[-1][0] < 1.000
assert 0.999 < feed_forward(xor_network, [0, 1])[-1][0] < 1.000
assert 0.000 < feed_forward(xor_network, [1, 1])[-1][0] < 0.001

inputs = [[0, 0], [0, 1], [1, 0], [1, 1]]
for input in inputs:
    print( input, round(feed_forward(xor_network, input)[-1][0],2) )

[0, 0] 0.0
[0, 1] 1.0
[1, 0] 1.0
[1, 1] 0.0


### Hidden Layer의 역할
* "처리 단계"와 같은 효과
    + 단계를 거치면서 경우의 수를 줄여주는 효과
* 복잡한 문제일수록 &rarr; 많은 Hidden Layer, 많은 Neuron &rarr; Weight를 어떻게 찾을까?
    + 간단한 XOR Gate 문제도 Weight 찾기가 어려운데...

![Hidden Layer의 역할](images/hidden_layer_effect.png)

## Backpropagation
* Neural Network의 Weight를 구하는(최적화하는) 기계학습 방법
* Weight : 학습된 모델
* Weight 구하기 : 기계학습 과정 (반복하면서, 실측치와 예측치를 비교하면서, Gradient Descent를 이용하여 조정)

<ol>
<li> Run feed_forward on an input vector to produce the outputs of all the neurons in the network.
<li> We know the target output, so we can compute a <i>loss</i> that's the sum of the squared errors.
<li> Compute the gradient of this error as a function of the neuron’s weights.
<li> “Propagate” the gradients and errors backward to compute the gradients with respect to the hidden neurons' weight.
<li> Take a gradient descent step.
</ol>

<img align="left"  src="images/backprop.png">

### Example
<img align="left"  src="images/backprop_ex.png">

In [8]:
def sqerror_gradients(network: List[List[Vector]],
                      input_vector: Vector,
                      target_vector: Vector) -> List[List[Vector]]:
    """
    Given a neural network, an input vector, and a target vector,
    make a prediction and compute the gradient of the squared error
    loss with respect to the neuron weights.
    """
    # forward pass
    hidden_outputs, outputs = feed_forward(network, input_vector)

    # gradients with respect to output neuron pre-activation outputs
    output_deltas = [output * (1 - output) * (output - target)
                     for output, target in zip(outputs, target_vector)]

    # gradients with respect to output neuron weights
    output_grads = [[output_deltas[i] * hidden_output
                     for hidden_output in hidden_outputs + [1]]
                    for i, output_neuron in enumerate(network[-1])]

    # gradients with respect to hidden neuron pre-activation outputs
    hidden_deltas = [hidden_output * (1 - hidden_output) *
                         dot(output_deltas, [n[i] for n in network[-1]])
                     for i, hidden_output in enumerate(hidden_outputs)]

    # gradients with respect to hidden neuron weights
    hidden_grads = [[hidden_deltas[i] * input for input in input_vector + [1]]
                    for i, hidden_neuron in enumerate(network[0])]

    return [hidden_grads, output_grads]

### XOR - Neural Network으로 풀기

In [117]:
import random
random.seed(0)

# training data
xs = [[0., 0], [0., 1], [1., 0], [1., 1]]
ys = [[0.], [1.], [1.], [0.]]

# start with random weights
network = [ # hidden layer: 2 inputs -> 2 outputs
            [[random.random() for _ in range(2 + 1)],   # 1st hidden neuron
             [random.random() for _ in range(2 + 1)]],  # 2nd hidden neuron
            # output layer: 2 inputs -> 1 output
            [[random.random() for _ in range(2 + 1)]]   # 1st output neuron
          ]

from scratch.gradient_descent import gradient_step
import tqdm

learning_rate = 1.0

for epoch in tqdm.trange(20000, desc="neural net for xor"):
    for x, y in zip(xs, ys):
        gradients = sqerror_gradients(network, x, y)

        # Take a gradient step for each neuron in each layer
        network = [[gradient_step(neuron, grad, -learning_rate)
                    for neuron, grad in zip(layer, layer_grad)]
                   for layer, layer_grad in zip(network, gradients)]

# check that it learned XOR
assert feed_forward(network, [0, 0])[-1][0] < 0.01
assert feed_forward(network, [0, 1])[-1][0] > 0.99
assert feed_forward(network, [1, 0])[-1][0] > 0.99
assert feed_forward(network, [1, 1])[-1][0] < 0.01


neural net for xor: 100%|██████████████████████████████████████████████████████| 20000/20000 [00:02<00:00, 6941.32it/s]


In [118]:
def predict(network, input):
    return feed_forward(network, input)[-1]

In [119]:
for input in xs:
    print( input, round(predict(xor_network, input)[0],2) )

[0.0, 0] 0.0
[0.0, 1] 1.0
[1.0, 0] 1.0
[1.0, 1] 0.0


# Example: Fizz Buzz
* 3의 배수: "fizz"
* 5의 배수: "buzz"
* 15의 배수: "fizzbuzz"
* 모형 설계
  + 입력: Binary Encoding된 벡터
    - 예) 1 &rarr; [1, 1, 0, 0, 0, 0, 0, 0,  0,  0] (거꾸로 배열된 2진수)
    - "거꾸로 배열된 2진수"라도 상관없음 (각 숫자를 고유의 이미지 벡터로 표현한 것일 뿐)
  + 출력: FizzBuzz Encoding된 벡터
    - 예) "fizz" &rarr; [0, 1, 0, 0]
    - 예) "buzz" &rarr; [0, 0, 1, 0]

In [12]:
def fizz_buzz_encode(x: int) -> Vector:
    if x % 15 == 0:
        return [0, 0, 0, 1]
    elif x % 5 == 0:
        return [0, 0, 1, 0]
    elif x % 3 == 0:
        return [0, 1, 0, 0]
    else:
        return [1, 0, 0, 0]

assert fizz_buzz_encode(2) == [1, 0, 0, 0]
assert fizz_buzz_encode(6) == [0, 1, 0, 0]
assert fizz_buzz_encode(10) == [0, 0, 1, 0]
assert fizz_buzz_encode(30) == [0, 0, 0, 1]

In [13]:
def binary_encode(x: int) -> Vector:
    binary: List[float] = []

    for i in range(10):
        binary.append(x % 2)
        x = x // 2

    return binary

#                             1  2  4  8 16 32 64 128 256 512
assert binary_encode(0)   == [0, 0, 0, 0, 0, 0, 0, 0,  0,  0]
assert binary_encode(1)   == [1, 0, 0, 0, 0, 0, 0, 0,  0,  0]
assert binary_encode(10)  == [0, 1, 0, 1, 0, 0, 0, 0,  0,  0]
assert binary_encode(101) == [1, 0, 1, 0, 0, 1, 1, 0,  0,  0]
assert binary_encode(999) == [1, 1, 1, 0, 0, 1, 1, 1,  1,  1]

### 학습용 데이터 만들기

In [14]:
xs = [binary_encode(n) for n in range(101, 1024)]
ys = [fizz_buzz_encode(n) for n in range(101, 1024)]

### 학습시키기
* 101 ~ 1023 까지의 숫자(입력)와 해당 숫자의 FizzBuzz 값(출력/정답)

In [15]:
NUM_HIDDEN = 25

network = [
    # hidden layer: 10 inputs -> NUM_HIDDEN outputs
    [[random.random() for _ in range(10 + 1)] for _ in range(NUM_HIDDEN)],

    # output_layer: NUM_HIDDEN inputs -> 4 outputs
    [[random.random() for _ in range(NUM_HIDDEN + 1)] for _ in range(4)]
]

from scratch.linear_algebra import squared_distance

learning_rate = 1.0

with tqdm.trange(500) as t:
    for epoch in t:
        epoch_loss = 0.0

        for x, y in zip(xs, ys):
            predicted = feed_forward(network, x)[-1]
            epoch_loss += squared_distance(predicted, y)
            gradients = sqerror_gradients(network, x, y)

            # Take a gradient step for each neuron in each layer
            network = [[gradient_step(neuron, grad, -learning_rate)
                        for neuron, grad in zip(layer, layer_grad)]
                    for layer, layer_grad in zip(network, gradients)]

        t.set_description(f"fizz buzz (loss: {epoch_loss:.2f})")

fizz buzz (loss: 29.53): 100%|███████████████████████████████████████████████████████| 500/500 [05:26<00:00,  1.53it/s]


### 예측하기
* 1~100까지의 숫자에 대해 FizzBuzz 예측
  + 학습할 때 사용하지 않은 숫자에 대해 얼마나 정확하게 예측하는가?

In [16]:
def argmax(xs: list) -> int:
    """Returns the index of the largest value"""
    return max(range(len(xs)), key=lambda i: xs[i])

assert argmax([0, -1]) == 0               # items[0] is largest
assert argmax([-1, 0]) == 1               # items[1] is largest
assert argmax([-1, 10, 5, 20, -3]) == 3   # items[3] is largest

In [17]:
num_correct = 0

for n in range(1, 101):
    x = binary_encode(n)
    predicted = argmax(feed_forward(network, x)[-1])
    actual = argmax(fizz_buzz_encode(n))
    labels = [str(n), "fizz", "buzz", "fizzbuzz"]
    print(n, labels[predicted], labels[actual])

    if predicted == actual:
        num_correct += 1

print(num_correct, "/", 100)

1 1 1
2 2 2
3 fizz fizz
4 4 4
5 buzz buzz
6 fizz fizz
7 7 7
8 8 8
9 fizz fizz
10 buzz buzz
11 11 11
12 fizz fizz
13 13 13
14 14 14
15 fizzbuzz fizzbuzz
16 16 16
17 17 17
18 fizz fizz
19 19 19
20 20 buzz
21 fizz fizz
22 22 22
23 23 23
24 fizz fizz
25 buzz buzz
26 26 26
27 fizz fizz
28 28 28
29 29 29
30 fizzbuzz fizzbuzz
31 31 31
32 32 32
33 fizz fizz
34 34 34
35 buzz buzz
36 fizz fizz
37 37 37
38 38 38
39 fizz fizz
40 buzz buzz
41 41 41
42 fizz fizz
43 43 43
44 44 44
45 fizzbuzz fizzbuzz
46 46 46
47 47 47
48 fizz fizz
49 49 49
50 buzz buzz
51 fizz fizz
52 52 52
53 53 53
54 fizz fizz
55 buzz buzz
56 56 56
57 fizz fizz
58 58 58
59 59 59
60 fizzbuzz fizzbuzz
61 61 61
62 62 62
63 fizz fizz
64 64 64
65 buzz buzz
66 fizz fizz
67 67 67
68 68 68
69 fizz fizz
70 buzz buzz
71 71 71
72 fizz fizz
73 73 73
74 74 74
75 fizzbuzz fizzbuzz
76 76 76
77 77 77
78 fizz fizz
79 79 79
80 80 buzz
81 fizz fizz
82 82 82
83 83 83
84 fizz fizz
85 fizz buzz
86 86 86
87 fizz fizz
88 88 88
89 89 89
90 fizzbuzz fizzbu

## 딥 러닝 (Deep Learning)
* Deep Neural Networks (DNN)
* Input Size가 크다, Output Size도 크다, 여러 층의 Hidden Layer, 많은 수의 Neuron
* Large network &rarr; 많은 컴퓨팅 자원 필요
* H/W 발전 + 컴퓨팅 기술 발전 + ML 기술 발전 &rarr; 딥 러닝 가능 &rarr; 복잡하고 어려운 문제도 풀 수 있게 됨
* 여러 가지 발전된 모델들
    + Convolutional Deep Neural Networks(CDNN)
    + Recurrent Neural Networks (RNN)
    + Deep Belief Networks (DBN)
* 좋은 구현 도구들
    + TensorFlow, Keras