# Neural Networks

* Classification & Regression
* 대표적인 Black-box 모델
  + cf) Decision Tree
* Brain-like
  + Network of neurons
  + Neuron들 사이의 연결강도(weight)에 의해 모델이 결정됨
  + 어렵고 복잡한 문제도 잘 푼다 
      - 복잡한 문제 &rarr; 더 많은 neuron &rarr; 더 많은 계산비용

## Perceptron
* Neural Network을 구성하는 기본 단위
* Simple, single neuron (뇌세포 하나에 해당)
* $y = f(\sum x_iw_i$)
    + $f$ : Step function
    + $x$ : Inputs
    + $w$ : weights ($w_0$ : weight for bias) 

![Perceptron](perceptron.PNG)

In [1]:
from collections import Counter
from functools import partial
import math, random
import matplotlib
import matplotlib.pyplot as plt

import sys
sys.path.append('../common')
from linear_algebra import dot

In [2]:
def step_function(x):
    return 1 if x >= 0 else 0

def perceptron_output(weights, bias, x):
    """returns 1 if the perceptron 'fires', 0 if not"""
    return step_function(dot(weights, x) + bias)

## Perceptron으로 간단한 문제 풀기
* Logic Gates
 
|x|y|AND(x,y)|OR(x,y)|NOT(x)|XOR(x,y)|
|---|---|---|---|---|---|
|0|0|0|0|1|0|
|0|1|0|1|1|1|
|1|0|0|1|0|1|
|1|1|1|1|0|0|

* AND, OR, NOT Gate를 Perceptron으로 풀기
* 문제풀기 = Weight 찾기

<img src="simple_gates.png" alt="Simple Gates" style="width:600px;" align="left"/>

In [3]:
# test AND gate
inputs = [[0,0],[0,1],[1,0], [1,1]]
for i in inputs:
    print(i, perceptron_output([2,2], -3, i))

[0, 0] 0
[0, 1] 0
[1, 0] 0
[1, 1] 1


## XOR gate?
* (0,1), (1,0) &rarr; 1
* (0,0), (1,1) &rarr; 0

생각처럼 쉽지 않음
* 2차원 평면 좌표계 : XOR 값을 구별할 수 있는 하나의 직선(분할기준)을 구할 수 없다
* 즉, 하나의 Perceptron만으로는 XOR 값을 구별할 수 없다
* 해석적으로 풀 수는 있지만...
    + XOR(x,y) = OR(x,y) AND (NOT(AND(x,y)))
![XOR Gate](xor_gate_naive.png)


## Feed-Forward Neural Network
* 좀 더 체계적인 방법, 여러 개의 Perceptron을 연결하는 일반화된 구조
* Multi-layer Perceptron (Layered Network of neurons, "Neural Network")
    + Input Layer
    + Hidden Layers (여러개)
    + Output Layer
* Activation function : Sigmoid

![Feed-Forward Neural Network](feed_forward_nn.png)


In [4]:
def sigmoid(t):
    return 1 / (1 + math.exp(-t))

def neuron_output(weights, inputs):
    return sigmoid(dot(weights, inputs))

![Step Function vs. Sigmoid Function](step_vs_sigmoid.png)

In [5]:
def feed_forward(neural_network, input_vector):
    """takes in a neural network (represented as a list of lists of lists of weights)
    and returns the output from forward-propagating the input"""

    outputs = []

    for layer in neural_network:

        input_with_bias = input_vector + [1]             # add a bias input
        output = [neuron_output(neuron, input_with_bias) # compute the output
                  for neuron in layer]                   # for this layer
        outputs.append(output)                           # and remember it

        # the input to the next layer is the output of this one
        input_vector = output

    return outputs

### Feed-Forward Network 구성 - XOR Gate
* Input Layer : Neuron 2개(입력 데이터 Feature 수 만큼) + bias
* Hidden Layer : Neuron 2개
    + 왜 2개인지,  Weight를 어떻게 찾았는지는 생략 (찾았다 치고!)
* Output Layer : Neuron 1개 (0 또는 1의 값을 얻기 위함)

#### Feed-Forward
* Weight를 찾은 상태에서 결과값을 얻는 처리 과정

![Neural Network for XOR](xor_network.png)

In [9]:
xor_network = [ # hidden layer
            [[20, 20, -30], # 'and' neuron
             [20, 20, -10]], # 'or' neuron
            # output layer
            [[-60, 60, -30]]] # '2nd input but not 1st input' neuron

for i in inputs:
    print(i, feed_forward(xor_network, i))

[0, 0] [[9.357622968839299e-14, 4.5397868702434395e-05], [9.38314668300676e-14]]
[0, 1] [[4.5397868702434395e-05, 0.9999546021312976], [0.9999999999999059]]
[1, 0] [[4.5397868702434395e-05, 0.9999546021312976], [0.9999999999999059]]
[1, 1] [[0.9999546021312976, 0.9999999999999065], [9.383146683006828e-14]]


## Hidden Layer의 역할
* "처리 단계"와 같은 효과
    + 단계를 거치면서 경우의 수를 줄여주는 효과
* 복잡한 문제일수록 &rarr; 많은 Hidden Layer, 많은 Neuron &rarr; Weight를 어떻게 찾을까?
    + 간단한 XOR Gate 문제도 Weight 찾기가 어려운데...

![Hidden Layer의 역할](hidden_layer_effect.png)

## Backpropagation
Weight를 구하는 기계학습 방법
* Weight : 학습된 모델
* Weight 구하기 : 기계학습 과정 (반복하면서, 실측치와 예측치를 비교하면서, Gradient Descent를 이용하여 조정)

<ol>
<li> Run feed_forward on an input vector to produce the outputs of all the neurons inthe network.
<li> This results in an error for each output neuron—the difference between its output and its target.
<li> Compute the gradient of this error as a function of the neuron’s weights, and adjust its weights in the direction that most decreases the error.
<li> “Propagate” these output errors backward to infer errors for the hidden layer.
<li> Compute the gradients of these errors and adjust the hidden layer’s weights in the same manner.
</ol>

<img align="left"  src="backprop.png">

### Example
<img align="left"  src="backprop_ex.png">

In [10]:
def backpropagate(network, input_vector, target):

    hidden_outputs, outputs = feed_forward(network, input_vector)

    # the output * (1 - output) is from the derivative of sigmoid
    output_deltas = [output * (1 - output) * (output - target[i])
                     for i, output in enumerate(outputs)]

    # adjust weights for output layer (network[-1])
    for i, output_neuron in enumerate(network[-1]):
        for j, hidden_output in enumerate(hidden_outputs + [1]):
            output_neuron[j] -= output_deltas[i] * hidden_output

    # back-propagate errors to hidden layer
    hidden_deltas = [hidden_output * (1 - hidden_output) *
                      dot(output_deltas, [n[i] for n in network[-1]])
                     for i, hidden_output in enumerate(hidden_outputs)]

    # adjust weights for hidden layer (network[0])
    for i, hidden_neuron in enumerate(network[0]):
        for j, input in enumerate(input_vector + [1]):
            hidden_neuron[j] -= hidden_deltas[i] * input
            

In [11]:
def predict(input):
    return feed_forward(network, input)[-1]

## 예 - 난이도 下) XOR 다시 풀어보기

In [13]:
inputs = [[0,0],[0,1],[1,0], [1,1]]
targets = [[0], [1], [1], [0]]

# network 초기화 (Random weights)
hidden_layer = [[random.random(),random.random(),random.random()],[random.random(),random.random(),random.random()]] # 2 neurons + bias
output_layer = [[random.random(),random.random(), random.random()]] # 1 output neuron + bias 
network = [hidden_layer, output_layer]

for __ in range(10000):
    for input_vector, target_vector in zip(inputs, targets):
        backpropagate(network, input_vector, target_vector)

for i, input in enumerate(inputs):
    outputs = predict(input)
    print(input, [round(p,2) for p in outputs])

[0, 0] [0.01]
[0, 1] [0.99]
[1, 0] [0.99]
[1, 1] [0.01]


## 예 - 난이도 中) 이 일은 이, 이 이는 사, 이 삼은 육, ...

In [20]:
input_size = 2  # [2,0], [2,1], [2,2], ...
num_hidden = 5   # we'll have 5 neurons in the hidden layer
output_size = 20 # 0~ 29

# each hidden neuron has one weight per input, plus a bias weight
hidden_layer = [[random.random() for __ in range(input_size + 1)]
                for __ in range(num_hidden)]

# each output neuron has one weight per hidden neuron, plus a bias weight
output_layer = [[random.random() for __ in range(num_hidden + 1)]
                for __ in range(output_size)]

# the network starts out with random weights
network = [hidden_layer, output_layer]

# [2,0], [2,1], [2,2], ...
inputs = [[2,i] for i in range(10)]

# [2,0] ==> [1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
# [2,1] ==> [0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
targets = [[1 if i == j*2 else 0 for i in range(20)] for j in range(10)]


for __ in range(10000):
    for input_vector, target_vector in zip(inputs, targets):
        backpropagate(network, input_vector, target_vector)

for i, input in enumerate(inputs):
    outputs = predict(input)
    print(input, [round(p,2) for p in outputs])

[2, 0] [0.99, 0.01, 0.04, 0.01, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.01, 0.0, 0.0, 0.0, 0.0]
[2, 1] [0.02, 0.01, 0.94, 0.01, 0.04, 0.01, 0.0, 0.01, 0.0, 0.01, 0.0, 0.01, 0.0, 0.01, 0.0, 0.0, 0.0, 0.01, 0.0, 0.01]
[2, 2] [0.0, 0.0, 0.04, 0.0, 0.94, 0.0, 0.03, 0.0, 0.01, 0.0, 0.01, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
[2, 3] [0.0, 0.0, 0.0, 0.0, 0.04, 0.0, 0.87, 0.0, 0.38, 0.0, 0.01, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
[2, 4] [0.0, 0.0, 0.0, 0.0, 0.01, 0.0, 0.04, 0.0, 0.33, 0.0, 0.04, 0.0, 0.0, 0.0, 0.01, 0.0, 0.0, 0.0, 0.0, 0.0]
[2, 5] [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.22, 0.0, 0.74, 0.0, 0.37, 0.0, 0.04, 0.0, 0.0, 0.0, 0.0, 0.0]
[2, 6] [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.13, 0.0, 0.62, 0.0, 0.38, 0.0, 0.05, 0.0, 0.0, 0.0, 0.0, 0.0]
[2, 7] [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.11, 0.0, 0.18, 0.0, 0.16, 0.0, 0.01, 0.0]
[2, 8] [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.09, 0.0, 0.22, 0.0, 0.2

In [28]:
print([round(p,2) for p in predict([2, 2.5])])

# [2, 2.1], [2, 2.7], .... (다른 입력값에 대해서도 테스트해 보자)

[0.0, 0.0, 0.0, 0.0, 0.07, 0.0, 0.81, 0.0, 0.3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


## 예 - 난이도 上) 숫자 이미지 인식
* 5x5 이미지 &rarr; 숫자
![CAPCHA](capcha.png)

In [29]:
    raw_digits = [
          """11111
             1...1
             1...1
             1...1
             11111""",

          """..1..
             ..1..
             ..1..
             ..1..
             ..1..""",

          """11111
             ....1
             11111
             1....
             11111""",

          """11111
             ....1
             11111
             ....1
             11111""",

          """1...1
             1...1
             11111
             ....1
             ....1""",

          """11111
             1....
             11111
             ....1
             11111""",

          """11111
             1....
             11111
             1...1
             11111""",

          """11111
             ....1
             ....1
             ....1
             ....1""",

          """11111
             1...1
             11111
             1...1
             11111""",

          """11111
             1...1
             11111
             ....1
             11111"""]


In [38]:
raw_digits[0]

'11111\n         1...1\n         1...1\n         1...1\n         11111'

### 간단한 전처리
* 이미지 &rarr; 벡터

In [39]:
def make_digit(raw_digit):
    return [1 if c == '1' else 0
            for row in raw_digit.split("\n")
            for c in row.strip()]

inputs = list(map(make_digit, raw_digits))
targets = [[1 if i == j else 0 for i in range(10)]
           for j in range(10)]

In [40]:
inputs[0]

[1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1]

In [41]:
targets

[[1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 1]]

In [42]:
# 예쁘게 그려보기
for i in range(5):
    at = i*5
    print(inputs[0][at:at+5])

[1, 1, 1, 1, 1]
[1, 0, 0, 0, 1]
[1, 0, 0, 0, 1]
[1, 0, 0, 0, 1]
[1, 1, 1, 1, 1]


### Neural Net 구성 &rarr; 학습시키기
* input_size : 25  (5x5)
* num_hidden : 5   (5 neurons)
* output_size : 10 (0~9)
![CAPCHA- Neural Net 구성](capcha_network.png)

In [43]:
random.seed(0)   # to get repeatable results
input_size = 25  # each input is a vector of length 25
num_hidden = 5   # we'll have 5 neurons in the hidden layer
output_size = 10 # we need 10 outputs for each input

# each hidden neuron has one weight per input, plus a bias weight
hidden_layer = [[random.random() for __ in range(input_size + 1)]
                for __ in range(num_hidden)]

# each output neuron has one weight per hidden neuron, plus a bias weight
output_layer = [[random.random() for __ in range(num_hidden + 1)]
                for __ in range(output_size)]

# the network starts out with random weights
network = [hidden_layer, output_layer]

In [44]:
for __ in range(10000):
    for input_vector, target_vector in zip(inputs, targets):
        backpropagate(network, input_vector, target_vector)

### 결과 확인 (1) - 원래 이미지

In [45]:
for i, input in enumerate(inputs):
    outputs = predict(input)
    print(i, [round(p,2) for p in outputs])

0 [0.96, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.02, 0.03, 0.0]
1 [0.0, 0.96, 0.03, 0.02, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
2 [0.0, 0.02, 0.96, 0.0, 0.0, 0.03, 0.0, 0.0, 0.0, 0.0]
3 [0.0, 0.03, 0.0, 0.97, 0.0, 0.0, 0.0, 0.02, 0.0, 0.03]
4 [0.0, 0.02, 0.0, 0.0, 0.99, 0.0, 0.0, 0.01, 0.0, 0.0]
5 [0.0, 0.0, 0.02, 0.0, 0.0, 0.96, 0.01, 0.0, 0.02, 0.02]
6 [0.0, 0.0, 0.01, 0.0, 0.01, 0.01, 0.99, 0.0, 0.0, 0.0]
7 [0.02, 0.0, 0.0, 0.02, 0.0, 0.0, 0.0, 0.97, 0.0, 0.0]
8 [0.03, 0.0, 0.0, 0.0, 0.0, 0.02, 0.0, 0.0, 0.96, 0.03]
9 [0.0, 0.0, 0.0, 0.01, 0.0, 0.02, 0.0, 0.0, 0.03, 0.95]


### 결과 확인 (2) - 훼손된 이미지, 필기체
![훼손된 이미지](ugly_image.png)

In [46]:
print([round(x, 2) for x in
      predict(  [0,1,1,1,0,    # .@@@.
                 0,0,0,1,1,    # ...@@
                 0,0,1,1,0,    # ..@@.
                 0,0,0,1,1,    # ...@@
                 0,1,1,1,0])]) # .@@@.

[0.0, 0.0, 0.0, 0.94, 0.0, 0.0, 0.0, 0.01, 0.0, 0.13]


In [47]:
print([round(x, 2) for x in
      predict(  [0,1,1,1,0,    # .@@@.
                 1,0,0,1,1,    # @..@@
                 0,1,1,1,0,    # .@@@.
                 1,0,0,1,1,    # @..@@
                 0,1,1,1,0])]) # .@@@.

[0.0, 0.0, 0.0, 0.0, 0.0, 0.59, 0.0, 0.0, 0.95, 1.0]


## 딥 러닝 (Deep Learning)
* Deep Neural Networks (DNN)
* Input Size가 크다, Output Size도 크다, 여러 층의 Hidden Layer, 많은 수의 Neuron
* Large network &rarr; 많은 컴퓨팅 자원 필요
* H/W 발전 + 컴퓨팅 기술 발전 + ML 기술 발전 &rarr; 딥 러닝 가능 &rarr; 복잡하고 어려운 문제도 풀 수 있게 됨
* 여러 가지 발전된 모델들
    + Convolutional Deep Neural Networks(CDNN)
    + Recurrent Neural Networks (RNN)
    + Deep Belief Networks (DBN)
* 좋은 구현 도구들
    + TensorFlow, Keras