# Theano을 활용한 딥러닝 실습


- 딥바이오 스터디에서 진행한 내용으로 작성되었습니다.
- https://github.com/biospin/DeepBio



## 01.준비 
- [Theano 설치 방법](https://github.com/biospin/DeepBio/blob/master/part01/Week1_151103/BioPython%EA%B3%BCTheano%EC%84%A4%EC%B9%98.txt)
- [우분투 14.04에 NVIDA CUDA 드라이버 설치](https://github.com/biospin/DeepBio/blob/master/reference/00_cuda_install.ipynb)
- [Theano Tuturial](http://www.deeplearning.net/tutorial/)
- [Theano 기초](http://deeplearning.net/software/theano/tutorial/)
- [파이션 기초 - 정보교육을 위한 파이썬](http://python.xwmooc.org/)
- [Tensor에 대한 기본 개념](https://github.com/biospin/DeepBio/tree/master/part03/Week4_160126)

## 02. 기본적인 대수표현

- Theano에서는 1차원의 스칼라값부터 다차원의 데이터를 tensor 표현으로 다룸
- 수리적인 연산은 function 이라는 명령어로 정의하여 처리하고, function을 정의하면 Theano내부에서 해당 연산을 하는 C 코드를 컴파일함.
- function()에서 
    - 첫번째 인자로 변수 리스트를 받으면 연산의 입력값의 타입을 정의함.
    - 두번째 인자는 연산식(결과값)을 정의함.

In [1]:
import theano.tensor as T

In [2]:
from theano import function

In [3]:
x = T.dscalar('x')

In [4]:
y = T.dscalar('y')

In [5]:
z = x + y

In [6]:
# 이 과정에서 C코드 컴파일작업이 일어남.
f = function([x, y], z)

In [7]:
f(2, 3)

array(5.0)

In [8]:
f(16.3, 12.1)

array(28.4)

### 두개의 행렬에 대한 연산

In [9]:
x = T.dmatrix('x')

In [10]:
y = T.dmatrix('y')

In [11]:
z = x + y

In [12]:
f = function([x, y], z)

In [13]:
f([[1, 2], [3, 4]], [[10, 20], [30, 40]])

array([[ 11.,  22.],
       [ 33.,  44.]])

### Logistic Function

\\( s(x) = {1} / (1 + e^{-x})  \\)

![이미지](http://deeplearning.net/software/theano/_images/logistic.png)

In [14]:
import theano
import theano.tensor as T
x = T.dmatrix('x')
s = 1 / (1 + T.exp(-x))

In [15]:
logistic = theano.function([x], s)

In [16]:
logistic([[0, 1], [-1, -2]])

array([[ 0.5       ,  0.73105858],
       [ 0.26894142,  0.11920292]])

- \\( s(x) = {1} / (1 + e^{-x}) = (1 + tanh(x/2)) / {2} \\)

In [17]:
s2 = (1 + T.tanh(x / 2)) / 2
logistic2 = theano.function([x], s2)
logistic2([[0, 1], [-1, -2]])

array([[ 0.5       ,  0.73105858],
       [ 0.26894142,  0.11920292]])

### 공유변수( Shared Variables )

In [18]:
from theano import shared
state = shared(0)
inc = T.iscalar('inc')
accumulator = function([inc], state, updates=[(state, state+inc)])

- 2가지 새로운 개념이 포함됨
- 1) shared() function
    - shared variables(공유변수)을 생성함수
    - (**중요**) 데이터를 공유하기 위한 변수이고, CPU와 GPU에서 데이터를 공유를 하기 위해서 사용됨.
- 2) function의 updates 인자
    - updates는 (shared변수, 표현식) 쌍을 이루는 형식들의 리스트를 갖음. 
    - state = state + inc와 같음.
    - (**중요**) 가중치(W)와 biase(b )을 업데이트하기 위해서 주로 사용됨.

In [19]:
state.get_value()

array(0)

In [20]:
accumulator(1)

array(0)

In [21]:
state.get_value()

array(1)

In [22]:
accumulator(300)

array(1)

In [23]:
state.get_value()

array(301)

## 03. A Real Example: Logistic Regression

- Linear Classification : http://vision.stanford.edu/teaching/cs231n/slides/lecture3.pdf
- Gradient_descent Wiki : https://en.wikipedia.org/wiki/Gradient_descent
- Gradient_descent : http://vision.stanford.edu/teaching/cs231n/slides/lecture4.pdf
- Cross entropy : https://en.wikipedia.org/wiki/Cross_entropy
- (**중요**) 하나의 Hidden node에 대해서 학습방법과 거의 동일

In [25]:
import numpy
import theano
import theano.tensor as T
rng = numpy.random

N = 400
feats = 784  # 28 * 28 
D = (rng.randn(N, feats), rng.randint(size=N, low=0, high=2))
training_steps = 100

# Declare Theano symbolic variables
x = T.matrix("x")
y = T.vector("y")
w = theano.shared(rng.randn(feats), name="w")
b = theano.shared(0., name="b")
print("Initial model:")
print(w.get_value())
print(b.get_value())

# Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w) - b))   # Probability that target = 1
prediction = p_1 > 0.5                    # The prediction thresholded
xent = -y * T.log(p_1) - (1-y) * T.log(1-p_1) # Cross-entropy loss function
cost = xent.mean() + 0.01 * (w ** 2).sum()# The cost to minimize, 패널티 : w의 값이 크면 패널티를 줌.
gw, gb = T.grad(cost, [w, b])             # Compute the gradient of the cost
                                          # (we shall return to this in a
                                          # following section of this tutorial)

# Compile
train = theano.function(
          inputs=[x,y],
          outputs=[prediction, xent],
          updates=((w, w - 0.1 * gw), (b, b - 0.1 * gb)))

# Train
for i in range(training_steps):
    pred, err = train(D[0], D[1])
    
print("Final model:")
print(w.get_value())
print(b.get_value())
print("target values for D:")
print(D[1])
print("prediction on D:")

predict = theano.function(inputs=[x], outputs=prediction)
print(predict(D[0]))

Initial model:
[ -6.77449187e-01   5.47081152e-01  -2.34836208e-01   5.21317813e-01
  -5.86799207e-01  -7.07146666e-01  -1.75981028e+00  -7.30686494e-02
   5.66477644e-02  -8.30942080e-01  -9.77407925e-01   3.28184992e-01
   1.39370973e-01  -1.32262792e+00  -1.42769207e-01  -2.63195079e-01
   4.78356674e-01  -1.77731525e+00  -4.77840078e-01  -1.39931062e+00
  -1.40203317e+00   8.75485570e-01  -3.20898281e-03  -2.70248716e-01
   1.42925425e+00   6.24620425e-02  -8.65753970e-01  -6.18968907e-01
   9.74791498e-02   7.55109752e-01   1.26003443e+00   6.38074318e-01
   3.05825841e-01  -6.86204884e-02   3.35365893e-01  -9.07185082e-01
  -4.74808889e-01   1.46689576e+00  -1.20283675e+00   3.58041442e-01
   6.77345554e-01   1.73860905e-01   2.68100408e+00  -2.44122426e-01
   4.40656248e-01  -1.09178371e+00   3.49501699e-01   1.03367414e-01
  -1.52809160e+00  -7.02897929e-01  -3.67008300e-01   2.33173092e-01
  -5.87439566e-02  -3.15317409e-01   5.96989054e-01  -4.09534196e-01
   1.71534291e+00  

## 04. Graph Structures

In [None]:
import theano.tensor as T

x = T.dmatrix('x')
y = T.dmatrix('y')
z = x + y

![이미지](http://deeplearning.net/software/theano/_images/apply.png)

In [27]:
import theano
a = theano.tensor.vector("a")      # declare symbolic variable
b = a + a ** 10                    # build symbolic expression
f = theano.function([a], b)        # compile function
print(f([0, 1, 2]))                # prints `array([0,2,1026])`

[    0.     2.  1026.]


In [None]:
theano.printing.pydotprint(b, outfile="./pics/symbolic_graph_unopt.png", var_with_name_simple=True)  
theano.printing.pydotprint(f, outfile="./pics/symbolic_graph_opt.png", var_with_name_simple=True)  

- [Theano의 Graph Structures예제에서 발생하는 오류 해결방법](https://github.com/biospin/DeepBio/blob/master/reference/pydot_error.txt)
![이미지](symbolic_graph_opt.png)

##05. Using the GPU

In [11]:
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in range(iters):
    r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
else:
    print('Used the gpu')

[Elemwise{exp,no_inplace}(<TensorType(float64, vector)>)]
Looping 1000 times took 6.445618 seconds
Result is [ 1.23178032  1.61879341  1.52278065 ...,  2.20771815  2.29967753
  1.62323285]
Used the cpu


- 홈디렉토리에 .theanorc 파일에 설정 (**추천**)
```
[global]
floatX=float32
device=gpu
[nvcc]
fastmath=True
```

- 실행시 설정
    - THEANO_FLAGS=mode=FAST_RUN,device=cpu                python check1.py
    - THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python check1.py
    
- 코드상에서
```
GPU = False
if GPU :
    print "Using GPU"
    try : theano.config.device = 'gpu'
    except : pass
    theano.config.floatX = 'float32'
else :
    print "Using CPU"
```

- 확인해보기

```
[deeplearning@deep01 ~]$ THEANO_FLAGS=mode=FAST_RUN,device=cpu     python check1.py
[Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]
Looping 1000 times took 3.190209 seconds
Result is [ 1.23178029  1.61879337  1.52278066 ...,  2.20771813  2.29967761
  1.62323284]
Used the cpu
[deeplearning@deep01 ~]$
[deeplearning@deep01 ~]$ THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python check1.py
Using gpu device 0: GeForce GTX 960
[GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), HostFromGpu(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.565735 seconds
Result is [ 1.23178029  1.61879349  1.52278066 ...,  2.20771813  2.29967761
  1.62323296]
Used the gpu
[deeplearning@deep01 ~]$

```

- ** theano.shared( value=W_values, name="W", borrow=True )  **
```
기본적으로 GPU 연산은 데이터를 주메모리에서 GPU용 메모리인 VRAM으로 옮긴 후 처리된다. 그리고 결과를 확인하려면 다시 VRAM에서 주메모리로 가져와야한다. 이 부분이 시간이 많이 소요된다. 따라서 메모리간 이동을 최소화 하기 위해서 GPU에 데이터를 올리는 함수이다.
```

## 06 . Classifying MNIST digits using Logistic Regression

### MNIST Dataset
- mnist.pkl.gz - http://deeplearning.net/data/mnist/mnist.pkl.gz
- Theano 예제 코드안에 다운로드 받는 소스가 있어서 따로 받을 필요가 없음.
- http://www.deeplearning.net/tutorial/code/    ,  logistic_sgd.py 파일안에 load_data()함수안에 있음.
- Training set 60,000 개, Validation set 10,000개, Testing set 10,000개로 세부분으로 나누어짐.
- fixed size image of 28 x 28 pixels
- ![이미지](http://nbviewer.jupyter.org/github/songorithm/ML/blob/master/part3/pkg/theano/ch03/figures/cap3.1.png)

### mnist.pkl.gz 데이터 읽어오기

In [None]:
import cPickle, gzip, numpy

# Load the dataset
f = gzip.open('mnist.pkl.gz', 'rb')
train_set, valid_set, test_set = cPickle.load(f)
f.close()

### shared data로 다루기

In [13]:
def shared_dataset(data_xy):
    """ Function that loads the dataset into shared variables
    
    The reason we store our dataset in shared variables is to allow
    Theano to copy it into the GPU memory (when code is run on GPU).
    Since copying data into the GPU is slow, copying a minibatch everytime
    is needed (the default behaviour if the data is not in a shared
    variable) would lead to a large decrease in performance.
    """
    data_x, data_y = data_xy
    shared_x = theano.shared(numpy.asarray(data_x, dtype=theano.config.floatX)) 
    shared_y = theano.shared(numpy.asarray(data_y, dtype=theano.config.floatX))
    # When storing data on the GPU it has to be stored as floats
    # therefore we will store the labels as ‘‘floatX‘‘ as well
    # (‘‘shared_y‘‘ does exactly that). But during our computations
    # we need them as ints (we use labels as index, and if they are
    # floats it doesn’t make sense) therefore instead of returning
    # ‘‘shared_y‘‘ we will have to cast it to int. This little hack
    # lets us get around this issue
    return shared_x, T.cast(shared_y, 'int32')

In [None]:
import theano
import theano.tensor as T

test_set_x, test_set_y = shared_dataset(test_set)
valid_set_x, valid_set_y = shared_dataset(valid_set)
train_set_x, train_set_y = shared_dataset(train_set)

batch_size = 500 # size of the minibatch

# accessing the third minibatch of the training set
data  = train_set_x[2 * 500: 3 * 500]
label = train_set_y[2 * 500: 3 * 500]

### Deep Learning을 위한 최적화 기법 기초
- Learning a Classifier : 손실함수
- Stochastic Gradient Descent : 최적화 
- Regularization : 과적합 방지
- Recap

#### Learning a Classifier
- Zero-One Loss
- Negative Log-Likelihood Loss

##### Zero-One Loss
![이미지](http://fa.bianp.net/talks/trento_may_2015/img/logistic.svg)
![이미지](http://nbviewer.jupyter.org/github/songorithm/ML/blob/master/part3/pkg/theano/ch03/figures/cap3.6.png)

In [None]:
# zero_one_loss is a Theano variable representing a symbolic
# expression of the zero one loss ; to get the actual value this
# symbolic expression has to be compiled into a Theano function (see # the Theano tutorial for more details)
zero_one_loss = T.sum(T.neq(T.argmax(p_y_given_x), y))

##### Negative Log-Likelihood Loss
- Zero-One Loss 함수는 미분이 가능하지 않음.
- log-likelihood
- ![이미지](http://nbviewer.jupyter.org/github/songorithm/ML/blob/master/part3/pkg/theano/ch03/figures/cap3.7.png)
- negative log-likelihood
- ![이미지](http://nbviewer.jupyter.org/github/songorithm/ML/blob/master/part3/pkg/theano/ch03/figures/cap3.8.png)

In [None]:
# NLL is a symbolic variable ; to get the actual value of NLL, this symbolic # expression has to be compiled into a Theano function (see the Theano
# tutorial for more details)
NLL = -T.sum(T.log(p_y_given_x)[T.arange(y.shape[0]), y])
# note on syntax: T.arange(y.shape[0]) is a vector of integers [0,1,2,...,len(y)].
# Indexing a matrix M by the two vectors [0,1,...,K], [a,b,...,k] returns the
# elements M[0,a], M[1,b], ..., M[K,k] as a vector.  Here, we use this
# syntax to retrieve the log-probability of the correct labels, y.

#### Stochastic Gradient Descent
- Gradient descent : 전체 데이터를 한번에 계산에 적용
- Stochastic gradient descent : 데이터를 하나씩을 계산에 적용
- ** Minibatch SGD : 데이터를 몇개씩( 예, 500개 )씩을 묶어서 계산에 적용  **

##### Gradient descent

In [None]:
# GRADIENT DESCENT
while True:
    loss = f(params)
    d_loss_wrt_params = ... # compute gradient 
    params -= learning_rate * d_loss_wrt_params 
    if <stopping condition is met>:
        return params

##### Stochastic gradient descent

In [None]:
# STOCHASTIC GRADIENT DESCENT
for (x_i,y_i) in training_set:
                            # imagine an infinite generator
                            # that may repeat examples (if there is only a finite training
    loss = f(params, x_i, y_i) 
    d_loss_wrt_params = ... # compute gradient 
    params -= learning_rate * d_loss_wrt_params 
    if <stopping condition is met>:
        return params

##### minibatch SGD

In [None]:
for (x_batch,y_batch) in train_batches:
                            # imagine an infinite generator
                            # that may repeat examples
    loss = f(params, x_batch, y_batch)
    d_loss_wrt_params = ... # compute gradient using theano 
    params -= learning_rate * d_loss_wrt_params
    if <stopping condition is met>: 
        return params

#####  Theano 구현

In [None]:
# Minibatch Stochastic Gradient Descent

# assume loss is a symbolic description of the loss function given
# the symbolic variables params (shared variable), x_batch, y_batch;

# compute gradient of loss with respect to params
d_loss_wrt_params = T.grad(loss, params)

# compile the MSGD step into a theano function
updates = [(params, params - learning_rate * d_loss_wrt_params)]
MSGD = theano.function([x_batch,y_batch], loss, updates=updates)

for (x_batch, y_batch) in train_batches:
    # here x_batch and y_batch are elements of train_batches and
    # therefore numpy arrays; function MSGD also updates the params 
    print('Current loss is ', MSGD(x_batch, y_batch))
    if stopping_condition_is_met:
        return params

#### Regularization
- L1 and L2 regularization
- Early-Stopping

##### L1 and L2 regularization
![이미지](http://nbviewer.jupyter.org/github/songorithm/ML/blob/master/part3/pkg/theano/ch03/figures/cap3.9.png)
![이미지](https://github.com/songorithm/ML/raw/f9f2c631f14613f5051eed28f70ffeb130d9c219/part2/study04/dml07/figures/cap7.21.png)
![이미지](https://camo.githubusercontent.com/0403ea65eb50f1635201e850814948aa886780bd/687474703a2f2f796f73696e736b692e636f6d2f6d6c737331322f6d656469612f736c696465732f4d4c53532d323031322d46756b756d697a752d4b65726e656c2d4d6574686f64732d666f722d537461746973746963616c2d4c6561726e696e675f3035302e706e67)

In [None]:
# symbolic Theano variable that represents the L1 regularization term
L1 = T.sum(abs(param))

# symbolic Theano variable that represents the squared L2 term
L2_sqr = T.sum(param ** 2)

# the loss
loss = NLL + lambda_1 * L1 + lambda_2 * L2

##### Early-Stopping
![이미지](https://github.com/songorithm/ML/raw/f9f2c631f14613f5051eed28f70ffeb130d9c219/part2/study04/dml07/figures/cap7.44.png)
![이미지](https://github.com/songorithm/ML/raw/f9f2c631f14613f5051eed28f70ffeb130d9c219/part2/study04/dml07/figures/cap7.48.png)

In [None]:
# early-stopping parameters
patience = 5000 # look as this many examples regardless 
patience_increase = 2 # wait this much longer when a new best is
                        # found
improvement_threshold = 0.995 # a relative improvement of this much is
                               # considered significant
validation_frequency = min(n_train_batches, patience/2) # go through this many
                              # minibatches before checking the network
                              # on the validation set; in this case we
                              # check every epoch
best_params = None
best_validation_loss = numpy.inf
test_score = 0.
start_time = time.clock()

done_looping = False
epoch = 0

while (epoch < n_epochs) and (not done_looping):
    # Report "1" for first epoch, "n_epochs" for last epoch
    epoch = epoch + 1
    for minibatch_index in xrange(n_train_batches):

        d_loss_wrt_params = ... # compute gradient
        params -= learning_rate * d_loss_wrt_params # gradient descent
        
        # iteration number. We want it to start at 0.
        iter = (epoch - 1) * n_train_batches + minibatch_index
        # note that if we do ‘iter % validation_frequency‘ it will be 
        # true for iter = 0 which we do not want. We want it true for 
        # iter = validation_frequency - 1.
        if (iter + 1) % validation_frequency == 0:

            this_validation_loss = ... # compute zero-one loss on validation set 
            
            if this_validation_loss < best_validation_loss:
                                       
                # improve patience if loss improvement is good enough
                if this_validation_loss < best_validation_loss * improvement_threshold:
                    patience = max(patience, iter * patience_increase)
                
                best_params = copy.deepcopy(params)
                best_validation_loss = this_validation_loss

        if patience <= iter:
            done_looping = True
            break

# POSTCONDITION :
# best_params refers to the best out-of-sample parameters observed during the optimization

 ### Loading and Saving Models

In [None]:
import cPickle
save_file = open('path', 'wb') # this will overwrite current contents
cPickle.dump(w.get_value(borrow=True), save_file, -1) # the -1 is for HIGHEST_PROTOCOL 
cPickle.dump(v.get_value(borrow=True), save_file, -1) # .. and it triggers much more e 
cPickle.dump(u.get_value(borrow=True), save_file, -1) # .. storage than numpy’s defaul 
save_file.close()

In [None]:
save_file = open('path')
w.set_value(cPickle.load(save_file), borrow=True) 
v.set_value(cPickle.load(save_file), borrow=True) 
u.set_value(cPickle.load(save_file), borrow=True

### 전체 코드 실행
- THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32   python logistic_sgd.py

```
epoch 68, minibatch 83/83, validation error 7.531250 %
     epoch 68, minibatch 83/83, test error of best model 7.520833 %
epoch 69, minibatch 83/83, validation error 7.531250 %
epoch 70, minibatch 83/83, validation error 7.510417 %
     epoch 70, minibatch 83/83, test error of best model 7.500000 %
epoch 71, minibatch 83/83, validation error 7.520833 %
epoch 72, minibatch 83/83, validation error 7.510417 %
     epoch 72, minibatch 83/83, test error of best model 7.510417 %
epoch 73, minibatch 83/83, validation error 7.500000 %
     epoch 73, minibatch 83/83, test error of best model 7.489583 %
Optimization complete with best validation score of 7.500000 %,with test performance 7.489583 %
The code run for 74 epochs, with 12.146231 epochs/sec
The code for file logistic_sgd.py ran for 6.1s
```

## 07. 암 환자 RNA정보를 활용한 암 예측 모델 개발  : 숙제
- https://github.com/biospin/DeepBio
- [암환자 mRNA에서 학습용, Valiaiotion용, Test용 데이터 만들기](https://github.com/biospin/DeepBio/blob/master/exercise01/mRNA_make_feature.ipynb)
- 데이터 파일들 : https://drive.google.com/drive/folders/0B6bSLTlVnagfN2dIZ0p1OTFTYzg 
- 학습에 필요한 파일을 여러개의 파일로 나누어짐 => 각각 파일을 읽어서 하나로 병합이 필요함.
- 학습데이터에서 초보자가 하는 단순한 실수가 포함됨 => 파일을 읽어서 실수를 처리하는 코드를 추가함
- Y값은 범위가 10개가 아님, Y값의 범위를 조절
- 기타 하이퍼파라메터를 알맞게 수정 필요