### 이 문서는 초보가 제작하였으므로, 틀린 부분이 있을 지도 모릅니다.
- https://github.com/bage79/nlp4kor
- https://www.youtube.com/playlist?list=PLE_yleP-KQefhFSNh16hJKnq6stIG05fu

# Real Tensorflow Coding
- 대용량 입력 데이터
- 학습 모델 저장/로드
- 최적의 모델 선택
- 반복 학습 (그래프 재사용)

### Reference
- https://www.buzzvil.com/2017/02/22/buzzvil-techblog-tensorflow-deeplearning/
- https://jasdeep06.github.io/posts/variable-sharing-in-tensorflow/

# Queue Runner (input pipeline)

### (질문1) tensorflow 학습중에, 파이썬 프로그램이 멈춰버린 적이 있나요?
<img src="img/gpu_out_of_memory.png">
- https://3.bp.blogspot.com/-PSbzKHlUR9U/VxQ659ilhPI/AAAAAAAABuQ/CB33z-iJxMMrVLVECnIqe5xUUvKBKyE3ACKgB/s1600/step04.png
- standard output (out of GPU memory)

### (질문2) tensorflow 데이터 로딩중에, 파이썬 프로그램이 멈춰버린 적이 있나요?
<img src="img/out_of_memory_syslog.png">
- /var/log/syslog (out of RAM memory)


### (질문3) tensorflow로 학습시킬 때 GPU 사용량이 불규칙 적인 것을 보셨나요?
<img src="img/nvidia-smi.png">
- watch -n 1 nvidia-smi

### Why you need Tensorflow Queue Runner
- for better Performance
    - parallel data loading in other python thread

<img src="img/why_need_queue_runner.png">
- https://www.quora.com/In-TensorFlow-what-are-queue-runners-and-why-are-they-useful

### Methods of reading data
- https://www.tensorflow.org/programmers_guide/reading_data
- 1.Preloaded data: 
    - a constant or variable in the TensorFlow graph holds all the data (for small data sets).
- 2.Feeding: 
    - Python code provides the data when running each step.
    - tf.placeholder(shape)
- 3.Reading from files: (for large data sets).
    - an input pipeline reads the data from files at the beginning of a TensorFlow graph.
    - tf.train.string_input_producer(filenames)


### 2. Reading from files (Queue Runner)
<img src="img/queue_runner_process.gif">
https://www.tensorflow.org/programmers_guide/reading_data

<img src="img/queue_runner_bcho.png">
http://bcho.tistory.com/1165

### Very Fast with placeholder & feed_dict (preloaded small data)
```python
all_data = None # preloaded small data in memory
def next_batch_in_memory(filenames, batch_size):
    all_data = read_all_data_into_memory(filenames)
    for features_batch, labels_batch in all_data:
        yield features_batch, labels_batch
def create_graph():
    x = tf.placeholder()
    y = tf.placeholder()
    return x, y
x, y = create_graph()
with tf.Session():
    for features_batch, labels_batch in next_batch_in_memory(filenames):
        _train_cost, _train_step = sess.run([train_cost, train_step], 
                                            feed_dict={x:_features_batch, y: _labels_batch})    
```

### Fast with placeholder & feed_dict (for big data)
```python
def next_batch(filenames, batch_size):
    for file in filenames:
        features_batch, labels_batch = [], []
        for line in file:
            feature, label = line.split()
            features_batch.append(feature)
            labels_batch.append(label)
            if len(features_batch) >= batch_size:
                yield features_batch, labels_batch
                features_batch, labels_batch = [], []
def create_graph():
    x = tf.placeholder()
    y = tf.placeholder()
    return x, y
x, y = create_graph()
with tf.Session():
    for features_batch, labels_batch in next_batch(filenames):
        _train_cost, _train_step = sess.run([train_cost, train_step], 
                                            feed_dict={x:_features_batch, y: _labels_batch})    
```

# But...
### My Queue Runner is very slower than placeholder. Why?
### (Reason) use free_dict for data
- Very Slow with Queue Runner & feed_dict (incorrect way)
```python
def input_pipeline(filenames):
    filename_queue = tf.train.string_input_producer([filenames], shuffle=False)
    reader = tf.TextReader()
    _, line = reader.read()
    tokens = tf.decode_csv(line)
    feature, label = tf.reshape(tokens[:-1], ), tf.reshape(tokens[-1], )
    features_batch, labels_batch = tf.train.batch([feature, label])
    return features_batch, labels_batch
def create_graph():
    x = tf.placeholder()
    y = tf.placeholder()
    return x, y
x, y = create_graph()
features_batch, labels_batch = input_pipeline(filenames)
with tf.Session():
    for nth_batch in range(n_train//batch_size):
        _features_batch, _labels_batch = sess.run([features_batch, labels_batch]) # read from queue
        _train_cost, _train_step = sess.run([train_cost, train_step], feed_dict={x:_features_batch, y: _labels_batch}) # input to placeholder
```

### Very Fast with Queue Runner (for big data)
```python
def input_pipeline(filenames):
    # same with above
    return features_batch, labels_batch
def create_graph(x, y):
    x, y = input_pipeline(filenames)
    return x, y
x, y = create_graph()
with tf.Session():
    _train_cost, _train_step = sess.run([train_cost, train_step])
```

### Benchmark Queue Runner & placeholder
##### with placeholder & preload (Very Fast, small data)
<img src="img/learn_add_with_placeholder_preload.png">
##### with placeholder (Fast)
<img src="img/learn_add_with_placeholder.png">
##### with Queue Runner & feed_dict (Very Slow)
- incorrect way
<img src="img/learn_add_with_queue_feed_dict.png">
##### with Queue Runner (Very Fast)
<img src="img/learn_add_with_queue.png">

# Model Save & Load
- tf.train.Saver

```python
model_name = os.path.basename(__file__).replace('.py', '')
model_file = os.path.join(MODELS_DIR, '%s.n_train_%s.batch_size_%s.total_train_time_%s/model' % (model_name, n_train, batch_size, total_train_time))
model_dir = os.path.dirname(model_file)
```

In [1]:
!ls -al /home/bage/workspace/nlp4kor/models/

합계 68
drwxrwxr-x 17 bage bage 4096 Jul 29 14:10 .
drwxrwxr-x 10 bage bage 4096 Jul 24 16:33 ..
drwxrwxr-x  2 bage bage 4096 Jul 29 22:14 learn_add_with_placeholder.n_train_1000.batch_size_1.total_train_time_5
drwxrwxr-x  2 bage bage 4096 Jul 29 22:14 learn_add_with_placeholder.n_train_1000.batch_size_10.total_train_time_5
drwxrwxr-x  2 bage bage 4096 Jul 29 22:14 learn_add_with_placeholder.n_train_1000.batch_size_100.total_train_time_5
drwxrwxr-x  2 bage bage 4096 Jul 29 13:03 learn_add_with_placeholder_preload.n_train_1000.batch_size_1.total_train_time_5
drwxrwxr-x  2 bage bage 4096 Jul 29 13:03 learn_add_with_placeholder_preload.n_train_1000.batch_size_10.total_train_time_5
drwxrwxr-x  2 bage bage 4096 Jul 29 13:03 learn_add_with_placeholder_preload.n_train_1000.batch_size_100.total_train_time_5
drwxrwxr-x  2 bage bage 4096 Jul 29 22:21 learn_add_with_queue.n_train_1000.batch_size_1.total_train_time_5
drwxrwxr-x  2 bage bage 4096 Jul 29 22:21 learn_add_with_queue.n_train_10

```python
saver = tf.train.Saver(max_to_keep=None)
for epoch in range(total_epochs): # (optional) save model on each epoch
    saver.save(sess, model_file, global_step=epoch)

if min_valid_epoch == epoch:  # save the lastest best model 
    saver.save(sess, model_file)
```

```python
checkpoint = tf.train.get_checkpoint_state(model_dir)
if not training_mode and checkpoint: # this is test mode and model exists
    saver = tf.train.Saver()
    saver.restore(sess, model_file) # restore
```

### (주의) 위와 같은 방법으로 best model을 찾지 못하는 경우 
- 동영상 발표 참고
- 따라서, Tensorboard를 통하여 min valid cost가 정상적으로 학습된 Weight에 의한 값인지 눈으로 확인할 필요 있음.

### Current code line is running on CPU or GPU? (발표 동영상 참고)
- x = tf.placeholder(...)
- sess.run(tf.global_variables_initializer())
- _cost = sess.run([cost], feed_dict={x: x_data})
- Saver().save(model_file)
- Saver().restore(model_file)
- http://haanjack.github.io/cuda/2016-02-16-CUDA/
<img src="img/cuda_processing_flow.jpg">

# Tensorboard
- https://github.com/tensorflow/tensorflow/blob/r1.2/tensorflow/tensorboard/README.md
- https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/4_Utils/tensorboard_basic.py

### start & stop Tensorboard background

In [2]:
!grep tensorboard ~/.bash_profile

alias tensorboard-clear='rm -rf ~/tensorboard_log/*'
alias tensorboard-start='nohup tensorboard --reload_interval=5 --logdir=~/tensorboard_log/ --port=6006 >/dev/null 2>&1 &'
alias tensorboard-stop='pkill -f tensorboard && sleep 1 && ps aux | grep tensorboard | grep -v grep'


In [3]:
!ps -efww | grep tensorboard | grep -v grep

bage      4049     1  0 10:18 ?        00:00:09 /home/bage/anaconda3/bin/python /home/bage/anaconda3/bin/tensorboard --reload_interval=5 --logdir=~/tensorboard_log/ --port=6006


### Cost Graph by different Batch-sizes
<img src="img/learn_add_with_placeholder_tensorboard.png">

# Reuse Graph
- 1.train & valid graph
- 2.graph with different hypter-parameters
- https://tensorflowkorea.gitbooks.io/tensorflow-kr/content/g3doc/how_tos/variable_scope/
```python
def create_graph(reuse=None):
    with tf.variable_scope('model', reuse=reuse) as scope: # reuse variables while entring into this scope
        # w = tf.Variable("w1", shape, initializer=tf.random_normal_initializer())
        w = tf.get_variable("w1", shape, initializer=tf.random_normal_initializer()) # name=model/w1
        scope.reuse_variables() # reuse already defined variables in this scope
        w = tf.get_variable("w1", shape, initializer=tf.random_normal_initializer()) # name=model/w1
train_cost, train_step = create_graph(reuse=None)
valid_cost = create_graph(reuse=True)
test_cost, accuracy, y_hat = create_graph(reuse=True)
with tf.Session():
    if is_training:
        _train_cost, _train_step = sess.run([train_cost, train_step]) # x, y is feeded from pipeline
        _valid_cost = sess.run([valid_cost]) # x, y is feeded from pipeline
    else:
        _test_cost, _accuracy, _y_hat, _w = sess.run([test_cost, accuracy, y_hat, w]) # x, y is feeded from pipeline
```

### (Tip) Multiple session on one GPU
```python
config = tf.ConfigProto()
config.gpu_options.allow_growth = True  # do not use entire memory for this session
with tf.Session(config=config) as sess:
    pass
```

### (Tip) Tensor's Flow (classification)
- http://machinethink.net/blog/tensorflow-on-ios/
<img src="img/tensorflow_classifitation_process.png">
- cost(loss) function
    - binary classification: sigmoid
    - multi-label classification: softmax with cross entropy
- y_pred (y_hat): for calculating cost
- inference (predicted label): for accucary
##### train vs valid vs test
- train: optimizer(minimize or maximize)
- valid: cost
- test: cost & accuracy & w, b, y_hat ...

# Real Tensorflow Coding
- 대용량 입력 데이터: Queue Runner (Input Pipeline)
- 학습 모델 저장/로드: Saver
- 최적의 모델 선택: Tensorboard (Check epoch has minimum valid cost)
- 반복 학습 (그래프 재사용): tf.variable_scope(reuse=True) & tf.get_variable(name='') & scop.reuse_variables()

### (Example) Learn Add function (Linear Regression)
- https://github.com/bage79/nlp4kor/blob/master/nlp4kor/examples/learn_add_with_placeholder.py
    - with placeholder & feed_dict
- https://github.com/bage79/nlp4kor/blob/master/nlp4kor/examples/learn_add_with_queue.py
    - with Queue Runner

In [4]:
!wc -l /home/bage/workspace/nlp4kor/data/add.train.tsv
!wc -l /home/bage/workspace/nlp4kor/data/add.valid.tsv
!wc -l /home/bage/workspace/nlp4kor/data/add.test.tsv

1000 /home/bage/workspace/nlp4kor/data/add.train.tsv
100 /home/bage/workspace/nlp4kor/data/add.valid.tsv
100 /home/bage/workspace/nlp4kor/data/add.test.tsv


In [5]:
!head /home/bage/workspace/nlp4kor/data/add.train.tsv

45	26	71
50	14	64
75	78	153
44	0	44
45	64	109
40	65	105
90	27	117
92	79	171
49	84	133
78	5	83
