<a href="https://colab.research.google.com/github/BrotherKim/Colab/blob/main/SEP592/BK_Breast_Data_PreProcessing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Colab 세션 유지를 위한 JavaScript

Google Colab에서는 <u>약 30분 이상 작업이 없는 경우(idle)에는 세션을 종료</u>하기 때문에, 크롬 웹브라우저의 "개발자 도구(DevTools)"를 이용해서 주기적으로 클릭 이벤트를 트리거함으로써 세션을 유지함.

1. 크롬 웹브라우저에서 F12 키를 입력하여 개발자 도구를 호출
2. 개발자 도구 하단의 Console 탭에서, 트리거를 시작하기 위하여 다음 JavaScript를 입력
```javascript
function KeepClicking(){
  console.log("Clicking:  " + new Date().toString());
  document.querySelector("colab-connect-button").click();
}
var trigger = setInterval(KeepClicking, 60000);  // 변수는 정지시킬 때 사용
```

3. 만약 트리거를 정지하려면, 다음 JavaScript를 입력
```javascript
console.log("Stopped");
clearInterval(trigger);
```

### 학습 도중에 세션 종료(disconnection)되는 경우

* Google Colab에서는 <u>최대 12시간의 세션</u>을 지원하므로, 백그라운드 학습시간이 너무 길어지는 경우에는 종료될 수 있음.
* 사용량 트래픽에 따라서 최대 시간에 도달하기 전에도 <u>간헐적으로 종료될 가능성</u>이 있음.
  * Google의 FAQ 내용 발췌:
  ```
“Colaboratory is intended for interactive use. 
　Long-running background computations, particularly on GPUs, may be stopped.”
  ```

# 구글 드라이브의 연동(mount)

매 세션마다 재인증 절차가 필요함.

*   코드 실행 후 출력되는 URL 링크 접속 → 접근권한 허용 → 생성된 인증코드 복사하여 입력
*   인증 과정 수행 (Google Cloud SDK, Google Drive File Stream)



In [None]:
# 만약 Google Cloud Storage를 사용할 경우
#from google.colab import auth
#auth.authenticate_user()

# 개인 계정에서 Google Drive를 사용할 경우
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [None]:
!cd "/content/gdrive/MyDrive/Colab_Notebooks/2021-1_SEP592_DeepPATH"; ls -al;

total 1171
drwx------ 2 root root   4096 May 18 04:21 CMap_output
-rw------- 1 root root  20409 May 23 11:50 CMap_output.tar.gz
drwx------ 2 root root   4096 May 18 04:21 HeatMap_output
-rw------- 1 root root    173 May 23 11:49 HeatMap_output.tar.gz
drwx------ 2 root root   4096 May 21 13:27 .ipynb_checkpoints
drwx------ 2 root root   4096 May 18 04:21 output_eval_test
-rw------- 1 root root 976715 May 23 11:48 output_eval_test.tar.gz
drwx------ 2 root root   4096 May 18 04:21 output_eval_valid
drwx------ 2 root root   4096 May 18 04:21 output_eval_valid_checkpoints
-rw------- 1 root root    505 May 23 11:39 output_eval_valid_checkpoints.tar.gz
-rw------- 1 root root    261 May 23 11:38 output_eval_valid.tar.gz
drwx------ 2 root root   4096 May 23 11:34 output_model_transfer
drwx------ 2 root root   4096 May 18 03:42 preproc_images
drwx------ 2 root root   4096 May 21 13:27 resnet_v1_50
drwx------ 2 root root   4096 May 18 04:21 ROC_output
-rw------- 1 root root 159023 May 23 11:54 RO

# 전처리할 데이터셋 추려내기

현재 드라이브 내에 13종 암에 대한 lym set과 necrosis set이 있다.
여기서 필요한 데이터는 {ROOT}/trainig_data/*/train_list_brca.txt 내에 정의된 디렉터리 리스트이므로, 이를 추출한다.

In [None]:
# 구글 드라이브의 wiki에서 가져온 데이터셋 관련 저장 경로
ROOT = '/content/gdrive/MyDrive/TCGA-BRCA/training_data'
LYM_ROOT = '%s/lym_cnn_training_data' % ROOT
NEC_ROOT = '%s/necrosis_cnn_training_data' % ROOT

In [None]:
!ls -l {LYM_ROOT}/train_list_brca.txt
!ls -l {NEC_ROOT}/train_list_brca.txt

-rw------- 1 root root 744 Oct  8  2017 /content/gdrive/MyDrive/TCGA-BRCA/training_data/lym_cnn_training_data/train_list_brca.txt
-rw------- 1 root root 13 Oct  8  2017 /content/gdrive/MyDrive/TCGA-BRCA/training_data/necrosis_cnn_training_data/train_list_brca.txt


In [None]:
# 터미널 커맨드를 문자열로 반환해주는 함수 작성

import subprocess

def GetShellCmdStdOut(command):
  cmd = ['sh', '-c', command]
  fd_popen = subprocess.Popen(cmd, stdout=subprocess.PIPE).stdout 
  data = fd_popen.read().strip() 
  fd_popen.close()

  retval = data.decode('utf-8') 
  return retval

In [None]:
!cat {ROOT}/../README.txt

The files in this folder make up a single zip archive. Please run 
        cat training_data_multi.z* > training_data.zip 
to merge the files into a full zip file. To extract the contents of
the full zip file, please run
        unzip training_data.zip 
 

The training dataset is organized into two folders: 
    * lym_cnn_training_data contains the training data for the lymphocyte CNN. 
    * necrosis_cnn_training_data contains the training data for the necrosis CNN. 

Training Dataset for Lymphocyte CNN: 
------------------------------------

lym_cnn_training_data has several subfolders and files. Each subfolder has 
a set of png images. These images are patches extracted from whole slide 
tissue images and assigned a label. Each folder also contains a label.txt file. 
This file stores the labels of the images in the folder. The file is space 
separated; i.e., each column is separated by space.

The training data for the lymphocyte CNN was collected over time using different tools 
an

PATCH Name TCGA_Name tissue_X tissue_y predicted infiltration probability
0.png 0 TCGA-05-4396-01Z-00-DX1 35152 8062 0.264

1.png 0 TCGA-05-4396-01Z-00-DX1 38592 8062 0.244

2.png 0 TCGA-05-4396-01Z-00-DX1 41172 8277 0.280




The first column is the image patch file and the second column is the label. The other 
columns may vary across the label.txt files. In the above example, the third column is 
the image id of the TCGA whole slide tissue image, the fourth and fifth columns are the 
coordinates of the patch in the whole slide tissue image, and the last column is the 
predicted lymphocyte infiltration probability by a CNN model of the patch. The last 
column was used for debugging during development and should be ignored. 

The first two columns of the label.txt files should be used for training deep learning 
models. 

If the label column (the 2nd column) in a label.txt file contains a 0 or -1, the 
patch is considered a non-lymphocyte patch (i.e., a patch not containing lymphocytes 
or the number of lymphocytes in the patch is below the threshold). If the label 
column contains a 1, the patch is considered a lymphocyte patch (i.e., a patch
containing lymphocytes).


In [None]:
print(GetShellCmdStdOut('cat %s/../README.txt' % {ROOT}))




In [None]:
necrosisRaw = GetShellCmdStdOut('cat %s/train_list_brca.txt' % NEC_ROOT)
print(necrosisRaw)

luad_batch_1


In [None]:
!cat /content/gdrive/MyDrive/TCGA-BRCA/training_data/lym_cnn_training_data/luad_batch_1/label.txt

0.png 1 TCGA-55-5899-01Z-00-DX1.faa65b08-150c-4c74-95aa-1e8743f0152c.svs
1.png 1 TCGA-55-5899-01Z-00-DX1.faa65b08-150c-4c74-95aa-1e8743f0152c.svs
2.png 1 TCGA-55-5899-01Z-00-DX1.faa65b08-150c-4c74-95aa-1e8743f0152c.svs
3.png 1 TCGA-55-5899-01Z-00-DX1.faa65b08-150c-4c74-95aa-1e8743f0152c.svs
4.png 1 TCGA-55-5899-01Z-00-DX1.faa65b08-150c-4c74-95aa-1e8743f0152c.svs
5.png 1 TCGA-55-5899-01Z-00-DX1.faa65b08-150c-4c74-95aa-1e8743f0152c.svs
6.png 1 TCGA-55-5899-01Z-00-DX1.faa65b08-150c-4c74-95aa-1e8743f0152c.svs
7.png 1 TCGA-55-5899-01Z-00-DX1.faa65b08-150c-4c74-95aa-1e8743f0152c.svs
8.png 1 TCGA-55-5899-01Z-00-DX1.faa65b08-150c-4c74-95aa-1e8743f0152c.svs
9.png 1 TCGA-55-5899-01Z-00-DX1.faa65b08-150c-4c74-95aa-1e8743f0152c.svs
10.png 1 TCGA-55-5899-01Z-00-DX1.faa65b08-150c-4c74-95aa-1e8743f0152c.svs
11.png 1 TCGA-55-5899-01Z-00-DX1.faa65b08-150c-4c74-95aa-1e8743f0152c.svs
12.png 1 TCGA-55-5899-01Z-00-DX1.faa65b08-150c-4c74-95aa-1e8743f0152c.svs
13.png 1 TCGA-55-5899-01Z-00-DX1.faa65b08-150c-4

In [None]:
from PIL import Image

def GenerateNewResolutionImage(w, h, srcPath, destPath):
  img = Image.open(srcPath)
  img_resize = img.resize((w, h))
  img_resize.save(destPath)

In [None]:
!mkdir -p /content/dataset/train_set/lym
!mkdir -p /content/dataset/test_set/lym
!mkdir -p /content/dataset/train_set/normal
!mkdir -p /content/dataset/test_set/normal

In [None]:
import random

lymSrcPathList = []
lymTgtFileList = []
nrmSrcPathList = []
nrmTgtFileList = []

lymRaw = GetShellCmdStdOut('cat %s/train_list_brca.txt' % LYM_ROOT)
ll = lymRaw.split(' ')
for l in ll:
  labelRawList = GetShellCmdStdOut('cat %s/%s/label.txt' % (LYM_ROOT, l)).split('\n')
  #random.shuffle(labelRawList)  

  for labelRaw in labelRawList:
    labelInfo = labelRaw.split(' ')
    pngName = labelInfo[0]
    label = labelInfo[1]
    svsName = labelInfo[2]
    srcPath = '%s/%s/%s' % (LYM_ROOT, l, pngName)
    if label == '1':
      lymSrcPathList.append(srcPath)
      lymTgtFileList.append('%s_%s' % (svsName, pngName))
    else:
      nrmSrcPathList.append(srcPath)
      nrmTgtFileList.append('%s_%s' % (svsName, pngName))

In [None]:
import random

lymShuffle = list(range(0, len(lymSrcPathList)))
nrmShuffle = list(range(0, len(nrmSrcPathList)))

random.shuffle(lymShuffle)
random.shuffle(nrmShuffle)

IMAGE_WIDTH = 224
IMAGE_HEIGHT = 224

LYM_DIV = round(len(lymShuffle) * 0.7)
NRM_DIV = round(len(nrmShuffle) * 0.7)

print('lym : %d / %d' % (LYM_DIV, len(lymShuffle)))
print('nrm : %d / %d' % (NRM_DIV, len(nrmShuffle)))

for idx, i in enumerate(lymShuffle[:LYM_DIV]):
  GenerateNewResolutionImage(IMAGE_WIDTH, IMAGE_HEIGHT, lymSrcPathList[i], '/content/dataset/train_set/lym/%s' % (lymTgtFileList[i]))
  print('%d / %d' % (idx, LYM_DIV))
for idx, i in enumerate(lymShuffle[LYM_DIV:]):
  GenerateNewResolutionImage(IMAGE_WIDTH, IMAGE_HEIGHT, lymSrcPathList[i], '/content/dataset/test_set/lym/%s' % (lymTgtFileList[i]))
  print('%d / %d' % (idx + LYM_DIV, len(lymShuffle)))
for idx, i in enumerate(nrmShuffle[:NRM_DIV]):
  GenerateNewResolutionImage(IMAGE_WIDTH, IMAGE_HEIGHT, nrmSrcPathList[i], '/content/dataset/train_set/normal/%s' % (nrmTgtFileList[i]))
  print('%d / %d' % (idx, NRM_DIV))
for idx, i in enumerate(nrmShuffle[NRM_DIV:]):
  GenerateNewResolutionImage(IMAGE_WIDTH, IMAGE_HEIGHT, nrmSrcPathList[i], '/content/dataset/test_set/normal/%s' % (nrmTgtFileList[i]))
  print('%d / %d' % (idx + NRM_DIV, len(nrmShuffle)))


[1;30;43m스트리밍 출력 내용이 길어서 마지막 5000줄이 삭제되었습니다.[0m
3015 / 26716
3016 / 26716
3017 / 26716
3018 / 26716
3019 / 26716
3020 / 26716
3021 / 26716
3022 / 26716
3023 / 26716
3024 / 26716
3025 / 26716
3026 / 26716
3027 / 26716
3028 / 26716
3029 / 26716
3030 / 26716
3031 / 26716
3032 / 26716
3033 / 26716
3034 / 26716
3035 / 26716
3036 / 26716
3037 / 26716
3038 / 26716
3039 / 26716
3040 / 26716
3041 / 26716
3042 / 26716
3043 / 26716
3044 / 26716
3045 / 26716
3046 / 26716
3047 / 26716
3048 / 26716
3049 / 26716
3050 / 26716
3051 / 26716
3052 / 26716
3053 / 26716
3054 / 26716
3055 / 26716
3056 / 26716
3057 / 26716
3058 / 26716
3059 / 26716
3060 / 26716
3061 / 26716
3062 / 26716
3063 / 26716
3064 / 26716
3065 / 26716
3066 / 26716
3067 / 26716
3068 / 26716
3069 / 26716
3070 / 26716
3071 / 26716
3072 / 26716
3073 / 26716
3074 / 26716
3075 / 26716
3076 / 26716
3077 / 26716
3078 / 26716
3079 / 26716
3080 / 26716
3081 / 26716
3082 / 26716
3083 / 26716
3084 / 26716
3085 / 26716
3086 / 26716
3087 / 26716
3

In [None]:
print('%d / %d' % (LYM_DIV, len(lymShuffle)))
print('%d / %d' % (NRM_DIV, len(nrmShuffle)))

4264 / 6091
18701 / 26716


In [None]:
!cp -r /content/dataset /content/gdrive/MyDrive/KAIST/datasets

In [None]:
!ls -l /content/gdrive/MyDrive/KAIST/datasets

total 8
drwx------ 4 root root 4096 May 30 05:32 test_set
drwx------ 4 root root 4096 May 30 05:29 train_set


#CSV에서 다시 dataset으로 변환

In [None]:
import subprocess

cmd = ['sh', '-c', 'cat /content/breast_label.csv']
fd_popen = subprocess.Popen(cmd, stdout=subprocess.PIPE).stdout 
data = fd_popen.read().strip() 
fd_popen.close()

arr = data.splitlines()
for d in arr:
  d = d.decode('utf-8').split(" ")
  # 첫 열은 png명, 둘째 열은 svs명. svs명_png명으로 복사
  print(d)

[1;30;43m스트리밍 출력 내용이 길어서 마지막 5000줄이 삭제되었습니다.[0m
['930.png,0,TCGA-05-4396-01Z-00-DX1,18167,47623,0.208']
['931.png,0,TCGA-05-4396-01Z-00-DX1,19887,47623,0.267']
['932.png,0,TCGA-05-4396-01Z-00-DX1,19457,48053,0.350']
['933.png,0,TCGA-05-4396-01Z-00-DX1,19887,48053,0.312']
['934.png,0,TCGA-05-4396-01Z-00-DX1,18597,48268,0.241']
['935.png,0,TCGA-05-4396-01Z-00-DX1,19242,48268,0.316']
['936.png,0,TCGA-05-4396-01Z-00-DX1,20962,48483,0.305']
['937.png,0,TCGA-05-4396-01Z-00-DX1,20747,49343,0.368']
['3.png,0,TCGA-05-4397-01Z-00-DX1,61383,1612,0.232']
['7.png,0,TCGA-05-4397-01Z-00-DX1,58373,1827,0.259']
['11.png,0,TCGA-05-4397-01Z-00-DX1,62888,1827,0.531']
['15.png,0,TCGA-05-4397-01Z-00-DX1,55793,2042,0.243']
['19.png,0,TCGA-05-4397-01Z-00-DX1,56653,2042,0.215']
['23.png,0,TCGA-05-4397-01Z-00-DX1,60953,2042,0.222']
['27.png,0,TCGA-05-4397-01Z-00-DX1,63318,2042,0.370']
['31.png,0,TCGA-05-4397-01Z-00-DX1,62243,2257,0.438']
['35.png,0,TCGA-05-4397-01Z-00-DX1,73208,2257,0.312']
['39.png,0,TCGA-05

KeyboardInterrupt: ignored

In [None]:
for row_index, row in fileMeta.iterrows():
    print(row_index)
    print(row.loc[0])
    print(row.loc[1])

0
0.png 1 TCGA-55-5899-01Z-00-DX1.faa65b08-150c-4c74-95aa-1e8743f0152c.svs


KeyError: ignored

In [None]:
#20배율
#jpeg_label_dir/TCGA-(병명)/valid_TCGA_
#preproce/tcga.123123_files/20.0/

## DeepPath

https://github.com/ncoudray/DeepPATH/tree/master/DeepPATH_code

Instructions Clone this repo to your local machine using:

In [None]:
!git clone https://github.com/ncoudray/DeepPATH.git

Cloning into 'DeepPATH'...
remote: Enumerating objects: 2483, done.[K
remote: Counting objects: 100% (157/157), done.[K
remote: Compressing objects: 100% (105/105), done.[K
remote: Total 2483 (delta 69), reused 132 (delta 51), pack-reused 2326[K
Receiving objects: 100% (2483/2483), 10.74 MiB | 16.84 MiB/s, done.
Resolving deltas: 100% (1434/1434), done.


Major dependencies are:

* python 3.6.5
* tensorflow-gpu 1.9.0
* numpy 1.14.3
* matplotlib 2.1.2
* sklearn
* scipy 1.1.0
* openslide-python 1.1.1
* Pillow 5.1.0


In [None]:
!apt-get install openslide-tools

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-460
Use 'apt autoremove' to remove it.
The following additional packages will be installed:
  libopenslide0
Suggested packages:
  libtiff-tools
The following NEW packages will be installed:
  libopenslide0 openslide-tools
0 upgraded, 2 newly installed, 0 to remove and 34 not upgraded.
Need to get 92.5 kB of archives.
After this operation, 268 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libopenslide0 amd64 3.4.1+dfsg-2 [79.8 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic/universe amd64 openslide-tools amd64 3.4.1+dfsg-2 [12.7 kB]
Fetched 92.5 kB in 0s (1,271 kB/s)
Selecting previously unselected package libopenslide0.
(Reading database ... 160706 files and directories currently installed.)
Preparing to unpack .../libopenslide0_3.4.1+dfsg

In [None]:


!apt-get install python3-openslide
!pip install openslide-python

!pip install pydicom==0.9.9
!pip install spams

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-460
Use 'apt autoremove' to remove it.
The following additional packages will be installed:
  javascript-common libjs-jquery python-asn1crypto python-blinker
  python-cffi-backend python-click python-colorama python-cryptography
  python-enum34 python-flask python-idna python-ipaddress python-itsdangerous
  python-jinja2 python-markupsafe python-openslide-examples python-openssl
  python-pkg-resources python-pyinotify python-simplejson python-six
  python-werkzeug python3-olefile python3-pil
Suggested packages:
  apache2 | lighttpd | httpd python-blinker-doc python-cryptography-doc
  python-cryptography-vectors python-enum34-doc python-flask-doc
  python-jinja2-doc python-openssl-doc python-openssl-dbg python-setuptools
  python-pyinotify-doc ipython python-genshi python-lxml python-greenlet
  pyt

In [None]:
# Tensorflow 현재 버전 확인
!pip list | grep tensorflow

tensorflow                    2.4.1         
tensorflow-datasets           4.0.1         
tensorflow-estimator          2.4.0         
tensorflow-gcs-config         2.4.0         
tensorflow-hub                0.12.0        
tensorflow-metadata           0.30.0        
tensorflow-probability        0.12.1        


In [None]:
#   >> DeepPath 실행 시, 오류가 발생하므로 Tensorflow v2 대신 구버전을 설치 (AttributeError: module 'tensorflow' has no attribute 'app')
!pip uninstall -y tensorflow
!pip install tensorflow==1.13.1
!pip install tensorflow-gpu==1.13.1

#  다른 모듈도 구버전 설치
!pip install 'numpy<1.17'
!pip install scipy==1.2.0

Uninstalling tensorflow-2.4.1:
  Successfully uninstalled tensorflow-2.4.1
Collecting tensorflow==1.13.1
[?25l  Downloading https://files.pythonhosted.org/packages/d4/29/6b4f1e02417c3a1ccc85380f093556ffd0b35dc354078074c5195c8447f2/tensorflow-1.13.1-cp37-cp37m-manylinux1_x86_64.whl (92.6MB)
[K     |████████████████████████████████| 92.6MB 58kB/s 
Collecting tensorflow-estimator<1.14.0rc0,>=1.13.0
[?25l  Downloading https://files.pythonhosted.org/packages/bb/48/13f49fc3fa0fdf916aa1419013bb8f2ad09674c275b4046d5ee669a46873/tensorflow_estimator-1.13.0-py2.py3-none-any.whl (367kB)
[K     |████████████████████████████████| 368kB 47.3MB/s 
Collecting tensorboard<1.14.0,>=1.13.0
[?25l  Downloading https://files.pythonhosted.org/packages/0f/39/bdd75b08a6fba41f098b6cb091b9e8c7a80e1b4d679a581a0ccd17b10373/tensorboard-1.13.1-py3-none-any.whl (3.2MB)
[K     |████████████████████████████████| 3.2MB 46.0MB/s 
[?25hCollecting keras-applications>=1.0.6
[?25l  Downloading https://files.pythonhost

Collecting scipy==1.2.0
[?25l  Downloading https://files.pythonhosted.org/packages/80/39/066ecde98f373430bf7a39a02d91c7075b01ef4fc928456e8e31577342d6/scipy-1.2.0-cp37-cp37m-manylinux1_x86_64.whl (26.6MB)
[K     |████████████████████████████████| 26.6MB 1.2MB/s 
[31mERROR: albumentations 0.1.12 has requirement imgaug<0.2.7,>=0.2.5, but you'll have imgaug 0.2.9 which is incompatible.[0m
Installing collected packages: scipy
  Found existing installation: scipy 1.4.1
    Uninstalling scipy-1.4.1:
      Successfully uninstalled scipy-1.4.1
Successfully installed scipy-1.2.0


In [None]:
# 랜덤시드 고정
SEED_FIX = 40

import tensorflow as tf
tf.set_random_seed(SEED_FIX)

import random
random.seed(SEED_FIX)

import numpy as np
np.random.seed(SEED_FIX)

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


## 0 - Prepare the images.


### 0.1 Tile the svs slide images

`
python /path_to/0b_tileLoop_deepzoom4.py  -s 299 -e 0 -j 32 -B 25 -o <full_path_to_output_folder> "full_path_to_input_slides/*/*svs"  
`

In [None]:
#cd /content/gdrive/MyDrive/Colab_Notebooks/

In [None]:
!ls -l

total 12
drwxr-xr-x 4 root root 4096 May 24 05:39 DeepPATH
drwx------ 5 root root 4096 May 24 05:39 gdrive
drwxr-xr-x 1 root root 4096 May  6 13:44 sample_data


In [None]:
!ls -l "{ROOT}"

total 39
drwx------ 2 root root 4096 May 18 04:21 CMap_output
-rw------- 1 root root  170 May 21 12:48 CMap_output.tar.gz
drwx------ 2 root root 4096 May 18 04:21 HeatMap_output
-rw------- 1 root root  173 May 21 12:47 HeatMap_output.tar.gz
drwx------ 2 root root 4096 May 18 04:21 output_eval_test
-rw------- 1 root root  174 May 21 12:47 output_eval_test.tar.gz
drwx------ 2 root root 4096 May 18 04:21 output_eval_valid
drwx------ 2 root root 4096 May 18 04:21 output_eval_valid_checkpoints
-rw------- 1 root root  387 May 21 12:46 output_eval_valid_checkpoints.tar.gz
-rw------- 1 root root  175 May 21 12:46 output_eval_valid.tar.gz
drwx------ 2 root root 4096 May 18 04:21 output_model_transfer
drwx------ 7 root root 4096 May 18 03:42 preproc_images
drwx------ 2 root root 4096 May 21 13:27 resnet_v1_50
drwx------ 2 root root 4096 May 18 04:21 ROC_output
-rw------- 1 root root  170 May 21 12:48 ROC_output.tar.gz


In [None]:
import subprocess

cmd = ['sh', '-c', 'ls /content/gdrive/MyDrive/Team5']
fd_popen = subprocess.Popen(cmd, stdout=subprocess.PIPE).stdout 
data = fd_popen.read().strip() 
fd_popen.close()

arr = data.splitlines()
rawDataList = []
for d in arr:
  rawDataList.append(d)

# 아래 리스트에 없는 경우 입력으로 수행
import subprocess

cmd = ['sh', '-c', 'ls /content/gdrive/MyDrive/preproc_images | grep dzi']
fd_popen = subprocess.Popen(cmd, stdout=subprocess.PIPE).stdout 
data = fd_popen.read().strip() 
fd_popen.close()

arr = data.splitlines()
processedDataList = []
for d in arr:
  processedDataList.append(d[24:-4])

remainedRawDataList = rawDataList
for element in processedDataList:
    if element in remainedRawDataList:
        remainedRawDataList.remove(element)


dl = rawDataList
for idx in range(len(dl)):
    print("[%d] : [%s]" % (idx, dl[idx]))

dl = processedDataList
for idx in range(len(dl)):
    print("[%d] : [%s]" % (idx, dl[idx]))

dl = remainedRawDataList
for idx in range(len(dl)):
    print("[%d] : [%s]" % (idx, dl[idx]))


[0] : [b'01aac488-6e2f-423b-8743-77967544a544']
[1] : [b'02b4f982-ba00-42db-ac23-ca8c325981ff']
[2] : [b'061aea9f-7b32-4bed-8511-02c2a196fa46']
[3] : [b'06ed56c1-d611-4743-9655-af68ffd4d949']
[4] : [b'0ac15491-7981-4c44-bc36-15997ad9c567']
[5] : [b'0b368773-1908-4785-b39e-d04b97a8faab']
[6] : [b'0bcadf49-6569-4d02-a259-ec645b2add5a']
[7] : [b'0c6592bb-9551-4c67-9fb2-9528c45917da']
[8] : [b'0d0ef874-bae8-4ec3-99ea-36f6a5eb57f9']
[9] : [b'0e1392fa-bb69-44d0-84c3-e000aaa40a3a']
[10] : [b'16249084-802b-4a82-8175-d8fa7d2ba2d6']
[11] : [b'17f19d02-019a-42a1-b4e2-b96961807928']
[12] : [b'1966abea-4cf2-4f39-9ee3-c1321e309e1a']
[13] : [b'1ac5a933-c755-48ba-8ce5-b3a85396633e']
[14] : [b'1b946b69-1314-4edf-a92f-40eee1b85c1a']
[15] : [b'1f10965c-025e-419d-a2ce-79f0e88b138a']
[16] : [b'235db291-fb91-4dfe-98e5-1240c78649d4']
[17] : [b'35c7e608-deb5-4cd2-950c-f985db0e53f5']
[18] : [b'3c855a77-6309-429d-9b9c-7bb69a5d82d7']
[19] : [b'3f3cfc1f-9152-4265-a142-2501b7a5f4a1']
[20] : [b'40888806-13ce-4801-b

In [None]:
#수행해야할 명령어들
PROCESS_CMD = 'python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images" '

for d in remainedRawDataList:
  TARGET = '/content/gdrive/MyDrive/Team5/%s/*svs' % d.decode('utf-8') 
  print('!%s "%s"' % (PROCESS_CMD, TARGET))


### 0.2a Sort the tiles into train/valid/test sets according to the classes defined

`
python /full_path_to/0d_SortTiles.py --SourceFolder=<tiled images path> --JsonFile=<JsonFilePath> --Magnification=<Magnification To copy>  --MagDiffAllowed=<Difference Allowed on Magnification> --SortingOption=<Sorting option> --PercentTest=15 --PercentValid=15 --PatientID=12 --nSplit 0
`

* --SortingOption 
  * 3 Sort according to type of cancer (LUSC, LUAD, or Nomal Tissue)
  * 4 Sort according to type of cancer (LUSC, LUAD)

* (optional) outputtype: Type of output: list source/destination in a file (File), do symlink (Symlink, default) or both (Both)


In [None]:
# 정렬 결과파일들을 삭제
#!rm -R Solid_Tissue_Normal/ TCGA-LUAD/ TCGA-LUSC/ img_list.txt log_sort_tiles.log jpeg_label_dir/

In [None]:
%%time
# 타일 이미지의 클래스(LUAD, LUSC 암종)에 대해서 이미지 정렬(sort) 단계 실행
# (0d_SortTiles.py)
#   >> 이때, Console의 출력 메시지가 너무 커서 오류 발생하므로, 
#      실행결과를 별도의 log 파일로 저장함 (IOPub data rate exceeded.)
!python /content/gdrive/MyDrive/Colab_Notebooks/DeepPATH/DeepPATH_code/00_preprocessing/0d_SortTiles.py \
        --SourceFolder="{ROOT}/preproc_images" \
        --JsonFile='/content/gdrive/MyDrive/Colab_Notebooks/DeepPATH/DeepPATH_code/example_TCGA_lung/metadata.cart.2017-03-02T00_36_30.276824.json' \
        --Magnification=20.0 \
        --MagDiffAllowed=0 \
        --SortingOption=3 \
        --outputtype=Both \
        --PercentTest=15 \
        --PercentValid=15 \
        --PatientID=12 \
        --nSplit 0 \
> log_sort_tiles.log

Traceback (most recent call last):
  File "/content/gdrive/MyDrive/Colab_Notebooks/DeepPATH/DeepPATH_code/00_preprocessing/0d_SortTiles.py", line 579, in <module>
    if max(AvailMags) < 0:
ValueError: max() arg is an empty sequence
CPU times: user 96.6 ms, sys: 23.4 ms, total: 120 ms
Wall time: 15.3 s


In [None]:
# 이미지 폴더 목록
TEMP_DIRS = ['Solid_Tissue_Normal/', 'TCGA-LUAD/', 'TCGA-LUSC/']
# TEMP_DIRS = ['jpeg_label_dir/Solid_Tissue_Normal/', 'jpeg_label_dir/TCGA-LUAD/', 'jpeg_label_dir/TCGA-LUSC/']

for d in TEMP_DIRS:
  # 리눅스 명령을 이용해서 데이터 개수 집계
  %env TEMP_DIR_COUNT={d}
  nTotal = !ls -l $TEMP_DIR_COUNT | awk '{print $9}' | grep '^[tv]' | wc -l
  nTrain = !ls -l $TEMP_DIR_COUNT | awk '{print $9}' | grep '^train_' | wc -l
  nValid = !ls -l $TEMP_DIR_COUNT | awk '{print $9}' | grep '^valid_' | wc -l
  nTest  = !ls -l $TEMP_DIR_COUNT | awk '{print $9}' | grep '^test_' | wc -l

  print(' - Total Num.= {}\n - Num. of Train= {}\n - Num. of Valid= {}\n - Num. of Test= {}\n'
          .format(nTotal[-1], nTrain[-1], nValid[-1], nTest[-1]))

env: TEMP_DIR_COUNT=Solid_Tissue_Normal/
 - Total Num.= 0
 - Num. of Train= 0
 - Num. of Valid= 0
 - Num. of Test= 0

env: TEMP_DIR_COUNT=TCGA-LUAD/
 - Total Num.= 2067
 - Num. of Train= 0
 - Num. of Valid= 1267
 - Num. of Test= 800

env: TEMP_DIR_COUNT=TCGA-LUSC/
 - Total Num.= 0
 - Num. of Train= 0
 - Num. of Valid= 0
 - Num. of Test= 0



### 0.3a Convert the JPEG tiles into TFRecord format for 2 or 3 classes jobs

`
python build_image_data.py --directory='jpeg_label_directory' --output_directory='outputfolder' --train_shards=1024  --validation_shards=128 --num_threads=4
`

In [None]:
# `jpeg_label_dir` 폴더 생성 후, 앞 단계에서 생성된 이미지를 이동
!mkdir -p 'jpeg_label_dir'
!mv Solid_Tissue_Normal/ TCGA-LUAD/ TCGA-LUSC/ 'jpeg_label_dir'

mv: cannot stat 'TCGA-LUSC/': No such file or directory


The jpeg must not be directly inside 'jpeg_label_directory' but in subfolders

(for example as `jpeg_label_directory/TCGA-LUAD/...jpeg` and `jpeg_label_directory/TCGA-LUSC/...jpeg`)

In [None]:
!ls jpeg_label_dir/

Solid_Tissue_Normal  TCGA-LUAD


In [None]:
#!zip jpeg_label_dir.zip -r jpeg_label_dir/*

In [None]:
%%time
# TFRecord 이미지 저장 폴더 (train)
!mkdir -p 'tf_images/train'

# 학습을 위한 TFRecord 포맷의 이미지로 변환 단계를 실행
# (TFRecord_2or3_Classes/build_image_data.py)
!cd /content/gdrive/MyDrive/Colab_Notebooks/DeepPATH/DeepPATH_code/00_preprocessing/TFRecord_2or3_Classes/ \
    && python build_image_data.py \
            --directory='/content/jpeg_label_dir' \
            --output_directory='/content/tf_images/train' \
            --train_shards=1024  \
            --validation_shards=128 \
            --num_threads=4

Saving results to /content/tf_images/train
Determining list of input files and labels from /content/jpeg_label_dir.
unique_labels:
['Solid_Tissue_Normal', 'TCGA-LUAD']
Found 0 JPEG files across 2 labels inside /content/jpeg_label_dir.
DONE***********************************************************
train 0 0 0 1024 0
CPU times: user 33.1 ms, sys: 25.8 ms, total: 58.9 ms
Wall time: 3.55 s


Known bug: 

On many systems, it is better to always use --num_threads=1. Corrupted TFRecords can be generated when multi-threading is used.

In [None]:
%%time
# TFRecord 이미지 저장 폴더 (valid, test)
!mkdir -p 'tf_images/valid' 'tf_images/test'

# 나머지 데이터셋 (valid, test) 변환 단계를 실행
!cd /content/gdrive/MyDrive/Colab_Notebooks/DeepPATH/DeepPATH_code/00_preprocessing/TFRecord_2or3_Classes/ \
    && python build_TF_test.py \
            --directory='/content/gdrive/MyDrive/Colab_Notebooks/2021-1_SEP592_DeepPATH/jpeg_label_dir' \
            --output_directory='/content/gdrive/MyDrive/Colab_Notebooks/2021-1_SEP592_DeepPATH/test' \
            --num_threads=1 \
            --one_FT_per_Tile=False \
            --ImageSet_basename='test'

!cd /content/gdrive/MyDrive/Colab_Notebooks/DeepPATH/DeepPATH_code/00_preprocessing/TFRecord_2or3_Classes/ \
    && python build_TF_test.py \
            --directory='/content/gdrive/MyDrive/Colab_Notebooks/2021-1_SEP592_DeepPATH/jpeg_label_dir' \
            --output_directory='/content/gdrive/MyDrive/Colab_Notebooks/2021-1_SEP592_DeepPATH/tf_images/valid' \
            --num_threads=1 \
            build_TF_test \
            --one_FT_per_Tile=False \
            --ImageSet_basename='valid'

Saving results to /content/gdrive/MyDrive/Colab_Notebooks/2021-1_SEP592_DeepPATH/test
test
Determining list of input files and labels from /content/gdrive/MyDrive/Colab_Notebooks/2021-1_SEP592_DeepPATH/jpeg_label_dir.
Traceback (most recent call last):
  File "build_TF_test.py", line 487, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "build_TF_test.py", line 482, in main
    FLAGS.train_shards)
  File "build_TF_test.py", line 463, in _process_dataset
    filenames, texts, labels = _find_image_files(name, directory)
  File "build_TF_test.py", line 409, in _find_image_files
    for item in os.listdir(data_dir):
FileNotFoundError: [Errno 2] No such file or directory: '/content/gdrive/MyDrive/Colab_Notebooks/2021-1_SEP592_DeepPATH/jpeg_label_dir'
Saving results to /content/gdrive/MyDrive/Colab_Notebooks/2021-1_SEP592_DeepPATH/tf_images/valid
valid
Determining list of input fi

In [None]:
# 이미지 폴더 목록
TEMP_DIRS = ['/content/gdrive/MyDrive/Colab_Notebooks/2021-1_SEP592_DeepPATH/tf_images/train', '/content/gdrive/MyDrive/Colab_Notebooks/2021-1_SEP592_DeepPATH/tf_images/valid', '/content/gdrive/MyDrive/Colab_Notebooks/2021-1_SEP592_DeepPATH/tf_images/test']

for d in TEMP_DIRS:
  # 리눅스 명령을 이용해서 데이터 개수 집계
  %env TEMP_DIR_COUNT={d}
  nTotal = !ls -l $TEMP_DIR_COUNT | awk '{print $9}' | grep '^[tv]' | wc -l
  nTrain = !ls -l $TEMP_DIR_COUNT | awk '{print $9}' | grep '^train' | wc -l
  nValid = !ls -l $TEMP_DIR_COUNT | awk '{print $9}' | grep '^valid' | wc -l
  nTest  = !ls -l $TEMP_DIR_COUNT | awk '{print $9}' | grep '^test' | wc -l

  print(' - Total Num.= {}\n - Num. of Train= {}\n - Num. of Valid= {}\n - Num. of Test= {}\n'
          .format(nTotal[-1], nTrain[-1], nValid[-1], nTest[-1]))

env: TEMP_DIR_COUNT=/content/gdrive/MyDrive/Colab_Notebooks/2021-1_SEP592_DeepPATH/tf_images/train
 - Total Num.= 0
 - Num. of Train= 0
 - Num. of Valid= 0
 - Num. of Test= 0

env: TEMP_DIR_COUNT=/content/gdrive/MyDrive/Colab_Notebooks/2021-1_SEP592_DeepPATH/tf_images/valid
 - Total Num.= 0
 - Num. of Train= 0
 - Num. of Valid= 0
 - Num. of Test= 0

env: TEMP_DIR_COUNT=/content/gdrive/MyDrive/Colab_Notebooks/2021-1_SEP592_DeepPATH/tf_images/test
 - Total Num.= 0
 - Num. of Train= 0
 - Num. of Valid= 0
 - Num. of Test= 0



In [None]:

!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/6fa6c06e-2f7d-461a-877d-31185676e9bd/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/73b89e11-c565-4f3b-bc59-7a631f6c480d/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/765abc33-e39b-46cc-8426-78ba4033f3a2/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/78ed11e6-2c6f-4264-8cff-fbdd5e82653c/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/7a4bb271-3942-49f2-8b11-a328d33dad7d/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/7aaf3a37-7b87-4693-b916-1c0c09eab834/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/8022baa3-a434-4d90-9337-d83ef9eaabd4/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/818a3efb-f566-4d51-be28-79439f90adff/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/81d1260b-ff2f-45da-93af-e0a900a58e3f/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/880d4396-72e5-474a-af12-740ea6c102c1/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/88f06abf-481b-4738-9539-97ffc6baffc8/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/8afa0b40-74e0-4554-a978-4de597879677/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/8b5c1d33-75fd-4450-8b5a-bbbf1460c488/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/9249b379-05e3-404f-a3ff-77f2fb852622/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/92c09b68-5187-42e0-a418-9647a54e202e/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/95057780-94f2-455d-9e4e-d1fb249d300f/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/99f1c0e3-0112-4c86-9e08-aa6d49ca48f2/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/9f2a5ce9-7ec9-45e0-946e-1aaab4f40de2/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/a08bd1a7-2581-46b8-ba61-2d4a7755891d/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/a11a7657-453b-45f9-90f1-b66adcfdf746/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/a523353a-9a3f-47b2-8a81-bbccea8b2062/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/a686ba32-368e-4eec-b41c-058cefa271fe/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/a7756c44-2e66-4abe-b104-fbe5a9e77989/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/a8af4ed5-c6d6-47f8-9f1c-3d47e4e98e28/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/a8c5c7b4-8036-446d-a5b1-3365dadb65f2/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/af9d91eb-71ca-44b5-9838-5a5d6baa8ad1/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/b051aa5c-f9d9-4ef4-ba8f-d09df992b832/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/b1abafd0-252e-46b2-86ca-f1f9702e6a1b/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/b4dcd0a6-41dc-4d1d-b44e-79e2dac438c8/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/b7afa49a-6bcf-4d38-aefe-5b94782e30de/*svs"
!python DeepPATH/DeepPATH_code/00_preprocessing/0b_tileLoop_deepzoom4.py -s 299 -e 0 -j 32 -B 25 -o "/content/gdrive/MyDrive/preproc_images"  "/content/gdrive/MyDrive/Team5/b9d97500-35c4-4263-9a44-c704e93b61fb/*svs"

# Data augmentation

In [None]:
#!rm -rf /content/gdrive/MyDrive/KAIST/dataset_augmentation/

In [None]:
!cp -rf /content/gdrive/MyDrive/KAIST/dataset/ /content/gdrive/MyDrive/KAIST/dataset_augmentation/

^C


In [None]:
!find /content/gdrive/MyDrive/KAIST/dataset/train_set/lym -type f | wc -l

5006


In [None]:
!find /content/gdrive/MyDrive/KAIST/dataset/train_set/normal -type f | wc -l

18527


In [None]:
!find /content/gdrive/MyDrive/KAIST/dataset/test_set/lym -type f | wc -l

1826


In [None]:
!find /content/gdrive/MyDrive/KAIST/dataset/test_set/normal -type f | wc -l

7990


In [None]:
!find /content/gdrive/MyDrive/KAIST/dataset_augmentation/train_set/lym -type f | wc -l

5006


In [None]:
!find /content/gdrive/MyDrive/KAIST/dataset_augmentation/train_set/normal -type f | wc -l

18527


In [None]:
!find /content/gdrive/MyDrive/KAIST/dataset_augmentation/test_set/lym -type f | wc -l

1826


In [None]:
!find /content/gdrive/MyDrive/KAIST/dataset_augmentation/test_set/normal -type f | wc -l

6298


In [None]:
# 터미널 커맨드를 문자열로 반환해주는 함수 작성

import subprocess

def GetShellCmdStdOut(command):
  cmd = ['sh', '-c', command]
  fd_popen = subprocess.Popen(cmd, stdout=subprocess.PIPE).stdout 
  data = fd_popen.read().strip() 
  fd_popen.close()

  retval = data.decode('utf-8') 
  return retval

In [None]:
!mkdir -p /content/gdrive/MyDrive/KAIST/lym_augmentation/

In [None]:

from keras.preprocessing import image

augLymListRaw = GetShellCmdStdOut('ls /content/gdrive/MyDrive/KAIST/dataset_augmentation/train_set/lym')
augLymList = result.split('\n')
for augLym in augLymList:
  augTarget = '%s/%s' % ('/content/gdrive/MyDrive/KAIST/dataset_augmentation/train_set/lym', augLym)
  #print(augTarget)
  img = image.load_img(augTarget)
  x = image.img_to_array(img)
  x = x.reshape((1,) + x.shape)
  i = 0
  # 이 for는 무한으로 반복되기 때문에 우리가 원하는 반복횟수를 지정하여, 지정된 반복횟수가 되면 빠져나오도록 해야합니다.
  for batch in imageGenerator.flow(x, batch_size=1, save_to_dir='/content/gdrive/MyDrive/KAIST/lym_augmentation', save_prefix='aug', save_format='png'):
      i += 1
      if i > 2: 
          break