# TRACER: Extreme Attention Guided Salient Object Tracing Network

This paper was accepted at AAAI 2022 SA poster session. [[pdf]](https://arxiv.org/abs/2112.07380)    

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tracer-extreme-attention-guided-salient/salient-object-detection-on-duts-te)](https://paperswithcode.com/sota/salient-object-detection-on-duts-te?p=tracer-extreme-attention-guided-salient)  
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tracer-extreme-attention-guided-salient/salient-object-detection-on-dut-omron)](https://paperswithcode.com/sota/salient-object-detection-on-dut-omron?p=tracer-extreme-attention-guided-salient)  
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tracer-extreme-attention-guided-salient/salient-object-detection-on-hku-is)](https://paperswithcode.com/sota/salient-object-detection-on-hku-is?p=tracer-extreme-attention-guided-salient)  
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tracer-extreme-attention-guided-salient/salient-object-detection-on-ecssd)](https://paperswithcode.com/sota/salient-object-detection-on-ecssd?p=tracer-extreme-attention-guided-salient)  
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tracer-extreme-attention-guided-salient/salient-object-detection-on-pascal-s)](https://paperswithcode.com/sota/salient-object-detection-on-pascal-s?p=tracer-extreme-attention-guided-salient)

## Configurations
--arch: EfficientNet backbone scale: TE0 to TE7.  
--frequency_radius: High-pass filter radius in the MEAM.  
--gamma: channel confidence ratio \gamma in the UAM.   
--denoise: Denoising ratio d in the OAM.  
--RFB_aggregated_channel: # of channels in receptive field blocks.  
--multi_gpu: Multi-GPU learning options.  
--img_size: Input image resolution.  
--save_map: Options saving predicted mask.  

<table>
<thead>
  <tr>
    <th>Model</th>
    <th>Img size</th>
  </tr>
</thead>
<tbody>
    <tr>
        <td>TRACER-Efficient-0 ~ 1</td>
        <td>320</td>
    </tr>
    <tr>
        <td>TRACER-Efficient-2</td>
        <td>352</td>
    </tr>
    <tr>
        <td>TRACER-Efficient-3</td>
        <td>384</td>
    </tr>
    <tr>
        <td>TRACER-Efficient-4</td>
        <td>448</td>
    </tr>
    <tr>
        <td>TRACER-Efficient-5</td>
        <td>512</td>
    </tr>
    <tr>
        <td>TRACER-Efficient-6</td>
        <td>576</td>
    </tr>
    <tr>
        <td>TRACER-Efficient-7</td>
        <td>640</td>
    </tr>
</tbody>
</table>

# Note that the demo should be run on the GPU session.

In [None]:
cd /content

/content


In [1]:
!git clone https://github.com/Karel911/TRACER.git

Cloning into 'TRACER'...
remote: Enumerating objects: 185, done.[K
remote: Counting objects: 100% (95/95), done.[K
remote: Compressing objects: 100% (55/55), done.[K
remote: Total 185 (delta 70), reused 40 (delta 40), pack-reused 90[K
Receiving objects: 100% (185/185), 9.89 MiB | 6.75 MiB/s, done.
Resolving deltas: 100% (82/82), done.


In [2]:
import os
import shutil
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from google.colab import files


os.makedirs('/content/TRACER/data/custom_dataset/', exist_ok=True)
filename = list(files.upload().keys())[0]
os.rename(filename,'/content/TRACER/data/custom_dataset/'+filename)

Saving don_1.png to don_1.png


In [3]:
cd TRACER

/content/TRACER


In [None]:
cd ..

/


In [4]:
!bash demo_run.sh

<---- Training Params ---->
Namespace(action='inference', exp_num=0, dataset='custom_dataset/', data_path='data/', arch='7', channels=[24, 40, 112, 320], RFB_aggregated_channel=[32, 64, 128], frequency_radius=16, denoise=0.93, gamma=0.1, img_size=640, batch_size=32, epochs=100, lr=5e-05, optimizer='Adam', weight_decay=0.0001, criterion='API', scheduler='Reduce', aug_ver=2, lr_factor=0.1, clipping=2, patience=5, model_path='results/', seed=42, save_map=True, multi_gpu=True, num_workers=4)
<----- Initializing inference mode ----->
Downloading: "https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b7-4652b6dd.pth" to /root/.cache/torch/hub/checkpoints/adv-efficientnet-b7-4652b6dd.pth
100% 255M/255M [00:00<00:00, 298MB/s]
Loaded pretrained weights for efficientnet-b7
Downloading: "https://github.com/Karel911/TRACER/releases/download/v1.0/TRACER-Efficient-7.pth" to /root/.cache/torch/hub/checkpoints/TRACER-Efficient-7.pth
100% 255M/255M [00:01<00:00, 191M

In [5]:
images = []
pure_file_name = os.path.splitext(os.path.basename(filename))[0]
original_img = mpimg.imread('/content/TRACER/data/custom_dataset/'+filename)
result_alpha = mpimg.imread('/content/TRACER/mask/custom_dataset/'+pure_file_name+'.png',0)
result_color = mpimg.imread('/content/TRACER/object/custom_dataset/'+pure_file_name+'.png')

In [6]:
cd /content

/content


In [7]:
from google.colab import files

uploaded = files.upload()

Saving alignment.py to alignment.py


In [8]:
import cv2
from google.colab.patches import cv2_imshow

In [9]:
!pip install ocrd-fork-pylsd

Collecting ocrd-fork-pylsd
  Downloading ocrd_fork_pylsd-0.0.8-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (56 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/56.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.5/56.5 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: ocrd-fork-pylsd
Successfully installed ocrd-fork-pylsd-0.0.8


In [10]:
from alignment import translate,rotate,resize,order_points,four_point_transform,DocScanner

In [11]:
ori = cv2.imread('/content/TRACER/data/custom_dataset/'+filename, cv2.IMREAD_UNCHANGED)
image=cv2.imread('/content/TRACER/mask/custom_dataset/'+pure_file_name+'.png', cv2.IMREAD_UNCHANGED)
docscanner=DocScanner()
screenCnt, edges = docscanner.get_contour(image, '/content')
warped = four_point_transform(ori, screenCnt )
(h, w, d) = warped.shape
print("width={}, height={}, depth={}".format(w, h, d))
if h < 600 or w < 450:
  warped = cv2.cvtColor(ori, cv2.COLOR_BGR2RGB)
else:
  warped = cv2.cvtColor(warped, cv2.COLOR_BGR2RGB)
cv2.imwrite('/content/result.png', warped)

width=567, height=683, depth=3


True

In [12]:
!pip install paddleocr
!pip install paddlepaddle
!pip uninstall Pillow
!pip install Pillow==9.5.0

Collecting paddleocr
  Downloading paddleocr-2.7.0.3-py3-none-any.whl (465 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/465.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━[0m [32m307.2/465.7 kB[0m [31m9.0 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m465.7/465.7 kB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
Collecting pyclipper (from paddleocr)
  Downloading pyclipper-1.3.0.post5-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (908 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m908.3/908.3 kB[0m [31m59.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting lmdb (from paddleocr)
  Downloading lmdb-1.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (299 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m299.2/299.2 kB[0m [31m34.0 MB/s[0m eta [36m0:00:00[0m
Collecting visualdl (fro

Collecting paddlepaddle
  Downloading paddlepaddle-2.6.0-cp310-cp310-manylinux1_x86_64.whl (125.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m125.7/125.7 MB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting httpx (from paddlepaddle)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m11.1 MB/s[0m eta [36m0:00:00[0m
Collecting astor (from paddlepaddle)
  Downloading astor-0.8.1-py2.py3-none-any.whl (27 kB)
Collecting httpcore==1.* (from httpx->paddlepaddle)
  Downloading httpcore-1.0.4-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.8/77.8 kB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx->paddlepaddle)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
In

In [13]:
from paddleocr import PaddleOCR, draw_ocr
# Download and extract the pre-trained detection model
!wget https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar
!tar -xf ch_ppocr_mobile_v2.0_det_infer.tar
ocr = PaddleOCR()

--2024-03-14 16:45:58--  https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar
Resolving paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)... 103.235.46.61, 2409:8c04:1001:1203:0:ff:b0bb:4f27
Connecting to paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)|103.235.46.61|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4003840 (3.8M) [application/x-tar]
Saving to: ‘en_PP-OCRv3_det_infer.tar’


2024-03-14 16:46:00 (4.40 MB/s) - ‘en_PP-OCRv3_det_infer.tar’ saved [4003840/4003840]

tar: ch_ppocr_mobile_v2.0_det_infer.tar: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now
download https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_det_infer.tar to /root/.paddleocr/whl/det/ch/ch_PP-OCRv4_det_infer/ch_PP-OCRv4_det_infer.tar


100%|██████████| 4.89M/4.89M [00:05<00:00, 903kiB/s] 


download https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_rec_infer.tar to /root/.paddleocr/whl/rec/ch/ch_PP-OCRv4_rec_infer/ch_PP-OCRv4_rec_infer.tar


100%|██████████| 11.0M/11.0M [00:00<00:00, 11.1MiB/s]


download https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar to /root/.paddleocr/whl/cls/ch_ppocr_mobile_v2.0_cls_infer/ch_ppocr_mobile_v2.0_cls_infer.tar


100%|██████████| 2.19M/2.19M [00:01<00:00, 1.86MiB/s]

[2024/03/14 16:46:09] ppocr DEBUG: Namespace(help='==SUPPRESS==', use_gpu=False, use_xpu=False, use_npu=False, ir_optim=True, use_tensorrt=False, min_subgraph_size=15, precision='fp32', gpu_mem=500, gpu_id=0, image_dir=None, page_num=0, det_algorithm='DB', det_model_dir='/root/.paddleocr/whl/det/ch/ch_PP-OCRv4_det_infer', det_limit_side_len=960, det_limit_type='max', det_box_type='quad', det_db_thresh=0.3, det_db_box_thresh=0.6, det_db_unclip_ratio=1.5, max_batch_size=10, use_dilation=False, det_db_score_mode='fast', det_east_score_thresh=0.8, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_sast_score_thresh=0.5, det_sast_nms_thresh=0.2, det_pse_thresh=0, det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_scale=1, scales=[8, 16, 32], alpha=1.0, beta=1.0, fourier_degree=5, rec_algorithm='SVTR_LCNet', rec_model_dir='/root/.paddleocr/whl/rec/ch/ch_PP-OCRv4_rec_infer', rec_image_inverse=True, rec_image_shape='3, 48, 320', rec_batch_num=6, max_text_length=25, rec_char_dict_path='




In [14]:
import matplotlib.pyplot as plt
import cv2
import numpy as np

# Perform text detection on an image
img_path = 'result.png'
result = ocr.ocr(img_path, cls=False)
# Extract bounding boxes from the result
boxes = [elements[0] for elements in result[0]]

# Visualize the detection results
image = cv2.imread(img_path)
for box in boxes:
    box = np.array(box, dtype=np.int32)
    box = box.reshape((-1, 1, 2))
    image = cv2.polylines(image, [box], isClosed=True, color=(255, 0, 0), thickness=2)
# for box in boxes:
#     print(box)
# plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
# plt.axis('off')
# plt.show()

[2024/03/14 16:46:13] ppocr DEBUG: dt_boxes num : 32, elapsed : 0.3002760410308838
[2024/03/14 16:46:15] ppocr DEBUG: rec_res num  : 32, elapsed : 1.792510747909546


In [15]:
from PIL import Image,ImageFilter
import numpy as np
from io import BytesIO
import requests
from google.colab import files

In [16]:
img = Image.open('result.png')
cropped_images_list = []
for i, box in enumerate(boxes):
        # Cắt ảnh sử dụng tọa độ bounding box
        cropped_img = img.crop((box[0][0]-5, box[0][1]-5, box[2][0]+5, box[2][1]+5))
        # Thêm ảnh đã cắt vào danh sách
        cropped_images_list.append(cropped_img)

In [17]:
! pip install vietocr

Collecting vietocr
  Downloading vietocr-0.3.12-py3-none-any.whl (34 kB)
Collecting einops==0.2.0 (from vietocr)
  Downloading einops-0.2.0-py2.py3-none-any.whl (18 kB)
Collecting gdown==4.4.0 (from vietocr)
  Downloading gdown-4.4.0.tar.gz (14 kB)
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting prefetch-generator==1.0.1 (from vietocr)
  Downloading prefetch_generator-1.0.1.tar.gz (3.4 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting scikit-image>=0.21.0 (from vietocr)
  Downloading scikit_image-0.22.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (14.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m14.7/14.7 MB[0m [31m71.7 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: gdown, prefetch-generator
  Building wheel for gdown (pyproject.toml) ... [?25l[?25hdone
  Created wheel fo

In [18]:
from vietocr.tool.predictor import Predictor
from vietocr.tool.config import Cfg
from vietocr.model.trainer import Trainer

In [19]:
config = Cfg.load_config_from_name('vgg_transformer')

In [20]:
#config['weights'] = 'transformerocr.pth'
config['cnn']['pretrained']=False
config['device'] = 'cuda:0'

In [21]:
detector = Predictor(config)

18533it [00:00, 24641.58it/s]


In [22]:
texts = []
for j in cropped_images_list:
  s = detector.predict(j, return_prob=False)
  texts.append(s)

In [23]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [25]:
cd /content/drive/MyDrive/ColabNotebooks/kie_colab/doc2graph-master

/content/drive/MyDrive/ColabNotebooks/kie_colab/doc2graph-master


In [26]:
%env PYTHONPATH="$/env/python:/content/drive/MyDrive/ColabNotebooks/kie_colab/doc2graph-master
!echo $PYTHONPATH

env: PYTHONPATH="$/env/python:/content/drive/MyDrive/ColabNotebooks/kie_colab/doc2graph-master
"$/env/python:/content/drive/MyDrive/ColabNotebooks/kie_colab/doc2graph-master


In [27]:
cd /content/drive/MyDrive/ColabNotebooks/kie_colab/doc2graph-master

/content/drive/MyDrive/ColabNotebooks/kie_colab/doc2graph-master


In [28]:
!pip install attrdictionary
!pip install attrdict3
!pip install python-dotenv

Collecting attrdictionary
  Downloading attrdictionary-1.0.0-py2.py3-none-any.whl (8.0 kB)
Installing collected packages: attrdictionary
Successfully installed attrdictionary-1.0.0
Collecting attrdict3
  Downloading attrdict3-2.0.2-py2.py3-none-any.whl (10 kB)
Installing collected packages: attrdict3
Successfully installed attrdict3-2.0.2
Collecting python-dotenv
  Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.0.1


In [29]:
from src.paths import FUNSD_TEST
from PIL import Image, ImageDraw
import cv2

# Đọc ảnh
image = cv2.imread('/content/result.png')

# Chuyển ảnh thành 1 channel (ảnh xám)
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Lưu ảnh 1 channel
cv2.imwrite('/content/result1.png', gray_image)
funsd_test_images = FUNSD_TEST / 'images'
image_name = '/content/result1.png' #! change this to see different outputs from FUNSD, or pass your own image!
# 82491256.png
image_path = str(funsd_test_images / image_name)
image = Image.open(image_path).convert('RGB')

In [30]:
!pip install  dgl==1.1.3 -f https://data.dgl.ai/wheels/cu117/repo.html
!pip install  dglgo -f https://data.dgl.ai/wheels-test/repo.html
!sudo apt-get update
!sudo apt install nvidia-cuda-toolkit

Looking in links: https://data.dgl.ai/wheels/cu117/repo.html
Collecting dgl==1.1.3
  Downloading https://data.dgl.ai/wheels/cu117/dgl-1.1.3%2Bcu117-cp310-cp310-manylinux1_x86_64.whl (94.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m94.2/94.2 MB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: dgl
Successfully installed dgl-1.1.3+cu117
Looking in links: https://data.dgl.ai/wheels-test/repo.html
Collecting dglgo
  Downloading dglgo-0.0.2-py3-none-any.whl (63 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m63.5/63.5 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
Collecting isort>=5.10.1 (from dglgo)
  Downloading isort-5.13.2-py3-none-any.whl (92 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.3/92.3 kB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting autopep8>=1.6.0 (from dglgo)
  Downloading autopep8-2.0.4-py2.py3-none-any.whl (45 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [31]:
!pip install easyocr
!pip install pytesseract

Collecting easyocr
  Downloading easyocr-1.7.1-py3-none-any.whl (2.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.9/2.9 MB[0m [31m38.6 MB/s[0m eta [36m0:00:00[0m
Collecting python-bidi (from easyocr)
  Downloading python_bidi-0.4.2-py2.py3-none-any.whl (30 kB)
Collecting ninja (from easyocr)
  Downloading ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl (307 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m307.2/307.2 kB[0m [31m37.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: ninja, python-bidi, easyocr
Successfully installed easyocr-1.7.1 ninja-1.11.1.1 python-bidi-0.4.2
Collecting pytesseract
  Downloading pytesseract-0.3.10-py3-none-any.whl (14 kB)
Installing collected packages: pytesseract
Successfully installed pytesseract-0.3.10


In [32]:
boxes_modified = []

# Lặp qua từng hình hộp trong danh sách ban đầu
for box in boxes:
    # Lấy tọa độ x và y của tất cả các điểm trong hình hộp
    x_coordinates = [point[0] for point in box]
    y_coordinates = [point[1] for point in box]

    # Tìm tọa độ x và y tối đa và tối thiểu trong tất cả các điểm
    x_min = min(x_coordinates)
    y_min = min(y_coordinates)
    x_max = max(x_coordinates)
    y_max = max(y_coordinates)

    # Thêm hình hộp đã chuyển đổi vào danh sách kết quả
    boxes_modified.append([int(x_min), int(y_min), int(x_max), int(y_max)])

In [33]:
from src.data.graph_builder import GraphBuilder
gb = GraphBuilder()
graphs, _, _, features = gb.get_graph1([image_path],texts,boxes_modified, 'CUSTOM')
graph = graphs[0] # we have only one for this tutorial!
graph

DGL backend not selected or invalid.  Assuming PyTorch for now.


Setting the default backend to "pytorch". You can change it in the ~/.dgl/config.json file or export the DGLBACKEND environment variable.  Valid options are: pytorch, mxnet, tensorflow (all lowercase)


Graph(num_nodes=32, num_edges=992,
      ndata_schemes={}
      edata_schemes={})

In [34]:
!pip install segmentation-models-pytorch

Collecting segmentation-models-pytorch
  Downloading segmentation_models_pytorch-0.3.3-py3-none-any.whl (106 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/106.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m106.7/106.7 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
Collecting pretrainedmodels==0.7.4 (from segmentation-models-pytorch)
  Downloading pretrainedmodels-0.7.4.tar.gz (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.8/58.8 kB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting efficientnet-pytorch==0.7.1 (from segmentation-models-pytorch)
  Downloading efficientnet_pytorch-0.7.1.tar.gz (21 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting timm==0.9.2 (from segmentation-models-pytorch)
  Downloading timm-0.9.2-py3-none-any.whl (2.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0

In [35]:
!python -m spacy download en_core_web_lg

Collecting en-core-web-lg==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.1/en_core_web_lg-3.7.1-py3-none-any.whl (587.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m587.7/587.7 MB[0m [31m1.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: en-core-web-lg
Successfully installed en-core-web-lg-3.7.1
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_lg')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


In [36]:
from src.data.feature_builder import FeatureBuilder

device = 'cuda:0' # change this to 'cpu' in case you cannot use hardware acceleration
fb = FeatureBuilder(d=device)
chunks, _ = fb.add_features(graphs, features) # chunks is used by the model to merge different embeddings together!
graph

adding features: 100%|██████████| 1/1 [00:00<00:00,  2.25it/s]


Graph(num_nodes=32, num_edges=992,
      ndata_schemes={'geom': Scheme(shape=(4,), dtype=torch.float32), 'feat': Scheme(shape=(1752,), dtype=torch.float32), 'norm': Scheme(shape=(1,), dtype=torch.float32)}
      edata_schemes={'feat': Scheme(shape=(6,), dtype=torch.float32), 'weights': Scheme(shape=(), dtype=torch.float32)})

In [37]:
import torch
from src.models.graphs import SetModel
from src.paths import CHECKPOINTS

sm = SetModel(name='e2e', device=device)
model = sm.get_model(4, 2, chunks, False) # 4 and 2 refers to nodes and edge classes, check paper for details!
model.load_state_dict(torch.load(CHECKPOINTS / 'e2e-20240305-1238.pt')) # load pretrained model
model.eval() # set the model for inference only


### MODEL ###
-> Using E2E
-> Total params: 2700314
-> Device: True



E2E(
  (projector): InputProjector(
    (modalities): Sequential(
      (0): Sequential(
        (0): Linear(in_features=4, out_features=300, bias=True)
        (1): LayerNorm((300,), eps=1e-05, elementwise_affine=True)
        (2): ReLU()
      )
      (1): Sequential(
        (0): Linear(in_features=300, out_features=300, bias=True)
        (1): LayerNorm((300,), eps=1e-05, elementwise_affine=True)
        (2): ReLU()
      )
      (2): Sequential(
        (0): Linear(in_features=1448, out_features=300, bias=True)
        (1): LayerNorm((300,), eps=1e-05, elementwise_affine=True)
        (2): ReLU()
      )
    )
  )
  (message_passing): GcnSAGELayer(
    (linear): Linear(in_features=1800, out_features=900, bias=True)
    (lynorm): LayerNorm((900,), eps=1e-05, elementwise_affine=True)
  )
  (edge_pred): MLPPredictor_E2E(
    (W1): Linear(in_features=1814, out_features=300, bias=True)
    (norm): LayerNorm((300,), eps=1e-05, elementwise_affine=True)
    (W2): Linear(in_features=300, o

In [38]:
def draw_results(img, boxs, links):
    draw = ImageDraw.Draw(img)

    for box in boxs:
        draw.rectangle(box, outline='blue', width=3)

    if links:
        for idx in range(len(links['src'])):
            key_center = center(boxs[links['src'][idx]])
            value_center = center(boxs[links['dst'][idx]])
            draw.line((key_center, value_center), fill='violet', width=3)

In [40]:
from torch.nn import functional as F
from src.data.preprocessing import center
import json
with torch.no_grad():
    n, e = model(graph.to(device), graph.ndata['feat'].to(device))
    _, epreds = torch.max(F.softmax(e, dim=1), dim=1)
    _, npreds = torch.max(F.softmax(n, dim=1), dim=1)

    # save results
    links = (epreds == 1).nonzero(as_tuple=True)[0].tolist()
    u, v = graph.edges()
    entities = features['boxs'][0]
    contents = features['texts'][0]

    inference = Image.open(image_path).convert('RGB')
    draw_results(inference, entities, [])
    draw = ImageDraw.Draw(inference)
    result=[]

    for i, idx in enumerate(links):
        pair = {'key': {'text': contents[u[idx]], 'box': entities[u[idx]]},
                        'value': {'text': contents[v[idx]], 'box': entities[v[idx]]}}
        result.append(pair)

        key_center = center(entities[u[idx]])
        value_center = center(entities[v[idx]])
        draw.line((key_center, value_center), fill='violet', width=3)
        draw.ellipse([(key_center[0]-4,key_center[1]-4), (key_center[0]+4,key_center[1]+4)], fill = 'green', outline='black')
        draw.ellipse([(value_center[0]-4,value_center[1]-4), (value_center[0]+4,value_center[1]+4)], fill = 'red', outline='black')
    inference.save('/content/drive/MyDrive/ColabNotebooks/kie_colab/doc2graph-master/inference/result.png')
    with open('/content/drive/MyDrive/ColabNotebooks/kie_colab/doc2graph-master/inference/result.json', "w", encoding="utf-8") as outfile:
        json.dump(result, outfile, ensure_ascii=False)

In [41]:
from output import file_processing
file_processing('/content/drive/MyDrive/ColabNotebooks/kie_colab/doc2graph-master/inference/result.json')

Chẩn đoán:
Viêm kết mạc, không xác định [H10.9] (MP: VKM/2M: IOL
% Khô mắt); Chảy nước mắt [H04.2]; Sự có mặt của thâu kính
Thuốc điều trị:
1/Levofloxacin 5mg/ml (Melevo)
01 Lọ
cách dùng: Nhỏ mắt phải ngày 5 lần, mỗi lần 1 giọt.
2/Systane Ultra Drop 5ml
01 Lọ
cách dùng: Nhỏ mắt phải ngày 5 lần, mỗi lần 1 giọt
3/Vitamin A-D
20 Viên
ngày uống 02 viên; sáng: 01; chiều: 01;
Cộng khoản: 3
Lời dăn: Dùng thuốc theo toa
Ngày 12 tháng 12 năm 2017
