**요약**
- 미세조정된 SegFormer와 배경 모델링으로부터 `train_target_image`에 대한 수도 레이블을 생성합니다.

<br>

**Inputs:**
- `dir_data`: 데이터가 있는 디렉토리
- `dir_save`: `train_target_image`에 대한 수도 레이블링을 저장할 디렉토리
- `path_ckpt`: 수도 레이블링을 생성한 미세조정된 SegFormer 모델의 체크포인트
- `outside_fname`: `train_target_image`의 각 사진에 대한 배경을 나타내는 이진 마스크를 담은 딕셔너리가 저장된 피클 파일 경로

<br>

**Outputs**:
- f`{dir_save}/TRAIN_TARGET_0000.png`: `train_target_image`에 대한 수도 레이블
- f`{dir_data}/full_pl.csv`: `train_source`, `train_target`, `val_source`의 메타 데이터를 담은 csv 파일

In [1]:
dir_data = '../data'
dir_save = '../data/train_target_pl'
path_ckpt = '../ckpt/1695288341/last_ckpt.bin'

outside_fname = '/home/eunwoo/experiment/PSSC/Oneformer(instance)/result/background_dongjin/background_basesum.pickle'

In [2]:
import sys
sys.path.append('../')

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "2"

import cv2
import pickle
import numpy as np
import pandas as pd
import albumentations as A
from tqdm import tqdm

import torch
import torch.nn.functional as F

from segformers.utils import print_env
from segformers.networks import SegFormer
from segformers.inference import _slide_inference


  from .autonotebook import tqdm as notebook_tqdm
Some weights of SegformerForSemanticSegmentation were not initialized from the model checkpoint at nvidia/segformer-b5-finetuned-cityscapes-1024-1024 and are newly initialized because the shapes did not match:
- decode_head.classifier.weight: found shape torch.Size([19, 768, 1, 1]) in the checkpoint and torch.Size([13, 768, 1, 1]) in the model instantiated
- decode_head.classifier.bias: found shape torch.Size([19]) in the checkpoint and torch.Size([13]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of Mask2FormerForUniversalSegmentation were not initialized from the model checkpoint at facebook/mask2former-swin-large-cityscapes-semantic and are newly initialized because the shapes did not match:
- class_predictor.bias: found shape torch.Size([20]) in the checkpoint and torch.Size([14]) in the model instantiated
- class_predictor.weigh

In [3]:
print_env()

DATE : 2023-10-04
Pyton Version : 3.8.17
PyTorch Version : 1.13.0
OS : Linux 5.4.0-155-generic
CPU spec : x86_64
RAM spec : 503.73 GB
Device 0:
Name: NVIDIA A100-SXM4-40GB
Total Memory: 40536.1875 MB
Driver Version: 470.199.02
Device 1:
Name: NVIDIA A100-SXM4-40GB
Total Memory: 40536.1875 MB
Driver Version: 470.199.02
Device 2:
Name: NVIDIA A100-SXM4-40GB
Total Memory: 40536.1875 MB
Driver Version: 470.199.02
Device 3:
Name: NVIDIA DGX Display
Total Memory: 3911.875 MB
Driver Version: 470.199.02
Device 4:
Name: NVIDIA A100-SXM4-40GB
Total Memory: 40536.1875 MB
Driver Version: 470.199.02


In [3]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

state_dict = torch.load(path_ckpt)
model = SegFormer
model.load_state_dict(state_dict['model_state_dict'])
model.to(device);


In [7]:
os.makedirs(dir_save, exist_ok=True)

with open(outside_fname, 'rb') as f:
    outside_dict = pickle.load(f)
    
for k, v in outside_dict.items():
    outside_dict[k] = v.astype(np.uint8)


In [8]:
df = pd.read_csv(os.path.join(dir_data, 'train_target.csv'))

model.eval()
for idx in tqdm(range(len(df))):
    img_path = os.path.join(dir_data, df.loc[idx, 'img_path'])
    original_image = cv2.imread(img_path)
    original_image = cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB)

    # Stage 1
    image = cv2.resize(original_image, (960, 540))
    image = A.Normalize()(image=image)['image']
    images = torch.as_tensor(image, dtype=torch.float, device=device).permute(2, 0, 1).unsqueeze(0)
    preds, count_mat = _slide_inference(images, model, num_classes=13, stride=(50, 50), crop_size=(512, 512))

    # Stage 2
    image = cv2.resize(original_image, (1200, 675))
    image = A.Normalize()(image=image)['image']
    images = torch.as_tensor(image, dtype=torch.float, device=device).permute(2, 0, 1).unsqueeze(0)
    cur_preds, cur_count_mat = _slide_inference(images, model, num_classes=13, stride=(50, 50), crop_size=(512, 512))
    preds += F.interpolate(cur_preds, size=(540, 960), mode="bilinear", align_corners=False)
    count_mat += F.interpolate(cur_count_mat, size=(540, 960), mode="bilinear", align_corners=False)

    # Stage 3 
    image = cv2.resize(original_image, (1440, 810))
    image = A.Normalize()(image=image)['image']
    images = torch.as_tensor(image, dtype=torch.float, device=device).permute(2, 0, 1).unsqueeze(0)
    cur_preds, cur_count_mat = _slide_inference(images, model, num_classes=13, stride=(50, 50), crop_size=(512, 512))
    preds += F.interpolate(cur_preds, size=(540, 960), mode="bilinear", align_corners=False)
    count_mat += F.interpolate(cur_count_mat, size=(540, 960), mode="bilinear", align_corners=False)

    logits = preds / count_mat
    logits = F.interpolate(logits, size=(1080, 1920), mode="bilinear", align_corners=False)[0]
    entropies = torch.mean(-torch.log_softmax(logits, dim=0), dim=0).cpu()
    mask = torch.argmax(logits, dim=0).cpu().numpy()

    outside = outside_dict[idx]
    outside = cv2.resize(outside, (1920, 1080), interpolation=cv2.INTER_NEAREST)
    mask[np.where(outside == 1)] = 12

    mask[mask == 12] = 255
    cv2.imwrite(os.path.join(dir_save, f"TRAIN_TARGET_{str(idx).zfill(4)}.png"), mask)


100%|██████████| 2923/2923 [8:46:01<00:00, 10.80s/it]  


In [4]:
train_target_df = pd.read_csv(os.path.join(dir_data, 'train_target.csv'))
train_target_df['gt_path'] = train_target_df['id'].apply(lambda x: f'./train_target_pl/{x}.png')

train_df = pd.read_csv(os.path.join(dir_data, 'train_source.csv'))
valid_df = pd.read_csv(os.path.join(dir_data, 'val_source.csv'))
data = pd.concat([train_df, valid_df, train_target_df], axis=0, ignore_index=True)

data.to_csv(os.path.join(dir_data, 'full_pl.csv'), index=False)