**요약**
- 미세조정된 Mask2Former를 사용하여 테스트 이미지에 대해 추론을 합니다.

<br>

**Inputs:**
- `dir_data`: 데이터가 있는 디렉토리
- `dir_save`: 예측 파일이 저장되는 디렉토리
- `dir_ckpt`: 학습된 모델을 저장할 디렉토리


<br>

**Outputs**:
- f`{dir_save}/Mask2Former.csv`: 미세조정된 Mask2Former 모델 체크포인트

In [1]:
dir_data = '../data'
dir_save = '../outputs/Mask2Former'
path_ckpt = '../ckpt/1696079822/last_ckpt.bin'

In [2]:
import sys
sys.path.append('../')

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "3"

import cv2
import numpy as np
import pandas as pd
import albumentations as A
from tqdm import tqdm

import torch
import torch.nn.functional as F

from transformers import Mask2FormerImageProcessor
from segformers.utils import rle_encode
from segformers.networks import Mask2Former



  from .autonotebook import tqdm as notebook_tqdm
Some weights of SegformerForSemanticSegmentation were not initialized from the model checkpoint at nvidia/segformer-b5-finetuned-cityscapes-1024-1024 and are newly initialized because the shapes did not match:
- decode_head.classifier.weight: found shape torch.Size([19, 768, 1, 1]) in the checkpoint and torch.Size([13, 768, 1, 1]) in the model instantiated
- decode_head.classifier.bias: found shape torch.Size([19]) in the checkpoint and torch.Size([13]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of Mask2FormerForUniversalSegmentation were not initialized from the model checkpoint at facebook/mask2former-swin-large-cityscapes-semantic and are newly initialized because the shapes did not match:
- class_predictor.bias: found shape torch.Size([20]) in the checkpoint and torch.Size([14]) in the model instantiated
- class_predictor.weigh

In [3]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

state_dict = torch.load(path_ckpt)
model = Mask2Former
model.load_state_dict(state_dict['model_state_dict'])
model.to(device);

image_processor = Mask2FormerImageProcessor.from_pretrained("facebook/mask2former-swin-large-cityscapes-semantic")
image_processor.do_resize = False

In [5]:
df = pd.read_csv(os.path.join(dir_data, 'test.csv'))

result = []
for idx in tqdm(range(len(df))):
    img_path = os.path.join(dir_data, df.loc[idx, 'img_path'])
    image = cv2.imread(img_path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image = cv2.resize(image, (1920, 1080))
    normalized_image = A.Normalize()(image=image)['image']

    images = torch.as_tensor(normalized_image, dtype=torch.float, device=device).permute(2, 0, 1).unsqueeze(0)
    with torch.no_grad():
        outputs = model(images)
        class_queries_logits = outputs.class_queries_logits
        masks_queries_logits = outputs.masks_queries_logits

        masks_queries_logits = torch.nn.functional.interpolate(
                    masks_queries_logits, size=(384, 384), mode="bilinear", align_corners=False
                )
        masks_classes = class_queries_logits.softmax(dim=-1)[..., :-1]
        masks_probs = masks_queries_logits.sigmoid()
        segmentation = torch.einsum("bqc, bqhw -> bchw", masks_classes, masks_probs)
        logits = crop_seg_logit = F.interpolate(
                    segmentation,
                    size=(1080, 1920),
                    mode="bilinear",
                    align_corners=False
                )

    masks = torch.argmax(logits, dim=1).cpu().numpy()[0]
    masks = cv2.resize(masks, (960, 540), interpolation=cv2.INTER_NEAREST)

    predictions = masks.astype(np.int32)
    for class_id in range(12):
        class_mask = (predictions == class_id).astype(np.int32)
        if np.sum(class_mask) > 0: # 마스크가 존재하는 경우 encode
            mask_rle = rle_encode(class_mask)
            result.append(mask_rle)
        else: # 마스크가 존재하지 않는 경우 -1
            result.append(-1)

100%|██████████| 1898/1898 [56:41<00:00,  1.79s/it]


In [6]:
submit = pd.read_csv('../data/sample_submission.csv')
submit['mask_rle'] = result
submit.to_csv(os.path.join(dir_save, 'Mask2Former.csv'), index=False)