#### Подготовка данных
Решил посмотреть как выглядят картинки на которых мы ошибаемся больше всего и увидел, что `train` содержит изображения со сдвинутой разметкой. 

Для их обнаружения обучается такая же модель, но на всех картинках, включая *испорченные*. После этого перебираются все изображения и делается на них предсказание. Если MSE больше 100 (просто константа), считаем, что на этой картинке неадекватная разметка.
 

In [None]:
import os
import pickle
import sys

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.models as models
import tqdm.notebook as tqdm

from torch.nn import functional as fnn
from torch.utils import data
from torchvision import transforms
import matplotlib.pyplot as plt
import cv2
import pandas as pd

np.random.seed(1234)
torch.manual_seed(1234)

torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
use_gpu = True
data_size = None

In [None]:
data_dir = "C:/_Data/full/"
learning_rate = 1e-3
batch_size = 192
epochs = 30
prj_name = "test"
# data_size = 40000
# data_size = 5000
# data_size = 300

In [None]:
# метрика, которая учитывает масштабирование изображений
class MseW(torch.nn.Module):
    def __init__(self):
        super(MseW,self).__init__()

    def setWeight(self, weight):
        self.w = weight
        
    def forward(self, outputs, labels):
        mse = torch.mul(outputs - labels,outputs - labels).mean(axis=1)        
        mse=torch.mul(mse,self.w).mean(axis=0)
        mse=mse.mean(axis=0)
        mse=2*mse
        return mse

In [None]:
def train(model, loader, loss_fn, optimizer, device):
    model.train()
    train_loss = []
    for batch in tqdm.tqdm(loader, total=len(loader), desc="training..."):        
        images = batch["image"].to(device)  # B x 3 x CROP_SIZE x CROP_SIZE
        landmarks = batch["landmarks"]  # B x (2 * NUM_PTS)
               
        pred_landmarks = model(images).cpu()  # B x (2 * NUM_PTS)
        loss = loss_fn(pred_landmarks, landmarks)
        train_loss.append(loss.item())

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    return np.mean(train_loss)

def validate(model, loader, loss_fn, device):
    model.eval()
    val_loss = []
    for batch in tqdm.tqdm(loader, total=len(loader), desc="validation..."):
        images = batch["image"].to(device)
        landmarks = batch["landmarks"]
        coef = batch["scale_coef"].numpy()
        coef = 1/coef
        coef = coef*coef
        coef =torch.tensor(coef)
        loss_fn.setWeight(coef)
        with torch.no_grad():
            pred_landmarks = model(images).cpu()
        loss = loss_fn(pred_landmarks, landmarks)
        val_loss.append(loss.item())
    return np.mean(val_loss)


def predict(model, loader, device):
    model.eval()
    predictions = np.zeros((len(loader.dataset), NUM_PTS, 2))
    for i, batch in enumerate(tqdm.tqdm(loader, total=len(loader), desc="test prediction...")):
        images = batch["image"].to(device)

        with torch.no_grad():
            pred_landmarks = model(images).cpu()
        pred_landmarks = pred_landmarks.numpy().reshape((len(pred_landmarks), NUM_PTS, 2))  # B x NUM_PTS x 2

        fs = batch["scale_coef"].numpy()  # B
        margins_x = batch["crop_margin_x"].numpy()  # B
        margins_y = batch["crop_margin_y"].numpy()  # B
        if "dx" in batch.keys():
            dx = batch["dx"].numpy()  # B
            dy = batch["dy"].numpy()  # B
            prediction = restore_landmarks_batch_ex(pred_landmarks, fs, margins_x, margins_y,dx,dy)  # B x NUM_PTS x 2
        else:
            prediction = restore_landmarks_batch(pred_landmarks, fs, margins_x, margins_y)  # B x NUM_PTS x 2
        
        predictions[i * loader.batch_size: (i + 1) * loader.batch_size] = prediction

    return predictions


In [None]:
TRAIN_SIZE = 0.8
NUM_PTS = 971
CROP_SIZE = 128
SUBMISSION_HEADER = "file_name,Point_M0_X,Point_M0_Y,Point_M1_X,Point_M1_Y,Point_M2_X,Point_M2_Y,Point_M3_X,Point_M3_Y,Point_M4_X,Point_M4_Y,Point_M5_X,Point_M5_Y,Point_M6_X,Point_M6_Y,Point_M7_X,Point_M7_Y,Point_M8_X,Point_M8_Y,Point_M9_X,Point_M9_Y,Point_M10_X,Point_M10_Y,Point_M11_X,Point_M11_Y,Point_M12_X,Point_M12_Y,Point_M13_X,Point_M13_Y,Point_M14_X,Point_M14_Y,Point_M15_X,Point_M15_Y,Point_M16_X,Point_M16_Y,Point_M17_X,Point_M17_Y,Point_M18_X,Point_M18_Y,Point_M19_X,Point_M19_Y,Point_M20_X,Point_M20_Y,Point_M21_X,Point_M21_Y,Point_M22_X,Point_M22_Y,Point_M23_X,Point_M23_Y,Point_M24_X,Point_M24_Y,Point_M25_X,Point_M25_Y,Point_M26_X,Point_M26_Y,Point_M27_X,Point_M27_Y,Point_M28_X,Point_M28_Y,Point_M29_X,Point_M29_Y\n"
   
class CropRandom(object):
    def __init__(self, size=CROP_SIZE, elem_name='image'):
        self.size = torch.tensor(size, dtype=torch.float)
        self.elem_name = elem_name

    def __call__(self, sample):
        if 'landmarks' in sample:
            img = sample[self.elem_name] #.copy()
            landmarks = sample['landmarks'].reshape(-1, 2)
            bound=landmarks[:,0].min(),landmarks[:,1].min(),landmarks[:,0].max(),landmarks[:,1].max()                        
            h,w,_ = img.shape
            min_sq = max(bound[3]-bound[1],bound[2]-bound[0])
            max_sq = min(w,h)            
            if min_sq+1<max_sq-1:
                sq = np.random.randint(min_sq+1,max_sq-1)
            else:
                sq = max_sq-1                
            
            min_dx = max(bound[2]-sq,0)
            max_dx = min(w-sq,bound[0])
            if min_dx<max_dx:
                dx = np.random.randint(min_dx,max_dx)
            else:
                dx = int(min_dx)
            
            min_dy = max(bound[3]-sq,0)
            max_dy = min(h-sq,bound[1])
            if min_dy<max_dy:
                dy = np.random.randint(min_dy,max_dy)
            else:                
                dy = int(min_dy)
                
            landmarks -= torch.tensor((dx, dy), dtype=landmarks.dtype)[None, :]                        
            sample['landmarks'] = landmarks.reshape(-1)
            sample[self.elem_name] = img[dy:dy+sq, dx:dx+sq]
            sample['dx'] = torch.tensor(dx,dtype=torch.short)
            sample['dy'] = torch.tensor(dy,dtype=torch.short)
        else:
            raise RuntimeError(f"stop")
            sample['dx'] = torch.tensor(0)
            sample['dy'] = torch.tensor(0)
        return sample    
    

class ScaleMinSideToSize(object):
    def __init__(self, size=(CROP_SIZE, CROP_SIZE), elem_name='image'):
        self.size = torch.tensor(size, dtype=torch.float)
        self.elem_name = elem_name

    def __call__(self, sample):
        h, w, _ = sample[self.elem_name].shape        
        if h > w:
            f = self.size[0] / w
        else:
            f = self.size[1] / h

        sample[self.elem_name] = cv2.resize(sample[self.elem_name], None, fx=f, fy=f, interpolation=cv2.INTER_AREA)
        sample["scale_coef"] = f

        if 'landmarks' in sample:
            landmarks = sample['landmarks'].reshape(-1, 2).float()
            landmarks = landmarks * f
            sample['landmarks'] = landmarks.reshape(-1)

        return sample


class CropCenter(object):
    def __init__(self, size=128, elem_name='image'):
        self.size = size
        self.elem_name = elem_name

    def __call__(self, sample):
        img = sample[self.elem_name]
        h, w, _ = img.shape
        margin_h = (h - self.size) // 2
        margin_w = (w - self.size) // 2
        sample[self.elem_name] = img[margin_h:margin_h + self.size, margin_w:margin_w + self.size]
        sample["crop_margin_x"] = margin_w
        sample["crop_margin_y"] = margin_h

        if 'landmarks' in sample:
            landmarks = sample['landmarks'].reshape(-1, 2)
            landmarks -= torch.tensor((margin_w, margin_h), dtype=landmarks.dtype)[None, :]
            sample['landmarks'] = landmarks.reshape(-1)

        return sample
    
# class RandomFlipV(object):
#     def __init__(self, size=128, elem_name='image'):
#         self.size = size
#         self.elem_name = elem_name

#     def __call__(self, sample):
#         if np.random.randint(0,10)>4:
#             sample['flip'] = True                    
#             img = sample[self.elem_name]            
#             sample[self.elem_name] = img[:,::-1,:]
#             if 'landmarks' in sample:
#                 landmarks = sample['flip_landmarks']                
#                 landmarks[:,0] = img.shape[1]-landmarks[:,0] 
#                 sample['landmarks'] = landmarks #torch.as_tensor(flip_lm_v(landmarks,img.shape))
#         else:
#             sample['flip'] = False
#         return sample 
    
class TransformByKeys(object):
    def __init__(self, transform, names):
        self.transform = transform
        self.names = set(names)

    def __call__(self, sample):
        for name in self.names:
            if name in sample:
                sample[name] = self.transform(sample[name])

        return sample

class ThousandLandmarksDataset(data.Dataset):
    def __init__(self, root, transforms, split="train", size=None, ignore_image=None): # , isFlip=False
        super(ThousandLandmarksDataset, self).__init__()
        self.root = root
        landmark_file_name = os.path.join(root, 'landmarks.csv') if split != "test" \
            else os.path.join(root, "test_points.csv")
        images_root = os.path.join(root, "images")

        self.image_names = []
        self.landmarks = []
#         self.flip_landmarks = []

        if size is None:            
            with open(landmark_file_name, "rt") as fp:
                num_lines = sum(1 for line in fp)
        else:
            num_lines = size-1

        with open(landmark_file_name, "rt") as fp:
            for i, line in tqdm.tqdm(enumerate(fp)):
                if i == 0:
                    continue  # skip header
                if split == "train" and i == int(TRAIN_SIZE * num_lines):
                    break  # reached end of train part of data
                elif split == "val" and i < int(TRAIN_SIZE * num_lines):
                    continue  # has not reached start of val part of data
                if i>=int(num_lines):
                    break # чтобы можно было грузить меньше картинок
                elements = line.strip().split("\t")
                if ignore_image is not None and elements[0] in ignore_image:
                    print("ignore: ",elements[0])
                    continue
                    
                image_name = os.path.join(images_root, elements[0])                
                self.image_names.append(image_name)            
                
                if split in ("train", "val"):
                    landmarks = list(map(np.int16, elements[1:]))
                    landmarks = np.array(landmarks, dtype=np.int16).reshape((len(landmarks) // 2, 2))                                                            
                    self.landmarks.append(landmarks)
#                     flip_landmarks = flip_lm_v(landmarks)
#                     self.flip_landmarks.append(flip_landmarks)

        if split in ("train", "val"):
            self.landmarks = torch.as_tensor(self.landmarks)
            self.flip_landmarks = torch.as_tensor(self.flip_landmarks)
        else:
            self.landmarks = None
            self.flip_landmarks = None

        self.transforms = transforms

    def __getitem__(self, idx):
        sample = {}

        image = cv2.imread(self.image_names[idx])
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)                
        sample["image"] = image
        
        if self.landmarks is not None:
            landmarks = self.landmarks[idx].clone()
#             flip_landmarks = self.flip_landmarks[idx].clone()
            
            sample["landmarks"] = landmarks
#             sample["flip_landmarks"] = flip_landmarks
        
        if self.transforms is not None:
            sample = self.transforms(sample)

        return sample

    def __len__(self):
        return len(self.image_names)

def restore_landmarks_batch_ex(landmarks, fs, margins_x, margins_y,dx,dy):
    landmarks[:, :, 0] += margins_x[:, None]
    landmarks[:, :, 1] += margins_y[:, None]
    landmarks /= fs[:, None, None]
    landmarks[:, :, 0] += dx[:, None]
    landmarks[:, :, 1] += dy[:, None]
    return landmarks
def restore_landmarks_batch(landmarks, fs, margins_x, margins_y):
    landmarks[:, :, 0] += margins_x[:, None]
    landmarks[:, :, 1] += margins_y[:, None]
    landmarks /= fs[:, None, None]    
    return landmarks


def create_submission(path_to_data, test_predictions, path_to_submission_file):
    test_dir = os.path.join(path_to_data, "test")

    output_file = path_to_submission_file
    wf = open(output_file, 'w')
    wf.write(SUBMISSION_HEADER)

    mapping_path = os.path.join(test_dir, 'test_points.csv')
    mapping = pd.read_csv(mapping_path, delimiter='\t')

    for i, row in mapping.iterrows():
        file_name = row[0]
        point_index_list = np.array(eval(row[1]))
        points_for_image = test_predictions[i]
        needed_points = points_for_image[point_index_list].astype(np.int)
        wf.write(file_name + ',' + ','.join(map(str, needed_points.reshape(2 * len(point_index_list)))) + '\n')


In [None]:
def draw_landmarks(image, landmarks):
    for point in landmarks:
        x, y = point.astype(np.int)
        cv2.circle(image, (x, y), 1, (128, 0, 128), 1, -1)
    return image


In [None]:
train_loss_fn = fnn.mse_loss
valid_loss_fn = MseW()

In [None]:
device = torch.device("cuda: 0") if use_gpu else torch.device("cpu")

In [None]:
# pipeline при обучении
train_transforms = transforms.Compose([
#     RandomFlipV(),
    CropRandom(),    
    ScaleMinSideToSize((CROP_SIZE, CROP_SIZE)),
    CropCenter(CROP_SIZE),   
    TransformByKeys(transforms.ToPILImage(), ("image",)),
    TransformByKeys(transforms.ToTensor(), ("image",)),
    TransformByKeys(transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]), ("image",)),
])
# pipeline при валидации и предсказании
val_transforms = transforms.Compose([
    ScaleMinSideToSize((CROP_SIZE, CROP_SIZE)),
    CropCenter(CROP_SIZE),
    TransformByKeys(transforms.ToPILImage(), ("image",)),
    TransformByKeys(transforms.ToTensor(), ("image",)),
    TransformByKeys(transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]), ("image",)),
])

In [None]:
%%time
print("Reading data...")
train_dataset = ThousandLandmarksDataset(os.path.join(data_dir, 'train'), train_transforms, split="train",size = data_size) 
train_dataloader = data.DataLoader(train_dataset, batch_size=batch_size, num_workers=0, pin_memory=True,drop_last=True,
                                   shuffle=True)
print(len(train_dataset))

In [None]:
val_dataset = ThousandLandmarksDataset(os.path.join(data_dir, 'train'), val_transforms, split="val",size = data_size)
val_dataloader = data.DataLoader(val_dataset, batch_size=batch_size, num_workers=0, pin_memory=True,drop_last=False,shuffle=False)
print(len(val_dataset))

In [None]:
print("Creating model...")
device = torch.device("cuda: 0") if use_gpu else torch.device("cpu")
model = models.resnet50(pretrained=True)
model.fc = nn.Linear(model.fc.in_features, 2 * NUM_PTS, bias=True)
model.to(device)
optimizer = optim.AdamW(model.parameters(), lr=learning_rate, amsgrad=True)

In [None]:
# 2. train & validate
print("Ready for training...")
best_val_loss = np.inf
for epoch in range(0,epochs):    
    train_loss = train(model, train_dataloader, train_loss_fn, optimizer, device=device)
    val_loss = validate(model, val_dataloader, valid_loss_fn, device=device)
    print("Epoch #{:2}:\ttrain loss: {:5.4f}\tval loss: {:5.4f}".format(epoch, train_loss, val_loss))    
    if val_loss < best_val_loss:
        best_val_loss = val_loss
        with open(f"{prj_name}_best.pth", "wb") as fp:
            torch.save(model.state_dict(), fp)
    with open(f"{prj_name}_"+str(epoch)+".pth", "wb") as fp:
            torch.save(model.state_dict(), fp)


Предскажем значения для валидационного множества

In [None]:
def dataset_landmark_to_pred(ds):
    lm = ds['landmarks'].numpy().copy()
    dx = ds['crop_margin_x']
    dy = ds['crop_margin_y']    
    for ix in range(0,len(lm),2):
        lm[ix]+=dx
    for iy in range(1,len(lm),2):
        lm[iy]+=dy
    coef = ds['scale_coef'].numpy()    
    lm=lm/coef
    lm = lm.reshape(-1,2)
    return lm

def calc_err(idx,landmarks,val_dataset,loss_fn):
    return loss_fn(torch.tensor(landmarks[idx]), torch.tensor(dataset_landmark_to_pred(val_dataset[idx]))).numpy()

def show_dataset_image(fn,ds):    
    image = cv2.imread(fn)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)    
#     dx = ds['crop_margin_x']
#     dy = ds['crop_margin_y']
    lm = dataset_landmark_to_pred(ds)
    image = draw_landmarks(image, lm)    
    plt.imshow(image)

def show_predict_image(fn,lm):
    image = cv2.imread(fn)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image = draw_landmarks(image, lm)
    plt.imshow(image)    

In [None]:
model = models.resnet50(pretrained=True)
model.fc = nn.Linear(model.fc.in_features, 2 * NUM_PTS, bias=True)
model.load_state_dict(torch.load(f"{prj_name}_best.pth"))

model.to(device)
model.eval()

In [None]:
%%time
print("Reading data...")
# поменяли трансформацию на валидационную, чтобы убрать рандом
train_dataset = ThousandLandmarksDataset(os.path.join(data_dir, 'train'), val_transforms, split="train",size = data_size) 
train_dataloader = data.DataLoader(train_dataset, batch_size=batch_size, num_workers=0, pin_memory=True,drop_last=False,shuffle=False)# уберем перемешивание
print(len(train_dataset))
train_predictions = predict(model, train_dataloader, device)

In [None]:
val_predictions = predict(model, val_dataloader, device)

Хочется сравнить разметку и результат работы модели

In [None]:
%%time
dataset = train_dataset
landmarks = train_predictions
max_len = len(dataset)
print(max_len)
max_err_idxs = []
for i in range(0,max_len):
    err = calc_err(i,landmarks,dataset,train_loss_fn)
    if err>100:
        max_err_idxs.append(i)
        print(i,err)
print(max_err_idxs)

In [None]:
%%time
dataset = val_dataset
landmarks = val_predictions
max_len = len(dataset)
print(max_len)
max_err_idxs_val = []
for i in range(0,max_len):
    err = calc_err(i,landmarks,dataset,train_loss_fn)
    if err>100:
        max_err_idxs_val.append(i)
        print(i,err)
print(max_err_idxs_val)

In [None]:
ignore_image = set()
for idx in max_err_idxs:
    ignore_image.add(os.path.basename(train_dataset.image_names[idx]))
    
for idx in max_err_idxs_val:    
    ignore_image.add(os.path.basename(val_dataset.image_names[idx]))
print(len(ignore_image))
with open("ignore_images_.lst", "wt") as fp:
    for s in ignore_image:    
        print(s, file = fp)

In [None]:
NUM_IMAGES_TO_SHOW = 16
NUM_COLS = 4
NUM_ROWS = NUM_IMAGES_TO_SHOW // NUM_COLS + int(NUM_IMAGES_TO_SHOW % NUM_COLS != 0)

plt.figure(figsize=(25, NUM_ROWS * 8))
for i, idx in enumerate(max_err_idxs_val[:16], 1):    
    plt.subplot(NUM_ROWS, NUM_COLS, i)
    show_predict_image(val_dataset.image_names[idx],val_predictions[idx])
plt.tight_layout()
plt.show()

In [None]:
plt.figure(figsize=(25, NUM_ROWS * 8))
for i, idx in enumerate(max_err_idxs_val[:16], 1):    
    plt.subplot(NUM_ROWS, NUM_COLS, i)
    show_dataset_image(val_dataset.image_names[idx],val_dataset[idx])
plt.tight_layout()
plt.show()

In [None]:
plt.figure(figsize=(25, NUM_ROWS * 8))
for i, idx in enumerate(max_err_idxs[:16], 1):    
    plt.subplot(NUM_ROWS, NUM_COLS, i)
    show_predict_image(train_dataset.image_names[idx],train_predictions[idx])
plt.tight_layout()
plt.show()

In [None]:
plt.figure(figsize=(25, NUM_ROWS * 8))
for i, idx in enumerate(max_err_idxs[:16], 1):    
    plt.subplot(NUM_ROWS, NUM_COLS, i)
    show_dataset_image(train_dataset.image_names[idx],train_dataset[idx])
plt.tight_layout()
plt.show()