# Object Detection - Mission 7
#### Ensemble
앙상블은 최종 아웃풋의 품질과 가장 직점적으로 연관이 있고, 시간 대비 좋은 결과를 낼 수 있는 방법입니다!
지금까지 학습시킨 모델들을 혹은 Sample Submission을 이용해 앙상블 코드를 작성해봅시다.
<br>Ensemble의 자세한 내용은 09강: Ready for Competition 강의를 참고합니다.

## 대회 데이터셋 구성
Custom 데이터를 구현하여 대회 데이터셋에 Ensemble 방법을 적용해봅니다. <br>
데이터셋의 자세한 개요는 [대회 플랫폼](https://next.stages.ai/competitions/)의 데이터 설명을 참고합니다.
> Copyright: CC BY 2.0

### dataset
    ├── train.json
    ├── test.json
    ├── train
    └── test

In [1]:
!pip install ensemble_boxes

Collecting ensemble_boxes
  Downloading ensemble_boxes-1.0.9-py3-none-any.whl (23 kB)
Collecting numba (from ensemble_boxes)
  Downloading numba-0.58.1-cp310-cp310-macosx_11_0_arm64.whl.metadata (2.7 kB)
Collecting llvmlite<0.42,>=0.41.0dev0 (from numba->ensemble_boxes)
  Downloading llvmlite-0.41.1-cp310-cp310-macosx_11_0_arm64.whl.metadata (4.8 kB)
Downloading numba-0.58.1-cp310-cp310-macosx_11_0_arm64.whl (2.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.6/2.6 MB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hDownloading llvmlite-0.41.1-cp310-cp310-macosx_11_0_arm64.whl (28.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m28.8/28.8 MB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: llvmlite, numba, ensemble_boxes
Successfully installed ensemble_boxes-1.0.9 llvmlite-0.41.1 numba-0.58.1


In [47]:
import pandas as pd
from ensemble_boxes import *
import numpy as np
from pycocotools.coco import COCO

In [60]:
# ensemble csv files
submission_files = ['./universe.csv', './cascade_focal.csv', './retina101.csv', './swin_t.csv', './detr.csv', './yolo.csv']
submission_df = [pd.read_csv(file, index_col=False) for file in submission_files]

In [61]:
image_ids = submission_df[0]['image_id'].tolist()

In [62]:
# ensemble 할 file의 image 정보를 불러오기 위한 json
annotation = './test.json'
coco = COCO(annotation)

loading annotations into memory...
Done (t=0.02s)
creating index...
index created!


In [63]:
weights = [0.5868, 0.5514, 0.4473, 0.4979, 0.4950, 0.4548]
min_value = min(weights)
scaled_list = [value / min_value for value in weights]
scaled_list

[1.3118712273641853,
 1.2327297116029512,
 1.0,
 1.113123183545719,
 1.1066398390342054,
 1.016767270288397]

In [64]:
len(scaled_list)

6

In [65]:
prediction_strings = []
file_names = []
# ensemble 시 설정할 iou threshold 이 부분을 바꿔가며 대회 metric에 알맞게 적용해봐요!
iou_thr = 0.6
skip_box_thr = 0.001
weights = scaled_list

# 각 image id 별로 submission file에서 box좌표 추출
for i, image_id in enumerate(image_ids):
    prediction_string = ''
    boxes_list = []
    scores_list = []
    labels_list = []
    image_info = coco.loadImgs(i)[0]
#     각 submission file 별로 prediction box좌표 불러오기
    for df in submission_df:
        predict_string = df[df['image_id'] == image_id]['PredictionString'].tolist()[0]
        predict_list = str(predict_string).split()

        if len(predict_list)==0 or len(predict_list)==1:
            continue

        predict_list = np.reshape(predict_list, (-1, 6))
        box_list = []

        for box in predict_list[:, 2:6].tolist():
            box[0] = float(box[0]) / image_info['width']
            box[1] = float(box[1]) / image_info['height']
            box[2] = float(box[2]) / image_info['width']
            box[3] = float(box[3]) / image_info['height']
            box_list.append(box)

        boxes_list.append(box_list)
        scores_list.append(list(map(float, predict_list[:, 1].tolist())))
        labels_list.append(list(map(int, predict_list[:, 0].tolist())))
        

#     예측 box가 있다면 이를 ensemble 수행
    if len(boxes_list):
        boxes, scores, labels = weighted_boxes_fusion(boxes_list, scores_list, labels_list, weights=scaled_list, iou_thr=iou_thr, skip_box_thr=skip_box_thr)
        for box, score, label in zip(boxes, scores, labels):
            prediction_string += str(int(label)) + ' ' + str(score) + ' ' + str(box[0] * image_info['width']) + ' ' + str(box[1] * image_info['height']) + ' ' + str(box[2] * image_info['width']) + ' ' + str(box[3] * image_info['height']) + ' '

    prediction_strings.append(prediction_string)
    file_names.append(image_id)



In [66]:
submission = pd.DataFrame()
submission['PredictionString'] = prediction_strings
submission['image_id'] = file_names
submission.to_csv('./ensemble9.csv', index=False)

submission.head()

Unnamed: 0,PredictionString,image_id
0,7 0.9758523728560446 603.2963256835938 515.428...,test/0000.jpg
1,5 0.7109988580494588 133.72361755371094 0.3450...,test/0001.jpg
2,1 0.8299846370247225 295.8479919433594 316.513...,test/0002.jpg
3,9 0.5527059433879347 125.22518920898438 253.94...,test/0003.jpg
4,0 0.3793897142860516 426.0456848144531 408.948...,test/0004.jpg


### Reference
https://github.com/ZFTurbo/Weighted-Boxes-Fusion

###**콘텐츠 라이선스**

<font color='red'><b>**WARNING**</b></font> : **본 교육 콘텐츠의 지식재산권은 재단법인 네이버커넥트에 귀속됩니다. 본 콘텐츠를 어떠한 경로로든 외부로 유출 및 수정하는 행위를 엄격히 금합니다.** 다만, 비영리적 교육 및 연구활동에 한정되어 사용할 수 있으나 재단의 허락을 받아야 합니다. 이를 위반하는 경우, 관련 법률에 따라 책임을 질 수 있습니다.
