<a href="https://colab.research.google.com/github/ProtossDragoon/CoMoLab/blob/master/CV/Yolov1Keras/Yolov1_Keras_Implementation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Yolov1 Model 

In this notebook I am going to implement YOLOV1 as described in the paper You Only Look Once. The goal is to replicate the model as described in the paper and in the process, understand the nuances of using Keras on a complex problem.

- https://www.maskaravivek.com/post/yolov1/

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [None]:
import tensorflow as tf
import matplotlib.pyplot as plt

Data Preprocessing
I would be using VOC 2007 dataset as its size is manageable so it would be easy to run it using Google Colab.

First, I download and extract the dataset.

In [None]:
%cd /content/gdrive/"My Drive"/
!rm -r temp
!mkdir temp
%cd temp


!wget http://pjreddie.com/media/files/VOCtrainval_06-Nov-2007.tar
!wget http://pjreddie.com/media/files/VOCtest_06-Nov-2007.tar

!tar xvf VOCtrainval_06-Nov-2007.tar # 현재 디렉터리에 tar file 의 압축을 푸는 코드
!tar xvf VOCtest_06-Nov-2007.tar

!rm VOCtrainval_06-Nov-2007.tar
!rm VOCtest_06-Nov-2007.tar

In [None]:
%cd /content/gdrive/"My Drive"/temp
!ls

VOCdevkit


*위 데이터를 한 번 다운받았으면 다시는 다운 받을필요가 없겠지요? 매우 오랜 시간이 걸리는 작업이니 주의해 주세요.

Next, we process the annotations and write the labels in a text file. A text file is easier to consume as compared to XML.

In [None]:
%cd /content/gdrive/"My Drive"/temp

import argparse
import xml.etree.ElementTree as ET
import os

parser = argparse.ArgumentParser(description='Build Annotations.')
parser.add_argument('dir', default='..', help='Annotations.')

sets = [('2007', 'train'), ('2007', 'val'), ('2007', 'test')]

# 내가 관심있는 class 들과 그에 해당하는 번호
classes_num = {'aeroplane': 0, 'bicycle': 1, 'bird': 2, 'boat': 3, 'bottle': 4, 'bus': 5,
               'car': 6, 'cat': 7, 'chair': 8, 'cow': 9, 'diningtable': 10, 'dog': 11,
               'horse': 12, 'motorbike': 13, 'person': 14, 'pottedplant': 15, 'sheep': 16,
               'sofa': 17, 'train': 18, 'tvmonitor': 19}


def convert_annotation(year, image_id, f):
    in_file = os.path.join('VOCdevkit/VOC%s/Annotations/%s.xml' % (year, image_id))
    tree = ET.parse(in_file)
    root = tree.getroot()

    for obj in root.iter('object'): # python 반복자 참고 : https://python.bakyeono.net/chapter-7-4.html 
                                    # xmltree xmlparser iter() 참고 : https://docs.python.org/2/library/xml.etree.elementtree.html#finding-interesting-elements
        difficult = obj.find('difficult').text # difficult 가 뭔지는 잘 모르겠음.
        cls = obj.find('name').text
        classes = list(classes_num.keys())
        if cls not in classes or int(difficult) == 1: # 내가 관심있는 class 가 아니면 버림.
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (int(xmlbox.find('xmin').text), int(xmlbox.find('ymin').text),
             int(xmlbox.find('xmax').text), int(xmlbox.find('ymax').text))
        f.write(' ' + ','.join([str(a) for a in b]) + ',' + str(cls_id)) # join 함수 참고 : https://zetawiki.com/wiki/%ED%8C%8C%EC%9D%B4%EC%8D%AC_join()
        # 함수가 파일에 쓰는 형식 : 
        # " xmin,ymin,xmax,ymax,1 xmin,ymin,xmax,ymax,3 (...object의 개수만큼)"


for year, image_set in sets:
  print(year, image_set)
  with open(os.path.join('VOCdevkit/VOC%s/ImageSets/Main/%s.txt' % (year, image_set)), 'r') as f: # python context manager 참고 : https://sjquant.tistory.com/12
      image_ids = f.read().strip().split() # 파일 입출력 참고 : https://wikidocs.net/26
  with open(os.path.join("VOCdevkit", '%s_%s.txt' % (year, image_set)), 'w') as f:
      for image_id in image_ids:
          f.write('%s/VOC%s/JPEGImages/%s.jpg' % ("VOCdevkit", year, image_id))
          convert_annotation(year, image_id, f)
          f.write('\n')
          # for 문 반복 한번 당 파일에 작성되는 형식 :
          # "VOCdevkit/VOC2007/JPEGImages/이미지명.jpg xmin,ymin,xmax,ymax,1 xmin,ymin,xmax,ymax,3 (...object의 개수만큼)"