# VOC
PASCAL VOC 挑战赛主要有 `Object Classification` 、`Object Detection`、`Object Segmentation`、`Human Layout`、`Action Classification`这几类子任务，其中又分为PASCAL VOC 2007和2012数据集，包含了4 个大类：vehicle、household、animal、person，总共 20 个小类（加背景21类），每个大类包含的小类如下：<br>
- **Person**:person
- **Animal**: bird, cat, cow, dog, horse, sheep
- **Vehicle**:aeroplane, bicycle, boat, bus, car, motorbike, train
- **Indoor**: bottle, chair, dining table, potted plant, sofa, tv/monitor

## 数据集文件结构
下载数据集后，其文件结构如下（以VOC2007为例）：<br>
├─ Annotations 进行 detection 任务时的标签文件，xml 形式，文件名与图片名一一对应<br>
├─ ImageSets 包含三个子文件夹 Layout、Main、Segmentation，其中 Main 存放的是分类和检测的数据集分割文件<br>
├─ JPEGImages 存放 .jpg 格式的图片文件<br>
├─ SegmentationClass 存放按照 class 分割的图片<br>
└─ SegmentationObject 存放按照 object 分割的图片<br>

├─ Main<br>
│   ├─ train.txt 写着用于训练的图片名称， 共 2501 个<br>
│   ├─ val.txt 写着用于验证的图片名称，共 2510 个<br>
│   ├─ trainval.txt train与val的合集。共 5011 个<br>
│   ├─ test.txt 写着用于测试的图片名称，共 4952 个<br>


## XML解析


```
<annotation>
	<folder>VOC2007</folder>
	<filename>000001.jpg</filename>  # 文件名 
	<source>
		<database>The VOC2007 Database</database>
		<annotation>PASCAL VOC2007</annotation>
		<image>flickr</image>
		<flickrid>341012865</flickrid>
	</source>
	<owner>
		<flickrid>Fried Camels</flickrid>
		<name>Jinky the Fruit Bat</name>
	</owner>
	<size>  # 图像尺寸, 用于对 bbox 左上和右下坐标点做归一化操作
		<width>353</width>
		<height>500</height>
		<depth>3</depth>
	</size>
	<segmented>0</segmented>  # 是否用于分割
	<object>
		<name>dog</name>  # 物体类别
		<pose>Left</pose>  # 拍摄角度：front, rear, left, right, unspecified 
		<truncated>1</truncated>  # 目标是否被截断（比如在图片之外），或者被遮挡（超过15%）
		<difficult>0</difficult>  # 检测难易程度，这个主要是根据目标的大小，光照变化，图片质量来判断
		<bndbox>
			<xmin>48</xmin>
			<ymin>240</ymin>
			<xmax>195</xmax>
			<ymax>371</ymax>
		</bndbox>
	</object>
	<object>
		<name>person</name>
		<pose>Left</pose>
		<truncated>1</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>8</xmin>
			<ymin>12</ymin>
			<xmax>352</xmax>
			<ymax>498</ymax>
		</bndbox>
	</object>
</annotation>
```

## 解析实例

In [1]:
import os

In [8]:
path_train_val = r'H:\deepLearning\dataset\voc\VOCtrainval_06-Nov-2007\VOC2007'
path_anno = os.path.join(path_train_val, 'Annotations')
files_anno = os.listdir(path_anno)
#
path_image = os.path.join(path_train_val, 'JPEGImages')
files_image = os.listdir(path_image)

In [9]:
files_anno[:2]

['000005.xml', '000007.xml']

In [10]:
files_image[:2]

['000005.jpg', '000007.jpg']

In [11]:
VOC_CLASSES = (    # always index 0
    'aeroplane', 'bicycle', 'bird', 'boat',
    'bottle', 'bus', 'car', 'cat', 'chair',
    'cow', 'diningtable', 'dog', 'horse',
    'motorbike', 'person', 'pottedplant',
    'sheep', 'sofa', 'train', 'tvmonitor')

In [12]:
import xml.etree.ElementTree as ET

In [14]:
def parse_rec(filename):
    '''
    Parse a PASCAL VOC xml file
    '''
    tree = ET.parse(filename)
    objects = []
    for obj in tree.findall('object'):
        obj_struct = {}
        difficult = int(obj.find('difficult').text)
        name = obj.find('name').text
        if difficult==1:
            print(f'Warning:the -{name}- in -{filename}- is difficult to detect')
            continue
        obj_struct['name'] = name
        bbox = obj.find('bndbox')
        obj_struct['bbox'] = [int(float(bbox.find('xmin').text)),
                              int(float(bbox.find('ymin').text)),
                              int(float(bbox.find('xmax').text)),
                              int(float(bbox.find('ymax').text))]
        objects.append(obj_struct)
        
    return objects

In [16]:
xml_file = os.path.join(path_anno, files_anno[2])
parse_rec(xml_file)

[{'name': 'horse', 'bbox': [69, 172, 270, 330]},
 {'name': 'person', 'bbox': [150, 141, 229, 284]},
 {'name': 'person', 'bbox': [285, 201, 327, 331]},
 {'name': 'person', 'bbox': [258, 198, 297, 329]}]