# <font style="color:blue">Generate Sample Dataset for Despose Training</font>

Here, we downloaded 2014 val Images from <a href="http://cocodataset.org/#download">COCO website</a>. We have chosen val images (`6GB`) instead of train images (`13GB`) as it is smaller in size.

**[Download the COCO val2014 Dataset](http://images.cocodataset.org/zips/val2014.zip)**

After downloading the COCO val2014 dataset, unzip it in the current directory. 

And the annotation files can be downloaded from <a href="https://github.com/facebookresearch/DensePose/blob/master/DensePoseData/get_DensePose_COCO.sh">here</a>. Let's download the annotation file by running the code cell.

**Download function.**

In [1]:
import urllib

def download(url, filepath):
    response = urllib.request.urlretrieve(url, filepath)
    return response

**Download the annotations file**

In [2]:
densepose_coco_2014_minival_url = 'https://dl.fbaipublicfiles.com/densepose/densepose_coco_2014_minival.json'
annotations_path = 'densepose_coco_2014_minival.json'

download(densepose_coco_2014_minival_url, annotations_path)

('densepose_coco_2014_minival.json',
 <http.client.HTTPMessage at 0x7f5394cf2050>)

From the annotations, it is found the number of images with annotations from val set are 1500. From these images, we will use 1000 images for creating train, val and test datasets.

Train, val and test datasets will follow the structure given in the detectron2 training module.

```
datasets
|
|-->coco
       |
       |-->annotations
       |       |-->densepose_train2014.json
       |       |-->densepose_valminusminival2014.json
       |       |-->densepose_minival2014.json
       |
       |-->train2014
       |
       |-->val2014
```

**The following code cell prepares a dataset of `1000` images. If you want to experiment with more number of images, you can increase the number of images.**

In [3]:
import os
import json
import random
import shutil


src_folder = 'val2014'
dest_folder = os.path.join('datasets', 'coco')
dest_annotations_folder = os.path.join(dest_folder, 'annotations')
train_dataset = os.path.join(dest_folder, 'train2014')
val_dataset = os.path.join(dest_folder, 'val2014')

#Number of images to be selected, you can increse this number, if you want to experiments with images.
num_images = 1000

os.makedirs(dest_annotations_folder, exist_ok=True)
os.makedirs(train_dataset, exist_ok=True)
os.makedirs(val_dataset, exist_ok=True)

with open(annotations_path, "r") as f:
    data = json.load(f)

count = 0
train_image_ids = []
test_image_ids = []
val_image_ids = []

for im in random.sample(data["images"], num_images):
    img_name = im['file_name']
    img_id =  str(im['id'])

    if count%5 == 0:
        img_path = os.path.join(src_folder, img_name)
        shutil.copy(img_path, val_dataset)

        if count%10 == 0:
            test_image_ids.append(img_id)
        else:
            val_image_ids.append(img_id)
    else:
        shutil.copy(img_path, train_dataset)
        train_image_ids.append(img_id)

    count = count + 1

train_data = {
    "images": [],
    "annotations": [],
    "categories": []
}

val_data = {
    "images": [],
    "annotations": [],
    "categories": []
}

test_data = {
    "images": [],
    "annotations": [],
    "categories": []
}

for im_obj in data["images"]:
    if str(im_obj["id"]) in train_image_ids:
        train_data["images"].append(im_obj)

    if str(im_obj["id"]) in val_image_ids:
        val_data["images"].append(im_obj)

    if str(im_obj["id"]) in test_image_ids:
        test_data["images"].append(im_obj)

for ann_obj in data["annotations"]:
    if str(ann_obj["image_id"]) in train_image_ids:
        train_data["annotations"].append(ann_obj)

    if str(ann_obj["image_id"]) in val_image_ids:
        val_data["annotations"].append(ann_obj)

    if str(ann_obj["image_id"]) in test_image_ids:
        test_data["annotations"].append(ann_obj)

train_data["categories"] = data["categories"]
test_data["categories"] = data["categories"]
val_data["categories"] = data["categories"]

with open(os.path.join(dest_annotations_folder,"densepose_train2014.json"), "w") as  f:
    f.write(json.dumps(train_data))

with open(os.path.join(dest_annotations_folder,"densepose_valminusminival2014.json"), "w") as f:
    f.write(json.dumps(val_data))

with open(os.path.join(dest_annotations_folder,"densepose_minival2014.json"), "w") as f:
    f.write(json.dumps(test_data))

print(len(train_data["images"]), len(train_data["annotations"]))
print(len(val_data["images"]), len(val_data["annotations"]))
print(len(test_data["images"]), len(test_data["annotations"]))

800 3075
100 441
100 394
