<a href="https://colab.research.google.com/github//pylabel-project/samples/blob/main/yolo2coco.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Convert Yolo V5 Annotations (TXT Files) to COCO JSON Format
Converting from yolo to another format is a little tricky because yolo format does not store the dimensions of the image, which you will need to convert to most other formats. So you need to read the image file to get the height and width of the image. The PyLabel package takes care of that. This notebook will show how you can import yolo v5 annotations and export them into another format, like COCO.



In [None]:
import logging
logging.getLogger().setLevel(logging.CRITICAL)
!pip install pylabel > /dev/null

In [None]:
from pylabel import importer

### Load Dataset from Google Drive


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
! ls
! ls drive/MyDrive/_ColabFiles/walmartai_veggievision
! unzip -qq drive/MyDrive/_ColabFiles/walmartai_veggievision/dataset_kavan_patel.zip -d data

data  drive  sample_data


## Import Yolo annotations
First we will import annotations stored in Yolo v5 format. (This is a sample data data set. You can edit this part to point to your dataset.)


In [None]:
%%capture
import os, zipfile

#Download sample yolo dataset
os.makedirs("data", exist_ok=True)
!wget "https://github.com/pylabel-project/datasets_models/blob/main/coco128.zip?raw=true" -O data/coco128.zip
with zipfile.ZipFile("data/coco128.zip", 'r') as zip_ref:
   zip_ref.extractall("data")

There are two methods of importing YOLOv5 annotations. The method shown here 'ImportYoloV5' will read the annotations but you must also provide a list of the class names that map to the class ids. There is another method, 'ImportYoloV5WithYaml' that can read the class names from a YAML file, shown in this notebook: [yolo_with_yaml_importer.ipynb](https://github.com/pylabel-project/samples/blob/main/yolo_with_yaml_importer.ipynb)

In [None]:
# path_to_annotations = "data/dataset_kavan_patel/labels/train"
path_to_annotations = "data/dataset_kavan_patel/labels/val"

#Identify the path to get from the annotations to the images
# path_to_images = "../../images/train"
path_to_images = "../../images/val"

#Import the dataset into the pylable schema
#Class names are defined here https://github.com/ultralytics/yolov5/blob/master/data/coco128.yaml
yoloclasses = ['banana-bag', 'banana', 'Blackberries', 'Raspberries', 'lemon-bag', 'lemon', 'grapes-bag', 'grapes', 'tomato-bag', 'tomato', 'apple-bag', 'apple', 'chili-bag', 'chili']
dataset = importer.ImportYoloV5(path=path_to_annotations, path_to_images=path_to_images, cat_names=yoloclasses,
    img_ext="jpg", name="instances_val")

dataset.df.head(5)


Importing YOLO files...:   0%|          | 0/650 [00:00<?, ?it/s][A
Importing YOLO files...:   0%|          | 1/650 [00:00<01:55,  5.62it/s][A
Importing YOLO files...:   0%|          | 2/650 [00:00<02:03,  5.24it/s][A
Importing YOLO files...:   0%|          | 3/650 [00:00<02:04,  5.20it/s][A
Importing YOLO files...:   1%|          | 4/650 [00:00<02:01,  5.31it/s][A
Importing YOLO files...:   1%|          | 5/650 [00:00<01:56,  5.52it/s][A
Importing YOLO files...:   1%|          | 6/650 [00:01<01:54,  5.62it/s][A
Importing YOLO files...:   1%|          | 7/650 [00:01<01:47,  5.96it/s][A
Importing YOLO files...:   1%|          | 8/650 [00:01<01:48,  5.90it/s][A
Importing YOLO files...:   1%|▏         | 9/650 [00:01<01:46,  5.99it/s][A
Importing YOLO files...:   2%|▏         | 10/650 [00:01<01:47,  5.93it/s][A
Importing YOLO files...:   2%|▏         | 11/650 [00:01<01:43,  6.19it/s][A
Importing YOLO files...:   2%|▏         | 12/650 [00:02<01:45,  6.05it/s][A
Importing YOLO f

Unnamed: 0_level_0,img_folder,img_filename,img_path,img_id,img_width,img_height,img_depth,ann_segmented,ann_bbox_xmin,ann_bbox_ymin,...,ann_iscrowd,ann_keypoints,ann_pose,ann_truncated,ann_difficult,cat_id,cat_name,cat_supercategory,split,annotated
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,../../images/val,387_5_apple_wob_8.jpg,,0,3024,4032,3,,1005.5,976.5,...,,,,,,11,apple,,,1
1,../../images/val,178_5_lemon_wob_2.jpg,,1,3024,4032,3,,136.0,913.5,...,,,,,,5,lemon,,,1
2,../../images/val,529_5_lemon_wb_24.jpg,,2,3024,4032,3,,98.5,2672.0,...,,,,,,4,lemon-bag,,,1
3,../../images/val,472_5_banana_wob_31.jpg,,3,3024,4032,3,,562.0,739.5,...,,,,,,1,banana,,,1
4,../../images/val,104_3_lemon_wb_28.jpg,,4,4032,3024,3,,1017.5,368.0,...,,,,,,4,lemon-bag,,,1


## Analyze annotations
Pylabel can calculate basic summary statisticts about the dataset such as the number of files and the classes.
The dataset is stored as a pandas frame so the developer can do additional exploratory analysis on the dataset.

In [None]:
print(f"Number of images: {dataset.analyze.num_images}")
print(f"Number of classes: {dataset.analyze.num_classes}")
print(f"Classes:{dataset.analyze.classes}")
print(f"Class counts:\n{dataset.analyze.class_counts}")

Number of images: 3942
Number of classes: 14
Classes:['banana-bag', 'banana', 'Blackberries', 'Raspberries', 'lemon-bag', 'lemon', 'grapes-bag', 'grapes', 'tomato-bag', 'tomato', 'apple-bag', 'apple', 'chili-bag', 'chili']
Class counts:
cat_name
tomato          529
lemon           509
grapes          499
apple           477
banana          388
chili           359
chili-bag       319
tomato-bag      307
apple-bag       304
grapes-bag      301
banana-bag      292
lemon-bag       290
Raspberries     149
Blackberries    117
Name: count, dtype: int64


## Edit Annotations
All of the annotations are stored in a Pandas dataframe that you can access directly as 'dataset.df'. Not only can you do your own custom queries of the dataset, but you can also manipulate the dataset by removing rows, changing labels, etc.  

PyLabel also includes a lightweight annotation tool that you can use to create and edit bounding box annotations within a Jupyter notebook. You can see an example of that tool here: [pylabeler.ipynb](https://github.com/pylabel-project/samples/blob/main/pylabeler.ipynb)

## Visualize Annotations
You can render the bounding boxes for your image to inspect them and confirm that they imported correctly.  

In [None]:
from IPython.display import Image, display
display(dataset.visualize.ShowBoundingBoxes(100))
display(dataset.visualize.ShowBoundingBoxes(30))


Output hidden; open in https://colab.research.google.com to view.

# Export to Coco Json
The PyLabel exporter will export all of the annotations in the dataframe to the desired target format.
All annotations will be stored in a single json file.

In [None]:
! ls data/dataset_kavan_patel/labels/train/coco128.json
! cp data/dataset_kavan_patel/labels/train/coco128.json drive/MyDrive/_ColabFiles/walmartai_veggievision/

data/dataset_kavan_patel/labels/train/coco128.json


In [None]:
dataset.export.ExportToCoco(cat_id_index=1)

  df = df.replace(r"^\s*$", np.nan, regex=True)

Exporting to COCO file...:   0%|          | 0/4840 [00:00<?, ?it/s][A
Exporting to COCO file...:   3%|▎         | 161/4840 [00:00<00:14, 312.30it/s][A
Exporting to COCO file...:   9%|▊         | 417/4840 [00:00<00:05, 799.60it/s][A
Exporting to COCO file...:  14%|█▍        | 676/4840 [00:00<00:03, 1222.06it/s][A
Exporting to COCO file...:  19%|█▉        | 910/4840 [00:00<00:02, 1503.32it/s][A
Exporting to COCO file...:  23%|██▎       | 1128/4840 [00:00<00:02, 1665.69it/s][A
Exporting to COCO file...:  28%|██▊       | 1343/4840 [00:01<00:02, 1608.46it/s][A
Exporting to COCO file...:  32%|███▏      | 1536/4840 [00:01<00:02, 1539.11it/s][A
Exporting to COCO file...:  35%|███▌      | 1712/4840 [00:01<00:02, 1464.05it/s][A
Exporting to COCO file...:  39%|███▊      | 1873/4840 [00:01<00:02, 1450.34it/s][A
Exporting to COCO file...:  42%|████▏     | 2028/4840 [00:01<00:01, 1428.48it/s][A
Exporting to COCO file...:  45%|████▌     | 21

['data/dataset_kavan_patel/labels/train/coco128.json']

Thank you for trying PyLabel. If you had any issues running this notebook or have ideas for how to make it better, please submit an issue here https://github.com/pylabel-project/pylabel/issues.