# Combining Datasets

This notebook is created to merge the roboflow datasets. Start it with downloading the zip file of specific roboflow datasets.

## Modify Classes in Train/Valid/Test Folders

The *labels* folder stores class id of the box in the first digit and the edge values of the boxes for the following digits. For example:

`0 0.46015625 0.409375 0.7890625 0.6875`

It is an example of a text file in the labels folder and means this image has only one box since there is only one line and it has an id of 0 (fire_extinguisher) with the edge values of the boxes(generally xmin, xmax, ymin and ymax respectively).

First, define the dirname of the *labels* folder we aim to modify.

In [None]:
import os

dirname = "directory_name/valid/labels" # validation labels
filenames = os.listdir(dirname)

Then, read the files and create `lines` variable to rewrite the label classes

In [None]:
prev_class_id = '0' # class id to be replaced
new_class_id = '1' # new class id

for filename in filenames:
    path = os.path.join(dirname, filename)
    with open(path, 'r') as file:
        lines = file.readlines()
        
    with open(path, 'w') as file:
        for line in lines:
            if line.startswith(prev_class_id):
                line = new_class_id + line[1:]
            file.write(line)


Double check or debug the boxes if it is desired.

In [None]:
for filename in filenames:
    path = os.path.join(dirname, filename)
    with open(path, 'r') as file:
        lines = file.readlines()
        for line in lines:
            if line[0] == prev_class_id:
                print(line[0])

Starting from the beginning, Run the codes for each folder (train/valid/test)

## Modify *data.yaml*

*data.yaml* tells the model how to process the dataset. For example:

```
train: ../train/images
val: ../valid/images
test: ../test/images

nc: 2
names: ['fire_extinguisher', 'babyface']
```

- First three lines are constant, specifying the train, val and test folder positions.
- nc means number of classes (the class ids are directly related to nc)
- names indicate the names of the classes

Using the design above, create a *data.yaml* file and edit `nc` and `names` accordingly.

## Expected Directory Structure
We expect the directory structure to be in this format

```
.
└── Dataset/Directory/
    ├── README_files/
    │   └── ...
    ├── test/
    │   ├── images/
    │   │   ├── 38.png
    │   │   └── ...
    │   └── labels/
    │   │   ├── 38_png.txt
    │   │   └── ...
    ├── train/
    │   ├── images/
    │   │   ├── 1.png
    │   │   └── ...
    │   └── labels/
    │   │   ├── 1_png.txt
    │   │   └── ...
    ├── valid/
    │   ├── images/
    │   │   ├── 15.png
    │   │   └── ...
    │   └── labels/
    │   │   ├── 15_png.txt
    │   │   └── ...
    └── data.yaml
```

This structure is essential for training the ultralytics YOLO model.