# How to create your own image segmentation dataset

Note: Tested on Windows with carla 0.9.5

### Step 1: Start Carla by running CarlaUE4.exe

### Step 2: Run cmd from within the repo folder and activate your conda environment

```activate carla```

### Step 3: Spawn some Npc's by running the following command

```python spawn_npc.py```

I used the following arguments.

```python spawn_npc.py -n 30 --safe```

-n 30 means that we spawn 30 random Npc's.

-- safe means to avoid spawning vehicles prone to accidents.

There are more optional arguments which you can check out using the following command:

```python spawn_npc.py -h```

### Step 4: Run cmd from within the repo folder again and activate your conda environment

```activate carla```

### Step 5: Collect your data by running the following command

```python collect_train_data.py```

I used the following arguments.

```python collect_train_data.py -n 30 -d 180```

-n 30 means that we collect data from 30 random spawn points

-d 180 means that we let the ego vehicle drive in autopilot for 180 seconds

##### How this script works:

This script creates an ego vehicle with a rgba and a semantic segmentation camera attached to it at front centre.
<img src="doc_images/rgba.png">

The script spawns the ego vehicle in a unique position. The position is randomly selected.
The vehicle then starts to drive using autopilot and collect data for -d seconds. The collected rgba images are saved to "./output/images", and the collected semantic segmentation masks are saved to "./output/labels"
After that, the ego vehicle will respawn on another unique, randomly selected position. This will be repeated -n times.

This means that this script tries to collect -n * -d/2 images -> we divide by two cause in ego_camera_sensors.json I specified the tick of the camera to be 2 seconds. But you can change this if you want. 

### Step 6: Clean up file names

The collected images and labels are named with the following naming convention run{}_fr_{}.png. 
run is the run or spawn id. This means that images that were collected when the ego vehicle was spawned the second time are named with run1. fr stands for frame. I used the frame information to generate unique file names. 

When we inspect the collected images and labels we can see, that they have slightly of frame ids. That is due to the fact that we had to use two separate cameras to collect the data. So the images are slightly off. But in most cases, this is such a small offset that it is acceptable. However, we now need to synch the label and image file names. You can do this by running the following helper function.

In [1]:
import os
from clean_up_dataset import clean_up_file_names_preserving_run_id

clean_up_file_names_preserving_run_id(os.path.join(".", "output", "images"))
clean_up_file_names_preserving_run_id(os.path.join(".", "output", "labels"))

The helper function will replace the frame id with an uprising id.

### Step 7: Clean up the collected data

This is the annoying part. Because the autopilot tries to respect traffic rules, it will stop on red lines or stop signs. In this case, we will have some images that are nearly identical. We need to remove them. For that, you have to take a look at all images within the "images" folder and delete the ones that you think are invalid.

In my case, I had 1700 images and ended up with 1206.

### Step 8: Create the dataset

We now need to create our dataset. Therefore we will copy all valid image/label pairs to a new folder called dataset. Then we will select a number of runs to be our validation set. To do this, we save all filenames of the selected runs to a txt file. This can all be done by using the following helper functions.

In [3]:
from clean_up_dataset import copy_valid_image_files_to_dataset
from clean_up_dataset import create_validation_set_file

copy_valid_image_files_to_dataset("images")
copy_valid_image_files_to_dataset("labels")
create_validation_set_file(["2", "9", "13", "19", "27"])

### Step 9: Done

That´s it. We are done!