# Tutorial 01 - Data Preprocessing

The CommonRoad-RL package comes with functions to convert raw datasets into [CommonRoad scenarios](https://commonroad.in.tum.de/scenarios) in `.xml` format and `.pickle` format. Currently, [highD](https://www.highd-dataset.com) and [inD](https://www.ind-dataset.com) are supported. Respective conversion tools are found under `commonroad_rl/utils_highd` and `commonroad_rl/utils_ind`.  
This tutorial shows how to utilize the tools to prepare training and testing data for the highD dataset. A similar procedure follows for the inD dataset.

## 0. Preparation
Please follow the [README.md](https://gitlab.lrz.de/ss20-mpfav-rl/commonroad-rl/-/blob/development/README.md) to install the CommonRoad-RL package and make sure the followings:
* current path is at the project root `commonroad-rl`, i.e. one upper layer to the `tutorials` folder
* interactive python kernel is triggered from the correct environment

In [None]:
# Check current path
%cd ..
%pwd

# Check interactive python kernel
import sys
sys.executable

## 1. Acquire the dataset
Please go to [the highD home page](https://www.highd-dataset.com) or contact Xiao to download the raw highD dataset.  
To facilitate the following exercises, we have prepared sample data under `tutorials/data/highd/raw`, where you should see three csv files recording the track information and one jpg file showing the track background.

## 2. Convert raw .csv data to .xml files
Having read and followed the instructions in the [commonroad_rl/utils_highd/README.rst](https://gitlab.lrz.de/ss20-mpfav-rl/commonroad-rl/-/blob/development/commonroad_rl/utils_highd/README.rst) file, simply call the provided Python file to perform a conversion. Note this could take around 6min.

In [None]:
!python -m commonroad_rl.utils_highd.highd_to_cr -i tutorials/data/highD/raw/ -o tutorials/data/highD/xmls/

Now there should be 50 `.xml` files in the output folder `tutorials/data/highd/xmls`.

## 3. Validate .xml files against CommonRoad .xsd specification

To check if the converted `.xml` files comply with the CommonRoad scenario format, use the validation tool in `commonroad_rl/tools`.

In [None]:
!python -m commonroad_rl.tools.validate_cr -s commonroad_rl/tools/XML_commonRoad_XSD_2020a.xsd tutorials/data/highD/xmls/*

## 4. Visualize CommonRoad scenarios
There is a visualization tool in `commonroad_rl/tools`, which can be executed by a simple command at the terminal; for example,  
`python -m commonroad_rl.tools.visualize_cr tutorials/data/highD/xmls/DEU_LocationB-3_1_T-1.xml`. 

However, this script does not work for Jupyter notebook because of a backend error. Therefore, we utilize here the `commonroad-io` package. Let's try it with a sample scenario.

In [None]:
%matplotlib inline
import os
import matplotlib.pyplot as plt

from commonroad.common.file_reader import CommonRoadFileReader
from commonroad.visualization.draw_dispatch_cr import draw_object

file_path = "tutorials/data/highD/xmls/DEU_LocationB-3_1_T-1.xml"

# Read in the scenario and planning problem set
scenario, planning_problem_set = CommonRoadFileReader(file_path).open()

# Plot the scenario for 40 time step, here each time step corresponds to 0.1 second
for i in range(0, 40):
    # Uncomment to clear previous graph
    # display.clear_output(wait=True)
    
    plt.figure(figsize=(20, 10))
    
    # Plot the scenario at different time step
    draw_object(scenario, draw_params={'time_begin': i})
    
    # Plot the planning problem set
    draw_object(planning_problem_set)
    plt.gca().set_aspect('equal')
    plt.show()

## 5. Convert .xml files to .pickle data
Since an RL training/testing session involves tens of thousands of iterations and accesses to the scenarios, it is a good idea to convert the `.xml` files to `.pickle` format so that they will be loaded more efficiently during training and testing. Furthermore, this script separates road networks and obstacles since lots of scenario could share the road network data. Road networks are stored in `meta_scenario` folder, whereas obstacles are stored in the `problem` folder. This is done with a conversion tool in `commonroad_rl/tools/pickle_scenario`.

In [None]:
!python -m commonroad_rl.tools.pickle_scenario.xml_to_pickle -i tutorials/data/highD/xmls -o tutorials/data/highD/pickles

Now in the output folder `tutorials/data/highD/pickles`, there should be a `meta_scenario` folder containing meta information and a `problem` folder containing 50 problems pickled from the `.xml` files.

## 6. Split .pickle data for training and testing
As a final step, let's split the 50 problems into training and testing sets with a ratio of 7:3 randomly, again using a provided script in `commonroad_rl/utils_run`.

In [None]:
!python -m commonroad_rl.utils_run.split_dataset -i tutorials/data/highD/pickles/problem -otrain tutorials/data/highD/pickles/problem_train -otest tutorials/data/highD/pickles/problem_test -tr_r 0.7

Now in `tutorials/data/highD/pickles`, there should be a `problem_train` folder containing 35 pickles and a `problem_test` folder containing 15 pickles.