Skip to content

Latest commit

 

History

History
16 lines (9 loc) · 3.8 KB

data_readme.md

File metadata and controls

16 lines (9 loc) · 3.8 KB

Data Organization for ML/DL

All data for the project is located in /projects/ncdot mounted directory which can be accessed by renci or storage-ncdot group. Whenever possible, a service account serv-ncdot in storage-ncdot group is used for task operations in this project. We get primary road data initially to get the project going first, but the focus of the project is on 2 lane only secondary road data.

Image naming and organization convention

Both primary road and secondary road images are named and organized consistently by sets. A set refers to a unique 3 digit ID for a subset of data collection—generally around 100 miles or roughly 2 hours of actual data collection. Within a set folder, there are as many as 120 numbered folders containing JPEG images, with each folder representing 1 minute of data collection. The naming convention used for each image denotes where the image is stored in the file structure. Specifically, the first 3 digits represent the set number, next 2 digits represent converted hour value, next 2 digits represent converted minute value, next 2 digits represent converted second value, next 2 digits represent converted frame value, last digit represents the image view ID (1: front perspective, 2: right shoulder, 5: left shoulder, 6: rear).

Primary road data and associated guardrail survey data

The original primary road videolog data we obtained from NCDOT is stored in /projects/ncdot/2018/NC_2018. The 2 lane only road images from the primary road videolog, which NCDOT is interested in, were extracted by using NCDOT linear referencing system, 2 lane only shape file map, and videolog sensor output mapped metadata and placed in /projects/ncdot/2018/NC_2018_Images.

For the primary road videolog data, NCDOT also has guardrail survey data which was used to extract labeled guardrail data for guardrail model training. The labeled guardrail data is in /projects/ncdot/2018/machine_learning. Specifically, the full labeled guardrail data is in /projects/ncdot/2018/machine_learning/data which contains 1,370,773 images randomly split into 1,343,357 training, 13,708 validation, and 13,708 test sets; the 2 lane only labeled guardrail data is in /projects/ncdot/2018/machine_learning/data_2lanes which contains 282,452 images randomly split into 254,206 training, 14,124 validation, and 14,122 test sets. The 2 lane only labeled guardrail data is extracted for training a guardrail model with 2-lane only images as a subset (about 21%) of the full labeled guardrail data with images symbolic linked to the corresponding images in /projects/ncdot/2018/machine_learning/data. We trained two guardrail models using full and 2 lane only data, respectively, aimed at selecting the better performing guardrail model. Each image in the labeled guardrail data corresponds to a time stamp in the video log which was created by joining the left, front, and right view images for the time stamp.

Secondary road data

The secondary road videolog data contains 14 divisions and can be found in /projects/ncdot/NC_2018_Secondary with each sub-directory from d01 to d14 corresponding to one of the 14 divisions. NCDOT has selected d04, d08, d13 and d14 divisions for us to work with initially. Data preparation process includes exporting the sensor data using a viewer app, mapping the sensor data to images based on closest time stamps or distance in order to tag each image with its geo-location and mile post in the DOT linear referencing system, using the 2 lane shape file to select 2 lane only images, and joining left, front, and right view images to facilitate batch model inference. The prepared images ready to be used are located in /projects/ncdot/NC_2018_Secondary/images with sub-directory d4, d8, d13 and d14 representing corresponding divisions.