Skip to content

How to parse csv files

Mirco Dilly edited this page May 21, 2021 · 1 revision

Parsing CSV with RDPlot

Basic Structure

CSV (Comma-separated values; .csv) files are widely used for data exchange between different applications. These files are structured by lines and separators, such as semicolons. Each of the lines represents one data point, and each data point is subdivided into different data values by using a separator, e.g. a semicolon. The first line (header) specifies the names of the data values (keys). An example file is given below:

Sequence;QP;Bitrate;Y-PSNR;U-PSNR;V-PSNR
RiverByBoat;20;251845.108;39.97341217;41.39445233;43.34404117
RiverByBoat;21;211400.5248;39.2529095;41.025804;42.9842895
RiverByBoat;22;176904.8104;38.57324717;40.6913925;42.63936983
RiverByBoat;23;134663.8776;37.58021617;40.341812;42.268973
WaterFallWide;20;296302.8344;39.85129667;43.190307;46.024819
WaterFallWide;21;260036.6816;39.15466433;42.90219233;45.8539285
WaterFallWide;22;228621.7864;38.48426617;42.62523283;45.69569167
WaterFallWide;23;187343.348;37.4559095;42.33932283;45.5458655

Parsing CSV files

RDPlot recognizes that a CSV should be parsed, if the file ends with .csv. In this case, the file, which consists of N lines (data points), will be stored into a string variable lines[N]. The first line lines[0] specifies the header, which columnwise holds the names of the data entries (keys). For the example above, the header would look like this:

print(lines[0])
> 'Sequence;QP;Bitrate;Y-PSNR;U-PSNR;V-PSNR\n'

The config specification is assumed to be contained in the file name. Therefore, the parser tries to extract the config value from the path value. In our example the path and config values are given by:

print(path)
> '/PATH_TO_WORKSPACE/results/HDR-HLG_HM-16.22.csv'

print(config)
> 'HDR-HLG_HM-16.22'

After the parser has finished reading the file, RDPlot will show a tree view of the input data in the Sequences-Section, which can be found on the left side of the Plot-Area. For our example the tree view would look similar to this:

CSVLog
├── RiverByBoat
│   └── HDR-HLG_HM-16.22
│       └── 20
│       └── 21
│       └── 22
│       └── 23
└── WaterFallWide
    └── HDR-HLG_HM-16.22
        └── 20
        └── 21
        └── 22
        └── 23

If more than one file should be parsed, the structure of the tree view will slightly change. Since the file name carries the config information, the tree view will have more than one config option, if the files contain the same sequences. For example, if another file (/PATH_TO_WORKSPACE/results/HDR-HLG_VTM-11.0.csv), which holds results for the same sequences but using a different config, is parsed, the tree view in the Sequences-Section would update to this:

CSVLog
├── RiverByBoat
│   ├── HDR-HLG_HM-16.22
│   │   └── 20
│   │   └── 21
│   │   └── 22
│   │   └── 23
│   └── HDR-HLG_VTM-11.0
│       └── 20
│       └── 21
│       └── 22
│       └── 23
└── WaterFallWide
    ├── HDR-HLG_HM-16.22
    │   └── 20
    │   └── 21
    │   └── 22
    │   └── 23
    └── HDR-HLG_VTM-11.0
        └── 20
        └── 21
        └── 22
        └── 23