# CSV Parser Testing and Validation
---
This notebook tests the CSV parser on valid inputs, both weighted and unweighted, and on invalid inputs, with various issues such as inconsistent rows and incorrectly formated row length.

This following cell imports the necessary modules as well as sets up the 'src' path so we can access modules within the package as necessary.

## Testing parsers edge cases and bug handling capabilities


In [1]:
import sys
import os
sys.path.append(os.path.abspath("../src"))
from src import csv_parser

The following cell runs a test file with comments scattered throughout. The parser should silently skip the comments, and just output the intervals and whether or not it is a weighted data set.

In [2]:
filename = "testFile_bug_comments.csv"
intervals, isWeighted = csv_parser.parse_csv(filename)
print(f"Intervals: {intervals}")
print(f"Is weighted: {isWeighted}")

Intervals: [(1, 2), (2, 3), (3, 4)]
Is weighted: False


The following cell runs a test file with empty rows scattered throughout. The parser should silently skip the empty rows, and just output the intervals and whether or not it is a weighted data set.

In [3]:
filename = "testFile_bug_emptyRows.csv"
intervals, isWeighted = csv_parser.parse_csv(filename)
print(f"Intervals: {intervals}")
print(f"Is weighted: {isWeighted}")

Intervals: [(1, 2), (2, 3), (3, 4)]
Is weighted: False


The following cell runs a test file with inconsistent data. The CSV file should either contain only start and end times for EACH interval, or all three of start, end, and weight for EACH interval. The parser should raise a ValueError and print "Inconsistency Detected: All rows must be either 2 (unweighted) or 3 (weighted)".


In [4]:
filename = "testFile_bug_inconsistentRows.csv"
try:
    intervals, isWeighted = csv_parser.parse_csv(filename)
    print("Intervals:", intervals)
    print("Is weighted:", isWeighted)
except ValueError as error:
    print("Caught ValueError:\n", error)

Caught ValueError:
 Inconsistency Detected: All rows must be either 2 (unweighted) or 3 (weighted).


This following cell runs a test file with some rows that have the wrong amount of data. These rows will be skipped with a warning, and the interval and whether or not it is a weighted set should be outputted. Rows should either have 2 columns of data (start and end times) or 3 columns (start, end, and weight).

In [5]:
filename = "testFile_bug_rowLength.csv"
intervals, isWeighted = csv_parser.parse_csv(filename)
print(f"Intervals: {intervals}")
print(f"Is weighted: {isWeighted}")

Intervals: [(1, 2), (3, 4), (5, 6)]
Is weighted: False


## Demonstrating weighted and unweighted data outputs

The following cell runs two test files with weighted and unweighted data inputs. The output should show the intervals, including weight as the third value for the weighted data, and accurately tell if the data is weighted or not.

In [7]:
filename = "testFile_weighted.csv"
intervals, isWeighted = csv_parser.parse_csv(filename)
print(f"Intervals: {intervals}")
print(f"Is weighted: {isWeighted}")

filename = "testFile_unweighted.csv"
intervals, isWeighted = csv_parser.parse_csv(filename)
print(f"Intervals: {intervals}")
print(f"Is weighted: {isWeighted}")

Intervals: [(1, 2, 1), (2, 3, 2), (3, 4, 3), (5, 6, 4)]
Is weighted: True
Intervals: [(1, 2), (3, 4), (5, 6), (7, 8)]
Is weighted: False
