# Requirements

Loop through trials 1, 2, and 6 files in the `parsed_files` folder (i.e. xx_01, xx_02, and xx_06).

For files where `Trial number = 01`, get the 1st intersection (i.e. 1st set of rows where `inIntersection = True`) and the rows 4 seconds above and 4 seconds below the 1st and last included rows, respectively.

For files where `Trial number = 02`, get the 2nd intersection (i.e. 2nd set of rows where `inIntersection = True`) and the rows 4 seconds above and 4 seconds below the 1st and last included rows, respectively.

For files where `Trial number = 06`, get the 3rd intersection (i.e. 3rd set of rows where `inIntersection = True`) and the rows 4 seconds above and 4 seconds below the 1st and last included rows, respectively.

## Output location

Save each file's output to a CSV file with the same file name in a folder named `parsed_intersections`.

# Script

Import the os, pandas,  and load_workbook from openpyxl modules.

In [None]:
import os
import pandas as pd
import more_itertools as mit

Define the paths of the source and destination directories.

In [None]:
src_path = "parsed_files/"

dst_path = "parsed_intersections/"
if not os.path.exists(dst_path):
    os.mkdir(dst_path)

In [None]:
# loop through the files in parsed_files
for file in os.listdir(src_path):
    
    # check if the trial sequence of the file is either 01, 02, or 06
    if file[3:-4] in ["01", "02", "06"]:
        
        # log which file is being read
        print("------------------------------")
        print("Reading " + file)
        
        # extract the file data and save it to a dataframe
        df = pd.read_csv(src_path + file, engine="python")
        
        # get indices of rows where inIntersection = True
        indices = df.index[df["inIntersection"] == True]
        
        # identify 1st, 2nd, and 3rd intersections
        # by grouping consecutive indices
        indices = [list(group) for group in mit.consecutive_groups(indices)]
        
        # check if trial = 01
        if file[3:-4] == "01":
            # get 1st intersection
            indices = indices[0]
        
        # check if trial = 02
        elif file[3:-4] == "02":
            # get 2nd intersection
            indices = indices[1]
        
        # check if trial = 06
        elif file[3:-4] == "06":
            # get 3rd intersection
            indices = indices[2]
            
        # get the times 4 seconds above and 4 seconds below the 1st and last rows
        start_time = df.iloc[indices[0]]["Time"] - 4
        end_time = df.iloc[indices[-1]]["Time"] + 4
        
        # get the rows between the start and end times
        parsed_data = df[(df["Time"] >= start_time) & (df["Time"] <= end_time)]
        
        # save the output
        parsed_data.to_csv(dst_path + file, index=False)