# Requirements

Loop through the CSV files in the `raw_files` folder.

For each file:
* Create a folder named after the file.
* Get the rows in between events and save them in a separate file named after the event.


## Output location

Save everything in a folder named `parsed_files`. The directory tree should look like this:
* raw_files
    * sample00.csv
    * sample01.csv
    * sample02.csv
    * ...
* parsed_files
    * sample00
        * event00.csv
        * event01.csv
        * event02.csv
        * ...
    * sample01
        * event00.csv
        * event01.csv
        * event02.csv
        * ...
    * ...

# Script

Import os and pandas modules.

In [1]:
import os
import pandas as pd

Define the paths of the source and destination directories.

In [2]:
SRC_PATH = "raw_files/"

DST_PATH = "parsed_files/"
if not os.path.exists(DST_PATH):
    os.mkdir(DST_PATH)

In [3]:
# loop through the files in parsed_files
for file in os.listdir(SRC_PATH):
    
    # skip non-CSV files
    if ".csv" in file:
        
        # create a folder named after the file
        folder = DST_PATH + file.replace(".csv", "") + "/"
        if not os.path.exists(folder):
            os.mkdir(folder)

        # log which file is being read
        print("------------------------------")
        print("Reading " + file)

        # extract the file data and save it to a dataframe
        df = pd.read_csv(SRC_PATH + file)

        # get the indices of the rows where Event contains !E TRIAL_EVENT_VAR
        indices = df.index[df["Event"].str.contains("!E TRIAL_EVENT_VAR", na=False)].tolist()
        
        # loop through every other element of indices (i.e. 0, 2, 4, etc.)
        for i in range(0, len(indices), 2):
            
            # get start and end indices
            start_index = indices[i]
            end_index = indices[i + 1]
            
            # extract file name from event name
            filename = df["Event"][indices[i]]
            # remove !E TRIAL_EVENT_VAR from file name
            filename = filename.replace("!E TRIAL_EVENT_VAR ", "")
            # remove start/stop from file name
            filename = filename.split("_")[0]
            
            print(f"- Saving rows {start_index} to {end_index} to {filename}.csv")
            
            parsed_df = df[start_index:end_index + 1]
            parsed_df.to_csv(folder + filename + ".csv", index=False)

------------------------------
Reading sample02.csv
- Saving rows 27538 to 28027 to normal 1.csv
- Saving rows 38609 to 39138 to nonhazard 2.csv
- Saving rows 53319 to 53706 to hazard 3.csv
- Saving rows 66680 to 67318 to nonhazard 4.csv
- Saving rows 87903 to 88274 to hazard 5.csv
- Saving rows 100748 to 101267 to normal 6.csv
- Saving rows 115651 to 116022 to hazard 7.csv
- Saving rows 124820 to 125283 to normal 8.csv
- Saving rows 135281 to 135783 to nonhazard 9.csv
------------------------------
Reading sample01.csv
- Saving rows 19069 to 19590 to normal 1.csv
- Saving rows 31214 to 31833 to nonhazard 2.csv
- Saving rows 49877 to 50252 to hazard 3.csv
- Saving rows 67513 to 68124 to nonhazard 4.csv
- Saving rows 85214 to 85549 to hazard 5.csv
- Saving rows 109236 to 109647 to normal 6.csv
- Saving rows 125941 to 126311 to hazard 7.csv
- Saving rows 138739 to 139211 to normal 8.csv
- Saving rows 150974 to 151599 to nonhazard 9.csv
