
### This notebook was designed to analyze data for Preferential Viewing Experiments.

### Instructions for Use:
1. Open the script in your Python environment (e.g., VS Code, Jupyter Notebook).
2. Make sure you installed all the required libraries (listed in the first cell) in your custom environment.
3. Run the script. It will automatically process all the participant files in the data folder, apply the necessary functions, and save the results in the analysis folder.

### What it does:
1. Creates Directories for Saving Results
The script creates a folder where the analysis results will be saved. If this folder already exists, the script will let you know. Otherwise, it will create it.
2. Read and Process Data for Each Participant
The script goes into a folder where the data files are stored, finds all subfolders (one per participant), and processes the files in each folder.
3. For each participant:
- The script reads a data file that ends with _record_extra.csv.
- It filters the data to only include rows where the event named target_on occurred.
- It adds additional information to the data, such as padding (which defines the space around areas of interest), the participant ID, and the dimensions of images used in the experiment.
4. The script adds details about images shown during the experiment, such as their paths, coordinates, and bounding boxes (which define areas of interest on the screen).
Each image shown to participants is identified as either "left" or "right."
5. Run Data Processing
- plot2d(): Plots a scatter plot of the raw and fixation data for each condition. It has an option of plotting the stimuli images, they need to be in a folder specified in the dataframe.
- getFixationLatency(): Determines when each fixation started relative to the target event
- handle_carryover_fixations_and_merge(): Sometimes fixations start before or end after the event of interest. This fixes the fixation latency for these cases.
- addAOI(): Assigns fixations to the predefined Areas of Interest
- Combine Processed Data
Once the data for each participant is processed, it is added to a list.
After processing all participants, the script combines the data into a single file and saves it as allSubjects_PV_Young.csv in the analysis_new folder.
6. Output and Save
If any data was processed, the script combines it and saves it to a file. You will see a message indicating where the file was saved.
If no data was processed, the script will print a message saying "No data was processed."


### The notebook returns in /analysis folder
1. Plots of all trials
2. Fixation dataframe for all subject
3. Dataframe with novelty indices calculated

### Adjusting the AOIs (bounding boxes) 
1. By default the AOIs are the image dimensions
2. By using a $padding$ variable you can expand the AOIs on each side. This will also have an effect on plotting.

### Important Columns
1. $user\_pred\_px\_x, user\_pred\_px\_y$: raw gaze coordinates
1. $FixXPos, FixYPos$: x,y position of fixations
2. $FixStartEnd$: indicates wheter fixation was carried over the event boundaries or not
3. $DistFromPrevFix$: distance from previous fixation in px (handy variable)
4. $PrevFixSampTime$: timestamp of the previous fixation (handy variable)
5. $PrevFixXPos, PrevFixYPos$: x,y position of preceeding fixation (handy variable)
8. $FixLatency$: the latency of fixation relative to when the target was presented
9. $FixationOrder$: the order of fixation during the event
10. $FixDur$: the duration of fixation
11. $AOI\_bbox$: the number of the bounding box where fixation landed or None
12. $AOI\_stim$: which stimulus fixation landed
13. $event$: the event during which the data is analyzed, usually *target_on* 
14. $targSampTime$: timestamp of when the target was presented

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
import os
import sys
import matplotlib.pyplot as plt
from matplotlib.widgets import Button
import ast   
import re
import matplotlib.patches as patches
import matplotlib.image as mpimg  
import matplotlib.colors as mcolors

# import DeepEye analysis functions
from deepeye_analysis_package.getFixations import batch_extract_fixations
from deepeye_analysis_package.preprocessing import getFixationLatency, handle_carryover_fixations_and_merge, addAOI
from deepeye_analysis_package.plotting import plot2d

### Main part

In [None]:
# Determine the environment ('Home' or 'Office') and set the data path accordingly
WHERE = 'Office'  # 'Office' or 'Home'

if WHERE == 'Home':
    path = r'C:/Users/aby600/Dropbox/Appliedwork/CognitiveSolutions/Projects/DeepEye/TechnicalReports/TechnicalReport1/Test_PreferentialViewing/Pilot_PreferentialViewing/Young/Approved'
else:
    path = r'D:/Dropbox/Appliedwork/CognitiveSolutions/Projects/DeepEye/TechnicalReports/TechnicalReport1/Test_PreferentialViewing/Pilot_PreferentialViewing/Young/Approved'

# Define the AOI padding in pixels
PADDING = 0  # padding of AOI on each side, used in plot2d() and addAOI()

# Helper function to create a directory if it doesn't exist
def create_directory_if_not_exists(directory_path):
    if not os.path.exists(directory_path):
        os.makedirs(directory_path)
        print(f"Directory '{directory_path}' was created.")
    else:
        print(f"Directory '{directory_path}' already exists.")

# Define data analysis directories and create them if they don't exist yet
path_to_data = os.path.join(path, 'data')
path_to_analysis = os.path.join(path, 'analysis_new')
create_directory_if_not_exists(path_to_analysis)

# Initialize an empty list to hold the processed dataframes
output_dfs = []

# Get all folder names from the data directory
folder_names = [name for name in os.listdir(path_to_data) if os.path.isdir(os.path.join(path_to_data, name))]

# Process each participant's data
for fn in folder_names:
    path_to_file = os.path.join(path_to_data, fn, f'{fn}_record_extra.csv')

    print(f'Processing participant {fn}...')

    try:
        df = pd.read_csv(path_to_file)
    except FileNotFoundError:
        print(f'File does not exist: {path_to_file}')
        continue

    # Filter data to only include rows where the target was presented
    df1 = df[df['event'] == 'target_on'].copy()

    # Add padding, subject ID, and image dimensions to the dataframe
    df1['padding'] = PADDING
    df1['deepeye_id'] = fn
    df1['imageDims'] = [(480, 480)] * len(df1)

    # Add image paths and coordinates to the dataframe
    df1['image_paths'] = df1.apply(lambda row: [row.imageLeft, row.imageRight], axis=1)
    df1['image_coords'] = df1.apply(lambda row: [
        (row.leftX, row.Y, row.imageDims[0], row.imageDims[1]),
        (row.rightX, row.Y, row.imageDims[0], row.imageDims[1])
    ], axis=1)

    # Add bounding boxes and their names to the dataframe
    df1['bboxes'] = df1.apply(lambda row: [
        [row.leftX, row.Y, row.imageDims[0], row.imageDims[1]],
        [row.rightX, row.Y, row.imageDims[0], row.imageDims[1]]
    ], axis=1)
    df1['bboxesNames'] = df1.apply(lambda row: ['left', 'right'], axis=1)

    # Plot 2D fixations without saving the plot
    # plot2d(df1, fn, path_to_analysis, condition='locStudiedImage', save=False)

    # Process the data by applying preprocessing steps
    df1 = getFixationLatency(df1)
    df1 = handle_carryover_fixations_and_merge(df1, max_event_duration=4000)
    df1 = addAOI(df1)

    # Accumulate the processed dataframe for this participant
    output_dfs.append(df1)

# Concatenate all participants' data into one DataFrame
if output_dfs:
    output_df = pd.concat(output_dfs, ignore_index=True)
    output_file = os.path.join(path_to_analysis, 'allSubjects_PV_Young.csv')
    output_df.to_csv(output_file, index=False)
    print(f'Combined data saved to {output_file}')
else:
    print('No data was processed.')


## Calculate novelty index and make a new dataframe

In [None]:
# testPhase_df = pd.read_csv(os.path.join(path_to_analysis, 'allSubjects_PV_Young.csv'))

testPhase_df = output_df

# Select only test phase
testPhase_df = testPhase_df[testPhase_df.phase=='test']

# Label the fixations on left or right side
testPhase_df['FixatedNovel'] = np.where(testPhase_df.AOI_stim == testPhase_df.locStudiedImage, 'old', 
                                        np.where(testPhase_df.AOI_stim == 'None', 'None', 'novel'))

novelty_data = []

# Iterate through participants and trials
for (deepeye_id, trialNr), group in testPhase_df.groupby(['deepeye_id','trialNr']):
    
    # Safely compute the proportion of novel fixations (fixCountProp)
    fix_count_total = group.FixatedNovel.count()
    fix_count_novel = group.FixatedNovel[group.FixatedNovel == 'novel'].count()
    novelty_fix_count_prop = fix_count_novel / fix_count_total if fix_count_total != 0 else 0
    
    # Safely compute the proportion of fixation durations (fixDurProp)
    fix_dur_total = group.FixDur.sum()
    fix_dur_novel = group.FixDur[group.FixatedNovel == 'novel'].sum()
    novelty_fix_dur_prop = fix_dur_novel / fix_dur_total if fix_dur_total != 0 else 0
    
    # Append the results to a list
    novelty_data.append([deepeye_id, trialNr, novelty_fix_count_prop, novelty_fix_dur_prop])

# Convert list to DataFrame
novelty_df = pd.DataFrame(novelty_data, columns=['deepeye_id', 'trialNr', 'noveltyIdx_fixCountProp', 'noveltyIdx_fixDurProp'])

# Merge additional data into novelty_df
additional_columns = ['deepeye_id', 'trialNr', 'pp_id', 'imageLeft', 'imageRight', 'locStudiedImage']  # List the columns you want to keep

# Drop duplicates to avoid having repeated rows during merge
testPhase_unique_df = testPhase_df[additional_columns].drop_duplicates(subset=['deepeye_id', 'trialNr'])

# Merge the novelty dataframe with additional information
novelty_df = novelty_df.merge(testPhase_unique_df, on=['deepeye_id', 'trialNr'], how='left')

# Save the output file
novelty_df.to_csv(os.path.join(path_to_analysis, 'allSubjects_NoveltyIndex.csv'), index=False)



## Summary of novelty index

In [None]:
# filter out excluded participants
# '2024_01_15_14_19_20' was the second time, '2024_01_26_17_16_28' too few frames, '2024_01_15_11_44_18' & '2024_01_26_13_09_05'keep fixating the center, 
novelty_df = novelty_df[~novelty_df['deepeye_id'].isin(['2024_01_15_14_19_20', '2024_01_26_17_16_28', '2024_01_15_11_44_18', '2024_01_26_13_09_05'])]

total_count = novelty_df.groupby(['deepeye_id']).noveltyIdx_fixCountProp.count()
fixCountProp = novelty_df.groupby(['deepeye_id']).noveltyIdx_fixCountProp.mean()
fixDurProp = novelty_df.groupby(['deepeye_id']).noveltyIdx_fixDurProp.mean()

print(total_count)
print(fixCountProp)
print(fixDurProp)
print(fixCountProp.mean())
print(fixDurProp.mean())
