# PLV Analysis

## Basic Setup
* Load modules from EEG_Corpus_Scripts (make sure to run git pull before this so you are working on the most updated files)
* Set working directory

In [1]:
import os 
os.chdir('/home/mernestus/ERC_2024/PLV_PILOT/EEG_Corpus_Scripts/')

In [8]:
import pandas as pd

In [16]:
import locations
import read_xml
import load_eeg
import load_subset
import tfa_plv
import phrase_search

## Test Utterances
* Load test utterances
* Load syllable-level data for test utterances

In [3]:
utts=["Jordi had niet verwacht dat Melissas ouders het goed zouden vinden dat ie meeging",
      "En dat Jordi van plan was de archeologen daar te vragen of ze mochten helpen bij de opgravingen",
     "Hoewel het voor hem niet gemakkelijk was deze dag te beleven zonder mama",
     "Soms wilde Lisa dat zijzelf ook zo gemakkelijk kon praten als Julia",
     "Maar na een bevolkingsonderzoek voor vrouwen was er borstkanker bij haar geconstateerd",
     "Maar door al dat gestress ga je straks over de kop"]

In [10]:
syll_data = pd.read_table('utt_syll_st_et.txt')

# Word-level Analysis

## Load sample data
* Load first 10 participants (can adjust using n)
* eeg_data is saved but not used by these scripts (recommend deleting)

Note: The files are formatted the same as they are in the CGN corpus

In [None]:
participants, eeg_data = load_subset.get_data_sample(n=10)

## Filter data to get phrases

_Note that the audio files in the folder don't match the once that I have locally. The folder has individual files while the metadata for participants (which matches my data) have multiple files and have been converted to 60dB (e.g., fn001124_fn001125_60db.wav). We should check with __Martijn__ to find the Ponyland location for these files._

In [31]:
importlib.reload(locations)
locations.cgn_audio

'/vol/bigdata/corpora2/CGN2/data/audio/wav/comp-o/nl/'

The function below will generate a dictionary object with the phrase as a key and the following values:
* __phrase_eeg__: The raw eeg data corresponding to the phrase
* __sound_sample__: The audio data corresponding to the phrase
* __bad_electrodes__: Any electrodes that were listed as bad when loading the participant data
* __\[wav,block,participant\]__: An array that has metadata: the wav filename, the block name, and the participant id
* __\[phrase_st,phrase_et]__: An array that indicates the start and end time of the phrase in the audio
* __\[eeg_s,eeg_e]__: An array that indicates the start and end of the phrase in the EEG slice 

In [None]:
phrase_data = phrase_search.find_all_phrases(utts,participants)

## Collect PLV data

This function takes as input the phrase_data exported from the word-level analysis above. It then matches the phrase data with audio data, resamples to 1kHz, computes PLV between the audio and each good electrode,and returns the average value across those electrodes. Results are a dictionary with the phrase as a key and the following values as an array:
* The average PLV value across all channels
* The value of each channel's PLV

In [None]:
theta_band_data = tfa_plv.plv_audio_eeg_phrases(phrase_data,band=['Theta',4,8])

Then, we can plot the PLV data.

In [None]:
tfa_plv.plot_plvs_for_utts(theta_band_data)

# Syllable-level Analysis

In [None]:
syll_phrase_data = phrase_search.find_all_phrases_syll(utts,participants,'sample_data/utt_syll_st_et.txt')