Trouble running large dataset #54

YValarieAnne · 2023-03-03T13:15:11Z

Hello,

Thank you for publishing this code in python on github and for the support!
I am working on completing my dissertation using scanpaths and binary comparison works, although the processing is long.
Rob Newport at https://github.com/robnewport/SoftMatch has been wonderful in supporting this large dataset, but the process is long using his system in Matlab as well. I need this output to finish my analyses and defend in 30 days.

I need any suggestions you may have on running a very large dataset. This takes approx 40 min per binary comparison, and I have 50 participants that need cross-comparisons between two conditions for 5 scenario runs.

Do you have any suggestions on how to process these comparisons in a shorter time?
Each participant file has over 30,000 rows of fixations on x-y coordinates.

Thanks in advance,

Valarie

adswa · 2023-03-03T13:22:50Z

Hi Valarie,

Could you parallelize the computations by e.g., submitting compute jobs via a job scheduler on a compute cluster?

adswa · 2023-03-03T13:39:22Z

Depending on your experimental paradigm, it may also make sense to split the 30k rows into shorter time chunks. I haven't worked a lot on the topic of gaze paths and gaze path comparisons, and only you know what would work for your experiment, but if the scan paths you compare are 30k lines long, I would suspect that similarities between them get distorted/diminished as a side effect of the long vector length.

YValarieAnne · 2023-03-03T14:29:14Z

Hello, Thank you for your kind response! I am working to use the Matlab parallel processes now. No, chunking the snanpaths rows into sections would remove some of the difference/similarity analyses as I cannot see to the millisecond where each participant was in the scenario script but can view it from a high-level picture, which is what I am looking to achieve here.
I was hoping someone had some experience with this large of a data set. I will update this with the solution that works, in case others are met with this conundrum in the future . Many thanks, Valarie

adswa · 2023-03-03T14:37:25Z

I'm glad you seem to have found a solution for your problem :) I'll close this issue as there is nothing I can do in this tool box at this moment, but do feel free to reopen this issue at a later point, or open a new one if something comes up. :)

adswa closed this as completed Mar 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trouble running large dataset #54

Trouble running large dataset #54

YValarieAnne commented Mar 3, 2023

adswa commented Mar 3, 2023

adswa commented Mar 3, 2023

YValarieAnne commented Mar 3, 2023

adswa commented Mar 3, 2023

Trouble running large dataset #54

Trouble running large dataset #54

Comments

YValarieAnne commented Mar 3, 2023

adswa commented Mar 3, 2023

adswa commented Mar 3, 2023

YValarieAnne commented Mar 3, 2023

adswa commented Mar 3, 2023