-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trouble running large dataset #54
Comments
Hi Valarie, Could you parallelize the computations by e.g., submitting compute jobs via a job scheduler on a compute cluster? |
Depending on your experimental paradigm, it may also make sense to split the 30k rows into shorter time chunks. I haven't worked a lot on the topic of gaze paths and gaze path comparisons, and only you know what would work for your experiment, but if the scan paths you compare are 30k lines long, I would suspect that similarities between them get distorted/diminished as a side effect of the long vector length. |
Hello, Thank you for your kind response! I am working to use the Matlab parallel processes now. No, chunking the snanpaths rows into sections would remove some of the difference/similarity analyses as I cannot see to the millisecond where each participant was in the scenario script but can view it from a high-level picture, which is what I am looking to achieve here. |
I'm glad you seem to have found a solution for your problem :) I'll close this issue as there is nothing I can do in this tool box at this moment, but do feel free to reopen this issue at a later point, or open a new one if something comes up. :) |
Hello,
Thank you for publishing this code in python on github and for the support!
I am working on completing my dissertation using scanpaths and binary comparison works, although the processing is long.
Rob Newport at https://github.com/robnewport/SoftMatch has been wonderful in supporting this large dataset, but the process is long using his system in Matlab as well. I need this output to finish my analyses and defend in 30 days.
I need any suggestions you may have on running a very large dataset. This takes approx 40 min per binary comparison, and I have 50 participants that need cross-comparisons between two conditions for 5 scenario runs.
Do you have any suggestions on how to process these comparisons in a shorter time?
Each participant file has over 30,000 rows of fixations on x-y coordinates.
Thanks in advance,
Valarie
The text was updated successfully, but these errors were encountered: