Repository for Data Science exam at Aarhus University, CogSci, 2020.
This README outlines the role of the different scripts of the analysis.
In the folder example_scripts is a series of python and R scripts as well as various data files. These are made to work rigth out of the box to do an analysis on a small version of the simulation presented in the paper. This was 2 LODs on task 1, where each LOD where ran half the length of that in the paper, thus, 30000 generations. What follows is a walk though describtion of how to run this. However, all the data generated by the different scripts are already present in the data folders, so it is possible to skip any step in the walk through.
-
In the MABE_data folder is the raw output from the simulation. This output is treated by the script IIT_analysis.py, which needs to be run in a jupyter kernel (e.g. in visual studio code). Here the path to the folder needs to be set in the top of the script. This script outputs a csv containing the fitness data (fitness.csv) and a csv for each run containing the IIT measures across all state transitions. To join these into one csv file, run the first 12 lines in the R script avg_data_script.R. This will generate the file trans_data.csv.
-
Next, the script CountStates.py adds the surprisal measures to each row in trans_data.csv. This script needs to be run 3 times, one time for each restrinction in the set of states that are counted (None, Run or Agent). This will generate 3 trans_data.csv files with the prefixes, none_, run_, and agent_.
-
Open the R script avg_data_script.R again. Remember to run the libraries and and the setwd() again. Now run the script from below line 12. This will generate the averaged data used for the generation scale analysis, all_avg_data.csv. It imports the function in the file avg_function.R, so that file doesn't need to be opened.
-
The script avg_analysis.R uses the file all_avg_data.csv to produce the plots of the generation scale analysis. Note that there are not multiple tasks in this data and generally a lot less data, so the plots will look somewhat different. In this script further averaging is done to fit the methods in the study we are trying to replicate.
-
The script TimeSeries.R uses the transition data files generated at step 2 to make the cross correlation analysis. Similar to the procedure on step 2, this script needs to be run 3 times by commenting out values at the startof the script one for each of the files generated at step 2. These are the values of the variable "type". Tis will generate the csv files none_cordata.csv, agent_cordata.csv and run_cordata.csv.
-
Lastly, the plots of the cross corelation analysis are made using the script CorAnalysis.R.
The folder actual_agency contains code used to transform the MABE data into the desired format for pyphi (the package used to do the IIT analysis on step 1). This code was not produced by us.