# Variable Choice

This notebook will document the reasoning behind the choice of variables for the factor analysis.

We are interested in numerical measures that relate to behavior. They should be on the level of the task or questionnaire, so no item level data or composite scores.

In [70]:
import pandas as pd

In [71]:
path_unr = "../data/01_inputs/hcp_behavioral.csv"
path_res = "../data/01_inputs/hcp_behavioral_RESTRICTED.csv"
path_dict = "../data/01_inputs/HCP_S1200_DataDictionary.csv"
data_unr = pd.read_csv(path_unr)
data_res = pd.read_csv(path_res)
data_dict = pd.read_csv(path_dict)

variables_unr = []
variables_res = []

The variables of the HCP dataset are separated into the following categories:

In [72]:
categories = list(data_dict['category'].unique())
categories

['Subject Information',
 'Study Completion',
 'QC Issues',
 'MR Sessions',
 'MEG Sessions',
 'MEG Subjects',
 'Health and Family History',
 'Alertness',
 'Cognition',
 'Emotion',
 'FreeSurfer',
 'In-Scanner Task Performance',
 'Motor',
 'Personality',
 'Psychiatric and Life Function',
 'Sensory',
 'Substance Use',
 '7T Eye Tracker Metadata']

## Subject Information

In [73]:
data_dict[data_dict['category'] == 'Subject Information']

Unnamed: 0,fullDisplayName,category,assessment,columnHeader,description
0,Subject,Subject Information,Demographics,Subject,HCP Subject ID
1,Quarter Released,Subject Information,Demographics,Release,HCP data release in which this subject's data ...
2,Acquisition Quarter,Subject Information,Demographics,Acquisition,Quarter in which this subject's 3T and behavio...
3,Gender,Subject Information,Demographics,Gender,Gender of Subject
4,Age Range,Subject Information,Demographics,Age,"Age group of Participant, banded in five-year ..."
5,Age Range,Subject Information,Demographics,Age,Age group of Participant.
6,Age Range,Subject Information,Demographics,Age,Age group of Participant.
7,Age in Years,Subject Information,Demographics,Age_in_Yrs,Age of Participant in Years
8,HasGT,Subject Information,Demographics,HasGT,Genotyping data available from at least one of...
9,ZygositySR,Subject Information,Demographics,ZygositySR,Self-reported zygosity. Until the S1200 releas...


Interesting in the Subject Information category are Age and Gender as confounding variables. Also we need Family_ID to make sure that siblings are sorted into the same split.

## Alertness

The next category relevant for us is Alertness.

In [74]:
data_dict[data_dict['category'] == 'Alertness']

Unnamed: 0,fullDisplayName,category,assessment,columnHeader,description
162,Mini Mental Status Exam Total Score,Alertness,Cognitive Status (Mini Mental Status Exam),MMSE_Score,Mini Mental Status Exam (MMSE) Total Score. Th...
163,Sleep (Pittsburgh Sleep Questionnaire) Total S...,Alertness,Sleep (Pittsburgh Sleep Questionnaire),PSQI_Score,The total score across all items on the Pittsb...
164,Sleep (Pittsburgh Sleep Questionnaire) Compone...,Alertness,Sleep (Pittsburgh Sleep Questionnaire),PSQI_Comp1,The Pittsburgh Sleep Quality Index (PSQI) Comp...
165,Sleep (Pittsburgh Sleep Questionnaire) Compone...,Alertness,Sleep (Pittsburgh Sleep Questionnaire),PSQI_Comp2,The Pittsburgh Sleep Quality Index (PSQI) Comp...
166,Sleep (Pittsburgh Sleep Questionnaire) Compone...,Alertness,Sleep (Pittsburgh Sleep Questionnaire),PSQI_Comp3,The Pittsburgh Sleep Quality Index (PSQI) Comp...
167,Sleep (Pittsburgh Sleep Questionnaire) Compone...,Alertness,Sleep (Pittsburgh Sleep Questionnaire),PSQI_Comp4,The Pittsburgh Sleep Quality Index (PSQI) Comp...
168,Sleep (Pittsburgh Sleep Questionnaire) Compone...,Alertness,Sleep (Pittsburgh Sleep Questionnaire),PSQI_Comp5,The Pittsburgh Sleep Quality Index (PSQI) Comp...
169,Sleep (Pittsburgh Sleep Questionnaire) Compone...,Alertness,Sleep (Pittsburgh Sleep Questionnaire),PSQI_Comp6,The Pittsburgh Sleep Quality Index (PSQI) Comp...
170,Sleep (Pittsburgh Sleep Questionnaire) Compone...,Alertness,Sleep (Pittsburgh Sleep Questionnaire),PSQI_Comp7,The Pittsburgh Sleep Quality Index (PSQI) Comp...
171,1. Usual bed time (past month),Alertness,Sleep (Pittsburgh Sleep Questionnaire),PSQI_BedTime,"PSQI 1. During the past month, when have you u..."


There is the Mini Mental Status Exam, a neuropsychological test usually used for screening for dementia, and the Pittsburgh Sleep Questionnaire. For the latter there are a total score, component scores, and the individual items. The total score should suffice.

In [75]:
variables_unr.extend(['MMSE_Score', 'PSQI_Score'])

## Cognition

In [76]:
cog_vars = data_dict[data_dict['category'] == 'Cognition']
cog_vars

Unnamed: 0,fullDisplayName,category,assessment,columnHeader,description
190,NIH Toolbox Picture Sequence Memory Test: Unad...,Cognition,Episodic Memory (Picture Sequence Memory),PicSeq_Unadj,The Picture Sequence Memory Test is an assessm...
191,NIH Toolbox Picture Sequence Memory Test: Age-...,Cognition,Episodic Memory (Picture Sequence Memory),PicSeq_AgeAdj,The Picture Sequence Memory Test is an assessm...
192,NIH Toolbox Dimensional Change Card Sort Test:...,Cognition,Executive Function/Cognitive Flexibility (Dime...,CardSort_Unadj,The Dimensional Change Card Sort Test is a mea...
193,NIH Toolbox Dimensional Change Card Sort Test:...,Cognition,Executive Function/Cognitive Flexibility (Dime...,CardSort_AgeAdj,The Dimensional Change Card Sort Test is a mea...
194,NIH Toolbox Flanker Inhibitory Control and Att...,Cognition,Executive Function/Inhibition (Flanker Task),Flanker_Unadj,The Flanker is a measure of executive function...
195,NIH Toolbox Flanker Inhibitory Control and Att...,Cognition,Executive Function/Inhibition (Flanker Task),Flanker_AgeAdj,The Flanker is a measure of executive function...
196,Penn Progressive Matrices: Number of Correct R...,Cognition,Fluid Intelligence (Penn Progressive Matrices),PMAT24_A_CR,Penn Matrix Test (PMAT): Number of Correct Res...
197,Penn Progressive Matrices: Total Skipped Items...,Cognition,Fluid Intelligence (Penn Progressive Matrices),PMAT24_A_SI,Penn Matrix Test (PMAT): Total Skipped Items (...
198,Penn Progressive Matrices: Median Reaction Tim...,Cognition,Fluid Intelligence (Penn Progressive Matrices),PMAT24_A_RTCR,Penn Matrix Test (PMAT): Median Reaction Time ...
199,NIH Toolbox Oral Reading Recognition Test: Una...,Cognition,Language/Reading Decoding (Oral Reading Recogn...,ReadEng_Unadj,The Reading Test is a CAT format measure of re...


There is a whole battery of cognitive tasks in this dataset. For a lot of the measures there is an unadjusted and an age-adjusted version. As we want to regress out both age and gender, we will use the unadjusted versions.

Also, there are some composite scores, which combine some of the tasks. We will only use task scores directly and exclude the composite scores, to only have this information enter the analysis once.

Let's look at the individual tasks in more detail.

In [77]:
cog_vars[:6]

Unnamed: 0,fullDisplayName,category,assessment,columnHeader,description
190,NIH Toolbox Picture Sequence Memory Test: Unad...,Cognition,Episodic Memory (Picture Sequence Memory),PicSeq_Unadj,The Picture Sequence Memory Test is an assessm...
191,NIH Toolbox Picture Sequence Memory Test: Age-...,Cognition,Episodic Memory (Picture Sequence Memory),PicSeq_AgeAdj,The Picture Sequence Memory Test is an assessm...
192,NIH Toolbox Dimensional Change Card Sort Test:...,Cognition,Executive Function/Cognitive Flexibility (Dime...,CardSort_Unadj,The Dimensional Change Card Sort Test is a mea...
193,NIH Toolbox Dimensional Change Card Sort Test:...,Cognition,Executive Function/Cognitive Flexibility (Dime...,CardSort_AgeAdj,The Dimensional Change Card Sort Test is a mea...
194,NIH Toolbox Flanker Inhibitory Control and Att...,Cognition,Executive Function/Inhibition (Flanker Task),Flanker_Unadj,The Flanker is a measure of executive function...
195,NIH Toolbox Flanker Inhibitory Control and Att...,Cognition,Executive Function/Inhibition (Flanker Task),Flanker_AgeAdj,The Flanker is a measure of executive function...


The first three tasks, testing memory and executive function, are pretty straightforward. We will use the unadjusted scores.

In [78]:
variables_unr.extend(['PicSeq_Unadj', 'CardSort_Unadj', 'Flanker_Unadj'])

In [79]:
cog_vars[6:9]

Unnamed: 0,fullDisplayName,category,assessment,columnHeader,description
196,Penn Progressive Matrices: Number of Correct R...,Cognition,Fluid Intelligence (Penn Progressive Matrices),PMAT24_A_CR,Penn Matrix Test (PMAT): Number of Correct Res...
197,Penn Progressive Matrices: Total Skipped Items...,Cognition,Fluid Intelligence (Penn Progressive Matrices),PMAT24_A_SI,Penn Matrix Test (PMAT): Total Skipped Items (...
198,Penn Progressive Matrices: Median Reaction Tim...,Cognition,Fluid Intelligence (Penn Progressive Matrices),PMAT24_A_RTCR,Penn Matrix Test (PMAT): Median Reaction Time ...


The next task is the Penn Progressive Matrices, a fluid intelligence task often used in IQ tests. We have three measures: the number of correct responses, total skipped items, and median reaction time.

In [80]:
data_unr[cog_vars['columnHeader'].iloc[6:9]].corr()

Unnamed: 0,PMAT24_A_CR,PMAT24_A_SI,PMAT24_A_RTCR
PMAT24_A_CR,1.0,-0.9713,0.721995
PMAT24_A_SI,-0.9713,1.0,-0.701853
PMAT24_A_RTCR,0.721995,-0.701853,1.0


In [81]:
cog_vars.iloc[7]['description']

"Penn Matrix Test (PMAT): Total Skipped Items (items not presented because maximum errors allowed reached [5 in a row]). The PMAT measures fluid intelligence via non-verbal reasoning using an abbreviated version of the Raven's Progressive Matrices Form A developed by Gur and colleagues (Bilker et al. 2012).  Participants are presented with patterns made up of 2x2, 3x3 or 1x5 arrangements of squares, with one of the squares missing.  The participant must pick one of five response choices that best fits the missing square on the pattern.  The task has 24 items and 3 bonus items, arranged in order of increasing difficulty.  However, the task discontinues if the participant makes 5 incorrect responses in a row."

We can see that correct responses and skipped items correlate almost perfectly. This is because the test is stopped after 5 errors, meaning that correct responses and skipped items directly depend on each other. They are thus collinear, which can cause problems in the factor estimation. Consequently, we will only use correct responses and reaction time for the PMAT.

In [82]:
variables_unr.extend(['PMAT24_A_CR', 'PMAT24_A_RTCR'])

In [83]:
cog_vars[9:15]

Unnamed: 0,fullDisplayName,category,assessment,columnHeader,description
199,NIH Toolbox Oral Reading Recognition Test: Una...,Cognition,Language/Reading Decoding (Oral Reading Recogn...,ReadEng_Unadj,The Reading Test is a CAT format measure of re...
200,NIH Toolbox Oral Reading Recognition Test: Age...,Cognition,Language/Reading Decoding (Oral Reading Recogn...,ReadEng_AgeAdj,The Reading Test is a CAT format measure of re...
201,NIH Toolbox Picture Vocabulary Test: Unadjuste...,Cognition,Language/Vocabulary Comprehension (Picture Voc...,PicVocab_Unadj,The Picture Vocabulary Test is a CAT format me...
202,NIH Toolbox Picture Vocabulary Test: Age-Adjus...,Cognition,Language/Vocabulary Comprehension (Picture Voc...,PicVocab_AgeAdj,The Picture Vocabulary Test is a CAT format me...
203,NIH Toolbox Pattern Comparison Processing Spee...,Cognition,Processing Speed (Pattern Completion Processin...,ProcSpeed_Unadj,The Pattern Comparison Processing Test is a me...
204,NIH Toolbox Pattern Comparison Processing Spee...,Cognition,Processing Speed (Pattern Completion Processin...,ProcSpeed_AgeAdj,The Pattern Comparison Processing Test is a me...


The next ones are easy again, only one score to add per task.

In [84]:
variables_unr.extend(['ReadEng_Unadj', 'PicVocab_Unadj', 'ProcSpeed_Unadj'])

In [85]:
cog_vars[15:29]

Unnamed: 0,fullDisplayName,category,assessment,columnHeader,description
205,Delay Discounting: Subjective Value for $200 a...,Cognition,Self-regulation/Impulsivity (Delay Discounting),DDisc_SV_1mo_200,Delay Discounting: Subjective Value for $200 a...
206,Delay Discounting: Subjective Value for $200 a...,Cognition,Self-regulation/Impulsivity (Delay Discounting),DDisc_SV_6mo_200,Delay Discounting: Subjective Value for $200 a...
207,Delay Discounting: Subjective Value for $200 a...,Cognition,Self-regulation/Impulsivity (Delay Discounting),DDisc_SV_1yr_200,Delay Discounting: Subjective Value for $200 a...
208,Delay Discounting: Subjective Value for $200 a...,Cognition,Self-regulation/Impulsivity (Delay Discounting),DDisc_SV_3yr_200,Delay Discounting: Subjective Value for $200 a...
209,Delay Discounting: Subjective Value for $200 a...,Cognition,Self-regulation/Impulsivity (Delay Discounting),DDisc_SV_5yr_200,Delay Discounting: Subjective Value for $200 a...
210,Delay Discounting: Subjective Value for $200 a...,Cognition,Self-regulation/Impulsivity (Delay Discounting),DDisc_SV_10yr_200,Delay Discounting: Subjective Value for $200 a...
211,Delay Discounting: Subjective Value for $40K a...,Cognition,Self-regulation/Impulsivity (Delay Discounting),DDisc_SV_1mo_40K,Delay Discounting: Subjective Value for $40K a...
212,Delay Discounting: Subjective Value for $40K a...,Cognition,Self-regulation/Impulsivity (Delay Discounting),DDisc_SV_6mo_40K,Delay Discounting: Subjective Value for $40K a...
213,Delay Discounting: Subjective Value for $40K a...,Cognition,Self-regulation/Impulsivity (Delay Discounting),DDisc_SV_1yr_40K,Delay Discounting: Subjective Value for $40K a...
214,Delay Discounting: Subjective Value for $40K a...,Cognition,Self-regulation/Impulsivity (Delay Discounting),DDisc_SV_3yr_40K,Delay Discounting: Subjective Value for $40K a...


For the Delay Discounting task, the last two measures give a task level score, so we add them to our list.

In [86]:
variables_unr.extend(['DDisc_AUC_200', 'DDisc_AUC_40K'])

In [87]:
cog_vars[29:32]

Unnamed: 0,fullDisplayName,category,assessment,columnHeader,description
219,Variable Short Penn Line Orientation: Total Nu...,Cognition,Spatial Orientation (Variable Short Penn Line ...,VSPLOT_TC,Penn Line Orientation: Total Number Correct
220,Variable Short Penn Line Orientation: Median R...,Cognition,Spatial Orientation (Variable Short Penn Line ...,VSPLOT_CRTE,Penn Line Orientation: Median Reaction Time Di...
221,Variable Short Penn Line Orientation: Total Po...,Cognition,Spatial Orientation (Variable Short Penn Line ...,VSPLOT_OFF,Penn Line Orientation: Total Positions Off fo...


In [88]:
data_unr[cog_vars['columnHeader'].iloc[29:32]].corr()

Unnamed: 0,VSPLOT_TC,VSPLOT_CRTE,VSPLOT_OFF
VSPLOT_TC,1.0,0.142671,-0.786139
VSPLOT_CRTE,0.142671,1.0,-0.063169
VSPLOT_OFF,-0.786139,-0.063169,1.0


In the Variable Short Penn Line Orientation test we have a similar issue as with the PMAT. The correlation is less strong between the total correct answers and the number of positions off than in the PMAT, but the two measures are dependent nonetheless. The VSPLOT is a visual task where the participant has to match the orientation of a line shown on the screen. The number of positions off is thus a more fine-grained measurement and what we will add as the accuracy score for the VSPLOT.

In [89]:
variables_unr.extend(['VSPLOT_CRTE', 'VSPLOT_OFF'])

In [90]:
cog_vars[32:40]

Unnamed: 0,fullDisplayName,category,assessment,columnHeader,description
222,Short Penn Continuous Performance Test: True P...,Cognition,Sustained Attention (Short Penn Continuous Per...,SCPT_TP,Short Penn CPT True Positives = Sum of CPN_TP ...
223,Short Penn Continuous Performance Test: True N...,Cognition,Sustained Attention (Short Penn Continuous Per...,SCPT_TN,Short Penn CPT True Negatives = Sum of CPN_TN ...
224,Short Penn Continuous Performance Test: False ...,Cognition,Sustained Attention (Short Penn Continuous Per...,SCPT_FP,Short Penn CPT False Positives = Sum of CPN_FP...
225,Short Penn Continuous Performance Test: False ...,Cognition,Sustained Attention (Short Penn Continuous Per...,SCPT_FN,Short Penn CPT False Negatives = Sum of CPN_FN...
226,Short Penn Continuous Performance Test: Median...,Cognition,Sustained Attention (Short Penn Continuous Per...,SCPT_TPRT,Short Penn CPT Median Response Time for True P...
227,Short Penn Continuous Performance Test: Sensit...,Cognition,Sustained Attention (Short Penn Continuous Per...,SCPT_SEN,Short Penn CPT Sensitivity = SCPT_TP/(SCPT_TP ...
228,Short Penn Continuous Performance Test: Specif...,Cognition,Sustained Attention (Short Penn Continuous Per...,SCPT_SPEC,Short Penn CPT Specificity = SCPT_TN/(SCPT_TN ...
229,Short Penn Continuous Performance Test: Longes...,Cognition,Sustained Attention (Short Penn Continuous Per...,SCPT_LRNR,Short Penn CPT Longest Run of Non-Responses)


For the Short Penn Continuous Performance Test there is a bunch of measures. Additionally to the response time we are going to use sensitivity and specificity, as they combine the information from the other variables without overlap.

In [91]:
cog_vars.iloc[37]['description']

'Short Penn CPT Sensitivity = SCPT_TP/(SCPT_TP + SCPT_FN)'

In [92]:
cog_vars.iloc[38]['description']

'Short Penn CPT Specificity = SCPT_TN/(SCPT_TN + SCPT_FP)'

In [93]:
variables_unr.extend(['SCPT_SEN', 'SCPT_SPEC', 'SCPT_TPRT'])

In [94]:
cog_vars[40:]

Unnamed: 0,fullDisplayName,category,assessment,columnHeader,description
230,Penn Word Memory Test: Total Number of Correct...,Cognition,Verbal Episodic Memory (Penn Word Memory Test),IWRD_TOT,Penn Word Memory: Total Number of Correct Res...
231,Penn Word Memory Test: Median Reaction Time fo...,Cognition,Verbal Episodic Memory (Penn Word Memory Test),IWRD_RTC,Penn Word Memory: Median Reaction Time for Co...
232,NIH Toolbox List Sorting Working Memory Test: ...,Cognition,Working Memory (List Sorting),ListSort_Unadj,This task assesses working memory and requires...
233,NIH Toolbox List Sorting Working Memory Test: ...,Cognition,Working Memory (List Sorting),ListSort_AgeAdj,This task assesses working memory and requires...
234,NIH Toolbox Cognition Fluid Composite: Unadjus...,Cognition,Cognition Fluid Composite,CogFluidComp_Unadj,The Fluid Cognition Composite score is derived...
235,NIH Toolbox Cognition Fluid Composite: Age Adj...,Cognition,Cognition Fluid Composite,CogFluidComp_AgeAdj,The Fluid Cognition Composite score is derived...
236,NIH Toolbox Cognition Early Childhood Composit...,Cognition,Cognition Early Childhood Composite,CogEarlyComp_Unadj,The Early Childhood Composite score is derived...
237,NIH Toolbox Cognition Early Childhood Composit...,Cognition,Cognition Early Childhood Composite,CogEarlyComp_AgeAdj,The Early Childhood Composite score is derived...
238,NIH Toolbox Cognition Total Composite Score: U...,Cognition,Cognition Total Composite Score,CogTotalComp_Unadj,The Cognitive Function Composite score is deri...
239,NIH Toolbox Cognition Total Composite Score: A...,Cognition,Cognition Total Composite Score,CogTotalComp_AgeAdj,The Cognitive Function Composite score is deri...


We can add the Penn Word Memory Test and the List Sorting Working Memory Test. What follows are only composite measures.

In [95]:
variables_unr.extend(cog_vars.iloc[40:43]['columnHeader'])

## Emotion

In [96]:
emo_vars = data_dict[data_dict['category'] == 'Emotion']
emo_vars

Unnamed: 0,fullDisplayName,category,assessment,columnHeader,description
242,Penn Emotion Recognition Test: Number of Corre...,Emotion,Emotion Recognition (Penn Emotion Recognition ...,ER40_CR,Penn Emotion Recognition: Number of Correct Re...
243,Penn Emotion Recognition Test: Correct Respons...,Emotion,Emotion Recognition (Penn Emotion Recognition ...,ER40_CRT,Penn Emotion Recognition: Correct Responses Me...
244,Penn Emotion Recognition Test: Number of Corre...,Emotion,Emotion Recognition (Penn Emotion Recognition ...,ER40ANG,Penn Emotion Recognition: Number of Correct An...
245,Penn Emotion Recognition Test: Number of Corre...,Emotion,Emotion Recognition (Penn Emotion Recognition ...,ER40FEAR,Penn Emotion Recognition: Number of Correct Fe...
246,Penn Emotion Recognition Test: Number of Corre...,Emotion,Emotion Recognition (Penn Emotion Recognition ...,ER40HAP,Penn Emotion Recognition: Number of Correct Ha...
247,Penn Emotion Recognition Test: Number of Corre...,Emotion,Emotion Recognition (Penn Emotion Recognition ...,ER40NOE,Penn Emotion Recognition: Number of Correct Ne...
248,Penn Emotion Recognition Test: Number of Corre...,Emotion,Emotion Recognition (Penn Emotion Recognition ...,ER40SAD,Penn Emotion Recognition: Number of Correct Sa...
249,NIH Toolbox Anger-Affect Survey: Unadjusted Sc...,Emotion,"Negative Affect (Sadness, Fear, Anger)",AngAffect_Unadj,This self-report measure assesses anger as an ...
250,NIH Toolbox Anger-Hostility Survey: Unadjusted...,Emotion,"Negative Affect (Sadness, Fear, Anger)",AngHostil_Unadj,This self-report measure assesses attitudes of...
251,NIH Toolbox Anger-Physical Aggression Survey: ...,Emotion,"Negative Affect (Sadness, Fear, Anger)",AngAggr_Unadj,This self-report measure assesses aggression a...


From the emotion category we will add the accuracy and reaction time measure from the Penn Emotion Recognition Test. Also, we will use all NIH Toolbox measures.

In [97]:
variables_unr.extend(emo_vars.iloc[0:2]['columnHeader'])
variables_unr.extend(emo_vars.iloc[7:]['columnHeader'])

## In-Scanner Task Performance

In [98]:
task_vars = data_dict[data_dict['category'] == 'In-Scanner Task Performance']
task_vars[:55]

Unnamed: 0,fullDisplayName,category,assessment,columnHeader,description
467,OVERALL Emotion Task accuracy,In-Scanner Task Performance,Emotion,Emotion_Task_Acc,Average accuracy Percentage during EMOTION task
468,OVERALL Emotion Task Reaction Time,In-Scanner Task Performance,Emotion,Emotion_Task_Median_RT,Average of median Reaction Times from each con...
469,Emotion Task FACE accuracy,In-Scanner Task Performance,Emotion,Emotion_Task_Face_Acc,Accuracy Percentage during FACE blocks in EMOT...
470,Emotion Task FACE median Reaction Time,In-Scanner Task Performance,Emotion,Emotion_Task_Face_Median_RT,Median Reaction Time for correct trials during...
471,Emotion Task SHAPE accuracy,In-Scanner Task Performance,Emotion,Emotion_Task_Shape_Acc,Accuracy Percentage during SHAPE blocks in EMO...
472,Emotion Task SHAPE median Reaction Time,In-Scanner Task Performance,Emotion,Emotion_Task_Shape_Median_RT,Median Reaction Time for correct trials during...
473,Gambling Task Overall Percentage 'Larger',In-Scanner Task Performance,Gambling,Gambling_Task_Perc_Larger,Overall Percentage of Trials that received a '...
474,Gambling Task Overall Percentage 'Smaller',In-Scanner Task Performance,Gambling,Gambling_Task_Perc_Smaller,Overall Percentage of Trials that received a '...
475,Gambling Task Overall Percentage No Logged Res...,In-Scanner Task Performance,Gambling,Gambling_Task_Perc_NLR,Overall Percentage of Trials with no response ...
476,Gambling Task Overall Reaction Time 'Larger',In-Scanner Task Performance,Gambling,Gambling_Task_Median_RT_Larger,Average of Median Reaction Times from Trials t...


In [99]:
task_vars[:6]

Unnamed: 0,fullDisplayName,category,assessment,columnHeader,description
467,OVERALL Emotion Task accuracy,In-Scanner Task Performance,Emotion,Emotion_Task_Acc,Average accuracy Percentage during EMOTION task
468,OVERALL Emotion Task Reaction Time,In-Scanner Task Performance,Emotion,Emotion_Task_Median_RT,Average of median Reaction Times from each con...
469,Emotion Task FACE accuracy,In-Scanner Task Performance,Emotion,Emotion_Task_Face_Acc,Accuracy Percentage during FACE blocks in EMOT...
470,Emotion Task FACE median Reaction Time,In-Scanner Task Performance,Emotion,Emotion_Task_Face_Median_RT,Median Reaction Time for correct trials during...
471,Emotion Task SHAPE accuracy,In-Scanner Task Performance,Emotion,Emotion_Task_Shape_Acc,Accuracy Percentage during SHAPE blocks in EMO...
472,Emotion Task SHAPE median Reaction Time,In-Scanner Task Performance,Emotion,Emotion_Task_Shape_Median_RT,Median Reaction Time for correct trials during...


For the emotion task, we will use the individual accuracy and RT measures, as they test different aspects of cognition.

In [100]:
variables_unr.extend(task_vars.iloc[2:6]['columnHeader'])

In [101]:
task_vars[6:21]

Unnamed: 0,fullDisplayName,category,assessment,columnHeader,description
473,Gambling Task Overall Percentage 'Larger',In-Scanner Task Performance,Gambling,Gambling_Task_Perc_Larger,Overall Percentage of Trials that received a '...
474,Gambling Task Overall Percentage 'Smaller',In-Scanner Task Performance,Gambling,Gambling_Task_Perc_Smaller,Overall Percentage of Trials that received a '...
475,Gambling Task Overall Percentage No Logged Res...,In-Scanner Task Performance,Gambling,Gambling_Task_Perc_NLR,Overall Percentage of Trials with no response ...
476,Gambling Task Overall Reaction Time 'Larger',In-Scanner Task Performance,Gambling,Gambling_Task_Median_RT_Larger,Average of Median Reaction Times from Trials t...
477,Gambling Task Overall Reaction Time 'Smaller',In-Scanner Task Performance,Gambling,Gambling_Task_Median_RT_Smaller,Average of Median Reaction Times from Trials t...
478,Gambling Task Percentage 'Larger' in Reward,In-Scanner Task Performance,Gambling,Gambling_Task_Reward_Perc_Larger,Percentage of Reward Trials that received a 'l...
479,Gambling Task Median Reaction Time 'Larger' in...,In-Scanner Task Performance,Gambling,Gambling_Task_Reward_Median_RT_Larger,Median Reaction Time for Reward Trials that re...
480,Gambling Task Percentage 'Smaller' in Reward,In-Scanner Task Performance,Gambling,Gambling_Task_Reward_Perc_Smaller,Percentage of Reward Trials that received a 's...
481,Gambling Task Median Reaction Time 'Smaller' i...,In-Scanner Task Performance,Gambling,Gambling_Task_Reward_Median_RT_Smaller,Median Reaction Time for Reward Trials that re...
482,Gambling Task Percentage No Logged response in...,In-Scanner Task Performance,Gambling,Gambling_Task_Reward_Perc_NLR,Percentage of Reward Trials with no response l...


In the gambling task, the subjects behavior is not related to performance (the task is just used to elicit a brain response), so we will skip this.

In [102]:
task_vars[21:29]

Unnamed: 0,fullDisplayName,category,assessment,columnHeader,description
488,Language Task OVERALL accuracy,In-Scanner Task Performance,Language,Language_Task_Acc,Average of accuracy from each condition in the...
489,Language Task OVERALL median Reaction Time,In-Scanner Task Performance,Language,Language_Task_Median_RT,Average of median correct Reaction Time from e...
490,Language Task STORY accuracy,In-Scanner Task Performance,Language,Language_Task_Story_Acc,Accuracy Percentage during STORY condition in ...
491,Language Task STORY median Reaction Time,In-Scanner Task Performance,Language,Language_Task_Story_Median_RT,Median Reaction Time for correct trials during...
492,Language Task STORY difficulty level,In-Scanner Task Performance,Language,Language_Task_Story_Avg_Difficulty_Level,Average difficulty level of stimuli presented ...
493,Language Task MATH accuracy,In-Scanner Task Performance,Language,Language_Task_Math_Acc,Accuracy Percentage during MATH condition in L...
494,Language Task MATH median Reaction Time,In-Scanner Task Performance,Language,Language_Task_Math_Median_RT,Median Reaction Time for correct trials during...
495,Language Task MATH difficulty level,In-Scanner Task Performance,Language,Language_Task_Math_Avg_Difficulty_Level,Average difficulty level of stimuli presented ...


As in the emotion task, math and story relate to different aspects of cognition. We will only use accuracy and not average difficulty level to avoid collinearity of items.

In [103]:
variables_unr.extend(['Language_Task_Story_Acc', 'Language_Task_Story_Median_RT', 'Language_Task_Math_Acc', 'Language_Task_Math_Median_RT'])

In [104]:
task_vars[29:35]

Unnamed: 0,fullDisplayName,category,assessment,columnHeader,description
496,Relational Task OVERALL accuracy,In-Scanner Task Performance,Relational,Relational_Task_Acc,Average accuracy Percentage during RELATIONAL ...
497,Relational Task OVERALL Reaction Time,In-Scanner Task Performance,Relational,Relational_Task_Median_RT,Average of median Reaction Times from each con...
498,Relational Task MATCH accuracy,In-Scanner Task Performance,Relational,Relational_Task_Match_Acc,Accuracy Percentage during MATCH blocks in REL...
499,Relational Task MATCH median Reaction Time,In-Scanner Task Performance,Relational,Relational_Task_Match_Median_RT,Median Reaction Time for correct trials during...
500,Relational Task RELATIONAL block (REL) accuracy,In-Scanner Task Performance,Relational,Relational_Task_Rel_Acc,Accuracy Percentage during RELATIONAL blocks i...
501,Relational Task RELATIONAL block (REL) median ...,In-Scanner Task Performance,Relational,Relational_Task_Rel_Median_RT,Median Reaction Time for correct trials during...


Same game.

In [105]:
variables_unr.extend(task_vars.iloc[31:35]['columnHeader'])

In [106]:
task_vars[35:56]

Unnamed: 0,fullDisplayName,category,assessment,columnHeader,description
502,Social Task Overall Percentage 'Random',In-Scanner Task Performance,Social,Social_Task_Perc_Random,Overall Percentage of stimuli that the subject...
503,Social Task Overall Percentage 'TOM',In-Scanner Task Performance,Social,Social_Task_Perc_TOM,Overall Percentage of stimuli that received a ...
504,Social Task Overall Percentage 'Unsure',In-Scanner Task Performance,Social,Social_Task_Perc_Unsure,Overall Percentage of stimuli that received a ...
505,Social Task Overall Percentage No Logged Response,In-Scanner Task Performance,Social,Social_Task_Perc_NLR,Overall Percentage of stimuli with no response...
506,Social Task Overall Reaction Time 'Random',In-Scanner Task Performance,Social,Social_Task_Median_RT_Random,Average of Median Reaction Times from stimuli ...
507,Social Task Overall Reaction Time 'TOM',In-Scanner Task Performance,Social,Social_Task_Median_RT_TOM,Average of Median Reaction Times from stimuli ...
508,Social Task Overall Reaction Time 'Unsure',In-Scanner Task Performance,Social,Social_Task_Median_RT_Unsure,Average of Median Reaction Times from stimuli ...
509,Social Task Percentage 'Random' in Random cond...,In-Scanner Task Performance,Social,Social_Task_Random_Perc_Random,Percentage of Random stimuli that the subject ...
510,Social Task Median Reaction Time 'Random' in R...,In-Scanner Task Performance,Social,Social_Task_Random_Median_RT_Random,Median Reaction Time for Random stimuli that t...
511,Social Task Percentage 'TOM' in Random condition,In-Scanner Task Performance,Social,Social_Task_Random_Perc_TOM,Percentage of stimuli that received a 'social'...


In the social task, relevant performance is in the Percentage ToM in ToM condition, in other words, the fraction of correctly identified social interactions, and the corresponding reaction time, wich is analogue to the reaction time for correct answers in other tasks.

In [107]:
variables_unr.extend(['Social_Task_TOM_Perc_TOM', 'Social_Task_TOM_Median_RT_TOM'])

In [108]:
task_vars[56:]

Unnamed: 0,fullDisplayName,category,assessment,columnHeader,description
523,Working Memory Task Overall Accuracy,In-Scanner Task Performance,Working Memory,WM_Task_Acc,Accuracy across all conditions in WM task
524,Working Memory Task Overall Reaction Time,In-Scanner Task Performance,Working Memory,WM_Task_Median_RT,Average of Median Reaction Time for all condit...
525,Working Memory Task Accuracy for 2-back,In-Scanner Task Performance,Working Memory,WM_Task_2bk_Acc,Accuracy across all conditions in 2-back
526,Working Memory Task Median Reaction Time for 2...,In-Scanner Task Performance,Working Memory,WM_Task_2bk_Median_RT,Average of Median Reaction Time for all condit...
527,Working Memory Task Accuracy for 0-back,In-Scanner Task Performance,Working Memory,WM_Task_0bk_Acc,Accuracy across all conditions in 0-back
528,Working Memory Task Median Reaction Time for 0...,In-Scanner Task Performance,Working Memory,WM_Task_0bk_Median_RT,Average of Median Reaction Time for all condit...
529,Working Memory Task Accuracy for 0-back Body,In-Scanner Task Performance,Working Memory,WM_Task_0bk_Body_Acc,Accuracy across all trials in 0-back body cond...
530,Working Memory Task Accuracy for 0-back Body T...,In-Scanner Task Performance,Working Memory,WM_Task_0bk_Body_Acc_Target,Accuracy across target trials in 0-back body c...
531,Working Memory Task Accuracy for 0-back Body N...,In-Scanner Task Performance,Working Memory,WM_Task_0bk_Body_Acc_Nontarget,Accuracy across nontarget trials in 0-back bod...
532,Working Memory Task Accuracy for 0-back Face,In-Scanner Task Performance,Working Memory,WM_Task_0bk_Face_Acc,Accuracy across all trials in 0-back face cond...


For the Working Memory Task we will use the overall accuracy and reaction time.

In [109]:
variables_unr.extend(['WM_Task_Acc', 'WM_Task_Median_RT'])

## Motor

In [110]:
data_dict[data_dict['category'] == 'Motor']

Unnamed: 0,fullDisplayName,category,assessment,columnHeader,description
577,NIH Toolbox 2-minute Walk Endurance Test : Una...,Motor,Endurance (2 minute walk test),Endurance_Unadj,This test measures sub-maximal cardiovascular ...
578,NIH Toolbox 2-minute Walk Endurance Test : Age...,Motor,Endurance (2 minute walk test),Endurance_AgeAdj,This test measures sub-maximal cardiovascular ...
579,NIH Toolbox 4-Meter Walk Gait Speed Test: Comp...,Motor,Locomotion (4-meter walk test),GaitSpeed_Comp,Participants are asked to walk a short distanc...
580,NIH Toolbox 9-hole Pegboard Dexterity Test : U...,Motor,Dexterity (9-hole Pegboard),Dexterity_Unadj,This test of manual dexterity records the time...
581,NIH Toolbox 9-hole Pegboard Dexterity Test : A...,Motor,Dexterity (9-hole Pegboard),Dexterity_AgeAdj,This test of manual dexterity records the time...
582,NIH Toolbox Grip Strength Test: Unadjusted Sca...,Motor,Strength (Grip Strength Dynamometry),Strength_Unadj,Grip strength for each hand is measured with t...
583,NIH Toolbox Grip Strength Test: Age-Adjusted-A...,Motor,Strength (Grip Strength Dynamometry),Strength_AgeAdj,Grip strength for each hand is measured with t...


We'll include the unadjusted measures.

In [111]:
variables_unr.extend(['Endurance_Unadj', 'GaitSpeed_Comp', 'Dexterity_Unadj', 'Strength_Unadj'])

## Personality

In [112]:
data_dict[data_dict['category'] == 'Personality']

Unnamed: 0,fullDisplayName,category,assessment,columnHeader,description
584,NEO-FFI Agreeableness (NEOFAC_A),Personality,Five Factor Model (NEO-FFI) Factor Summary Scores,NEOFAC_A,Agreeableness Scale Score NEO-FFI (McCrae and ...
585,NEO-FFI Openness to Experience (NEOFAC_O),Personality,Five Factor Model (NEO-FFI) Factor Summary Scores,NEOFAC_O,Openness Scale Score NEO-FFI (McCrae and Costa...
586,NEO-FFI Conscientiousness (NEOFAC_C),Personality,Five Factor Model (NEO-FFI) Factor Summary Scores,NEOFAC_C,Conscientiousness Scale Score NEO-FFI (McCrae ...
587,NEO-FFI Neuroticism (NEOFAC_N),Personality,Five Factor Model (NEO-FFI) Factor Summary Scores,NEOFAC_N,Neuroticism Scale Score NEO-FFI (McCrae and Co...
588,NEO-FFI Extraversion (NEOFAC_E),Personality,Five Factor Model (NEO-FFI) Factor Summary Scores,NEOFAC_E,Extraversion Scale Score NEO-FFI (McCrae and C...
...,...,...,...,...,...
644,56. At times I have been so ashamed I just wan...,Personality,Five Factor Model (NEO-FFI) Raw Scores,NEORAW_56,56. At times I have been so ashamed I just wan...
645,57. I would rather go my own way than be a lea...,Personality,Five Factor Model (NEO-FFI) Raw Scores,NEORAW_57,57. I would rather go my own way than be a lea...
646,58. I often enjoy playing with theories or abs...,Personality,Five Factor Model (NEO-FFI) Raw Scores,NEORAW_58,58. I often enjoy playing with theories or abs...
647,"59. If necessary, I am willing to manipulate p...",Personality,Five Factor Model (NEO-FFI) Raw Scores,NEORAW_59,"59. If necessary, I am willing to manipulate p..."


We'll use the classic five personality measures Agreeableness, Openness to Experience, Conscientiousness, Neuroticism, and Extraversion.

In [113]:
variables_unr.extend(['NEOFAC_A', 'NEOFAC_O', 'NEOFAC_C', 'NEOFAC_N', 'NEOFAC_E'])

## Psychiatric and Life Function

In [114]:
data_dict[data_dict['category'] == 'Psychiatric and Life Function']

Unnamed: 0,fullDisplayName,category,assessment,columnHeader,description
649,ASR Anxious/Depressed Raw Score (ASR_Anxd_Raw),Psychiatric and Life Function,"Life Function (Achenbach Adult Self-Report, Sy...",ASR_Anxd_Raw,ASR Anxious/Depressed (scale I) Raw Score
650,ASR Anxious/Depressed Gender and Age Adjusted ...,Psychiatric and Life Function,"Life Function (Achenbach Adult Self-Report, Sy...",ASR_Anxd_T,ASR Anxious/Depressed (scale I) Gender and Age...
651,ASR Withdrawn Raw Score (ASR_Witd_Raw),Psychiatric and Life Function,"Life Function (Achenbach Adult Self-Report, Sy...",ASR_Witd_Raw,ASR Withdrawn (scale II) Raw Score
652,ASR Withdrawn Gender and Age Adjusted T-score ...,Psychiatric and Life Function,"Life Function (Achenbach Adult Self-Report, Sy...",ASR_Witd_T,ASR Withdrawn (scale II) Gender and Age Adjust...
653,ASR Somatic Complaints Raw Score (ASR_Soma_Raw),Psychiatric and Life Function,"Life Function (Achenbach Adult Self-Report, Sy...",ASR_Soma_Raw,ASR Somatic Complaints (scale III) Raw Score
654,ASR Somatic Complaints Gender and Age Adjusted...,Psychiatric and Life Function,"Life Function (Achenbach Adult Self-Report, Sy...",ASR_Soma_T,ASR Somatic Complaints (scale III) Gender and ...
655,ASR Thought Problems Raw Score (ASR_Thot_Raw),Psychiatric and Life Function,"Life Function (Achenbach Adult Self-Report, Sy...",ASR_Thot_Raw,ASR Thought Problems (scale IV) Raw Score
656,ASR Thought Problems Gender and Age Adjusted T...,Psychiatric and Life Function,"Life Function (Achenbach Adult Self-Report, Sy...",ASR_Thot_T,ASR Thought Problems (scale IV) Gender and Age...
657,ASR Attention Problems Raw Score (ASR_Attn_Raw),Psychiatric and Life Function,"Life Function (Achenbach Adult Self-Report, Sy...",ASR_Attn_Raw,ASR Attention Problems (scale V) Raw Score
658,ASR Attention Problems Gender and Age Adjusted...,Psychiatric and Life Function,"Life Function (Achenbach Adult Self-Report, Sy...",ASR_Attn_T,ASR Attention Problems (scale V) Gender and Ag...


The majority of items come from the Achenbach Adult Self-Report, which can be summarized using different scales. We will use the DSM scales (1-6), as this is the one more relevant for translational research. We will skip the items on the psychiatric history, as they do not relate to current behavior.

In [115]:
variables_res.extend(['DSM_Depr_Raw', 'DSM_Anxi_Raw', 'DSM_Somp_Raw', 'DSM_Avoid_Raw', 'DSM_Adh_Raw', 'DSM_Antis_Raw'])

## Sensory

In [116]:
data_dict[data_dict['category'] == 'Sensory']

Unnamed: 0,fullDisplayName,category,assessment,columnHeader,description
693,NIH Toolbox Words-In-Noise Age 6+: Computed Score,Sensory,Audition (Words in Noise),Noise_Comp,This test measures a person’s ability to recog...
694,NIH Toolbox Odor Identification Age 3+ Unadjus...,Sensory,Olfaction (Odor Identification Test),Odor_Unadj,This task assesses a person’s ability to ident...
695,NIH Toolbox Odor Identification Age 3+ Age-Adj...,Sensory,Olfaction (Odor Identification Test),Odor_AgeAdj,This task assesses a person’s ability to ident...
696,NIH Toolbox Pain Intensity Survey Age 18+: Raw...,Sensory,Pain (Pain Intensity and Interference Surveys),PainIntens_RawScore,This measure consists of a single item measuri...
697,NIH Toolbox Pain Interference Survey Age 18+: ...,Sensory,Pain (Pain Intensity and Interference Surveys),PainInterf_Tscore,"The pain interference survey measures, in CAT ..."
698,NIH Toolbox Regional Taste Intensity Age 12+ U...,Sensory,Taste (Taste Intensity Test),Taste_Unadj,This test measures the perceived intensity of ...
699,NIH Toolbox Regional Taste Intensity Age 12+ A...,Sensory,Taste (Taste Intensity Test),Taste_AgeAdj,This test measures the perceived intensity of ...
700,Color Vision Category,Sensory,Vision (EVA Scores and Farnsworth Test),Color_Vision,This is the type of color vision that the indi...
701,Eye Used For Color Vision Test,Sensory,Vision (EVA Scores and Farnsworth Test),Eye,Was right or left eye (dominant eye) used for ...
702,EVA score - Numerator,Sensory,Vision (EVA Scores and Farnsworth Test),EVA_Num,"Electronic Visual Acuity Numerator, distance i..."


The relevant items here are the ones for auditory, olfactory, pain, and taste processing.

In [117]:
variables_unr.extend(['Noise_Comp', 'Odor_Unadj', 'PainIntens_RawScore', 'PainInterf_Tscore', 'Taste_Unadj'])

## Substance Use

In [118]:
substance_vars = data_dict[data_dict['category'] == 'Substance Use']
substance_vars[:27]

Unnamed: 0,fullDisplayName,category,assessment,columnHeader,description
708,Any breathalyzer over .05,Substance Use,Breathalyzer and Drug Test Results,Breathalyzer_Over_05,Any Breathalyzer >= 0.05? Breathalyzers were a...
709,Any breathalyzer over .08,Substance Use,Breathalyzer and Drug Test Results,Breathalyzer_Over_08,Any Breathalyzer >= 0.08? Breathalyzers were a...
710,Positive test for Cocaine,Substance Use,Breathalyzer and Drug Test Results,Cocaine,Any positive test for Cocaine? Drug tests were...
711,Positive test for THC,Substance Use,Breathalyzer and Drug Test Results,THC,Any positive test for THC? Drug tests were adm...
712,Positive test for Opiates,Substance Use,Breathalyzer and Drug Test Results,Opiates,Any positive test for Opiates? Drug tests were...
713,Positive test for Amphetamines,Substance Use,Breathalyzer and Drug Test Results,Amphetamines,Any positive test for Amphetamines? Drug tests...
714,Positive test for MethAmphetamine,Substance Use,Breathalyzer and Drug Test Results,MethAmphetamine,Any positive test for MethAmphetamine? Drug te...
715,Positive test for Oxycontin,Substance Use,Breathalyzer and Drug Test Results,Oxycontin,Any positive test for Oxycontin? Drug tests we...
716,Total drinks in past 7 days,Substance Use,Alcohol Use 7-Day Retrospective,Total_Drinks_7days,Total drinks in past 7 days. Asked on last day...
717,Number days drank alcohol in past 7 days,Substance Use,Alcohol Use 7-Day Retrospective,Num_Days_Drank_7days,Number days drank alcohol in past 7 days. Aske...


We'll use none of the drug test results. From the Alcohol Use 7-Day Retrospective we'll take one item that summarizes it best. Choosing between total drinks and number of days drank in the last week we decide for number of days drank, as it is probably the one easier to remember by the participant and thus more reliable.

In [119]:
variables_res.append('Num_Days_Drank_7days')

In [120]:
substance_vars[27:42]

Unnamed: 0,fullDisplayName,category,assessment,columnHeader,description
735,Number of DSM4 Alcohol Dependence Criteria End...,Substance Use,Alcohol Use and Dependence,SSAGA_Alc_D4_Dp_Sx,Number of DSM4 ALC Dependence Criteria Met: If...
736,DSM4 Alcohol Abuse Criteria Met,Substance Use,Alcohol Use and Dependence,SSAGA_Alc_D4_Ab_Dx,Participant meets/met DSM4 criteria for Alcoho...
737,DSM4 Alcohol Abuse number of symptoms,Substance Use,Alcohol Use and Dependence,SSAGA_Alc_D4_Ab_Sx,Number of symptoms participant has/had of DSM4...
738,DSM4 Alcohol Dependence Criteria Met,Substance Use,Alcohol Use and Dependence,SSAGA_Alc_D4_Dp_Dx,Participant meets/met DSM4 criteria for DSM4 A...
739,Drinks per drinking day in past 12 months,Substance Use,Alcohol Use and Dependence,SSAGA_Alc_12_Drinks_Per_Day,Drinks consumed per drinking day in past 12 mo...
740,Frequency of any alcohol use in past 12 months,Substance Use,Alcohol Use and Dependence,SSAGA_Alc_12_Frq,Frequency of any alcohol use in past 12 months...
741,Frequency of drinking 5+ drinks in past 12 months,Substance Use,Alcohol Use and Dependence,SSAGA_Alc_12_Frq_5plus,Frequency of drinking 5+ drinks in past 12 mon...
742,Frequency drunk in past 12 months,Substance Use,Alcohol Use and Dependence,SSAGA_Alc_12_Frq_Drk,Frequency drunk in past 12 months: 1-7 days/we...
743,Max drinks in a single day in past 12 months,Substance Use,Alcohol Use and Dependence,SSAGA_Alc_12_Max_Drinks,Max drinks consumed in a single day in the pas...
744,Age at first alcohol use,Substance Use,Alcohol Use and Dependence,SSAGA_Alc_Age_1st_Use,"Age at first alcohol use: <15 = 1, 15-16 = 2, ..."


Next we have the SSAGA items, a questionnaire on alcohol abuse and dependence. We'll use the number of dependence or abuse symptoms respectively, as they give a summary for each aspect.

In [121]:
variables_res.extend(['SSAGA_Alc_D4_Dp_Sx', 'SSAGA_Alc_D4_Ab_Sx'])

In [122]:
substance_vars[42:65]

Unnamed: 0,fullDisplayName,category,assessment,columnHeader,description
750,Total times used/smoked ANY TOBACCO in past 7 ...,Substance Use,Tobacco Use 7-Day Retrospective,Total_Any_Tobacco_7days,Total times used/smoked ANY TOBACCO in past 7 ...
751,Times used/smoked ANY TOBACCO TODAY,Substance Use,Tobacco Use 7-Day Retrospective,Times_Used_Any_Tobacco_Today,Times used/smoked ANY TOBACCO TODAY. Asked on ...
752,Number days smoked/used ANY TOBACCO in past 7 ...,Substance Use,Tobacco Use 7-Day Retrospective,Num_Days_Used_Any_Tobacco_7days,Number days smoked/used ANY TOBACCO in past 7 ...
753,Avg total weekday ANY TOBACCO per day in past ...,Substance Use,Tobacco Use 7-Day Retrospective,Avg_Weekday_Any_Tobacco_7days,Avg total weekday ANY TOBACCO per day in past ...
754,Avg total weekend ANY TOBACCO per day in past ...,Substance Use,Tobacco Use 7-Day Retrospective,Avg_Weekend_Any_Tobacco_7days,Avg total weekend ANY TOBACCO per day in past ...
755,Total # CIGARETTES in past 7 days,Substance Use,Tobacco Use 7-Day Retrospective,Total_Cigarettes_7days,Total # CIGARETTES in past 7 days. Asked on la...
756,Avg weekday CIGARETTES per day in past 7 days,Substance Use,Tobacco Use 7-Day Retrospective,Avg_Weekday_Cigarettes_7days,Avg weekday CIGARETTES per day in past 7 days....
757,Avg weekend CIGARETTES per day in past 7 days,Substance Use,Tobacco Use 7-Day Retrospective,Avg_Weekend_Cigarettes_7days,Avg weekend CIGARETTES per day in past 7 days....
758,Total # CIGARS in past 7 days,Substance Use,Tobacco Use 7-Day Retrospective,Total_Cigars_7days,Total # CIGARS in past 7 days. Asked on last d...
759,Avg weekday CIGARS per day in past 7 days,Substance Use,Tobacco Use 7-Day Retrospective,Avg_Weekday_Cigars_7days,Avg weekday CIGARS per day in past 7 days. Ask...


Analogue to alcohol we use the number of days where the subject used tobacco in the last 7 days.

In [123]:
variables_res.append('Num_Days_Used_Any_Tobacco_7days')

In [124]:
substance_vars[65:]

Unnamed: 0,fullDisplayName,category,assessment,columnHeader,description
773,Fagerstrom FTND Score,Substance Use,Tobacco Use and Dependence,SSAGA_FTND_Score,Fagerstrom FTND (test for nicotine dependence)...
774,Fagerstrom HSI Score: HSI measure of tobacco d...,Substance Use,Tobacco Use and Dependence,SSAGA_HSI_Score,Fagerstrom HSI (heavy smoking index) score: HS...
775,"For regular smokers, age first smoked a cigare...",Substance Use,Tobacco Use and Dependence,SSAGA_TB_Age_1st_Cig,"For regular smokers, age first smoked a cigare..."
776,DSM tobacco dependence - difficulty quitting,Substance Use,Tobacco Use and Dependence,SSAGA_TB_DSM_Difficulty_Quitting,Participant meets/met DSM criteria for tobacco...
777,DSM tobacco dependence - tolerance,Substance Use,Tobacco Use and Dependence,SSAGA_TB_DSM_Tolerance,Participant meets/met DSM criteria for tobacco...
778,DSM tobacco dependence - withdrawal,Substance Use,Tobacco Use and Dependence,SSAGA_TB_DSM_Withdrawal,Participant meets/met DSM criteria for tobacco...
779,Cigarettes per day during heaviest period,Substance Use,Tobacco Use and Dependence,SSAGA_TB_Hvy_CPD,Cigarettes per day during heaviest period of u...
780,Most cigarettes smoked in a day,Substance Use,Tobacco Use and Dependence,SSAGA_TB_Max_Cigs,"Most cigarettes smoked in a day (5, 10, 15, 20..."
781,Cigarettes per day when smoking regularly,Substance Use,Tobacco Use and Dependence,SSAGA_TB_Reg_CPD,Cigarettes per day when smoking regularly (1-5...
782,Smoking history,Substance Use,Tobacco Use and Dependence,SSAGA_TB_Smoking_History,"Smoking history: never smoked (0), experimente..."


In [125]:
data_res['SSAGA_FTND_Score'].isna().sum()

876

Some of the remaining measures have quite a lot of missing values:

In [126]:
for col in substance_vars.iloc[65:]['columnHeader']:
    print(col+':', data_res[col].isna().sum(), 'missing values')

SSAGA_FTND_Score: 876 missing values
SSAGA_HSI_Score: 876 missing values
SSAGA_TB_Age_1st_Cig: 876 missing values
SSAGA_TB_DSM_Difficulty_Quitting: 876 missing values
SSAGA_TB_DSM_Tolerance: 876 missing values
SSAGA_TB_DSM_Withdrawal: 876 missing values
SSAGA_TB_Hvy_CPD: 876 missing values
SSAGA_TB_Max_Cigs: 876 missing values
SSAGA_TB_Reg_CPD: 876 missing values
SSAGA_TB_Smoking_History: 2 missing values
SSAGA_TB_Still_Smoking: 2 missing values
SSAGA_TB_Yrs_Since_Quit: 877 missing values
SSAGA_TB_Yrs_Smoked: 876 missing values
SSAGA_Times_Used_Illicits: 2 missing values
SSAGA_Times_Used_Cocaine: 2 missing values
SSAGA_Times_Used_Hallucinogens: 2 missing values
SSAGA_Times_Used_Opiates: 2 missing values
SSAGA_Times_Used_Sedatives: 2 missing values
SSAGA_Times_Used_Stimulants: 2 missing values
SSAGA_Mj_Use: 2 missing values
SSAGA_Mj_Ab_Dep: 2 missing values
SSAGA_Mj_Age_1st_Use: 552 missing values
SSAGA_Mj_Times_Used: 2 missing values


The only numerical item from the SSAGA tobacco variables is the smoking history, so we are going to include that. Additionally we add all individual illicits items, but not the summary measure to not count any answers twice.

In [127]:
variables_res.extend(['SSAGA_TB_Smoking_History', 'SSAGA_Times_Used_Cocaine', 'SSAGA_Times_Used_Hallucinogens', 'SSAGA_Times_Used_Opiates',
                      'SSAGA_Times_Used_Sedatives', 'SSAGA_Times_Used_Stimulants', 'SSAGA_Mj_Times_Used'])

## Final items

The final lists of items are:

In [128]:
variables_unr

['MMSE_Score',
 'PSQI_Score',
 'PicSeq_Unadj',
 'CardSort_Unadj',
 'Flanker_Unadj',
 'PMAT24_A_CR',
 'PMAT24_A_RTCR',
 'ReadEng_Unadj',
 'PicVocab_Unadj',
 'ProcSpeed_Unadj',
 'DDisc_AUC_200',
 'DDisc_AUC_40K',
 'VSPLOT_CRTE',
 'VSPLOT_OFF',
 'SCPT_SEN',
 'SCPT_SPEC',
 'SCPT_TPRT',
 'IWRD_TOT',
 'IWRD_RTC',
 'ListSort_Unadj',
 'ER40_CR',
 'ER40_CRT',
 'AngAffect_Unadj',
 'AngHostil_Unadj',
 'AngAggr_Unadj',
 'FearAffect_Unadj',
 'FearSomat_Unadj',
 'Sadness_Unadj',
 'LifeSatisf_Unadj',
 'MeanPurp_Unadj',
 'PosAffect_Unadj',
 'Friendship_Unadj',
 'Loneliness_Unadj',
 'PercHostil_Unadj',
 'PercReject_Unadj',
 'EmotSupp_Unadj',
 'InstruSupp_Unadj',
 'PercStress_Unadj',
 'SelfEff_Unadj',
 'Emotion_Task_Face_Acc',
 'Emotion_Task_Face_Median_RT',
 'Emotion_Task_Shape_Acc',
 'Emotion_Task_Shape_Median_RT',
 'Language_Task_Story_Acc',
 'Language_Task_Story_Median_RT',
 'Language_Task_Math_Acc',
 'Language_Task_Math_Median_RT',
 'Relational_Task_Match_Acc',
 'Relational_Task_Match_Median_RT',
 

In [129]:
variables_res

['DSM_Depr_Raw',
 'DSM_Anxi_Raw',
 'DSM_Somp_Raw',
 'DSM_Avoid_Raw',
 'DSM_Adh_Raw',
 'DSM_Antis_Raw',
 'Num_Days_Drank_7days',
 'SSAGA_Alc_D4_Dp_Sx',
 'SSAGA_Alc_D4_Ab_Sx',
 'Num_Days_Used_Any_Tobacco_7days',
 'SSAGA_TB_Smoking_History',
 'SSAGA_Times_Used_Cocaine',
 'SSAGA_Times_Used_Hallucinogens',
 'SSAGA_Times_Used_Opiates',
 'SSAGA_Times_Used_Sedatives',
 'SSAGA_Times_Used_Stimulants',
 'SSAGA_Mj_Times_Used']

In [130]:
vars_unr_pd = pd.Series(variables_unr)
vars_res_pd = pd.Series(variables_res)

In [131]:
vars_unr_pd.to_csv("../data/02_intermediate/col_unr.csv", header=False, index=False)
vars_res_pd.to_csv("../data/02_intermediate/col_res.csv", header=False, index=False)

## Columns to invert

Some variables need to be inverted such that the interpretation for the tasks is always higher = better. In particular, these are all reaction time measures, and all measures that denote some kind of error.

In [132]:
vars_to_invert = ['PMAT24_A_RTCR', 'VSPLOT_CRTE', 'VSPLOT_OFF', 'SCPT_TPRT', 'IWRD_RTC', 'ER40_CRT', 'Emotion_Task_Face_Median_RT', 'Emotion_Task_Shape_Median_RT',
                  'Language_Task_Story_Median_RT', 'Language_Task_Math_Median_RT', 'Relational_Task_Match_Median_RT', 'Relational_Task_Rel_Median_RT',
                  'Social_Task_TOM_Median_RT_TOM', 'WM_Task_Median_RT',]

In [134]:
vars_inv_pd = pd.Series(vars_to_invert)
vars_inv_pd.to_csv("../data/02_intermediate/rt_cols.csv", header=False, index=False)