<img src="https://raw.githubusercontent.com/probcomp/notebook/e66a399561d069a5dfe19c7efd8dd62c09a80787/tutorials/resources/header.png"/>
<img src="https://www.yammer.com/api/v1/uploaded_files/61754420/preview/CMI_rgb_horizontal_logo.png?_t=1525896635778" style='height:75px'/>

# Exploratory data analysis with BayesDB for child psychology


Authored by: Ulrich Schaechtle, Veronica Sara Weiner of the MIT Probabilistic Computing Project (Probcomp) with Arno Klein and Jon Clucas of the Child Mind Institute MATTER Lab.

In this notebook, we will use BayesDB for exploratory data analysis of the questionnaire responses data set provided by the [Child Mind Institute](https://childmind.org/). The data set contains individual question-items of a [set of
questionnaires](http://fcon_1000.projects.nitrc.org/indi/cmi_healthy_brain_network/assessments/master-list.html) related to childhood mental health and a set of structural MRI features. 

This notebook will cover the following topics:

1. List of questionnaires and links to individual questions
2. Analysis with BayesDB
3. Exploring probable dependencies between variables and comparing CrossCat dependence probability to linear correlation
4. Taking a closer look at dependencies and the models built by CrossCat
5. Aggregating across an ensemble of CrossCat models to assess the similarity of subjects

## 1. List of questionnaires and links to individual questions

The following are the questionnares from which data has been collected by the Child Mind Institute, and that we analyze in this notebook. Each link leads to a set of questions. We analyze the data in the original questionnaire format (as categorical responses to individual questions). Compound/summary scores are not included, the diagnoses have been transformed into individual binary variables.
   
- [ACE: Adverse Childhood Experience (ACE)](resources/html/ACE.html)
- [APQ_P: Alabama Parenting Questionnaire (APQ) Parent-Report](resources/html/APQ_P.html)
- [APQ_S: Alabama Parenting Questionnaire (APQ) Self-Report](resources/html/APQ_S.html)
- [ARI_P: Affective Reactivity Index (ARI) Parent-Report](resources/html/ARI_P.html)
- [ARI_S: Affective Reactivity Index (ARI) Self-Report](resources/html/ARI_S.html)
- [ASR: Adult Self Report](resources/html/ASR.html)
- [ASSQ: Autism Spectrum Screening Questionnaire (ASSQ)](resources/html/ASSQ.html)
- [C3SR (Conners-SR): Conners 3 - Self-Report (C3SR)](resources/html/C3SR.html)
- [CAARS: Conners' Adult ADHD Rating Scales (CAARS)](resources/html/CAARS.html)
- [CBCL: Child Behavior Checklist (CBCL) -- Age 6-18](resources/html/CBCL.html)
- [CBCL_Pre: Child Behavior Checklist (CBCL) Pre-School](resources/html/CBCL_Pre.html)
- [CCSC: Children's Coping Strategies Checklist-Revised](resources/html/CCSC.html)
- [CDI2SR: Children's Depression Index (CDI) Self Report](resources/html/CDI2SR.html)
- [CDI2_P: Children's Depression Index (CDI) Parent Report](resources/html/CDI2_P.html)
- [CIS_P: Columbia Impairment Scale (CIS) Parent Report](resources/html/CIS_P.html)
- [CIS_SR: Columbia Impairment Scale - Self-Report (CIS-SR)](resources/html/CIS_SR.html)
- [CPIC: Children's Perception of Interparental Conflict Scale (CPIC)](resources/html/CPIC.html)
- [DTS: Distress Tolerance Scale (DTS)](resources/html/DTS.html)
- ~~[IAT: Internet Addiction Test (IAT)](resources/html/IAT.html)~~
- [ICU_P: Inventory of Callous-Unemotional Traits (ICU) Parent Report](resources/html/ICU_P.html)
- [ICU_SR: Inventory of Callous-Unemotional Traits (ICU) Self Report](resources/html/ICU_SR.html)
- [MFQ_P: Mood and Feelings Questionnaire (MFQ) Parent Report](resources/html/MFQ_P.html)
- [MFQ_SR: Mood and Feelings Questionnaire (MFQ) Self Report](resources/html/MFQ_SR.html)
- ~~[PCIAT: Parent-Child Internet Addiction Test](resources/html/PCIAT.html)~~
- [PSI: Parenting Stress Index (PSI)](resources/html/PSI.html)
- [SAS: Socal Aptititude Scale (SAS)](resources/html/SAS.html)
- [SCARED_P: Screen for Child Anxiety Related Disorders (SCARED) Parent Report](resources/html/SCARED_P.html)
- [SCARED_SR: Screen for Anxiety Related Disorders (SCARED) Self Report](resources/html/SCARED_SR.html)
- [SCQ: Social Communication Questionnaire](resources/html/SCQ.html)
- [SDQ: Strength and Difficulties Questionnaire](resources/html/SDQ.html)
- [SDS: Sleep Disturbance Scale (SDS)](resources/html/SDS.html)
- [SRS_Preschool: Social Responsiveness Scale (SRS) Preschool](resources/html/SRS_Preschool.html)
- [SRS_School: Social Responsiveness Scale (SRS) School Age](resources/html/SRS_School.html)
- [STAI: State-Trait Anxiety Inventory for Adults](resources/html/STAI.html)
- [SWAN : The SWAN Rating Scale for ADHD](resources/html/SWAN .html)
- [Symptom_Checklist: Symptom Checklist - Parent](resources/html/Symptom_Checklist.html)
- [TRF_Preschool_Age: Teacher Report Form (TRF) Preschool Age](resources/html/TRF_Preschool_Age.html)
- [TRF_School_Age: Teacher Report Form (TRF) School Age](resources/html/TRF_School_Age.html)
- [WHODAS_P: WHO Disability Assessment Schedule (WHODAS) Parent-Report](resources/html/WHODAS_P.html)
- [WHODAS_SR: WHO Disability Assessment Schedule (WHODAS) Self-Report](resources/html/WHODAS_SR.html)
- [YFAS: Yale Food Addiction Scale (YFAS)](resources/html/YFAS.html)
- [YSR: Youth Self Report (YSR)](resources/html/YSR.html)



## 2. Analysis with BayesDB

### 2a. Setting up the Jupyter environment

The first step is to load the `jupyter_probcomp.magics` library, which provides BayesDB hooks for data exploration, plotting, querying, and analysis through this Jupyter notebook environment. The second cell allows plots from matplotlib and javascript to be shown inline.

In [1]:
%load_ext jupyter_probcomp.magics

session_id: jovyan@jclucas-notebook_2018-05-09T15:55:58.648197_4


In [2]:
%matplotlib inline
%vizgpm inline

<IPython.core.display.Javascript object>

### 2b. Creating a BayesDB `.bdb` file on disk

We next use the `%bayesdb` magic to create a `.bdb` file on disk named `childmind_smri_questions.bdb`. This file will store all the data and models created in this session.

In [3]:
!rm -f resources/bdb/childmind_smri_questions.bdb
%bayesdb resources/bdb/childmind_smri_questions.bdb
bdb = %get_bdb
import os
os.environ['LOOM_VERBOSITY'] = '0'
import bayeslite
from bayeslite.backends.loom_backend import LoomBackend
from bayeslite import bayesdb_register_backend
bayesdb_register_backend(bdb, LoomBackend(os.path.abspath('loom_files/')))

### 2c. Ingesting data from a `.csv` file into a BayesDB table

The questionnaire dataset is stored in the csv file `resources/init_data.csv`. Each column of the csv file is a variable, and each row is a record. We use the `CREATE TABLE` BQL query, with the pathname of the csv file, to convert the csv data into a database table named `raw_questionnaire_responses`.

In [4]:
%bql CREATE TABLE "raw_questionnaire_responses" FROM 'resources/data/smri_questions.csv'

Almost all datasets have missing values, and special tokens such as `NaN` or `NA` indicating a particular cell is missing. In the questionnaire data, empty strings are used. To tell BayesDB to treat empty strings as SQL `NULL` we use the `.nullify` command, followed by the name of the table and the string `''` which represents missing data. Over 250,000 cells have been converted to `NULL`, illustrating that the data is quite sparse.

In [5]:
%bql .nullify raw_questionnaire_responses ''

Nullified 272350 cells


### 2d. Running basic queries on the table using BQL and SQL

Now that the questionnaire dataset has been loaded into at table, and missing values converted to `NULL`, we can run standard SQL queries to explore the contents of the data. For example, we can select the first 5 records. Observe that each row in the table is a particular country, and each column is a macreconomic variable. Scroll through the names in the header of the table to get a sense of the marcoeconomic variables in the dataset. 

In [6]:
%bql SELECT * FROM "raw_questionnaire_responses" LIMIT 5;

Unnamed: 0,APQ_P_01,APQ_P_02,APQ_P_03,APQ_P_04,APQ_P_05,APQ_P_06,APQ_P_07,APQ_P_08,APQ_P_09,APQ_P_10,APQ_P_11,APQ_P_12,APQ_P_13,APQ_P_14,APQ_P_15,APQ_P_16,APQ_P_17,APQ_P_18,APQ_P_19,APQ_P_20,APQ_P_21,APQ_P_22,APQ_P_23,APQ_P_24,APQ_P_25,APQ_P_26,APQ_P_27,APQ_P_28,APQ_P_29,APQ_P_30,APQ_P_31,APQ_P_32,APQ_P_33,APQ_P_34,APQ_P_35,APQ_P_36,APQ_P_37,APQ_P_38,APQ_P_39,APQ_P_40,APQ_P_41,APQ_P_42,APQ_SR_01,APQ_SR_02,APQ_SR_03,APQ_SR_04,APQ_SR_05,APQ_SR_06,APQ_SR_07,APQ_SR_08,APQ_SR_09,APQ_SR_10,APQ_SR_11,APQ_SR_12,APQ_SR_13,APQ_SR_14,APQ_SR_15,APQ_SR_16,APQ_SR_17,APQ_SR_18,APQ_SR_19,APQ_SR_20,APQ_SR_21,APQ_SR_22,APQ_SR_23,APQ_SR_24,APQ_SR_25,APQ_SR_26,APQ_SR_27,APQ_SR_28,APQ_SR_29,APQ_SR_30,APQ_SR_31,APQ_SR_32,APQ_SR_33,APQ_SR_34,APQ_SR_35,APQ_SR_36,APQ_SR_37,APQ_SR_38,APQ_SR_39,APQ_SR_40,APQ_SR_41,APQ_SR_42,ARI_P_01,ARI_P_02,ARI_P_03,ARI_P_04,ARI_P_05,ARI_P_06,ARI_P_07,ARI_S_01,ARI_S_02,ARI_S_03,ARI_S_04,ARI_S_05,ARI_S_06,ARI_S_07,ASSQ_01,ASSQ_02,ASSQ_03,ASSQ_04,ASSQ_05,ASSQ_06,ASSQ_07,ASSQ_08,ASSQ_09,ASSQ_10,ASSQ_11,ASSQ_12,ASSQ_13,ASSQ_14,ASSQ_15,ASSQ_16,ASSQ_17,ASSQ_18,ASSQ_19,ASSQ_20,ASSQ_21,ASSQ_22,ASSQ_23,ASSQ_24,ASSQ_25,ASSQ_26,ASSQ_27,Age,C3SR_01,C3SR_02,C3SR_03,C3SR_04,C3SR_05,C3SR_06,C3SR_07,C3SR_08,C3SR_09,C3SR_10,C3SR_11,C3SR_12,C3SR_13,C3SR_14,C3SR_15,C3SR_16,C3SR_17,C3SR_18,C3SR_19,C3SR_20,C3SR_21,C3SR_22,C3SR_23,C3SR_24,C3SR_25,C3SR_26,C3SR_27,C3SR_28,C3SR_29,C3SR_30,C3SR_31,C3SR_32,C3SR_33,C3SR_34,C3SR_35,C3SR_36,C3SR_37,C3SR_38,C3SR_39,CCSC_01,CCSC_02,CCSC_03,CCSC_04,CCSC_05,CCSC_06,CCSC_07,CCSC_08,CCSC_09,CCSC_10,CCSC_11,CCSC_12,CCSC_13,CCSC_14,CCSC_15,CCSC_16,CCSC_17,CCSC_18,CCSC_19,CCSC_20,CCSC_21,CCSC_22,CCSC_23,CCSC_24,CCSC_25,CCSC_26,CCSC_27,CCSC_28,CCSC_29,CCSC_30,CCSC_31,CCSC_32,CCSC_33,CCSC_34,CCSC_35,CCSC_36,CCSC_37,CCSC_38,CCSC_39,CCSC_40,CCSC_41,CCSC_42,CCSC_43,CCSC_44,CCSC_45,CCSC_46,CCSC_47,CCSC_48,CCSC_49,CCSC_50,CCSC_51,CCSC_52,CCSC_53,CCSC_54,CCSC_55,CCSC_56,CPIC_01,CPIC_02,CPIC_03,CPIC_04,CPIC_05,CPIC_06,CPIC_07,CPIC_08,CPIC_09,CPIC_10,CPIC_11,CPIC_12,CPIC_13,CPIC_14,CPIC_15,CPIC_16,CPIC_17,CPIC_18,CPIC_19,CPIC_20,CPIC_21,CPIC_22,CPIC_23,CPIC_24,CPIC_25,CPIC_26,CPIC_27,CPIC_28,CPIC_29,CPIC_30,CPIC_31,CPIC_32,CPIC_33,CPIC_34,CPIC_35,CPIC_36,CPIC_37,CPIC_38,CPIC_39,CPIC_40,CPIC_41,CPIC_42,CPIC_43,CPIC_44,CPIC_45,CPIC_46,CPIC_47,CPIC_48,CPIC_49,CPIC_50,CPIC_51,DTS_01,DTS_02,DTS_03,DTS_04,DTS_05,DTS_06,DTS_07,DTS_08,DTS_09,DTS_10,DTS_11,DTS_12,DTS_13,DTS_14,DTS_15,EHQ_01,EHQ_02,EHQ_03,EHQ_04,EHQ_05,EHQ_06,EHQ_07,EHQ_08,EHQ_09,EHQ_10,EHQ_11,EHQ_12,EHQ_13,EHQ_14,EHQ_15,EID,FSQ_01,FSQ_02,FSQ_03,FSQ_04,FSQ_06,FSQ_07,FSQ_08,MDD_4,MDD_5,MDD_6,MDD_7,MDD_9,MFQ_P_01,MFQ_P_02,MFQ_P_03,MFQ_P_04,MFQ_P_05,MFQ_P_06,MFQ_P_07,MFQ_P_08,MFQ_P_09,MFQ_P_10,MFQ_P_11,MFQ_P_12,MFQ_P_13,MFQ_P_14,MFQ_P_15,MFQ_P_16,MFQ_P_17,MFQ_P_18,MFQ_P_19,MFQ_P_20,MFQ_P_21,MFQ_P_22,MFQ_P_23,MFQ_P_24,MFQ_P_25,MFQ_P_26,MFQ_P_27,MFQ_P_28,MFQ_P_29,MFQ_P_30,MFQ_P_31,MFQ_P_32,MFQ_P_33,MFQ_P_34,MFQ_SR_01,MFQ_SR_02,MFQ_SR_03,MFQ_SR_04,MFQ_SR_05,MFQ_SR_06,MFQ_SR_07,MFQ_SR_08,MFQ_SR_09,MFQ_SR_10,MFQ_SR_11,MFQ_SR_12,MFQ_SR_13,MFQ_SR_14,MFQ_SR_15,MFQ_SR_16,MFQ_SR_17,MFQ_SR_18,MFQ_SR_19,MFQ_SR_20,MFQ_SR_21,MFQ_SR_22,MFQ_SR_23,MFQ_SR_24,MFQ_SR_25,MFQ_SR_26,MFQ_SR_27,MFQ_SR_28,MFQ_SR_29,MFQ_SR_30,MFQ_SR_31,MFQ_SR_32,MFQ_SR_33,PAQ_A_02,PAQ_A_03,PAQ_A_04,PAQ_A_05,PAQ_A_06,PAQ_A_07,PAQ_A_09,PBQ_01,PBQ_02,PBQ_03,PBQ_03B_1,PBQ_04,PBQ_05,PBQ_06,PBQ_07,PBQ_08,PBQ_09,PBQ_10,PBQ_11,PBQ_12,PBQ_13,PBQ_14,PBQ_15,PBQ_16,PBQ_17,PBQ_18,PBQ_19,PBQ_21,PBQ_22,PBQ_23,PBQ_24,PBQ_25,PBQ_26,PBQ_27,PPS_F_01,PPS_F_02,PPS_F_03,PPS_F_04,PPS_F_05,PPS_F_06,PPS_M_01,PPS_M_02,PPS_M_03,PPS_M_04,PPS_M_05,PPS_M_06,PSI_01,PSI_02,PSI_03,PSI_04,PSI_05,PSI_06,PSI_07,PSI_08,PSI_09,PSI_10,PSI_11,PSI_12,PSI_13,PSI_14,PSI_15,PSI_16,PSI_17,PSI_18,PSI_19,PSI_20,PSI_21,PSI_22,PSI_23,PSI_24,PSI_25,PSI_26,PSI_27,PSI_28,PSI_29,PSI_30,PSI_31,PSI_32,PSI_33,PSI_34,PSI_35,PSI_36,SCARED_P_01,SCARED_P_02,SCARED_P_03,SCARED_P_04,SCARED_P_05,SCARED_P_06,SCARED_P_07,SCARED_P_08,SCARED_P_09,SCARED_P_10,SCARED_P_11,SCARED_P_12,SCARED_P_13,SCARED_P_14,SCARED_P_15,SCARED_P_16,SCARED_P_17,SCARED_P_18,SCARED_P_19,SCARED_P_20,SCARED_P_21,SCARED_P_22,SCARED_P_23,SCARED_P_24,SCARED_P_25,SCARED_P_26,SCARED_P_27,SCARED_P_28,SCARED_P_29,SCARED_P_30,SCARED_P_31,SCARED_P_32,SCARED_P_33,SCARED_P_34,SCARED_P_35,SCARED_P_36,SCARED_P_37,SCARED_P_38,SCARED_P_39,SCARED_P_40,SCARED_P_41,SCARED_SR_01,SCARED_SR_02,SCARED_SR_03,SCARED_SR_04,SCARED_SR_05,SCARED_SR_06,SCARED_SR_07,SCARED_SR_08,SCARED_SR_09,SCARED_SR_10,SCARED_SR_11,SCARED_SR_12,SCARED_SR_13,SCARED_SR_14,SCARED_SR_15,SCARED_SR_16,SCARED_SR_17,SCARED_SR_18,SCARED_SR_19,SCARED_SR_20,SCARED_SR_21,SCARED_SR_22,SCARED_SR_23,SCARED_SR_24,SCARED_SR_25,SCARED_SR_26,SCARED_SR_27,SCARED_SR_28,SCARED_SR_29,SCARED_SR_30,SCARED_SR_31,SCARED_SR_32,SCARED_SR_33,SCARED_SR_34,SCARED_SR_35,SCARED_SR_36,SCARED_SR_37,SCARED_SR_38,SCARED_SR_39,SCARED_SR_40,SCARED_SR_41,SCQ_01,SCQ_02,SCQ_03,SCQ_04,SCQ_05,SCQ_06,SCQ_07,SCQ_08,SCQ_09,SCQ_10,SCQ_11,SCQ_12,SCQ_13,SCQ_14,SCQ_15,SCQ_16,SCQ_17,SCQ_18,SCQ_19,SCQ_20,SCQ_21,SCQ_22,SCQ_23,SCQ_24,SCQ_25,SCQ_26,SCQ_27,SCQ_28,SCQ_29,SCQ_30,SCQ_31,SCQ_32,SCQ_33,SCQ_34,SCQ_35,SCQ_36,SCQ_37,SCQ_38,SCQ_39,SCQ_40,SDQ_01,SDQ_02,SDQ_03,SDQ_04,SDQ_05,SDQ_06,SDQ_07,SDQ_08,SDQ_09,SDQ_10,SDQ_11,SDQ_12,SDQ_13,SDQ_14,SDQ_15,SDQ_16,SDQ_17,SDQ_18,SDQ_19,SDQ_20,SDQ_21,SDQ_22,SDQ_23,SDQ_24,SDQ_25,SDQ_26,SDQ_27,SDQ_28,SDQ_30,SWAN_01,SWAN_02,SWAN_03,SWAN_04,SWAN_05,SWAN_06,SWAN_07,SWAN_08,SWAN_09,SWAN_10,SWAN_11,SWAN_12,SWAN_13,SWAN_14,SWAN_15,SWAN_16,SWAN_17,SWAN_18,Sex,SocAnx_01,SocAnx_02,SocAnx_03,SocAnx_05,csf_volume,left_cortical_grey_matter_volume,left_cortical_white_matter_volume,right_cortical_grey_matter_volume,right_cortical_white_matter_volume,whole_brain_volume,median_left_parsopercularis_freesurfer-thickness,median_left_entorhinal_freesurfer-thickness,median_left_lateralorbitofrontal_freesurfer-thickness,median_left_lingual_freesurfer-thickness,median_left_rostralanteriorcingulate_freesurfer-thickness,median_left_transversetemporal_freesurfer-thickness,median_left_parahippocampal_freesurfer-thickness,median_left_paracentral_freesurfer-thickness,median_left_inferiorparietal_freesurfer-thickness,median_left_postcentral_freesurfer-thickness,median_left_posteriorcingulate_freesurfer-thickness,median_left_parsorbitalis_freesurfer-thickness,median_left_cuneus_freesurfer-thickness,median_left_pericalcarine_freesurfer-thickness,median_left_lateraloccipital_freesurfer-thickness,median_left_precuneus_freesurfer-thickness,median_left_medialorbitofrontal_freesurfer-thickness,median_left_parstriangularis_freesurfer-thickness,median_left_middletemporal_freesurfer-thickness,median_left_superiortemporal_freesurfer-thickness,median_left_fusiform_freesurfer-thickness,median_left_precentral_freesurfer-thickness,median_left_supramarginal_freesurfer-thickness,median_left_rostralmiddlefrontal_freesurfer-thickness,median_left_caudalanteriorcingulate_freesurfer-thickness,median_left_inferiortemporal_freesurfer-thickness,median_left_caudalmiddlefrontal_freesurfer-thickness,median_left_isthmuscingulate_freesurfer-thickness,median_left_superiorfrontal_freesurfer-thickness,median_left_superiorparietal_freesurfer-thickness,median_left_insula_freesurfer-thickness,median_right_parsopercularis_freesurfer-thickness,median_right_entorhinal_freesurfer-thickness,median_right_lateralorbitofrontal_freesurfer-thickness,median_right_lingual_freesurfer-thickness,median_right_rostralanteriorcingulate_freesurfer-thickness,median_right_transversetemporal_freesurfer-thickness,median_right_parahippocampal_freesurfer-thickness,median_right_paracentral_freesurfer-thickness,median_right_inferiorparietal_freesurfer-thickness,median_right_postcentral_freesurfer-thickness,median_right_posteriorcingulate_freesurfer-thickness,median_right_parsorbitalis_freesurfer-thickness,median_right_cuneus_freesurfer-thickness,median_right_pericalcarine_freesurfer-thickness,median_right_lateraloccipital_freesurfer-thickness,median_right_precuneus_freesurfer-thickness,median_right_medialorbitofrontal_freesurfer-thickness,median_right_parstriangularis_freesurfer-thickness,median_right_middletemporal_freesurfer-thickness,median_right_superiortemporal_freesurfer-thickness,median_right_fusiform_freesurfer-thickness,median_right_precentral_freesurfer-thickness,median_right_supramarginal_freesurfer-thickness,median_right_rostralmiddlefrontal_freesurfer-thickness,median_right_caudalanteriorcingulate_freesurfer-thickness,median_right_inferiortemporal_freesurfer-thickness,median_right_caudalmiddlefrontal_freesurfer-thickness,median_right_isthmuscingulate_freesurfer-thickness,median_right_superiorfrontal_freesurfer-thickness,median_right_superiorparietal_freesurfer-thickness,median_right_insula_freesurfer-thickness,MAD_left_parsopercularis_freesurfer-thickness,MAD_left_entorhinal_freesurfer-thickness,MAD_left_lateralorbitofrontal_freesurfer-thickness,MAD_left_lingual_freesurfer-thickness,MAD_left_rostralanteriorcingulate_freesurfer-thickness,MAD_left_transversetemporal_freesurfer-thickness,MAD_left_parahippocampal_freesurfer-thickness,MAD_left_paracentral_freesurfer-thickness,MAD_left_inferiorparietal_freesurfer-thickness,MAD_left_postcentral_freesurfer-thickness,MAD_left_posteriorcingulate_freesurfer-thickness,MAD_left_parsorbitalis_freesurfer-thickness,MAD_left_cuneus_freesurfer-thickness,MAD_left_pericalcarine_freesurfer-thickness,MAD_left_lateraloccipital_freesurfer-thickness,MAD_left_precuneus_freesurfer-thickness,MAD_left_medialorbitofrontal_freesurfer-thickness,MAD_left_parstriangularis_freesurfer-thickness,MAD_left_middletemporal_freesurfer-thickness,MAD_left_superiortemporal_freesurfer-thickness,MAD_left_fusiform_freesurfer-thickness,MAD_left_precentral_freesurfer-thickness,MAD_left_supramarginal_freesurfer-thickness,MAD_left_rostralmiddlefrontal_freesurfer-thickness,MAD_left_caudalanteriorcingulate_freesurfer-thickness,MAD_left_inferiortemporal_freesurfer-thickness,MAD_left_caudalmiddlefrontal_freesurfer-thickness,MAD_left_isthmuscingulate_freesurfer-thickness,MAD_left_superiorfrontal_freesurfer-thickness,MAD_left_superiorparietal_freesurfer-thickness,MAD_left_insula_freesurfer-thickness,MAD_right_parsopercularis_freesurfer-thickness,MAD_right_entorhinal_freesurfer-thickness,MAD_right_lateralorbitofrontal_freesurfer-thickness,MAD_right_lingual_freesurfer-thickness,MAD_right_rostralanteriorcingulate_freesurfer-thickness,MAD_right_transversetemporal_freesurfer-thickness,MAD_right_parahippocampal_freesurfer-thickness,MAD_right_paracentral_freesurfer-thickness,MAD_right_inferiorparietal_freesurfer-thickness,MAD_right_postcentral_freesurfer-thickness,MAD_right_posteriorcingulate_freesurfer-thickness,MAD_right_parsorbitalis_freesurfer-thickness,MAD_right_cuneus_freesurfer-thickness,MAD_right_pericalcarine_freesurfer-thickness,MAD_right_lateraloccipital_freesurfer-thickness,MAD_right_precuneus_freesurfer-thickness,MAD_right_medialorbitofrontal_freesurfer-thickness,MAD_right_parstriangularis_freesurfer-thickness,MAD_right_middletemporal_freesurfer-thickness,MAD_right_superiortemporal_freesurfer-thickness,MAD_right_fusiform_freesurfer-thickness,MAD_right_precentral_freesurfer-thickness,MAD_right_supramarginal_freesurfer-thickness,MAD_right_rostralmiddlefrontal_freesurfer-thickness,MAD_right_caudalanteriorcingulate_freesurfer-thickness,MAD_right_inferiortemporal_freesurfer-thickness,MAD_right_caudalmiddlefrontal_freesurfer-thickness,MAD_right_isthmuscingulate_freesurfer-thickness,MAD_right_superiorfrontal_freesurfer-thickness,MAD_right_superiorparietal_freesurfer-thickness,MAD_right_insula_freesurfer-thickness,median_left_parsopercularis_travel-depth,median_left_entorhinal_travel-depth,median_left_lateralorbitofrontal_travel-depth,median_left_lingual_travel-depth,median_left_rostralanteriorcingulate_travel-depth,median_left_transversetemporal_travel-depth,median_left_parahippocampal_travel-depth,median_left_paracentral_travel-depth,median_left_inferiorparietal_travel-depth,median_left_postcentral_travel-depth,median_left_posteriorcingulate_travel-depth,median_left_parsorbitalis_travel-depth,median_left_cuneus_travel-depth,median_left_pericalcarine_travel-depth,median_left_lateraloccipital_travel-depth,median_left_precuneus_travel-depth,median_left_medialorbitofrontal_travel-depth,median_left_parstriangularis_travel-depth,median_left_middletemporal_travel-depth,median_left_superiortemporal_travel-depth,median_left_fusiform_travel-depth,median_left_precentral_travel-depth,median_left_supramarginal_travel-depth,median_left_rostralmiddlefrontal_travel-depth,median_left_caudalanteriorcingulate_travel-depth,median_left_inferiortemporal_travel-depth,median_left_caudalmiddlefrontal_travel-depth,median_left_isthmuscingulate_travel-depth,median_left_superiorfrontal_travel-depth,median_left_superiorparietal_travel-depth,median_left_insula_travel-depth,median_right_parsopercularis_travel-depth,median_right_entorhinal_travel-depth,median_right_lateralorbitofrontal_travel-depth,median_right_lingual_travel-depth,median_right_rostralanteriorcingulate_travel-depth,median_right_transversetemporal_travel-depth,median_right_parahippocampal_travel-depth,median_right_paracentral_travel-depth,median_right_inferiorparietal_travel-depth,median_right_postcentral_travel-depth,median_right_posteriorcingulate_travel-depth,median_right_parsorbitalis_travel-depth,median_right_cuneus_travel-depth,median_right_pericalcarine_travel-depth,median_right_lateraloccipital_travel-depth,median_right_precuneus_travel-depth,median_right_medialorbitofrontal_travel-depth,median_right_parstriangularis_travel-depth,median_right_middletemporal_travel-depth,median_right_superiortemporal_travel-depth,median_right_fusiform_travel-depth,median_right_precentral_travel-depth,median_right_supramarginal_travel-depth,median_right_rostralmiddlefrontal_travel-depth,median_right_caudalanteriorcingulate_travel-depth,median_right_inferiortemporal_travel-depth,median_right_caudalmiddlefrontal_travel-depth,median_right_isthmuscingulate_travel-depth,median_right_superiorfrontal_travel-depth,median_right_superiorparietal_travel-depth,median_right_insula_travel-depth,MAD_left_parsopercularis_travel-depth,MAD_left_entorhinal_travel-depth,MAD_left_lateralorbitofrontal_travel-depth,MAD_left_lingual_travel-depth,MAD_left_rostralanteriorcingulate_travel-depth,MAD_left_transversetemporal_travel-depth,MAD_left_parahippocampal_travel-depth,MAD_left_paracentral_travel-depth,MAD_left_inferiorparietal_travel-depth,MAD_left_postcentral_travel-depth,MAD_left_posteriorcingulate_travel-depth,MAD_left_parsorbitalis_travel-depth,MAD_left_cuneus_travel-depth,MAD_left_pericalcarine_travel-depth,MAD_left_lateraloccipital_travel-depth,MAD_left_precuneus_travel-depth,MAD_left_medialorbitofrontal_travel-depth,MAD_left_parstriangularis_travel-depth,MAD_left_middletemporal_travel-depth,MAD_left_superiortemporal_travel-depth,MAD_left_fusiform_travel-depth,MAD_left_precentral_travel-depth,MAD_left_supramarginal_travel-depth,MAD_left_rostralmiddlefrontal_travel-depth,MAD_left_caudalanteriorcingulate_travel-depth,MAD_left_inferiortemporal_travel-depth,MAD_left_caudalmiddlefrontal_travel-depth,MAD_left_isthmuscingulate_travel-depth,MAD_left_superiorfrontal_travel-depth,MAD_left_superiorparietal_travel-depth,MAD_left_insula_travel-depth,MAD_right_parsopercularis_travel-depth,MAD_right_entorhinal_travel-depth,MAD_right_lateralorbitofrontal_travel-depth,MAD_right_lingual_travel-depth,MAD_right_rostralanteriorcingulate_travel-depth,MAD_right_transversetemporal_travel-depth,MAD_right_parahippocampal_travel-depth,MAD_right_paracentral_travel-depth,MAD_right_inferiorparietal_travel-depth,MAD_right_postcentral_travel-depth,MAD_right_posteriorcingulate_travel-depth,MAD_right_parsorbitalis_travel-depth,MAD_right_cuneus_travel-depth,MAD_right_pericalcarine_travel-depth,MAD_right_lateraloccipital_travel-depth,MAD_right_precuneus_travel-depth,MAD_right_medialorbitofrontal_travel-depth,MAD_right_parstriangularis_travel-depth,MAD_right_middletemporal_travel-depth,MAD_right_superiortemporal_travel-depth,MAD_right_fusiform_travel-depth,MAD_right_precentral_travel-depth,MAD_right_supramarginal_travel-depth,MAD_right_rostralmiddlefrontal_travel-depth,MAD_right_caudalanteriorcingulate_travel-depth,MAD_right_inferiortemporal_travel-depth,MAD_right_caudalmiddlefrontal_travel-depth,MAD_right_isthmuscingulate_travel-depth,MAD_right_superiorfrontal_travel-depth,MAD_right_superiorparietal_travel-depth,MAD_right_insula_travel-depth,left_parsopercularis_area,left_entorhinal_area,left_lateralorbitofrontal_area,left_lingual_area,left_rostralanteriorcingulate_area,left_transversetemporal_area,left_parahippocampal_area,left_paracentral_area,left_inferiorparietal_area,left_postcentral_area,left_posteriorcingulate_area,left_parsorbitalis_area,left_cuneus_area,left_pericalcarine_area,left_lateraloccipital_area,left_precuneus_area,left_medialorbitofrontal_area,left_parstriangularis_area,left_middletemporal_area,left_superiortemporal_area,left_fusiform_area,left_precentral_area,left_supramarginal_area,left_rostralmiddlefrontal_area,left_caudalanteriorcingulate_area,left_inferiortemporal_area,left_caudalmiddlefrontal_area,left_isthmuscingulate_area,left_superiorfrontal_area,left_superiorparietal_area,left_insula_area,right_parsopercularis_area,right_entorhinal_area,right_lateralorbitofrontal_area,right_lingual_area,right_rostralanteriorcingulate_area,right_transversetemporal_area,right_parahippocampal_area,right_paracentral_area,right_inferiorparietal_area,right_postcentral_area,right_posteriorcingulate_area,right_parsorbitalis_area,right_cuneus_area,right_pericalcarine_area,right_lateraloccipital_area,right_precuneus_area,right_medialorbitofrontal_area,right_parstriangularis_area,right_middletemporal_area,right_superiortemporal_area,right_fusiform_area,right_precentral_area,right_supramarginal_area,right_rostralmiddlefrontal_area,right_caudalanteriorcingulate_area,right_inferiortemporal_area,right_caudalmiddlefrontal_area,right_isthmuscingulate_area,right_superiorfrontal_area,right_superiorparietal_area,right_insula_area,left_amygdala_volume-per-freesurfer-label,right_amygdala_volume-per-freesurfer-label,left_caudate_volume-per-freesurfer-label,right_caudate_volume-per-freesurfer-label,left_cerebral-white-matter_volume-per-freesurfer-label,right_cerebral-white-matter_volume-per-freesurfer-label,left_hippocampus_volume-per-freesurfer-label,right_hippocampus_volume-per-freesurfer-label,left_pallidum_volume-per-freesurfer-label,right_pallidum_volume-per-freesurfer-label,left_putamen_volume-per-freesurfer-label,right_putamen_volume-per-freesurfer-label,left_thalamus_volume-per-freesurfer-label,right_thalamus_volume-per-freesurfer-label,left_unsegmentedwhitematter_volume-per-freesurfer-label,right_unsegmentedwhitematter_volume-per-freesurfer-label,Anxiety,ADHD,Other,Autism
0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.0,4.0,3.0,3.0,3.0,3.0,3.0,3.0,1.0,2.0,4.0,3.0,3.0,3.0,1.0,3.0,2.0,1.0,3.0,2.0,2.0,3.0,3.0,2.0,5.0,2.0,2.0,2.0,1.0,3.0,2.0,1.0,1.0,4.0,1.0,3.0,3.0,1.0,2.0,2.0,1.0,2.0,,,,,,,,0.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,2.0,2.0,2.0,0.0,2.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,2.0,,2.0,1.0,0.0,0.0,0.0,11.5,1.0,0.0,1.0,1.0,1.0,3.0,1.0,1.0,2.0,0.0,3.0,1.0,0.0,1.0,0.0,2.0,1.0,1.0,1.0,1.0,3.0,2.0,3.0,2.0,2.0,0.0,2.0,1.0,3.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,3.0,3.0,2.0,2.0,4.0,2.0,3.0,2.0,3.0,2.0,4.0,4.0,2.0,3.0,2.0,4.0,1.0,2.0,2.0,1.0,2.0,3.0,2.0,3.0,2.0,3.0,2.0,1.0,2.0,3.0,2.0,2.0,2.0,3.0,3.0,1.0,2.0,2.0,3.0,2.0,3.0,2.0,2.0,3.0,2.0,3.0,3.0,2.0,2.0,3.0,2.0,3.0,2.0,1.0,2.0,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,1.0,1.0,-1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,0.0,1.0,NDARNN368BDH,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2.0,2.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,2.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,2.0,1.0,0.0,0.0,2.0,0.0,2.0,0.0,0.0,2.0,0.0,0.0,2.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,2.0,1.0,1.0,2.0,0.0,0.0,0.0,0.0,1.0,2.0,0.0,0.0,0.0,2.0,1.0,0.0,1.0,1.0,0.0,2.0,0.0,2.0,0.0,0.0,2.0,1.0,1.0,0.0,2.0,0.0,0.0,0.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,2.0,2.0,1.0,2.0,0.0,2.0,1.0,0.0,2.0,0.0,1.0,1.0,0.0,2.0,1.0,2.0,1.0,0.0,2.0,2.0,2.0,0.0,1.0,2.0,3.0,3.0,2.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,1.0,3.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,True,True,False
1,4.0,5.0,2.0,4.0,4.0,,4.0,2.0,5.0,1.0,5.0,1.0,4.0,5.0,5.0,4.0,1.0,4.0,5.0,4.0,1.0,2.0,4.0,1.0,2.0,4.0,4.0,1.0,1.0,1.0,1.0,,1.0,3.0,2.0,4.0,2.0,1.0,2.0,4.0,1.0,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,0.0,0.0,,1.0,1.0,0.0,0.0,,1.0,,1.0,1.0,0.0,,1.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,2.0,12.1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,NDARGM645PL4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,0.0,2.0,1.0,1.0,0.0,2.0,2.0,1.0,1.0,0.0,0.0,1.0,2.0,0.0,2.0,0.0,1.0,1.0,1.0,2.0,2.0,2.0,2.0,2.0,2.0,0.0,2.0,2.0,0.0,2.0,2.0,2.0,0.0,2.0,0.0,1.0,1.0,1.0,2.0,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,0.0,1.0,,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,,1.0,0.0,0.0,0.0,0.0,0.0,,0.0,1.0,0.0,,0.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,,1.0,0.0,0.0,2.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,2.0,1.0,2.0,2.0,1.0,2.0,1.0,0.0,-1.0,0.0,0.0,-1.0,-1.0,0.0,-1.0,1.0,0.0,1.0,0.0,0.0,-1.0,0.0,-2.0,0.0,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,False,False,True
2,4.0,5.0,2.0,3.0,5.0,2.0,4.0,1.0,5.0,1.0,4.0,1.0,5.0,5.0,4.0,5.0,1.0,4.0,4.0,4.0,1.0,2.0,3.0,2.0,2.0,5.0,5.0,1.0,1.0,1.0,1.0,2.0,1.0,3.0,1.0,5.0,3.0,1.0,2.0,4.0,1.0,3.0,3.0,3.0,2.0,4.0,4.0,3.0,3.0,3.0,3.0,3.0,4.0,1.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,1.0,3.0,1.0,2.0,4.0,4.0,4.0,4.0,4.0,1.0,2.0,2.0,3.0,1.0,5.0,5.0,1.0,5.0,5.0,5.0,5.0,,,,,,,,,,,,,,,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,2.0,2.0,1.0,2.0,2.0,2.0,2.0,2.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,16.0,3.0,1.0,1.0,0.0,0.0,2.0,2.0,0.0,2.0,1.0,3.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,3.0,2.0,,2.0,3.0,3.0,2.0,3.0,0.0,1.0,2.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,3.0,0.0,0.0,2.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,1.0,4.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,0.0,1.0,2.0,2.0,1.0,1.0,1.0,1.0,2.0,1.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,1.0,2.0,0.0,1.0,2.0,1.0,0.0,1.0,0.0,2.0,2.0,0.0,0.0,2.0,1.0,1.0,1.0,2.0,1.0,2.0,0.0,2.0,2.0,0.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,0.0,2.0,,,,,,,,,,,,,,,,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,NDARMA875ARE,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,5.0,4.0,3.0,2.0,4.0,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,2.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,1.0,2.0,1.0,1.0,0.0,1.0,0.0,2.0,1.0,0.0,1.0,1.0,1.0,1.0,2.0,0.0,1.0,2.0,2.0,2.0,1.0,1.0,2.0,3.0,1.0,3.0,3.0,2.0,3.0,3.0,2.0,3.0,3.0,2.0,0.0,0.0,0.0,-1.0,1.0,1.0,2.0,3.0,2.0,3.0,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,True,True,False
3,5.0,5.0,3.0,2.0,3.0,3.0,3.0,2.0,5.0,1.0,4.0,2.0,5.0,5.0,5.0,5.0,2.0,4.0,4.0,5.0,1.0,2.0,2.0,1.0,1.0,5.0,5.0,1.0,1.0,1.0,1.0,3.0,2.0,2.0,1.0,5.0,1.0,1.0,4.0,4.0,1.0,3.0,5.0,3.0,1.0,5.0,1.0,5.0,5.0,1.0,5.0,1.0,5.0,1.0,5.0,5.0,5.0,5.0,1.0,5.0,1.0,5.0,4.0,1.0,4.0,2.0,1.0,1.0,5.0,1.0,3.0,3.0,2.0,5.0,1.0,3.0,3.0,3.0,3.0,,5.0,2.0,1.0,1.0,2.0,2.0,1.0,0.0,0.0,1.0,0.0,2.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,13.3,0.0,1.0,2.0,3.0,2.0,2.0,3.0,0.0,0.0,0.0,3.0,1.0,0.0,1.0,0.0,2.0,0.0,1.0,0.0,0.0,1.0,1.0,3.0,1.0,2.0,,,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,3.0,0.0,2.0,3.0,2.0,3.0,4.0,3.0,4.0,3.0,3.0,3.0,2.0,3.0,2.0,2.0,2.0,2.0,2.0,4.0,4.0,4.0,4.0,3.0,1.0,3.0,3.0,3.0,3.0,2.0,4.0,2.0,3.0,3.0,2.0,1.0,2.0,3.0,3.0,2.0,3.0,2.0,3.0,3.0,3.0,1.0,2.0,3.0,3.0,3.0,3.0,2.0,1.0,2.0,3.0,1.0,3.0,1.0,1.0,1.0,0.0,1.0,0.0,2.0,0.0,2.0,2.0,0.0,2.0,0.0,2.0,2.0,2.0,0.0,0.0,2.0,2.0,0.0,2.0,2.0,1.0,2.0,1.0,0.0,2.0,2.0,0.0,2.0,2.0,2.0,0.0,2.0,1.0,2.0,0.0,0.0,0.0,2.0,2.0,0.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,,,,,,,,,,,,,,,,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,NDARXC367LA4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,1.0,2.0,0.0,2.0,0.0,2.0,1.0,1.0,1.0,0.0,0.0,1.0,2.0,0.0,0.0,2.0,2.0,0.0,0.0,2.0,0.0,2.0,0.0,0.0,2.0,0.0,2.0,0.0,0.0,1.0,2.0,2.0,0.0,2.0,1.0,2.0,0.0,2.0,2.0,2.0,2.0,1.0,2.0,0.0,2.0,0.0,0.0,1.0,0.0,2.0,0.0,0.0,1.0,2.0,2.0,1.0,2.0,1.0,1.0,1.0,2.0,0.0,1.0,2.0,1.0,1.0,0.0,1.0,1.0,1.0,2.0,1.0,1.0,0.0,2.0,0.0,1.0,0.0,1.0,2.0,2.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,2.0,0.0,1.0,2.0,2.0,1.0,2.0,1.0,2.0,0.0,0.0,1.0,1.0,2.0,2.0,2.0,1.0,2.0,1.0,2.0,0.0,2.0,2.0,2.0,2.0,3.0,2.0,3.0,3.0,2.0,2.0,3.0,3.0,-1.0,3.0,3.0,2.0,3.0,0.0,0.0,0.0,2.0,0.0,1.0,1.0,1.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,True,False,True,False
4,3.0,5.0,3.0,5.0,5.0,1.0,5.0,3.0,5.0,1.0,3.0,2.0,5.0,5.0,5.0,5.0,2.0,5.0,1.0,5.0,1.0,3.0,5.0,1.0,1.0,5.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,3.0,1.0,3.0,3.0,1.0,2.0,5.0,5.0,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2.0,2.0,1.0,2.0,2.0,2.0,2.0,,,,,,,,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,2.0,1.0,0.0,1.0,2.0,2.0,2.0,2.0,2.0,1.0,2.0,0.0,1.0,2.0,1.0,1.0,8.8,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,,-1.0,1.0,1.0,0.0,-1.0,0.0,NDARZK659DWX,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,,,,,,,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,0.0,2.0,1.0,0.0,2.0,2.0,2.0,0.0,1.0,2.0,2.0,2.0,1.0,1.0,2.0,1.0,2.0,1.0,2.0,1.0,2.0,1.0,1.0,0.0,2.0,3.0,3.0,1.0,3.0,3.0,3.0,3.0,2.0,3.0,3.0,3.0,2.0,2.0,2.0,2.0,-3.0,2.0,2.0,2.0,2.0,2.0,2.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,True,True,False


We can also find the total number of records (i.e. subjects).

In [7]:
%bql SELECT COUNT(*) as N FROM "raw_questionnaire_responses";

Unnamed: 0,N
0,630.0


### 2e. Creating a BayesDB population for the questionnaire response data

The notion of a "population" is a central concept in BayesDB. For a standard database table, such as `raw_questionnaire_responses`, each column is associated with a [data type](https://sqlite.org/datatype3.html), which in sqlite3 are `TEXT`, `REAL`, `INTEGER`, and `BLOB`. For a BayesDB population, each variable is associated with a _statistical data type_. These statistical types, such as `NOMINAL`, `NUMERICAL`, `MAGNITUDE`, and `COUNTS`, specify the set of values and default probability distributions used for building probabilistic models of the data in the population. In this tutorial, we will use the `NUMERICAL` and `NOMINAL` statistical data types.

We can use the `GUESS SCHEMA FOR <table>` command from the Metamodeling Language (MML) in BayesDB to guess the statistical data types of variables in the table. The guesses use heuristics based on the contents in the cells. The `num_distinct` column shows the number of unique values for that variable, and the `reason` column explains which heuristic was used to make the guess.

In [8]:
%mml GUESS SCHEMA FOR "raw_questionnaire_responses"

Unnamed: 0,column,stattype,num_distinct,reason
0,APQ_P_01,nominal,5.0,There are fewer than 20 distinct numerical va...
1,APQ_P_02,nominal,5.0,There are fewer than 20 distinct numerical va...
2,APQ_P_03,nominal,6.0,There are fewer than 20 distinct numerical va...
3,APQ_P_04,nominal,6.0,There are fewer than 20 distinct numerical va...
4,APQ_P_05,nominal,6.0,There are fewer than 20 distinct numerical va...
5,APQ_P_06,nominal,6.0,There are fewer than 20 distinct numerical va...
6,APQ_P_07,nominal,6.0,There are fewer than 20 distinct numerical va...
7,APQ_P_08,nominal,6.0,There are fewer than 20 distinct numerical va...
8,APQ_P_09,nominal,4.0,There are fewer than 20 distinct numerical va...
9,APQ_P_10,nominal,6.0,There are fewer than 20 distinct numerical va...


In [9]:
%%mml
CREATE POPULATION "questionnaire_responses_population" FOR "raw_questionnaire_responses" WITH SCHEMA (
    GUESS STATTYPES OF (*);
    -- stuff that the guess suggested to ignore:
    SET STATTYPES OF
         "SCQ_30",
         "SCQ_01",
         "SCQ_28",
         "ADHD",
         "Anxiety",
         "Autism",
         "Other"
    TO
        NOMINAL;
    SET STATTYPE OF 
         "Age" 
    TO
        NUMERICAL;
    IGNORE	 
         "EID";
);

### 2f. Creating initial multivariate models of the data

#### Turn on multi-core computing with BayesDB

In [10]:
%multiprocess on

Multiprocessing turned on from off.


This way, we can use all 64 cores of this machine in parallel. 

Now that we have created the `questionnaire_responses_population` population, the next step is to analyze the data by building probabilistic models which explain the data generating process. Probabilistic data analyses in BayesDB are specified using an `MODELING SCHEMA`. The default model discovery engine in BayesDB is Cross-Categorization [(Crosscat)](http://jmlr.org/papers/v17/11-392.html). CrossCat is a Bayesian factorial mixture model which learns a full joint distribution over all variables in the population, using a divide-and-conquer approach. We will explore CrossCat more in this notebook.

For now we use MML to declare the an analysis schema named `questionnaire_responses_m` for the `questionnaire_responses_population` population.

In [12]:
%%mml
CREATE GENERATOR FOR "questionnaire_responses_population" USING loom;

After creating the generator, we now need to initialize `MODELS` for the schema. We can think of a `MODEL` as specifying a hypothesis space of explanations for the data generating process for the population, and each `ANALYSIS` is a candidate hypothesis. We start by creating only 50 models, which are initialized __randomly__.

In [13]:
%mml INITIALIZE 50 MODELS IF NOT EXISTS FOR "questionnaire_responses_population";

Next, we run analysis for four hours.

In [None]:
%mml ANALYZE "questionnaire_responses_m" FOR 240 MINUTES WAIT (OPTIMIZED);

## 3. Exploring probable dependencies between variables and comparing CrossCat dependence probability to linear correlation

As mentioned earlier, all BQL queries are aggregated across the 60 analyses in the ensemble. We will create a table named `dependencies` which contains the pairwise `DEPENDENCE PROBABILITY` values between the questionnaire variables. The value of a cell (between 0 and 1) is the fraction of analyses in the ensemble where those two variables are detected to be probably dependent (i.e. they are in the same view).

In [None]:
%%bql
CREATE TABLE dependencies AS
    ESTIMATE DEPENDENCE PROBABILITY AS "depprob"
        FROM PAIRWISE VARIABLES OF questionnaire_responses_population;

We again summarize the `dependencies` table using a heatmap. Study this dependence heatmap, and compare it to the heatmap produced when there was only 1 analysis. Which common-sense dependencies were missed by the single model, but identified by the ensemble as probably dependent?

Find a full list of questionnaire items [here.](questionnaires-items.ipynb)

In [None]:
%bql .interactive_heatmap SELECT name0, name1, depprob FROM dependencies;

Let us compare dependence probabilities from CrossCat to linear correlation values, a very common technique for finding predictive relationships. We can compute the (and its p-value) in BayesDB using the `CORRELATION` and `CORRELATION PVALUE` queries. The following cell creates a table named `correlations`, which contains the R and p-value for all pairs of variables.

In [None]:
%%bql
CREATE TABLE "correlations" AS
ESTIMATE
    CORRELATION AS "correlation",
    CORRELATION PVALUE AS "pvalue"
FROM PAIRWISE VARIABLES OF "questionnaire_responses_population"

__Emphasis__: There is a signficiant difference between `DEPENDENCE PROBABILITY`, `CORRELATION`, and `CORRELATION PVALUE`. We outline these differences below, which will help us make comparisons between predictive relationships detected by CrossCat versus Pearson correlation.

- `DEPENDENCE PROBABILITY`: Returns a value between [0,1] indicating the __probability there exists__ a predictive relationship (statistical dependence) between two variables.

- `CORRELATION`: Returns a value between [0,1] indicating the __strength__ of the linear relationsip between two variables, where 0 means no linear correlation, and 1 means perfect linear correlation.

- `CORRELATION PVALUE`: Returns a value between (0, 1) indicating the tail probability of the observed correlation value between two variables, under the null hypothesis that the two variables have zero correlation.

Based on these distinctions, there is no immediate way to numerically compare `DEPENDENCE PROBABILITY` with `CORRELATION/CORRELATION PVALUE`. However, it is possible to compare the inferences about predictive relationships that each method gives rise to, which we do in the next section.

Let us first produce a heatmap of the raw correlation values. The following query shows the raw correlation values (between 0 and 1) for all pairs of variables where the p-value is less than 0.01 (note that we are not accounting for multiple-testing using e.g. Bonferroni correction). Pairs of variables where the p-value exceeds 0.01 (and thus the null hypothesis of independence cannot be rejected) are shown in gray. The sparsity of the data makes it difficult to draw inferences about many variables.

In [None]:
%bql .interactive_heatmap SELECT name0, name1, "correlation" FROM "correlations" WHERE "pvalue" < 0.1

Explore the heatmap, and compare it to the heatmap from `DEPENDENCE PROBABILITY`. The patterns of dependence relationships differ significantly, how?

We can use BQL to find variables which CrossCat believes are probably dependent, but correlation believes are independent (either the null hypothesis of independence cannot be rejected, or the correlation value is significant and near zero).

In [None]:
%%sql
SELECT
    "name0",
    "name1",
    "dependencies"."depprob",
    "correlations"."correlation",
    "correlations"."pvalue"
FROM
    "dependencies"
    JOIN "correlations"
    USING ("name0", "name1")
WHERE
    -- CrossCat: probability dependent.
    "dependencies"."depprob" > 0.85
    AND (
    -- Correlation: cannot reject null hypothesis of independence.
    "correlations"."pvalue" > 0.05
    OR (
    -- Correlation: linear relationship is significant and near zero.
    "correlations"."pvalue" < 0.05 AND "correlations"."correlation" < 0.05))

#### Quesstionnaire items
- SWAN_06: engages in tasks that require sustained mental effort.
- SDQ_25: Good attention span, sees chores or homework through to the end.
- IAT_13: How often do you snap, yell, or act annoyed if someone bothers you while you are online

Find a full list of questionnaire items [here.](questionnaires-items.ipynb)

#### Example of no probability of dependence being high

In [None]:
%bql .show_histograms --table raw_questionnaire_responses --column1 SDQ_25 --column2 Sex

We can use also BQL to find variables which CrossCat believes are probably independent, but correlation believes are dependent (a statistically significant non-zero correlation value, where we are using an R cutoff of 0.15). The following query shows a list of such variables.

In [None]:
%%sql
SELECT
    "name0",
    "name1",
    "dependencies"."depprob",
    "correlations"."correlation",
    "correlations"."pvalue"
FROM
    "dependencies"
    JOIN "correlations"
    USING ("name0", "name1")
WHERE
    -- CrossCat: high uncertainty about dependence probability.
    "dependencies"."depprob" < 0.05
    AND (
    -- Correlation: statistically significant dependence.
    "correlations"."pvalue" < 0.05 AND "correlations"."correlation" > 0.15)
LIMIT 10

The top correlations with 0 probability of dependency are all taken from the
Alabama Parenting Questionnaire (parent report):

- APQ_P_11: You help your child with his/her homework
- APQ_P_18: You hug or kiss your child when he/she has done something well
- APQ_P_28: You don't check that your child comes home at the time he/she was supposed to
- APQ_P_30: You don't check that your child comes home at the time he/she was supposed to
- APQ_P_40: You calmly explain to your child why his/her behavior was wrong when he/she misbehaves
- ...


Find all the questions of all questionnaires [here.](questionnaires-items.ipynb)

#### Example of probability of dependence being 0

In [None]:
%bql .show_histograms --table raw_questionnaire_responses --column1 APQ_P_28 --column2 Sex

#### Quesstionnaire item
- APQ_P_28: You don't check that your child comes home at the time he/she was supposed to

## 4. Taking a closer look at dependencies and the CrossCat state

### 4a. Listing variables probably dependent with autism diagnosis, ADHD diagnosis, anxiety diagnosis and age


In [None]:
%%bql
SELECT * FROM "dependencies"
    WHERE name0 = 'Autism'
    ORDER BY depprob DESC
    LIMIT 50

In [None]:
%%bql
SELECT * FROM "dependencies"
    WHERE name0 = 'ADHD'
    ORDER BY depprob DESC
    LIMIT 50

In [None]:
%%bql
SELECT * FROM "dependencies"
    WHERE name0 = 'Anxiety'
    ORDER BY depprob DESC
    LIMIT 50

In [None]:
%%bql
SELECT * FROM "dependencies"
    WHERE name0 = 'age'
    ORDER BY depprob DESC
    LIMIT 10

Alabama Parenting Questionnaire (parent report)

- apq_p_06:   You child fails to leave a note to let you know where he/she is going
- apq_p_10:   Your child stays out in the evening past the time that he/she is supposed to be home
- apq_p_11:   You help your child with his/her homework
- apq_p_19:   Your child goed out with a set time to be home
- apq_p_21:   Your child goes out after dark without an adult
- apq_p_30:  Your child comes home from school more than an hour past the time you expect him/her to be home
- apq_p_32:  Your child is at home without adult supervision


Alabama Parenting Questionnaire (self report)

- apq_sr_06:  You fail to leave a note or let your parents know where you are going
- apq_sr_10:  You stay out in the evening past the time you are supposed to be home

### 4b. Visualizing CrossCat states
For simplicity, we now create a new, smaller population with the 110 variables above. We
analyze again (which is much quicker now, because we have only a fraction of the column)
and plot the individual crosscat states.

The full `raw_questionnaire_responses` table contains over 700 columns. In this notebook, our exploratory analysis will be based on a random subsample of 100 columns. To create the subsample, we use the `.subsample_columns` magic. The `--keep` flag accepts a list of column names which should be kept. We will keep the columns we found in the three tables above. Finally, `raw_questionnaire_responses_subsample` is the name of the new table, and 110 is the number of columns to downsample to.

In [None]:
%bql .subsample_columns  raw_questionnaire_responses0 raw_questionnaire_responses_subsample 110 --seed=1 \
--keep EID Autism_Spectrum_Disorder Attention_Deficit_Hyperactivity_Disorder assq_03 assq_04 assq_09 assq_10 assq_11 assq_13 assq_14 assq_15 assq_16 assq_18 assq_19 assq_20 assq_21 assq_22 assq_23 assq_24 assq_26 assq_27 scq_03 scq_04 scq_06 scq_07 scq_08 scq_11 scq_12 scq_13 scq_14 scq_15 scq_16 assq_06 assq_07 assq_08 scq_09 scq_38 assq_17 scq_05 scq_19 assq_02 assq_05 assq_25 scared_p_08 scq_18 scq_33 assq_12 sdq_23 sdq_11 scq_17 scq_10 assq_01 sex sdq_02 sdq_10 sdq_21 swan_03 swan_10 swan_11 swan_12 swan_13 swan_14 swan_15 swan_16 swan_17 swan_18 sdq_15 sdq_25 swan_01 swan_02 swan_04 swan_05 swan_06 swan_07 swan_08 swan_09 psi_32 sdq_26 mfq_p_21 mfq_p_07 sas_01 sas_06 sas_07 sds_03 psi_29 psi_36 sas_02 sas_04 sas_05 sas_10 apq_p_37 ari_p_01 ari_p_02 ari_p_03 ari_p_05 ari_p_06 ari_p_07 mfq_p_11 psi_17 psi_28 age apq_p_06 apq_p_10 apq_p_11 apq_p_19 apq_p_21 apq_p_30 apq_p_32 apq_sr_06 apq_sr_10

In [None]:
%%mml
CREATE POPULATION "questionnaire_responses_subsample_population" 
    FOR "raw_questionnaire_responses_subsample" WITH SCHEMA (
        GUESS STATTYPES FOR (*);
    );

In [None]:
%multiprocess on

In [None]:
%mml CREATE ANALYSIS SCHEMA "questionnaire_responses_subsample_m" FOR "questionnaire_responses_subsample_population" WITH BASELINE crosscat();
%mml INITIALIZE 60 MODELS FOR "questionnaire_responses_subsample_m";

Analysis is much quicker now since we are looking only at a fraction of all the columns.


In [None]:
%mml ANALYZE "questionnaire_responses_subsample_m" FOR 10 MINUTES WAIT (OPTIMIZED);

#### Visualizing a CrossCat hypothesis

As mentioned earlier, CrossCat learns the full joint distribution of all variables in the population using divide-and-conquer:

- First, CrossCat partitions the variables into a set of _views_; all the variables in a particular view are modeled jointly, and two variables in different views are independent of one another.
- Second, within each view, CrossCat clusters the rows using a non-parametric mixture model.

The name Cross-Categorization is derived from this two-step process: first categorize the variables into views, and then categorize the rows into clusters within each view of variables. It is important to note that two different views A and B are likely to induce different clusterings of the rows.

To get a sense of CrossCat's hypothesis space, we can render the hypothesis specified by a particular analysis using the `.render_crosscat [options] <analysis_schema> <analysis_identifier>` plotting command. The `--subsample=50` option says to only show a subsample of 50 rows in the rendering (even though `questionnaire_responses_subsample_m` is modeling all countries in the `childmind` population); `--rowlabels=country` specifies which column in the table to use to label the rows. Finally `questionnaire_responses_subsample_m 0` means to render the first (and only) analysis in the `questionnaire_responses_subsample_m` anlaysis schema.

__To view a full-size image of the rendering, either double click the image, or right-click and select "Open image in new tab."__

In [None]:
%mml .render_crosscat \
    --subsample=50 --rowlabels=EID --xticklabelsize=small --yticklabelsize=xx-small --progress=True --width=64 \
    questionnaire_responses_subsample_m 0

## 5. Similarity of subjects in different contexts

In the heatmap below, each row and column is a subject, and the value of a cell (between 0
and 1) indicates the probability that those two subject are relevant for formulating
predictions about each other. Do these clusters of subjects make sense?


### 5a. Similarity with respect to diagnosis: autism

In [None]:
%%bql
CREATE TABLE "similarity_autism" AS
    ESTIMATE SIMILARITY IN THE CONTEXT OF "Autism"
    FROM PAIRWISE "questionnaire_responses_population";

In [None]:
%bql .interactive_heatmap --label0=rowid --label1=EID --table=raw_questionnaire_responses\
    SELECT * FROM "similarity_autism"

### 5b. Similarity with respect to diagnosis: ADHD

In [None]:
%%bql
CREATE TABLE "similarity_adhd" AS
    ESTIMATE SIMILARITY IN THE CONTEXT OF "ADHD"
    FROM PAIRWISE "questionnaire_responses_population";

In [None]:
%bql .interactive_heatmap --label0=rowid --label1=EID --table=raw_questionnaire_responses \
    SELECT * FROM "similarity_adhd"

### 5c. Similarity with respect to a specific item in autism screening questionnaire
- ASSQ_6 (Autism Spectrum Screening Questionnaire): Child has a deviant style of communication with a formal, fussy, old-fashioned or robotlike language (i.e. talks differently than other children, in a formal or stilted way)

In [None]:
%%bql
CREATE TABLE "similarity_assq_06" AS
    ESTIMATE SIMILARITY IN THE CONTEXT OF "ASSQ_06"
    FROM PAIRWISE "questionnaire_responses_population";

In [None]:
%bql .interactive_heatmap --label0=rowid --label1=EID --table=raw_questionnaire_responses \
    SELECT * FROM "similarity_assq_06"