<img src="https://raw.githubusercontent.com/probcomp/notebook/e66a399561d069a5dfe19c7efd8dd62c09a80787/tutorials/resources/header.png"/>
<img src="https://www.yammer.com/api/v1/uploaded_files/61754420/preview/CMI_rgb_horizontal_logo.png" style='height:75px'/>

# Exploratory data analysis with BayesDB for child psychology


Authored by: Ulrich Schaechtle, Veronica Sara Weiner of the MIT Probabilistic Computing Project (Probcomp) with Arno Klein and Jon Clucas of the Child Mind Institute MATTER Lab.

In this notebook, we will use BayesDB for exploratory data analysis of the questionnaire responses data set provided by the [Child Mind Institute](https://childmind.org/). The data set contains individual question-items of a [set of
questionnaires](http://fcon_1000.projects.nitrc.org/indi/cmi_healthy_brain_network/assessments/master-list.html) related to childhood mental health and a set of structural MRI features. 

This notebook will cover the following topics:

1. List of questionnaires and links to individual questions
2. Analysis with BayesDB
3. Exploring probable dependencies between variables and comparing CrossCat dependence probability to linear correlation
4. Taking a closer look at dependencies and the models built by CrossCat
5. Aggregating across an ensemble of CrossCat models to assess the similarity of subjects

## 1. List of questionnaires and links to individual questions

The following are the questionnares from which data has been collected by the Child Mind Institute, and that we analyze in this notebook. Each link leads to a set of questions. We analyze the data in the original questionnaire format (as categorical responses to individual questions). Compound/summary scores are not included, the diagnoses have been transformed into individual binary variables.
   
- [ACE: Adverse Childhood Experience (ACE)](resources/html/ACE.html)
- [APQ_P: Alabama Parenting Questionnaire (APQ) Parent-Report](resources/html/APQ_P.html)
- [APQ_S: Alabama Parenting Questionnaire (APQ) Self-Report](resources/html/APQ_S.html)
- [ARI_P: Affective Reactivity Index (ARI) Parent-Report](resources/html/ARI_P.html)
- [ARI_S: Affective Reactivity Index (ARI) Self-Report](resources/html/ARI_S.html)
- [ASR: Adult Self Report](resources/html/ASR.html)
- [ASSQ: Autism Spectrum Screening Questionnaire (ASSQ)](resources/html/ASSQ.html)
- [C3SR (Conners-SR): Conners 3 - Self-Report (C3SR)](resources/html/C3SR.html)
- [CAARS: Conners' Adult ADHD Rating Scales (CAARS)](resources/html/CAARS.html)
- [CBCL: Child Behavior Checklist (CBCL) -- Age 6-18](resources/html/CBCL.html)
- [CBCL_Pre: Child Behavior Checklist (CBCL) Pre-School](resources/html/CBCL_Pre.html)
- [CCSC: Children's Coping Strategies Checklist-Revised](resources/html/CCSC.html)
- [CDI2SR: Children's Depression Index (CDI) Self Report](resources/html/CDI2SR.html)
- [CDI2_P: Children's Depression Index (CDI) Parent Report](resources/html/CDI2_P.html)
- [CIS_P: Columbia Impairment Scale (CIS) Parent Report](resources/html/CIS_P.html)
- [CIS_SR: Columbia Impairment Scale - Self-Report (CIS-SR)](resources/html/CIS_SR.html)
- [CPIC: Children's Perception of Interparental Conflict Scale (CPIC)](resources/html/CPIC.html)
- [DTS: Distress Tolerance Scale (DTS)](resources/html/DTS.html)
- ~~[IAT: Internet Addiction Test (IAT)](resources/html/IAT.html)~~
- [ICU_P: Inventory of Callous-Unemotional Traits (ICU) Parent Report](resources/html/ICU_P.html)
- [ICU_SR: Inventory of Callous-Unemotional Traits (ICU) Self Report](resources/html/ICU_SR.html)
- [MFQ_P: Mood and Feelings Questionnaire (MFQ) Parent Report](resources/html/MFQ_P.html)
- [MFQ_SR: Mood and Feelings Questionnaire (MFQ) Self Report](resources/html/MFQ_SR.html)
- ~~[PCIAT: Parent-Child Internet Addiction Test](resources/html/PCIAT.html)~~
- [PSI: Parenting Stress Index (PSI)](resources/html/PSI.html)
- [SAS: Socal Aptititude Scale (SAS)](resources/html/SAS.html)
- [SCARED_P: Screen for Child Anxiety Related Disorders (SCARED) Parent Report](resources/html/SCARED_P.html)
- [SCARED_SR: Screen for Anxiety Related Disorders (SCARED) Self Report](resources/html/SCARED_SR.html)
- [SCQ: Social Communication Questionnaire](resources/html/SCQ.html)
- [SDQ: Strength and Difficulties Questionnaire](resources/html/SDQ.html)
- [SDS: Sleep Disturbance Scale (SDS)](resources/html/SDS.html)
- [SRS_Preschool: Social Responsiveness Scale (SRS) Preschool](resources/html/SRS_Preschool.html)
- [SRS_School: Social Responsiveness Scale (SRS) School Age](resources/html/SRS_School.html)
- [STAI: State-Trait Anxiety Inventory for Adults](resources/html/STAI.html)
- [SWAN : The SWAN Rating Scale for ADHD](resources/html/SWAN .html)
- [Symptom_Checklist: Symptom Checklist - Parent](resources/html/Symptom_Checklist.html)
- [TRF_Preschool_Age: Teacher Report Form (TRF) Preschool Age](resources/html/TRF_Preschool_Age.html)
- [TRF_School_Age: Teacher Report Form (TRF) School Age](resources/html/TRF_School_Age.html)
- [WHODAS_P: WHO Disability Assessment Schedule (WHODAS) Parent-Report](resources/html/WHODAS_P.html)
- [WHODAS_SR: WHO Disability Assessment Schedule (WHODAS) Self-Report](resources/html/WHODAS_SR.html)
- [YFAS: Yale Food Addiction Scale (YFAS)](resources/html/YFAS.html)
- [YSR: Youth Self Report (YSR)](resources/html/YSR.html)



## 2. Analysis with BayesDB

### 2a. Setting up the Jupyter environment

The first step is to load the `jupyter_probcomp.magics` library, which provides BayesDB hooks for data exploration, plotting, querying, and analysis through this Jupyter notebook environment. The second cell allows plots from matplotlib and javascript to be shown inline.

In [1]:
%load_ext jupyter_probcomp.magics

session_id: jovyan@jclucas-notebook_2018-07-09T20:53:05.376703_7


In [2]:
%matplotlib inline
%vizgpm inline

<IPython.core.display.Javascript object>

In [3]:
import numpy as np

In [4]:
import pandas as pd

## Note:

In `questions_v3_new_targets.csv` it seems that both `Compulsion` and `UseDisorders` are all `NaN` which causes to program to crash at a later stage. I am thus removing those columns.

In [None]:
df = pd.read_csv('resources/data/questions_v3_new_targets.csv')

In [None]:
df.head()

### 2b. Creating a BayesDB `.bdb` file on disk

We next use the `%bayesdb` magic to create a `.bdb` file on disk named `childmind_smri_questions.bdb`. This file will store all the data and models created in this session.

In [5]:
%bayesdb resources/bdb/loom_repaired_cols.01.bdb
bdb = %get_bdb
import os
os.environ['LOOM_VERBOSITY'] = '0'
import bayeslite
from bayeslite.backends.loom_backend import LoomBackend
from bayeslite import bayesdb_register_backend
bayesdb_register_backend(bdb, LoomBackend(os.path.abspath('temp-loom-repaired-v2/')))

### 2c. Ingesting data from a `.csv` file into a BayesDB table

The questionnaire dataset is stored in the csv file `resources/init_data.csv`. Each column of the csv file is a variable, and each row is a record. We use the `CREATE TABLE` BQL query, with the pathname of the csv file, to convert the csv data into a database table named `raw_questionnaire_responses`.

In [None]:
%bql CREATE TABLE "raw_questionnaire_responses" FROM 'resources/data/questions_v3_new_targets.csv'

Almost all datasets have missing values, and special tokens such as `NaN` or `NA` indicating a particular cell is missing. In the questionnaire data, empty strings are used. To tell BayesDB to treat empty strings as SQL `NULL` we use the `.nullify` command, followed by the name of the table and the string `''` which represents missing data. Over 250,000 cells have been converted to `NULL`, illustrating that the data is quite sparse.

In [None]:
%bql .nullify raw_questionnaire_responses ''

### 2d. Running basic queries on the table using BQL and SQL

Now that the questionnaire dataset has been loaded into at table, and missing values converted to `NULL`, we can run standard SQL queries to explore the contents of the data. For example, we can select the first 5 records. Observe that each row in the table is a particular country, and each column is a macreconomic variable. Scroll through the names in the header of the table to get a sense of the marcoeconomic variables in the dataset. 

In [None]:
%bql SELECT * FROM "raw_questionnaire_responses" LIMIT 5;

We can also find the total number of records (i.e. subjects).

In [None]:
%bql SELECT COUNT(*) as N FROM "raw_questionnaire_responses";

### 2e. Creating a BayesDB population for the questionnaire response data

We can use the `GUESS SCHEMA FOR <table>` command from the Metamodeling Language (MML) in BayesDB to guess the statistical data types of variables in the table. The guesses use heuristics based on the contents in the cells. The `num_distinct` column shows the number of unique values for that variable, and the `reason` column explains which heuristic was used to make the guess.

In [None]:
%mml GUESS SCHEMA FOR "raw_questionnaire_responses"

# TODO: remove compulsions and usedisorders from ignore below.

In [None]:
%%mml
CREATE POPULATION "questionnaire_responses_population" FOR "raw_questionnaire_responses" WITH SCHEMA (
    GUESS STATTYPES OF (*);
    -- stuff that the guess suggested to ignore:
    SET STATTYPES OF
         "SCQ_30",
         "SCQ_01",
         "SCQ_28",
         "Anxiety",
         "Compulsions",
         "UseDisorders",
         "OtherDx"
    TO
        NOMINAL;
    SET STATTYPES OF 
         "Age",
         "SymptomsOfCruelty",
         "SymptomsOfSuicide"
    TO
        NUMERICAL;
    IGNORE
         "EID",
         "CBCL_15",
         "CBCL_16",
         "CBCL_26",
         "CDI2_08",
         "CBCL_91";
);

### 2f. Creating initial multivariate models of the data

#### Turn on multi-core computing with BayesDB

This way, we can use all 64 cores of this machine in parallel. 

Now that we have created the `questionnaire_responses_population` population, the next step is to analyze the data by building probabilistic models which explain the data generating process. Probabilistic data analyses in BayesDB are specified using an `MODELING SCHEMA`. The default model discovery engine in BayesDB is Cross-Categorization [(Crosscat)](http://jmlr.org/papers/v17/11-392.html). CrossCat is a Bayesian factorial mixture model which learns a full joint distribution over all variables in the population, using a divide-and-conquer approach. We will explore CrossCat more in this notebook.

For now we use MML to declare the an analysis schema named `questionnaire_responses_m` for the `questionnaire_responses_population` population.

In [None]:
%%mml
CREATE GENERATOR FOR "questionnaire_responses_population" using loom;

After creating the generator, we now need to initialize `MODELS` for the schema. We can think of a `MODEL` as specifying a hypothesis space of explanations for the data generating process for the population, and each `ANALYSIS` is a candidate hypothesis. We start by creating only 50 models, which are initialized __randomly__.

In [None]:
%mml INITIALIZE 30 MODELS IF NOT EXISTS FOR "questionnaire_responses_population";

Next, we run analysis for 150 iterations.

In [None]:
# Run two minutes of analysis.
%mml ANALYZE "questionnaire_responses_population" FOR 50 ITERATIONS;

## 3. Exploring probable dependencies between variables and comparing CrossCat dependence probability to linear correlation

As mentioned earlier, all BQL queries are aggregated across the 50 analyses in the ensemble. We will create a table named `dependencies` which contains the pairwise `DEPENDENCE PROBABILITY` values between the questionnaire variables. The value of a cell (between 0 and 1) is the fraction of analyses in the ensemble where those two variables are detected to be probably dependent (i.e. they are in the same view).

In [None]:
%%bql
CREATE TABLE dependencies AS
    ESTIMATE DEPENDENCE PROBABILITY AS "depprob"
        FROM PAIRWISE VARIABLES OF questionnaire_responses_population;

We again summarize the `dependencies` table using a heatmap. Study this dependence heatmap, and compare it to the heatmap produced when there was only 1 analysis. Which common-sense dependencies were missed by the single model, but identified by the ensemble as probably dependent?

Find a full list of questionnaire items [here.](questionnaires-items.ipynb)

In [None]:
%bql .interactive_heatmap SELECT name0, name1, depprob FROM dependencies;

In [None]:
%sql SELECT COUNT(symptomsofcruelty) FROM raw_questionnaire_responses WHERE symptomsofcruelty > 0;

In [None]:
%sql SELECT COUNT(symptomsofsuicide) FROM raw_questionnaire_responses WHERE symptomsofsuicide > 0;

In [None]:
%sql SELECT COUNT (anxiety) FROM raw_questionnaire_responses WHERE anxiety=='True';

In [None]:
%sql SELECT COUNT (usedisorders) FROM raw_questionnaire_responses WHERE usedisorders=='True';

In [None]:
%sql SELECT COUNT (compulsions) FROM raw_questionnaire_responses WHERE compulsions=='True';

---

In [6]:
mi_str ='''
ESTIMATE MUTUAL INFORMATION OF
    "{column}" WITH "{outcome}"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"
'''

In [7]:
cmi_str ='''
ESTIMATE MUTUAL INFORMATION OF
    "{column}" WITH "{outcome}"
    GIVEN ({already_selected_columns})
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"
'''

In [8]:
def search_for_n_nonredundant_predictors(n, outcome, potential_predictors):
    """This functions returns the n most relevant (and non-redundant predictors) for an outcome."""
    columns = potential_predictors['column'].values.tolist() # turn the dataframe with potential predicots into a list.
    nonredundant_predictors = [] # Inniatilize the list of non-redundant predictors.
    while len(nonredundant_predictors) < n: # As long as we haven't found n non-redundant predictos: repeat.
        mi_values = [] # Initialize 
        for column in columns:
            if not nonredundant_predictors: # If we haven't found an predictors yet we run un-conditional mutual information.
                current_query = mi_str.format(column=column, outcome=outcome)
            else: # Otherwise, we condition the query on what we found previously:
                # We have to turn the list of found predictors into a string; e.g ['a', 'b', 'a'] => 'a,b,c'.
                already_selected_columns = ','.join([
                    '"%s"' % col for col in nonredundant_predictors
                ])
                current_query = cmi_str.format(
                    column=column,
                    outcome=outcome,
                    already_selected_columns=already_selected_columns
                )
            # Print the query we are going to execute.
            print current_query
            print ''
            # Run query with BayesDB magics.
            df = %bql {current_query}
            # Turn the dataframe into an entry in a list.
            mi_values.append(df['mi'].values[0])
        # Find the maximallyanxi informative column and select it.
        selected_column = columns[np.argmax(mi_values)]
        # Add said selected column the list of non-redundant predictors.
        nonredundant_predictors.append(selected_column)
        # Remove said selected column from the list of columns.
        columns.remove(selected_column)
    return nonredundant_predictors

In [9]:
def get_predictions(outcome):
    potential_predictors = %bql SELECT name0 AS "column" FROM dependencies \
        WHERE ((name1 = {outcome}) AND (depprob > 0.5) AND (name1 != name0)) ORDER BY depprob DESC LIMIT 20;
    predictors  = search_for_n_nonredundant_predictors(
        4,
        outcome,
        potential_predictors
    )
    print '================'
    for pred in predictors:
        print pred
    return predictors

In [10]:
predictors_suicide =  get_predictions('symptomsofsuicide')


ESTIMATE MUTUAL INFORMATION OF
    "mfq_p_16" WITH "symptomsofsuicide"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "mfq_p_17" WITH "symptomsofsuicide"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "mfq_p_19" WITH "symptomsofsuicide"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "scq_17" WITH "symptomsofsuicide"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "cbcl_18" WITH "symptomsofsuicide"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "mfq_p_15" WITH "symptomsofsuicide"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "mfq_p_18" WITH "symptomsofsuicide"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_po


ESTIMATE MUTUAL INFORMATION OF
    "mfq_p_01" WITH "symptomsofsuicide"
    GIVEN ("mfq_p_16","cbcl_18")
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "mfq_p_02" WITH "symptomsofsuicide"
    GIVEN ("mfq_p_16","cbcl_18")
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "mfq_p_03" WITH "symptomsofsuicide"
    GIVEN ("mfq_p_16","cbcl_18")
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "mfq_p_08" WITH "symptomsofsuicide"
    GIVEN ("mfq_p_16","cbcl_18")
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "mfq_p_09" WITH "symptomsofsuicide"
    GIVEN ("mfq_p_16","cbcl_18")
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "mfq_p_12" WITH "symptomsofsuicide"
    GIVEN ("mfq_p_16","cbcl_18")
    U

In [11]:
predictors_cruelty =  get_predictions('symptomsofcruelty')


ESTIMATE MUTUAL INFORMATION OF
    "sdq_12" WITH "symptomsofcruelty"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "sdq_22" WITH "symptomsofcruelty"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "cbcl_81" WITH "symptomsofcruelty"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "cbcl_82" WITH "symptomsofcruelty"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "cbcl_39" WITH "symptomsofcruelty"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "sdq_18" WITH "symptomsofcruelty"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "cbcl_106" WITH "symptomsofcruelty"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_populati


ESTIMATE MUTUAL INFORMATION OF
    "cbcl_63" WITH "symptomsofcruelty"
    GIVEN ("cbcl_28","cbcl_43")
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "sdq_07" WITH "symptomsofcruelty"
    GIVEN ("cbcl_28","cbcl_43")
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "cbcl_21" WITH "symptomsofcruelty"
    GIVEN ("cbcl_28","cbcl_43")
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "cbcl_22" WITH "symptomsofcruelty"
    GIVEN ("cbcl_28","cbcl_43")
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "cbcl_23" WITH "symptomsofcruelty"
    GIVEN ("cbcl_28","cbcl_43")
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "sdq_22" WITH "symptomsofcruelty"
    GIVEN ("cbcl_28","cbcl_43","sdq_12")
    USING 

In [12]:
predictors_anxiety =  get_predictions('anxiety')


ESTIMATE MUTUAL INFORMATION OF
    "scared_p_05" WITH "anxiety"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "scared_p_14" WITH "anxiety"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "scared_p_17" WITH "anxiety"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "scared_p_21" WITH "anxiety"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "scared_p_23" WITH "anxiety"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "scared_p_28" WITH "anxiety"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "scared_p_33" WITH "anxiety"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
  


ESTIMATE MUTUAL INFORMATION OF
    "mfq_p_01" WITH "anxiety"
    GIVEN ("scared_p_21","sdq_08")
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "mfq_p_02" WITH "anxiety"
    GIVEN ("scared_p_21","sdq_08")
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "mfq_p_08" WITH "anxiety"
    GIVEN ("scared_p_21","sdq_08")
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "scared_p_05" WITH "anxiety"
    GIVEN ("scared_p_21","sdq_08","cbcl_112")
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "scared_p_14" WITH "anxiety"
    GIVEN ("scared_p_21","sdq_08","cbcl_112")
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "scared_p_17" WITH "anxiety"
    GIVEN ("scared_p_21","sdq_08","cbcl_112")
    USING 2

In [13]:
predictors_compulsions =  get_predictions('compulsions')


ESTIMATE MUTUAL INFORMATION OF
    "cdi2_03" WITH "compulsions"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "cdi2_05" WITH "compulsions"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "cdi2_22" WITH "compulsions"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "cis_sr_07" WITH "compulsions"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "cis_sr_10" WITH "compulsions"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "apq_p_38" WITH "compulsions"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "apq_sr_24" WITH "compulsions"
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATIO


ESTIMATE MUTUAL INFORMATION OF
    "ari_s_07" WITH "compulsions"
    GIVEN ("c3sr_01","cdi2_05")
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "c3sr_02" WITH "compulsions"
    GIVEN ("c3sr_01","cdi2_05")
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "c3sr_03" WITH "compulsions"
    GIVEN ("c3sr_01","cdi2_05")
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "c3sr_04" WITH "compulsions"
    GIVEN ("c3sr_01","cdi2_05")
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "cdi2_03" WITH "compulsions"
    GIVEN ("c3sr_01","cdi2_05","ari_s_07")
    USING 20 SAMPLES AS mi
    BY "questionnaire_responses_population"



ESTIMATE MUTUAL INFORMATION OF
    "cdi2_22" WITH "compulsions"
    GIVEN ("c3sr_01","cdi2_05","ari_s_07")
    USING 20 SAMPLES AS mi
   

In [None]:
predictors_compulsions =  get_predictions('usedisorders')