# Filter literature feature sets.

Candidate feature sets inspired by a literature review are filtered based on their entropy. Further details are in this notebook's parent notebook "UNSEEN filter feature sets.ipynb".

## Imports and helper functions

In [106]:
%run 'UNSEEN_helper_functions.ipynb'
%store -r

## Load literature feature-set array

Here, we run the notebook that creates the literature feature-set array. We will then save the feature-set array as "my_featureSet_array", so that the remaining syntax in this notebook is common for all feature-set sources.


It is assumed that the caseness variables have already been created in the parent notebook.

In [107]:
%%capture
if 'fs_literature' not in globals():
    %run ./"UNSEEN_create_literature_feature_sets.ipynb"
    %store fs_literature
my_featureSet_array = fs_literature

## Filter feature sets.

### 1. Mutual information of individual feature sets and the caseness variables.

In [108]:
# Set the order of the composite: 1 = individual, 2 = pair, 3 = triplet.
m = 1

#### 1.1. Multinomial caseness

##### 1.1.1. ALL representation.

In [109]:
featuresetmi(featureSet_array = my_featureSet_array,
             casenessVector = caseness_array[['person_id','CMHD']],
             m = m,
             representation = 'all',
             source = 'literature')


No save location provided.
...Defaulting to ~/Mutual information saves/Individuals



****************************************
Calculating mutual information values...


100%|██████████| 5/5 [00:00<00:00,  7.95it/s]

...

1 batch(es) of feature sets processed.
5 / 5 feature sets dropped due to low entropy.
****************************************





##### 1.1.2. MULTI representation.

In [110]:
featuresetmi(featureSet_array = my_featureSet_array,
             casenessVector = caseness_array[['person_id','CMHD']],
             m = m,
             representation = 'multi',
             source = 'literature')


No save location provided.
...Defaulting to ~/Mutual information saves/Individuals



****************************************
Calculating mutual information values...


100%|██████████| 5/5 [00:00<00:00,  5.03it/s]

...

1 batch(es) of feature sets processed.
5 / 5 feature sets dropped due to low entropy.
****************************************





#### 1.2. Definitive caseness

##### 1.2.1. ALL representation.

In [111]:
featuresetmi(featureSet_array = my_featureSet_array,
             casenessVector = caseness_array[['person_id','CMHD_dx_and_rx']],
             m = m,
             representation = 'all',
             source = 'literature')


No save location provided.
...Defaulting to ~/Mutual information saves/Individuals



****************************************
Calculating mutual information values...


100%|██████████| 5/5 [00:00<00:00,  9.24it/s]

/home/jupyter/UNSEEN/c-mcinerney-workspaceMutual information saves/Individuals
...

1 batch(es) of feature sets processed.
4 / 5 feature sets dropped due to low entropy.
****************************************





##### 1.2.2. MULTI representation.

In [112]:
featuresetmi(featureSet_array = my_featureSet_array,
             casenessVector = caseness_array[['person_id','CMHD_dx_and_rx']],
             m = m,
             representation = 'multi',
             source = 'literature')


No save location provided.
...Defaulting to ~/Mutual information saves/Individuals



****************************************
Calculating mutual information values...


100%|██████████| 5/5 [00:00<00:00,  5.92it/s]

/home/jupyter/UNSEEN/c-mcinerney-workspaceMutual information saves/Individuals
...

1 batch(es) of feature sets processed.
4 / 5 feature sets dropped due to low entropy.
****************************************





#### 1.3. Possible caseness

##### 1.3.1. ALL representation.

In [113]:
featuresetmi(featureSet_array = my_featureSet_array,
             casenessVector = caseness_array[['person_id','CMHD_rx_not_dx']],
             m = m,
             representation = 'all',
             source = 'literature')


No save location provided.
...Defaulting to ~/Mutual information saves/Individuals



****************************************
Calculating mutual information values...


100%|██████████| 5/5 [00:00<00:00,  8.70it/s]

...

1 batch(es) of feature sets processed.
5 / 5 feature sets dropped due to low entropy.
****************************************





##### 1.3.2. MULTI representation.

In [114]:
featuresetmi(featureSet_array = my_featureSet_array,
             casenessVector = caseness_array[['person_id','CMHD_rx_not_dx']],
             m = m,
             representation = 'multi',
             source = 'literature')


No save location provided.
...Defaulting to ~/Mutual information saves/Individuals



****************************************
Calculating mutual information values...


100%|██████████| 5/5 [00:00<00:00,  5.87it/s]

...

1 batch(es) of feature sets processed.
5 / 5 feature sets dropped due to low entropy.
****************************************





#### 1.4. No caseness (i.e. control group)

##### 1.4.1. ALL representation.

In [115]:
featuresetmi(featureSet_array = my_featureSet_array,
             casenessVector = caseness_array[['person_id','CMHD_control']],
             m = m,
             representation = 'all',
             source = 'literature')


No save location provided.
...Defaulting to ~/Mutual information saves/Individuals



****************************************
Calculating mutual information values...


100%|██████████| 5/5 [00:00<00:00,  9.25it/s]

...

1 batch(es) of feature sets processed.
5 / 5 feature sets dropped due to low entropy.
****************************************





##### 1.4.2. MULTI representation.

In [116]:
featuresetmi(featureSet_array = my_featureSet_array,
             casenessVector = caseness_array[['person_id','CMHD_control']],
             m = m,
             representation = 'multi',
             source = 'literature')


No save location provided.
...Defaulting to ~/Mutual information saves/Individuals



****************************************
Calculating mutual information values...


100%|██████████| 5/5 [00:00<00:00,  5.04it/s]

...

1 batch(es) of feature sets processed.
5 / 5 feature sets dropped due to low entropy.
****************************************





### 2. Mutual information of pair-composite feature sets and the caseness variables.

In [117]:
# Set the order of the composite: 1 = individual, 2 = pair, 3 = triplet.
m = 2

#### 2.1. Multinomial caseness

##### 2.1.1. ALL representation.

In [118]:
featuresetmi(featureSet_array = my_featureSet_array,
             casenessVector = caseness_array[['person_id','CMHD']],
             m = m,
             representation = 'all',
             source = 'literature')


No save location provided.
...Defaulting to ~/Mutual information saves/Pairs



****************************************
Calculating mutual information values...


100%|██████████| 10/10 [00:01<00:00,  7.48it/s]

...

1 batch(es) of feature sets processed.
10 / 10 feature sets dropped due to low entropy.
****************************************





##### 2.1.2. MULTI representation.

In [119]:
featuresetmi(featureSet_array = my_featureSet_array,
             casenessVector = caseness_array[['person_id','CMHD']],
             m = m,
             representation = 'multi',
             source = 'literature')


No save location provided.
...Defaulting to ~/Mutual information saves/Pairs



****************************************
Calculating mutual information values...


100%|██████████| 10/10 [00:02<00:00,  4.66it/s]

...

1 batch(es) of feature sets processed.
10 / 10 feature sets dropped due to low entropy.
****************************************





#### 2.2. Definitive caseness

##### 2.2.1. ALL representation.

In [120]:
featuresetmi(featureSet_array = my_featureSet_array,
             casenessVector = caseness_array[['person_id','CMHD_dx_and_rx']],
             m = m,
             representation = 'all',
             source = 'literature')


No save location provided.
...Defaulting to ~/Mutual information saves/Pairs



****************************************
Calculating mutual information values...


100%|██████████| 10/10 [00:01<00:00,  7.59it/s]

...

1 batch(es) of feature sets processed.
10 / 10 feature sets dropped due to low entropy.
****************************************





##### 2.2.2. MULTI representation.

In [121]:
featuresetmi(featureSet_array = my_featureSet_array,
             casenessVector = caseness_array[['person_id','CMHD_dx_and_rx']],
             m = m,
             representation = 'multi',
             source = 'literature')


No save location provided.
...Defaulting to ~/Mutual information saves/Pairs



****************************************
Calculating mutual information values...


100%|██████████| 10/10 [00:02<00:00,  4.92it/s]

/home/jupyter/UNSEEN/c-mcinerney-workspaceMutual information saves/Pairs
...

1 batch(es) of feature sets processed.
6 / 10 feature sets dropped due to low entropy.
****************************************





#### 2.3. Possible caseness

##### 2.3.1. ALL representation.

In [122]:
featuresetmi(featureSet_array = my_featureSet_array,
             casenessVector = caseness_array[['person_id','CMHD_rx_not_dx']],
             m = m,
             representation = 'all',
             source = 'literature')


No save location provided.
...Defaulting to ~/Mutual information saves/Pairs



****************************************
Calculating mutual information values...


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]

...

1 batch(es) of feature sets processed.
10 / 10 feature sets dropped due to low entropy.
****************************************





##### 2.3.2. MULTI representation.

In [123]:
featuresetmi(featureSet_array = my_featureSet_array,
             casenessVector = caseness_array[['person_id','CMHD_rx_not_dx']],
             m = m,
             representation = 'multi',
             source = 'literature')


No save location provided.
...Defaulting to ~/Mutual information saves/Pairs



****************************************
Calculating mutual information values...


100%|██████████| 10/10 [00:02<00:00,  4.74it/s]

...

1 batch(es) of feature sets processed.
10 / 10 feature sets dropped due to low entropy.
****************************************





#### 2.4. No caseness (i.e. control group)

##### 2.4.1. ALL representation.

In [124]:
featuresetmi(featureSet_array = my_featureSet_array,
             casenessVector = caseness_array[['person_id','CMHD_control']],
             m = m,
             representation = 'all',
             source = 'literature')


No save location provided.
...Defaulting to ~/Mutual information saves/Pairs



****************************************
Calculating mutual information values...


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]

...

1 batch(es) of feature sets processed.
10 / 10 feature sets dropped due to low entropy.
****************************************





##### 2.4.2. MULTI representation.

In [125]:
featuresetmi(featureSet_array = my_featureSet_array,
             casenessVector = caseness_array[['person_id','CMHD_control']],
             m = m,
             representation = 'multi',
             source = 'literature')


No save location provided.
...Defaulting to ~/Mutual information saves/Pairs



****************************************
Calculating mutual information values...


100%|██████████| 10/10 [00:02<00:00,  4.83it/s]

...

1 batch(es) of feature sets processed.
10 / 10 feature sets dropped due to low entropy.
****************************************





### 3. Mutual information of triplet-composite feature sets and the caseness variables.

In [126]:
# Set the order of the composite: 1 = individual, 2 = pair, 3 = triplet.
m = 3

#### 3.1. Multinomial caseness

##### 3.1.1. ALL representation.

In [127]:
featuresetmi(featureSet_array = my_featureSet_array,
             casenessVector = caseness_array[['person_id','CMHD']],
             m = m,
             representation = 'all',
             source = 'literature')


No save location provided.
...Defaulting to ~/Mutual information saves/Triplets



****************************************
Calculating mutual information values...


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]

...

1 batch(es) of feature sets processed.
10 / 10 feature sets dropped due to low entropy.
****************************************





##### 3.1.2. MULTI representation.

In [128]:
featuresetmi(featureSet_array = my_featureSet_array,
             casenessVector = caseness_array[['person_id','CMHD']],
             m = m,
             representation = 'multi',
             source = 'literature')


No save location provided.
...Defaulting to ~/Mutual information saves/Triplets



****************************************
Calculating mutual information values...


100%|██████████| 10/10 [00:02<00:00,  4.25it/s]

...

1 batch(es) of feature sets processed.
10 / 10 feature sets dropped due to low entropy.
****************************************





#### 3.2. Definitive caseness

##### 3.2.1. ALL representation.

In [129]:
featuresetmi(featureSet_array = my_featureSet_array,
             casenessVector = caseness_array[['person_id','CMHD_dx_and_rx']],
             m = m,
             representation = 'all',
             source = 'literature')


No save location provided.
...Defaulting to ~/Mutual information saves/Triplets



****************************************
Calculating mutual information values...


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]

...

1 batch(es) of feature sets processed.
10 / 10 feature sets dropped due to low entropy.
****************************************





##### 3.2.2. MULTI representation.

In [130]:
featuresetmi(featureSet_array = my_featureSet_array,
             casenessVector = caseness_array[['person_id','CMHD_dx_and_rx']],
             m = m,
             representation = 'multi',
             source = 'literature')


No save location provided.
...Defaulting to ~/Mutual information saves/Triplets



****************************************
Calculating mutual information values...


100%|██████████| 10/10 [00:02<00:00,  4.31it/s]

/home/jupyter/UNSEEN/c-mcinerney-workspaceMutual information saves/Triplets
...

1 batch(es) of feature sets processed.
4 / 10 feature sets dropped due to low entropy.
****************************************





#### 3.3. Possible caseness

##### 3.3.1. ALL representation.

In [131]:
featuresetmi(featureSet_array = my_featureSet_array,
             casenessVector = caseness_array[['person_id','CMHD_rx_not_dx']],
             m = m,
             representation = 'all',
             source = 'literature')


No save location provided.
...Defaulting to ~/Mutual information saves/Triplets



****************************************
Calculating mutual information values...


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]

...

1 batch(es) of feature sets processed.
10 / 10 feature sets dropped due to low entropy.
****************************************





##### 3.3.2. MULTI representation.

In [132]:
featuresetmi(featureSet_array = my_featureSet_array,
             casenessVector = caseness_array[['person_id','CMHD_rx_not_dx']],
             m = m,
             representation = 'multi',
             source = 'literature')


No save location provided.
...Defaulting to ~/Mutual information saves/Triplets



****************************************
Calculating mutual information values...


100%|██████████| 10/10 [00:02<00:00,  4.08it/s]

...

1 batch(es) of feature sets processed.
10 / 10 feature sets dropped due to low entropy.
****************************************





#### 3.4. No caseness (i.e. control group)

##### 3.4.1. ALL representation.

In [133]:
featuresetmi(featureSet_array = my_featureSet_array,
             casenessVector = caseness_array[['person_id','CMHD_control']],
             m = m,
             representation = 'all',
             source = 'literature')


No save location provided.
...Defaulting to ~/Mutual information saves/Triplets



****************************************
Calculating mutual information values...


100%|██████████| 10/10 [00:01<00:00,  5.41it/s]

...

1 batch(es) of feature sets processed.
10 / 10 feature sets dropped due to low entropy.
****************************************





##### 3.4.2. MULTI representation.

In [134]:
featuresetmi(featureSet_array = my_featureSet_array,
             casenessVector = caseness_array[['person_id','CMHD_control']],
             m = m,
             representation = 'multi',
             source = 'literature')


No save location provided.
...Defaulting to ~/Mutual information saves/Triplets



****************************************
Calculating mutual information values...


100%|██████████| 10/10 [00:03<00:00,  3.15it/s]

...

1 batch(es) of feature sets processed.
10 / 10 feature sets dropped due to low entropy.
****************************************



