This notebook illustrates how to run LR, BMS and LSTM models from [Predicting pregnancy using large-scale data from a women's health tracking mobile application](https://arxiv.org/abs/1812.02222) with generated data.

Due to privacy concerns, we will not release the actual user data from Clue app. Instead, we provide a synthetic data generator which assumes fertility changes over a cycle and sample symptoms (sex activities and emotions) to obtain a pregnancy probability. Read `data_gen.py` for more details.

In [1]:
from data_gen import *

First, we generate 10000 cycles using the data generator.

In [2]:
data = genCycleData(10000)

Each data entry consists of a array of symptom tuples (day_in_cycle, symptom_name) and a binary label indicating whether this user gets preganant during this cycle. For example, the first data entry is printed as follows:

In [3]:
print(data[0])

([[0, 'emotion_sad'], [5, 'emotion_neutral'], [6, 'unprotected_sex'], [7, 'emotion_neutral'], [8, 'emotion_neutral'], [15, 'withdrawl_sex'], [16, 'emotion_happy'], [16, 'unprotected_sex'], [18, 'emotion_sad'], [23, 'emotion_sad']], 0)


Next, we convert, split and store the generated data into train / dev / test files as required by the models.

In [4]:
dataNpy = convertToNdarray(data)
splitAndSave(dataNpy, 0.8, 0.1, 0.1, "./data/generated/")

Finally, we could fit the model by running the entry point file `main.py`.

As an example, we fit a logistic regression model for 50 epochs on our generated model:

In [5]:
! python main.py \
    --experiment_name='Demo' \
    --data_dir='./data/' \
    --folder_name='generated' \
    --tb_dir='./experiments/' \
    --model='LR' \
    --num_epochs=50 \
    --batch_size=500
    

  from ._conv import register_converters as _register_converters
INFO:root:Prediction on per-cycle basis, using data:
	training data: ./data/generated/train, 
	dev data: ./data/generated/dev/dev.npy, 
	test data: ./data/generated/test/test.npy.
2019-02-21 17:17:49.049790: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Summary name LogisticRegression/fully_connected/weights:0 is illegal; using LogisticRegression/fully_connected/weights_0 instead.
INFO:tensorflow:Summary name LogisticRegression/fully_connected/weights:0 is illegal; using LogisticRegression/fully_connected/weights_0 instead.
INFO:tensorflow:Summary name LogisticRegression/fully_connected/biases:0 is illegal; using LogisticRegression/fully_connected/biases_0 instead.
INFO:tensorflow:Summary name LogisticRegression

INFO:root:Epoch 13 took 0.12 seconds: 0.01 seconds for loading data
INFO:root:Epoch: 14, Iter: 230, (part) train loss: 0.3977, (part) train auc: 0.833, (part) train auprc: 0.497, dev loss: 0.3311, dev auc: 0.817, dev auprc: 0.458
INFO:root:	Best dev auc so far: 0.817! Saving best checkpoint to ./experiments/Demo/best_checkpoint/best.ckpt...
INFO:root:Epoch: 14, Iter: 240, (part) train loss: 0.3838, (part) train auc: 0.840, (part) train auprc: 0.511, dev loss: 0.3300, dev auc: 0.823, dev auprc: 0.469
INFO:root:	Best dev auc so far: 0.823! Saving best checkpoint to ./experiments/Demo/best_checkpoint/best.ckpt...
INFO:root:Epoch 14 took 0.12 seconds: 0.01 seconds for loading data
INFO:root:Epoch: 15, Iter: 250, (part) train loss: 0.3510, (part) train auc: 0.848, (part) train auprc: 0.526, dev loss: 0.3288, dev auc: 0.829, dev auprc: 0.479
INFO:root:	Best dev auc so far: 0.829! Saving best checkpoint to ./experiments/Demo/best_checkpoint/best.ckpt...
INFO:root:Epoch 15 took 0.09 seconds: 0

INFO:root:Epoch: 30, Iter: 490, (part) train loss: 0.2930, (part) train auc: 0.929, (part) train auprc: 0.725, dev loss: 0.3028, dev auc: 0.906, dev auprc: 0.640
INFO:root:	Best dev auc so far: 0.906! Saving best checkpoint to ./experiments/Demo/best_checkpoint/best.ckpt...
INFO:root:Epoch 30 took 0.07 seconds: 0.01 seconds for loading data
INFO:root:Epoch: 31, Iter: 500, (part) train loss: 0.3443, (part) train auc: 0.930, (part) train auprc: 0.729, dev loss: 0.3018, dev auc: 0.907, dev auprc: 0.644
INFO:root:	Best dev auc so far: 0.907! Saving best checkpoint to ./experiments/Demo/best_checkpoint/best.ckpt...
INFO:root:Epoch: 31, Iter: 510, (part) train loss: 0.3171, (part) train auc: 0.931, (part) train auprc: 0.732, dev loss: 0.3009, dev auc: 0.909, dev auprc: 0.648
INFO:root:	Best dev auc so far: 0.909! Saving best checkpoint to ./experiments/Demo/best_checkpoint/best.ckpt...
INFO:root:Epoch 31 took 0.11 seconds: 0.01 seconds for loading data
INFO:root:Epoch: 32, Iter: 520, (part) 

INFO:root:Epoch: 46, Iter: 750, (part) train loss: 0.2913, (part) train auc: 0.951, (part) train auprc: 0.796, dev loss: 0.2794, dev auc: 0.930, dev auprc: 0.700
INFO:root:	Best dev auc so far: 0.930! Saving best checkpoint to ./experiments/Demo/best_checkpoint/best.ckpt...
INFO:root:Epoch 46 took 0.11 seconds: 0.01 seconds for loading data
INFO:root:Epoch: 47, Iter: 760, (part) train loss: 0.3207, (part) train auc: 0.951, (part) train auprc: 0.798, dev loss: 0.2787, dev auc: 0.930, dev auprc: 0.701
INFO:root:	Best dev auc so far: 0.930! Saving best checkpoint to ./experiments/Demo/best_checkpoint/best.ckpt...
INFO:root:Epoch 47 took 0.08 seconds: 0.01 seconds for loading data
INFO:root:Epoch: 48, Iter: 770, (part) train loss: 0.3044, (part) train auc: 0.952, (part) train auprc: 0.799, dev loss: 0.2779, dev auc: 0.931, dev auprc: 0.703
INFO:root:	Best dev auc so far: 0.931! Saving best checkpoint to ./experiments/Demo/best_checkpoint/best.ckpt...
INFO:root:Epoch: 48, Iter: 780, (part) 

The BMS model:

In [6]:
! python main.py \
    --experiment_name='Demo' \
    --data_dir='./data/' \
    --folder_name='generated' \
    --tb_dir='./experiments/' \
    --model='BMS' \
    --bms_model='lstm' \
    --num_epochs=10 \
    --batch_size=500
    

  from ._conv import register_converters as _register_converters
INFO:root:Prediction on per-cycle basis, using data:
	training data: ./data/generated/train, 
	dev data: ./data/generated/dev/dev.npy, 
	test data: ./data/generated/test/test.npy.
2019-02-21 17:17:59.208093: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Summary name lstm_b:0 is illegal; using lstm_b_0 instead.
INFO:tensorflow:Summary name lstm_b:0 is illegal; using lstm_b_0 instead.
INFO:tensorflow:Summary name rnn/multi_rn

The LSTM model:

In [7]:
! python main.py \
    --experiment_name='Demo' \
    --data_dir='./data/' \
    --folder_name='generated' \
    --tb_dir='./experiments/' \
    --model='LSTM' \
    --num_epochs=10 \
    --batch_size=500 \
    --lstm_layer=2 \
    --lstm_size=100 \
    --fc_layer=2 \
    --fc_size=100
    

  from ._conv import register_converters as _register_converters
INFO:root:Prediction on per-cycle basis, using data:
	training data: ./data/generated/train, 
	dev data: ./data/generated/dev/dev.npy, 
	test data: ./data/generated/test/test.npy.
2019-02-21 17:19:00.521459: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Summary name LSTMModel/rnn/multi_rnn_cell/cell_0/lstm_cell/kernel:0 is illegal; using LSTMModel/rnn/multi_rnn_cell/cell_0/lstm_cell/kernel_0 instead.
INFO:tensorflow:Summary name LSTMModel/rnn/multi_rnn_cell/cell_0/lstm_cell/kernel:0 is illegal; using LSTMModel/rnn/multi

INFO:root:Epoch: 9, Iter: 160, (part) train loss: 0.1674, (part) train auc: 0.969, (part) train auprc: 0.854, dev loss: 0.1534, dev auc: 0.967, dev auprc: 0.786
INFO:root:Epoch 9 took 4.45 seconds: 0.01 seconds for loading data


 For parameter specifications, please refer to the FLAGs definition in the `main.py` file.