# Control Growth Data

In [1], the tumour growth inhibition (TGI) PKPD model of Erlotinib and Gefitinib was derived from two separate *in vivo* experiments. In particular, the growth of patient-derived tumour explants LXF A677 (adenocarcinoma of the lung) and cell line-derived tumour xenografts VXF A431 (vulva cancer) in mice were monitored. Each experiment comprised a control growth group and three groups that were treated with either Erlotinib or Gefitnib at one of three dose levels. Treatments were orally administered once a day.

In this notebook, we focus on establishing a good understanding of the untreated tumour growth. In particular, this will allow us to critically assess the modelling choices in [1], and explore alternatives. It further allows us to derive posteriors for the growth parameters, that may inform the choice of priors for the full TGI-PKPD model inference.

We will now import the data sets and standardise their format for the inference.

## Raw LXF A677 control growth data

In [1]:
#
# Import raw LXF A677 data.
#

import os
import pandas as pd


# Import LXF A677 data
path = os.getcwd()  # make import independent of local path structure
lxf_data_raw = pd.read_csv(path + '/data_raw/Ctrl_Growth_LXF.csv')

# Display data
print('Raw LXF A677 Control Growth Data Set:')
lxf_data_raw

Raw LXF A677 Control Growth Data Set:


Unnamed: 0,#ID,TIME,DOSE,ADDL,II,Y,YTYPE,CENS,CELL LINE,DOSE GROUP,DRUG,DRUGCAT,EXPERIMENT,BW,YTV,KA,V,KE,w0
0,40,0,.,.,.,191.808,2,.,1,0,2,0,2,26.8,.,55,1.11,3.98,191.8080
1,94,0,.,.,.,77.2475,2,.,1,0,2,0,2,18.3,.,55,1.11,3.98,77.2475
2,95,0,.,.,.,186.2,2,.,1,0,2,0,2,22.3,.,55,1.11,3.98,186.2000
3,40,3,0,.,.,.,.,.,1,0,2,0,2,26.1,.,55,1.11,3.98,191.8080
4,40,4,0,2,1,.,.,.,1,0,2,0,2,26.5,.,55,1.11,3.98,191.8080
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
153,140,2,.,.,.,126.852,2,.,1,0,2,0,2,23.6,126.852,55,1.11,3.98,79.3305
154,94,4,.,.,.,125.316,2,.,1,0,2,0,2,18.5,125.316,55,1.11,3.98,77.2475
155,170,4,.,.,.,109.33,2,.,1,0,2,0,2,27.9,109.33,55,1.11,3.98,80.0565
156,170,2,.,.,.,94.221,2,.,1,0,2,0,2,27.7,94.221,55,1.11,3.98,80.0565


In [2]:
lxf_data_raw[lxf_data_raw['#ID']==40]

Unnamed: 0,#ID,TIME,DOSE,ADDL,II,Y,YTYPE,CENS,CELL LINE,DOSE GROUP,DRUG,DRUGCAT,EXPERIMENT,BW,YTV,KA,V,KE,w0
0,40,0,.,.,.,191.808,2,.,1,0,2,0,2,26.8,.,55,1.11,3.98,191.808
3,40,3,0,.,.,.,.,.,1,0,2,0,2,26.1,.,55,1.11,3.98,191.808
4,40,4,0,2,1,.,.,.,1,0,2,0,2,26.5,.,55,1.11,3.98,191.808
5,40,7,0,1,1,.,.,.,1,0,2,0,2,26.5,.,55,1.11,3.98,191.808
6,40,9,0,1,1,.,.,.,1,0,2,0,2,27.1,.,55,1.11,3.98,191.808
7,40,11,0,2,1,.,.,.,1,0,2,0,2,26.0,.,55,1.11,3.98,191.808
8,40,14,0,1,1,.,.,.,1,0,2,0,2,26.4,.,55,1.11,3.98,191.808
9,40,16,0,.,.,.,.,.,1,0,2,0,2,26.0,.,55,1.11,3.98,191.808
65,40,21,.,.,.,2276.736,2,.,1,0,2,0,2,26.2,2276.736,55,1.11,3.98,191.808
69,40,18,.,.,.,1752.192,2,.,1,0,2,0,2,25.4,1752.192,55,1.11,3.98,191.808


## Raw VXF A341 control growth data

In [3]:
#
# Import raw VXF A341 data.
#

import os
import pandas as pd


# Import VXF A341 data
path = os.getcwd()  # to make import independent of local path structure
vxf_data_raw = pd.read_csv(path + '/data_raw/Ctrl_Growth_VXF.csv', sep=';')

# Display data
print('Raw VXF A341 Control Growth data set:')
vxf_data_raw

Raw VXF A341 Control Growth data set:


Unnamed: 0,#ID,MICE,TIME,DOSE,ADDL,II,TUMOR,YTV,w0
0,1,708,0,.,.,.,92.5,.,92.5
1,1,708,3,.,.,.,115.2,115.2,92.5
2,1,708,3,0,3,1,.,.,92.5
3,1,708,7,.,.,.,133.2,134.2,92.5
4,1,708,7,0,2,1,.,.,92.5
...,...,...,...,...,...,...,...,...,...
128,7,743,30,.,.,.,1304.6875,1304.6875,207.4
129,7,743,32,.,.,.,1419.6,1419.6,207.4
130,7,743,35,.,.,.,1530.9,1530.9,207.4
131,7,743,37,.,.,.,1576.4625,1576.4625,207.4


## Cleaning the data

Note how the first data set seems to contain a lot more information than the second one. However, there is a lot of information that is not relevant for us. These data sets were manipulated with Monolix. That is why the meaning of most column keys can be looked up in the Monolix documentation. However, there are some customised keys whose meaning is not immediately clear. 

All we really need for our analyis is

- **#ID** indicating which mouse was measured,
- **TIME** indicating the time point of each measurement,
- **TUMOUR VOLUME** indicating the measured tumour volume.

While the columns **#ID** and **TIME** are easy to idenitfy in the data sets, **TUMOUR VOLUME** was intentionally chosen to be outside of the existing column keys. In both data sets there exist two keys that may potentially encode for the measured tumour volumes: **Y** and **YTV** in LXF; and **TUMOR** and **YTV** in VXF. The true measurements of the first experiment are easily identified by looking into the Monolix documentation. **Y** is here generally used as an identifier for the observations. For the second table it seems less obvious which column might contain the true data. However, in analogy with the first data set one might suspect that **YTV** is not the actual data. So until this will be clarified at a later stage, we will choose to use **TUMOR** as the true data.

Remarks on remaining column keys:

- **DOSE**: Seems to be a customised key encoding for the applied dose. This is the control group so we should ensure that this column has only NaN or zero entries.
- **ADDL**, **II**: According to Monolix these keys encode for the number of doses (ADDL) to add in addition to the dose in intervals specified by II. Since this is supposed to be the control group, we should filter out any rows with non-null values for these keys.
- **YTYPE**, **CENS**: According to Monolix these keys encode for the data type (tumour volume in this case) and whether the measurered values were subject to censoring. We should make sure that censored data should be dealt with accordingly and only one data type is present in the data set.
- **CELL LINE**, **DOSE GROUP**, **DRUG**, **EXPERIMENT**: These customised keys are quite self-explanatory. We should make sure that the data we use is uni-valued in these columns.
- **DRUGCAT**: The meaning of this key is less clear. It may refer to the drug category encoding for the route of administration. We should make sure that this column is also only uni-valued. If mutliple values are assumed we need to clarify what this column means.
- **BW**: refers to the body weight of the mouse at the time of the measurement.
- **KA**, **V**, **KE**, **w0**: These keys are customised keys, whose meaning is not immediately clear. They appear to be parameters of the PKPD model. We are interested in infering parameters, so we are not interested in any previously obtained parameters, and choose to ignore this column.

Remarks on units of relevant columns:

The raw data sets do not contain the units of the measured quantities. From the reference [1], we may however infer that 
- **TIME**: was measured in $\text{day}$, and
- **TUMOUR VOLUME**: was measured in $\text{mm}^3$. 

For reasons that will become clear later, we will choose to measure the tumour volume in $\text{cm}^3$.

## LXF A677 control growth data

In [4]:
#
# Create LXF A677 data from raw data set.
#

import os
import pandas as pd


# Import LXF A677 data
path = os.getcwd()  # to make import independent of local path structure
lxf_data_raw = pd.read_csv(path + '/data_raw/Ctrl_Growth_LXF.csv')

# Make sure that data is stored as numeric data
lxf_data = lxf_data_raw.apply(pd.to_numeric, errors='coerce')

# Mask data for non-null Y rows
lxf_data = lxf_data[lxf_data['Y'].notnull()]

# Rename Y to TUMOUR VOLUME in mm^3
lxf_data = lxf_data.rename(columns={'Y': 'TUMOUR VOLUME in mm^3'})

# Rename TIME to TIME in day
lxf_data = lxf_data.rename(columns={'TIME': 'TIME in day'})

# Raise error if DOSE, ADDL, II, YTYPE, CENS, CELL LINE, DOSE GROUP, DRUG, EXPERIMENT or DRUGCAT are not uni-valued
if len(lxf_data['DOSE'].unique()) > 1:
    raise ValueError
if len(lxf_data['ADDL'].unique()) > 1:
    raise ValueError
if len(lxf_data['II'].unique()) > 1:
    raise ValueError
if len(lxf_data['YTYPE'].unique()) > 1:
    raise ValueError
if len(lxf_data['CENS'].unique()) > 1:
    raise ValueError
if len(lxf_data['CELL LINE'].unique()) > 1:
    raise ValueError
if len(lxf_data['DOSE GROUP'].unique()) > 1:
    raise ValueError
if len(lxf_data['DRUG'].unique()) > 1:
    raise ValueError
if len(lxf_data['EXPERIMENT'].unique()) > 1:
    raise ValueError
if len(lxf_data['DRUGCAT'].unique()) > 1:
    raise ValueError

# Keep only #ID, TIME and TUMOUR VOLUME column
lxf_data = lxf_data[['#ID', 'TIME in day', 'TUMOUR VOLUME in mm^3']]

# Sort data such that time is increasing (for later convenience)
lxf_data.sort_values('TIME in day', inplace=True)

# Convert tumour measurements to cm^3
lxf_data['TUMOUR VOLUME in mm^3'] *= 1E-03
lxf_data = lxf_data.rename(columns={'TUMOUR VOLUME in mm^3': 'TUMOUR VOLUME in cm^3'})

# Delete raw data from memory
del lxf_data_raw

# Display cleaned data set
print('LXF A677 Control Growth:')
lxf_data

LXF A677 Control Growth:


Unnamed: 0,#ID,TIME in day,TUMOUR VOLUME in cm^3
0,40,0,0.191808
1,94,0,0.077248
2,95,0,0.186200
59,136,0,0.118588
60,140,0,0.079330
...,...,...,...
77,136,30,1.459342
103,94,30,0.576240
90,169,30,0.746986
67,140,30,2.122582


## VXF A341 control growth data

In [5]:
#
# Create VXF A341 data from raw data set.
#
import os
import pandas as pd


# Import LXF A677 data
path = os.getcwd()  # to make import independent of local path structure
vxf_data_raw = pd.read_csv(path + '/data_raw/Ctrl_Growth_VXF.csv', sep=';')

# Make sure that data is stored as numeric data
vxf_data = vxf_data_raw.apply(pd.to_numeric, errors='coerce')

# Mask data for rows where TUMOR and YTV coincide
vxf_data = vxf_data[vxf_data['TUMOR'].notnull()]

# Rename TUMOR to TUMOUR VOLUME in mm^3
vxf_data = vxf_data.rename(columns={'TUMOR': 'TUMOUR VOLUME in mm^3'})

# Rename TIME to TIME in day
vxf_data = vxf_data.rename(columns={'TIME': 'TIME in day'})

# Raise error if DOSE, ADDL or II are not uni-valued
if len(vxf_data['DOSE'].unique()) > 1:
    raise ValueError
if len(vxf_data['ADDL'].unique()) > 1:
    raise ValueError
if len(vxf_data['II'].unique()) > 1:
    raise ValueError

# Keep only MICE, TIME and TUMOUR VOLUME column
vxf_data = vxf_data[['MICE', 'TIME in day', 'TUMOUR VOLUME in mm^3']]

# Rename MICE to #ID
vxf_data = vxf_data.rename(columns={'MICE': '#ID'})

# Sort data such that time is increasing (for later convenience)
vxf_data.sort_values('TIME in day', inplace=True)

# Convert tumour measurements to cm^3
vxf_data['TUMOUR VOLUME in mm^3'] *= 1E-03
vxf_data = vxf_data.rename(columns={'TUMOUR VOLUME in mm^3': 'TUMOUR VOLUME in cm^3'})

# Delete raw data from memory
del vxf_data_raw

# Display cleaned data set
print('VXF A341 Control Growth:')
vxf_data

VXF A341 Control Growth:


Unnamed: 0,#ID,TIME in day,TUMOUR VOLUME in cm^3
0,708,0,0.092500
19,713,0,0.072000
114,743,0,0.207400
95,741,0,0.119100
38,730,0,0.092500
...,...,...,...
56,730,39,0.686070
94,739,39,2.040690
37,713,39,0.799254
75,733,39,0.816480


## Illustrate control growth data

We use [plotly](https://plotly.com/python/) to create interactive visualisations of the time-series data.

In [6]:
#
# Visualise control growth data.
#

import plotly.graph_objects as go


# Create figure
fig = go.Figure()

# Scatter plot LXF A677 time-series data for each mouse
mouse_ids = lxf_data['#ID'].unique()
for id_m in mouse_ids:
    # Create mask for mouse
    mask = lxf_data['#ID'] == id_m

    # Get time points for mouse
    times = lxf_data['TIME in day'][mask]

    # Get observed tumour volumes for mouse
    observed_volumes = lxf_data['TUMOUR VOLUME in cm^3'][mask]

    # Plot data
    fig.add_trace(go.Scatter(
        x=times,
        y=observed_volumes,
        legendgroup="LXF A677",
        name="LXF A677",
        showlegend=True if id_m == mouse_ids[0] else False,
        hovertemplate=
            "<b>%s ID: %d</b><br>" % ("LXF A677", id_m) +
            "Time: %{x:} day<br>" +
            "Tumour volume: %{y:.02f} cm^3<br>" +
            "<extra></extra>",
        mode="markers",
        marker=dict(
            symbol='circle',
            opacity=0.7,
            line=dict(color='black', width=1))
    ))

# Scatter plot VXF A341 time-series data for each mouse
mouse_ids = vxf_data['#ID'].unique()
for id_m in mouse_ids:
    # Create mask for mouse
    mask = vxf_data['#ID'] == id_m

    # Get time points for mouse
    times = vxf_data['TIME in day'][mask]

    # Get observed tumour volumes for mouse
    observed_volumes = vxf_data['TUMOUR VOLUME in cm^3'][mask]

    # Plot data
    fig.add_trace(go.Scatter(
        x=times,
        y=observed_volumes,
        legendgroup="VXF A341",
        name="VXF A341",
        showlegend=True if id_m == mouse_ids[0] else False,
        hovertemplate=
            "<b>%s ID: %d</b><br>" % ("VXF A341", id_m) +
            "Time: %{x:} day<br>" +
            "Tumour volume: %{y:.02f} cm^3<br>" +
            "<extra></extra>",
        mode="markers",
        marker=dict(
            symbol='star',
            opacity=0.7, 
            line=dict(color='black', width=1))
    ))

# Set X, Y axis and figure size
fig.update_layout(
    autosize=True,
    xaxis_title=r'$\text{Time in day}$',
    yaxis_title=r'$\text{Tumour volume in cm}^3$',
    template="plotly_white")

# Add switch between linear and log y-scale
fig.update_layout(
    updatemenus=[
        dict(
            type = "buttons",
            direction = "left",
            buttons=list([
                dict(
                    args=[{"yaxis.type": "linear"}],
                    label="Linear y-scale",
                    method="relayout"
                ),
                dict(
                    args=[{"yaxis.type": "log"}],
                    label="Log y-scale",
                    method="relayout"
                )
            ]),
            pad={"r": 0, "t": -10},
            showactive=True,
            x=0.0,
            xanchor="left",
            y=1.15,
            yanchor="top"
        ),
    ]
)

# Show figure
fig.show()

**Figure 1:** Untreated tumour growth of patient-derived tumour explants LXF A677 (adenocarcinoma of the lung) and cell line-derived tumour xenografts VXF A431 (vulva cancer) in mice. The colouring of the data points indicates that the measurements belong to the same mouse. Mouse ID and further information can be explored by hovering over the data points.

## Export cleaned data

In [7]:
#
# Export cleaned data sets for inference in other notebooks.
#

import os
import pandas as pd


# Get path of current working directory
path = os.getcwd()

# Export cleaned LXF A677 control growth data
lxf_data.to_csv(path + '/data/lxf_control_growth.csv')

# Export cleaned VXF A341 control growth data
vxf_data.to_csv(path + '/data/vxf_control_growth.csv')

## Bibliography

- <a name="ref1"> [1] </a> Eigenmann et. al., Combining Nonclinical Experiments with Translational PKPD Modeling to Differentiate Erlotinib and Gefitinib, Mol Cancer Ther (2016)

[Back to project overview](https://github.com/DavAug/ErlotinibGefitinib/blob/master/README.md) | [Forward to next notebook](https://github.com/DavAug/ErlotinibGefitinib/blob/master/notebooks/control_growth/pooled_model.ipynb)