# A cure for cancer?

## Preliminaries

In [1]:
# Run this cell to start.
import numpy as np

# Load the OKpy test library and tests.
from client.api.notebook import Notebook
ok = Notebook('lymphoma.ok')

Assignment: lymphoma
OK, version v1.18.1



The tests in this notebook do not test if you have the right answer, but only
if you have the *right sort* of answer.  *Be careful* -- the tests could pass, but your answer could still be wrong.

## Is there a cure?

At the time I wrote this, you can find the following on the [Wikipedia page for
Burkitt's
Lymphoma](https://en.wikipedia.org/wiki/Burkitt%27s_lymphoma#Prognosis).

> The overall cure rate for Burkitt's lymphoma in developed countries is
> about 90%, but worse in low-income countries. Burkitt's lymphoma is
> uncommon in adults, where it has a worse prognosis (Molyneux et al 2012).
>
> In 2006, treatment with dose-adjusted EPOCH with Rituximab showed
> promising initial results in a small series of patients (n=17), with
> a 100% response rate, and 100% overall survival and progression-free
> survival at 28 months (median follow-up) (Dunleavy et al 2006).

* Molyneux *et al* (2012). Burkitt's Lymphoma.  The Lancet, 379(9822),
  1234-1244.
* Dunleavy *et al* (2006). Novel Treatment of Burkitt Lymphoma with
  Dose-Adjusted EPOCH-Rituximab: Preliminary Results Showing Excellent Outcome.
  Blood, 108(11), 2736â€“2736.

How likely is it that the Dunleavy 2006 study results, or better, could have
come about by chance?

You can use the tools you already know like this:

* Your ideal (null) model is that the EPOCH study was, in fact, no more
  effective than any other standard therapy.
* You are going to simulate 10000 trials, using this model.
* In each trial, you will make 17 simulated patients, each with a 90%
  chance of being cured.  Then count how many of the 17 patients were cured.
* At the end of your simulation, you should have 10000 counts of the number of
  simulated patients, out of 17, who were cured.  Store these counts in
  a variable `counts`.

In [2]:
np.random.seed(1)

In [3]:
#- Simulate 10000 trials of 17 patients
counts = np.array([ np.random.uniform(0,1,17) for i in range(10000)])
# Show the first five counts
counts[:5]

array([[4.17022005e-01, 7.20324493e-01, 1.14374817e-04, 3.02332573e-01,
        1.46755891e-01, 9.23385948e-02, 1.86260211e-01, 3.45560727e-01,
        3.96767474e-01, 5.38816734e-01, 4.19194514e-01, 6.85219500e-01,
        2.04452250e-01, 8.78117436e-01, 2.73875932e-02, 6.70467510e-01,
        4.17304802e-01],
       [5.58689828e-01, 1.40386939e-01, 1.98101489e-01, 8.00744569e-01,
        9.68261576e-01, 3.13424178e-01, 6.92322616e-01, 8.76389152e-01,
        8.94606664e-01, 8.50442114e-02, 3.90547832e-02, 1.69830420e-01,
        8.78142503e-01, 9.83468338e-02, 4.21107625e-01, 9.57889530e-01,
        5.33165285e-01],
       [6.91877114e-01, 3.15515631e-01, 6.86500928e-01, 8.34625672e-01,
        1.82882773e-02, 7.50144315e-01, 9.88861089e-01, 7.48165654e-01,
        2.80443992e-01, 7.89279328e-01, 1.03226007e-01, 4.47893526e-01,
        9.08595503e-01, 2.93614148e-01, 2.87775339e-01, 1.30028572e-01,
        1.93669579e-02],
       [6.78835533e-01, 2.11628116e-01, 2.65546659e-01, 4.915

In [4]:
# Test you are on the right track.
_ = ok.grade('q_01_counts')

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 3
    Failed: 0
[ooooooooook] 100.0% passed



Calculate the *proportion* `p_100` of `counts` that correspond to 100% response
rate (17 out of 17):

In [8]:
p_100 = 0
for sim in counts :
    if np.count_nonzero(sim <=0.9)/len(i) == 1 :
        p_100 += 1
p_100 /= 10000
# Show the value
p_100

0.1597

In [9]:
# Test you are on the right track.
_ = ok.grade('q_02_p_100')

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 3
    Failed: 0
[ooooooooook] 100.0% passed



## The rush to publish

One big problem in medical research, as in other research, is the *file-drawer
effect*, also called [publication
bias](https://en.wikipedia.org/wiki/Publication_bias).

The problem is that there may be multiple labs testing the same treatment.
Labs that do not find a surprising result, will probably not publish a paper.  Labs that do, probably will publish a paper.

Imagine there were four labs all testing the same treatment as Dunleavy *et
al*.  They also tested 17 patients, and looked at the number of patients
who are progression-free after about 28 months follow-up - like Dunleavy. Again
imagine, in our ideal model of the world, the treatment is, in fact, no more
effective than average.

Now imagine that each of the four labs will publish a paper if they get 17 of 17 progression-free survival rate, and will not publish otherwise.

In this ideal world, what is the chance that at least one lab will publish a paper?

Here is a sketch of a simulation of one trial in that world:

In [10]:
def sim_count():
    return np.count_nonzero(np.random.uniform(0,1,17) <= 0.9)

In [11]:
# This code is rather ugly, and needs editing to work correctly.
def sim_labs(n_labs=4):
    lab_counts = np.zeros(n_labs)
    for i in range(len(lab_counts)):
        lab_counts[i] = sim_count()
    n_publications = np.count_nonzero(lab_counts == 17)
    return n_publications

In [12]:
sim_labs()

2

Now do a simulation of 10000 trials like this.  Count the number of publications for each trial.  Store the number of publications for each trial in an array `publications`.


In [13]:
#- Simulate 10000 trials of four labs, each studying 17 patients.
publications = np.array([sim_labs() for i in range(10000)])
# Show the first five publication counts
publications[:5]

array([0, 1, 0, 0, 2])

In [14]:
# Test you are on the right track.
_ = ok.grade('q_04_publications')

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 3
    Failed: 0
[ooooooooook] 100.0% passed



In this world, where each trial has four labs, each testing the same thing,
what proportion of trials give at least one publication?

In [15]:
p_at_least_one = np.count_nonzero(publications) / len(publications)
# Show the value
p_at_least_one

0.5211

In [14]:
# Test you are on the right track.
_ = ok.grade('q_05_p_at_least_one')

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 3
    Failed: 0
[ooooooooook] 100.0% passed



With the evidence you have here, in the Wikipedia page, and from any extra
reading you would like to do, how likely is it that the treatment that Dunleavy
*et al* used is really more effective than other standard treatments for
Burkitt's lymphoma?  If you had Burkitt's lymphoma, would you insist on this
treatment?  Give your answer, with arguments, in the space below.

<mark> Based on our analysis, it is very likely that at least one of four labs will present a result idnetical to Dunleavy
*et al* if we believe the overall cure rate is 90%. Considering that there are probably far more than 4 labs studying this disease, the `p_at_least_one` should be higer in reality. Further research did by Dunlevy (Dunlevy et al., 2013) showed a high overall survival rate but still based on a small sample size (n=19). I may say this method is effective but it's hard to determine if it is better than other methods. </mark>

<mark>If I have Burkitt's Lymphoma, I won't believe this method can cure me 100%. According to Saleh's review (Saleh et al., 2020), many other methods are also very promising, so I won't insist on this method. I'll leave this problem to the medical team and let them decide which one is suitable for me.  </mark>

## Done

You're finished with the assignment!  Be sure to...

- **run all the tests** (the next cell has a shortcut for that),
- **Save and Checkpoint** from the "File" menu.
- Finally, **restart** the kernel for this notebook, and **run all the cells**,
  to check that the notebook still works without errors.  Use the
  "Kernel" menu, and choose "Restart and Run All".  If you find any
  problems, go back and fix them, save the notebook, and restart / run
  all again, before submitting.

In [15]:
# For your convenience, you can run this cell to run all the tests at once!
import os
_ = [ok.grade(q[:-3]) for q in os.listdir("tests") if q.startswith('q')]

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 3
    Failed: 0
[ooooooooook] 100.0% passed

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 3
    Failed: 0
[ooooooooook] 100.0% passed

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 3
    Failed: 0
[ooooooooook] 100.0% passed

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 3
    Failed: 0
[ooooooooook] 100.0% passed



<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=02eeb3c8-a959-45be-ad2d-e5e0adb81ad4' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>