# Cost needed to treat as a repeatable executable notebook: proof of concept

The goals of this specific project include rudimentary analysis of a dataset of potential costs. The notebook is not meant as a final version, rather a proof of concept. 

Contents:

- <a href='#the_destination0'>Step 0: Install and import libraries</a>
- <a href='#the_destination1'>Step 1: Show environment for reproducbility</a>
- <a href='#the_destination2'>Step 2: Calculation example</a>
- <a href='#the_destination3'>Step 3: Repeat caculation with user inputs</a>
- <a href='#the_destination4'>Step 4: Notes on final code or alternative approaches</a>



<div class="alert alert-danger">Although I have provided a table of contents, if you do not run every cell in order the first time, you may create errors or hidden states that cause problems.</div>

<a id='the_destination0'></a>
## Step 0. Install and import libraries 

<div class="alert alert-danger">Although I have included,  pandas, numpy, seaborn and matplotlib, the notebook in the current state can run without them. I assume that if I expand the notebook fully I will use these libraries. 
    These libraries allow me to show what watermark package does, so I have left the imports in. If your computer does not have these libraries, do not worry.</div>

In [1]:

##fundamental data analytics libraries and packages

import pandas as pd
import numpy as np
import matplotlib 
import matplotlib.pyplot as plt
from matplotlib import rc
import seaborn as sns

In [2]:
# ## math libraries and packages commented out, in case you want to add more sophisticated stats
#import math as mth
#from scipy import stats

In [3]:
# import watermark package for reproducibility
import watermark


<a id='the_destination1'></a>
## Step 1. Show environment for reproducibility

It is well documented that calculations based on code are often not reproducible over time as software packages and environments change due to techncialities from floating point calculations to how random numbers are created depending upon the coding language. Therefore, in order to make this code reproducibile I will document the environment with the watermark package. The first run is commented out, and the second run will be for the user to compare their environment. 

In [21]:
#%load_ext watermark

## python, ipython, packages, and machine characteristics
#%watermark -v -m -p pandas,numpy,matplotlib,seaborn,watermark 

## date
#print (" ")
#%watermark -u -n -t -z 

OK, so we got results on our environment as follows:

CPython 3.7.1 

IPython 7.2.0

pandas 1.1.3

numpy 1.15.4

matplotlib 3.0.2

seaborn 0.9.0

watermark 2.0.2

compiler   : MSC v.1915 64 bit (AMD64)

system     : Windows

release    : 10

machine    : AMD64

processor  : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel

CPU cores  : 12

interpreter: 64bit


last updated: Sun Nov 08 2020 09:07:23 Jerusalem Standard Time

<div class="alert alert-success"> <b>Running the cell below will show your environment.</b>  </div>

In [None]:
%load_ext watermark

# python, ipython, packages, and machine characteristics
%watermark -v -m -p pandas,numpy,matplotlib,seaborn,watermark  

# date
print (" ")
%watermark -u -n -t -z 

<div class="alert alert-warning"> <b>If there is a significant difference in environments, please contact me, as results may not be reproducibe. </b></div>


<a id='the_destination2'></a>
## Step 2. Calculation example

This part uses the numbers supplied to get the number in the paper. In the next part, users can change the cost of drugs. In a final version I would put all the calcualtions for the numbers in a code notebook, but to simplify I will use some that have been calculated. 

In [22]:
empagliflozin_aNNT= 190
semaglutide_aNNT = 141
print("There variables for annualized number needed to treat are set here at" ,empagliflozin_aNNT," for empagliflozin, and ", semaglutide_aNNT, "for semaglutide")



There variables for annualized number needed to treat are set here at 190  for empagliflozin, and  141 for semaglutide


Below we will assume costs as given in the paper, and assign them into variables, then multiply by ur aNNT. 

In [23]:
cost_empagliflozin= 4572
cost_semaglutide= 6680

cnnt_empagliflozin=  cost_empagliflozin * empagliflozin_aNNT
cnnt_semaglutide= cost_semaglutide* semaglutide_aNNT 

print("Our costs needed to treat are:",cnnt_empagliflozin, "and", cnnt_semaglutide,  "for Empagliflozin and OralSemaglutide")

Our costs needed to treat are: 868680 and 941880 for Empagliflozin and OralSemaglutide


OK, this is not exactly formatted in a pretty way, but this is just a proof of concept. 


<a id='the_destination3'></a>
## Step 3. Repeat caculation with user inputs

Below the user can input costs and derive a NNT based on costs in his or her country or institution. Please do not add back in the commented code as it has a bug. The idea of this code was to check if the number typed in was a number in a certain range, but the input is a string, so the code needs to be rewritten. I can later add code to make sure the input is a number, not letters, and in a certain range.

In [24]:
#numbers_list= range(0,10000)
print("You must enter the cost you want to check for empagliflozin, round to nearest integer. Type it in and press enter") 
input_cost_empa = input()

#if input_cost_empa in numbers_list:
   # print('Price exists in possible prices, you can proceed to the next block')
#else: print('Your price as typed is not on our range of possible prices. Check the price and type exactly ')  

You must enter the cost you want to check for empagliflozin, round to nearest integer. Type it in and press enter
2345


In [12]:
#numbers_list= range(0,10000)
print("You must enter the cost you want to check for semaglutide round to integer. Type it in and press enter") 
input_cost_sema = input()

#if input_cost_empa in numbers_list:
   # print('Country exists in Dataframe, you can proceed to query it in the next block')
#else: print('Your country as typed is not on our list. Check the list of countries and type it exactly as listed')

You must enter the cost you want to check for semaglutide. Type it in and press enter
6680


In [26]:
# now we turn our strings into ints
input_cost_empa= int(input_cost_empa)
input_cost_sema= int(input_cost_sema)

In [27]:

input_cost_empagliflozin= input_cost_empa
input_cost_semaglutide= input_cost_sema

cnnt_empagliflozin=  input_cost_empagliflozin * empagliflozin_aNNT
cnnt_semaglutide= input_cost_sema* semaglutide_aNNT 

print("Our costs needed to treat based on your input prices are:",cnnt_empagliflozin, "and", cnnt_semaglutide,  "for Empagliflozin and OralSemaglutide")

Our costs needed to treat based on your input prices are: 445550 and 941880 for Empagliflozin and OralSemaglutide



<a id='the_destination4'></a>
## Step 4. Further notes on this proof of concept


We can make an elegant repeatable code notebook to add as an addendum. I think this would make the paper of much more interest over time. An alternative way to allow users to adjust for the price of drugs locally, would be to make an application, or a visualization even simple tables. But I feel it is much more scientific and reproducible to just add code. Most scientists will be able to use Jupyter, although the alternatives include R markdown, WorkflowR, IHaskell notebooks, or even Emacs. 

Even this proof of concept by itself can make the paper more usable for people everywhere and over time as prices vary. Please note once the notebook is properly expanded it would also allow new trials to be added to the analysis. For example, if a new trial changes NNT for one drug, this can be updated so the paper is still of interest. I timed myself and making the notebook took me 46 minutes. I believe adding some formating or better alternative ways to do input could add a few more hours. -- Dr. Candace Makeda H. Moore