# vantage6: Age Standerdized Incidence Rate
Algorithm that calculates crude rate and adjusted rate. The returned (Python) dictonary contains:
* `local_crude_rate`
* `combined_crude_rate`
* `local_adjusted_rate`
* `combined_adjusted_rate`

Limits of the algorithm:
* The entire dataset needs at least 10 records at each data-station (node). Else no statistical analysis could be performed
* Each dataset has to match the same formatting
* Column names (from the input) need to match the dataset column names

The central part of the algorithm can be executed by the _master_-container. This takes care of combining the results from the individual containers. It is also the prefered way of computing the ASR. However it is also possible to request the distributed part of the algorithm yourself. In this notebook we first show how to compute the ASR using the _master_-container and in the second part we show how you can perform the central part of the algorithm yourself.

## 1 Option 1: ASR using a master container (prefered)
The central part of the algorithm can be executed by the master-container. The master container computes the combined crude and adjusted age standerdized incidence rates by combining the results reported from each individual node.

Things to keep in mind: 
* if there are too few rows in the datasets then you will recieve a `Dataset is too small: len(data) < 10` error
* if the columns in the input_ file do not match the ones on the dataframe you will recieve a `KeyError: ` error 

In [1]:
from vantage6.client import Client
from pathlib import Path
import pandas as pd # pandas>=1.1.0 (!)

### 1.1.1 The client
First you have to configure the client. The client is the interface to the central server. It contains methods to post tasks and retrieve their results. Note that this Client is still under heavy development so it might change quite a bit in the future.

In [2]:
# Server information: url, port, api_path
client = Client("http://localhost", 5000, "")
# Authentication using username and password
client.authenticate("frank@iknl.nl", "qwerty123")
# when we start using encryption, we need to specify the path to the private key file here...
client.setup_encryption(None) 

### 1.2 Specifying the input
The input consist out of 3 parts:
* `method`: This is the name of the method that is triggerd within the docker container. When using the ASR master method, this needs to be set to `master`.
* `master`: Boolean indicating we are using a master container or not. The method name does *not* do this (master methods are not neccasserely called master).
* `kwargs` or `args`: The input arguments for the `method`. You can either specify them in a dictonairy (`kwargs`) or as a list (`args`). Note that the order matters when using `args`, therefore we prefer to use `kwargs`.

The `kwargs` in the example below represent the following:
* `incidence`, `population`, `gender`, `ageclass` and `prefacture` are column names as they are in the datafiles attached to the nodes. These need to be the same for all datastations (nodes)!
* `standard_population`  must be a Pandas Dataframe and can be read in as an `.xlsx` via `pd.read_excel` or a `.csv` via `pd.read_csv`


In [3]:
# Define algorithm input
input_ = {
    "method": "master",
    "master": True,
    "kwargs": {
        "incidence": "incidence",
        "population": "pop",
        "gender": "sex",
        "ageclass": "agec",
        "prefacture": "pref",
        "standard_population": pd.read_excel(r'..\v6-asr-py\local\std_pop.xls')
    }
}

### 1.3 Request computation

In [4]:
# Send the task to the central server RPC METHOD
task = client.post_task(
    name="testing",
    image="harbor.vantage6.ai/algorithms/asr",
    collaboration_id=1,
    input_= input_,
    organization_ids=[1]
)

Above we can see that the task being sent to the node encompasses the name of the task, `testing`, the docker image reference containing the algorithm `harbor.vantage6.ai/algorithms/asr`, the collaboration id `1`, the input and finally all the organization ids which in this case is set to the first organization by `organization_ids=[1]`. Depending on how many nodes are connected you can requestion a calculation from the individual node by adjusting this value, i.e. by setting `organization_ids = [2]`. Because this is a master method, the task first gets executed at the specified node and then distributed amongst the rest of the nodes in the collaboration.

### 1.4 Retrieving the results 

Below highlights the method of retrieving the results once the server and nodes have completed the computation. As a researcher, a lot of information that is returned by the method is not directly relevant. To retrieve the results you would use the `client.get_results(task_id=task.get("id"))` method, we defined a little polling method `wait_for_results` to check if the results are ready to be retrieved. The result is returned as a list object of dictionairies. This is important to know as it makes accessing the relevant information much more straightforward. 

In [5]:
# this method becomes part of the client in a future release
import time
def wait_for_results(client, task):
    task_id = task.get("id")
    task = client.request(f"task/{task_id}")
    while not task.get("complete"):
        task = client.request(f"task/{task_id}")
        print("Waiting for results")
        time.sleep(1)
    res = client.get_results(task_id=task_id)
    return res

In [6]:
# Retrieve the results, and since this is only a single organization we pop the first record in the list
res = wait_for_results(client, task)[0]
display(res.keys())
display(res['result'].keys())

Waiting for results
Waiting for results
Waiting for results
Waiting for results
Waiting for results


dict_keys(['task', 'id', 'started_at', 'log', 'assigned_at', 'result', 'input', 'finished_at', 'organization'])

dict_keys(['local_crude_rate', 'combined_crude_rate', 'local_adjusted_rate', 'combined_adjusted_rate'])

### 1.5 Display results

#### local crude rate

In [7]:
local_crude_rate = res['result']['local_crude_rate']
pd.concat(local_crude_rate).unstack()

sex,1,2,9
pref,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,25.583473,10.925984,16.115497
1,23.086337,16.318028,17.716605
2,35.697409,24.584043,55.478879


#### Combined crude rate & combined adjusted rate

In [8]:
combined_adjusted_rate = res['result']['combined_adjusted_rate']
combined_crude_rate = res['result']['combined_crude_rate']

# temporary fix: errorous labels
combined_adjusted_rate = combined_adjusted_rate.rename(index={0:1, 1:2, 2:9})

pd.concat([combined_adjusted_rate, combined_crude_rate], keys = ["adjust. rate", "crude rate"]).unstack().transpose().droplevel(0)

Unnamed: 0,adjust. rate,crude rate
1,23.689427,27.104122
2,17.201871,16.189863
9,22.664358,22.053807


In [9]:
local_crude_rate = res['result']['local_adjusted_rate']
# temporary fix: added the prefecture by hand
pd.concat(local_crude_rate, keys=[0, 1, 2]).unstack()

sex,1,2,9
0,22.816611,11.579615,16.421659
1,40.84268,34.519603,82.936761
2,21.72445,16.05038,17.193765


# Option 2) Executing Individual RPC methods and execute the central part of the algorithm at your own machine

If we want to run the examples below we need to install the v6-ast-py package first. You can do so by installing it from our github repository (we do not have a pypi release):

```shell
pip install git+https://github.com/iknl/v6-asr-py
```

In [10]:
import importlib
asr = importlib.import_module('v6-asr-py')

## 2.1 RPC_preliminairy_results
This method contains the crude rate calculation as well as the incidence population and total local population. This was done to reduce computation time, as it would require less communication between nodes, server and client.

### input
The main difference with the master method is that the `master` key has been set to `False` as this is no longer a master method. The method we want to trigger is `preliminairy_results` (note that we do not specify the `RPC_` part) and this method requires the `kwargs`: `incidence`, `population`, `gender`, `ageclass` and `prefacture`.

In [11]:
prem_res_input = {
        "method": "preliminairy_results",
        "master": False,
        "kwargs": {
            "incidence": "incidence",
            "population": "pop",
            "gender": "sex",
            "ageclass": "agec",
            "prefacture": "pref"
        }
    } 

### Task & Results

In [12]:
# Send the task to the central server
task = client.post_task(
    name="testing",
    image="harbor.vantage6.ai/algorithms/asr",
    collaboration_id=1,
    input_= prem_res_input,
    organization_ids=[1,2,3]
)

In [13]:
# Retrieve the results
preliminairy_results = wait_for_results(client, task)
preliminairy_results[0]['result'].keys()

Waiting for results
Waiting for results


dict_keys(['crude_rate', 'incidence_population', 'total_local_population', 'total_local_incidence'])

In [14]:
incidence_population_results = [res['result']["incidence_population"] for res in preliminairy_results]
incidence_population_results

[agec      0       5       10      15      20
 sex                                         
 1       4309  259642  252365  293159  398336
 2     247970  247926  242958  285445  385185
 9     507995  507568  495323  578604  783521,
 agec      0       5       10      15      20
 sex                                         
 1      84026  168907  135645  223611  166579
 2     184155  213469  186621  132161  145942
 9     195985  168251  242006  134654  142323,
 agec      0       5       10      15      20
 sex                                         
 1       5005  259642  252365  293159  398336
 2     247970  247926  242958  285445  385185
 9     507995  507568  495323  578604  783521]

In [15]:
crude_rate_results = [res['result']["crude_rate"] for res in preliminairy_results]
crude_rate_results

[pref  sex
 0     1      25.583473
       2      10.925984
       9      16.115497
 dtype: float64,
 pref  sex
 2     1      35.697409
       2      24.584043
       9      55.478879
 dtype: float64,
 pref  sex
 1     1      23.086337
       2      16.318028
       9      17.716605
 dtype: float64]

In [16]:
total_local_population_results = [res['result']["total_local_population"] for res in preliminairy_results]
total_local_population_results

[sex
 1    1207811
 2    1409484
 9    2873011
 dtype: int64,
 sex
 1    778768
 2    862348
 9    883219
 dtype: int64,
 sex
 1    1208507
 2    1409484
 9    2873011
 dtype: int64]

In [17]:
total_local_incidence_results = [res['result']["total_local_incidence"] for res in preliminairy_results]
total_local_incidence_results

[sex
 1    309
 2    154
 9    463
 dtype: int64,
 sex
 1    278
 2    212
 9    490
 dtype: int64,
 sex
 1    279
 2    230
 9    509
 dtype: int64]

## 2.2 RPC_adjusted_rate
This method calculates the adjusted rate which uses the helper methods relative population `relative_population` (the standard population standerdised to 100000) and `people_at_risk` method to calculate the total people at risk of catching the disease across all nodes- this is used for the combined adjusted rate calculation between all nodes.

In [18]:
pop_at_risk = asr.people_at_risk(incidence_population_results)
std_pop = pd.read_excel(r'..\v6-asr-py\local\std_pop.xls')
rel_pop = asr.relative_population(data=std_pop, population='pop', ageclass='agec')

In [19]:
adjusted_rate_input = {
        "method": "adjusted_rate",
        "master": False,
        "kwargs": {
            "population": "pop",
            "gender": "sex",
            "ageclass": "agec",
            "incidence": "incidence",
            "people_at_risk": pop_at_risk,
            "rel_pop": rel_pop
        }
    }

In [20]:
# Send the task to the central server
task = client.post_task(
    name="testing",
    image="harbor.vantage6.ai/algorithms/asr",
    collaboration_id=1,
    input_= adjusted_rate_input,
    organization_ids=[1,2,3]
)

In [21]:
adjusted_rate_results = wait_for_results(client, task)

Waiting for results
Waiting for results


In [22]:
adjusted_rate_glob = [res['result']['adj_rate_glob'] for res in adjusted_rate_results]
adjusted_rate_local = [res['result']['adj_rate_local'] for res in adjusted_rate_results]

In [23]:
adjusted_rate_glob

[sex
 1    8.540002
 2    4.652259
 9    7.370528
 dtype: float64,
 sex
 1    7.556692
 2    6.013214
 9    7.548449
 dtype: float64,
 sex
 1    7.592732
 2    6.536398
 9    7.745381
 dtype: float64]

In [24]:
pd.concat(adjusted_rate_glob, keys=['site1', 'site2', 'site3']).unstack()

sex,1,2,9
site1,8.540002,4.652259,7.370528
site2,7.556692,6.013214,7.548449
site3,7.592732,6.536398,7.745381


## 2.3 produce final results

### local_crude_rate

In [25]:
pd.concat(crude_rate_results).unstack()

sex,1,2,9
pref,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,25.583473,10.925984,16.115497
1,23.086337,16.318028,17.716605
2,35.697409,24.584043,55.478879


### combined_crude_rate

In [26]:
combined_crude_rate = asr.combined_crude_rate(total_local_incidence_results, total_local_population_results)
combined_crude_rate

Unnamed: 0_level_0,0
sex,Unnamed: 1_level_1
1,27.104122
2,16.189863
9,22.053807


### local_adjusted_rate

In [27]:
pd.concat(adjusted_rate_local, keys=['site1', 'site2', 'site3']).unstack()

sex,1,2,9
site1,22.816611,11.579615,16.421659
site2,40.84268,34.519603,82.936761
site3,21.72445,16.05038,17.193765


### combined_adjusted_rate

In [29]:
combined_adjusted_rate = asr.combined_adjusted_rate(adjusted_rate_glob)
combined_adjusted_rate

Unnamed: 0,0
0,23.689427
1,17.201871
2,22.664358
