# Age Standerdized Incidence Rate
Algorithm that calculates crude rate and adjusted rate:
* Local crude rate `local_crude_rate`
* Combined crude rate `combined_crude_rate`
* Local adjusted rate `local_adjusted_rate`
* Combined adjusted rate `combined_adjusted_rate`

Limits of the algorithm:
* The entire dataset needs at least 10 records. Else no statistical analysis should be performed
* Dataset has to match the formatting requested 
* Column names (from the input) need to match the dataset column names

## Master 
The master container computes the combined crude and adjusted age standerdized incidence rates. 

Things to keep in mind: 
* if there are too few rows in the datasets then you will recieve a `Dataset is too small: len(data) < 10` error
* if the columns in the input_ file do not match the ones on the dataframe you will recieve a `KeyError: ` error 

In [13]:
from vantage6.client import Client
from pathlib import Path
import pandas as pd
import importlib
asr=importlib.import_module('v6-asr-py') 

### input.txt
The input.txt is mounted by the docker-container, and contains input to the algorithm.

The input for this algorithm includes the method name that is called in the docker-container `master` and a `dict` containing column names and the file location for the `standard_population` which is used in the adjusted rate calculation.  Here the `standard_population` must be a Pandas Dataframe and can be read in as an `.xlsx` via `pd.read_excel` or a `.csv` via `pd.read_csv`. This has to be provided at the node as an input parameter as the master method takes in a client as the first argument followed by a single dataset. 

In [17]:
client = Client("http://localhost", 5000, "/api")
client.authenticate("Hasan", "password1")
client.setup_encryption(None)

In [18]:
# Define algorithm input
input_ = {
        "method": "master",
        "master": True,
        "kwargs": {
            "incidence": "incidence",
            "population": "pop",
            "gender": "sex",
            "ageclass": "agec",
            "prefacture": "pref",
            "standard_population": pd.read_excel(r'C:\Users\hal2002.53340\Repositories\vantage6-repositories\v6-asr-py\v6-asr-py\local\std_pop.xlsx')
        }
    }

In [19]:
# Send the task to the central server RPC METHOD
task = client.post_task(
    name="testing",
    image="harbor.vantage6.ai/algorithms/asr",
    collaboration_id=1,
    input_= input_,
    organization_ids=[1]
)

Above we can see that the task being sent to the node encompasses the name of the task, `testing`, the docker image of the algorithm `harbor.vantage6.ai/algorithms/asr`, the collaboration identification `1`, the 
input which contains the `input.txt` and last but not least of all the organization id which here is set to the first node by `organization_ids=[1]` although, depending on how many nodes are connected you can requestion a calculation from the individual node by adjusting this value, i.e. by setting `organization_ids = [2]`. Because this is a master method, the task first gets executed at the specified node and then distributed amongst the rest of the nodes in the collaboration. This is less important when doing a distributed calculation, unless a specific node holds a dataset that, for instance begins the calculation, rather it is more important when calculating individual RPC methods on the different nodes. An example of this will be shown in the Individual RPC methods section below. 

### Interperting the results 

Below highlights the method of retrieving the results once the server and nodes have completed the computation. It is important to note that a lot of what gets sent back is not what you, as a researcher, would like to see. To  retrieve the results you would use the `client.get_results(task_id=task.get("id"))` method. It is it convinient to give the result a name, like `res` so that it is easier to access. The result is returned as a list object with a dictionairy nested within it. This is important to know as it makes accessing the relevant information much more straightforward. 

In [25]:
# Retrieve the results
res = client.get_results(task_id=task.get("id"))
res

[{'task': {'id': 71, 'link': '/api/task/71', 'methods': ['GET', 'DELETE']},
  'assigned_at': '2020-12-18T15:49:59.082888+00:00',
  'organization': {'id': 1,
   'link': '/api/organization/1',
   'methods': ['PATCH', 'GET']},
  'id': 133,
  'started_at': '2020-12-18T16:50:01.376516+00:00',
  'finished_at': '2020-12-18T16:51:17.995197+00:00',
  'result': {'local_crude_rate': [pref  sex
    0     1      25.583473
          2      10.925984
          9      16.115497
    dtype: float64,
    pref  sex
    1     1      23.086337
          2      16.318028
          9      17.716605
    dtype: float64,
    pref  sex
    2     1      35.697409
          2      24.584043
          9      55.478879
    dtype: float64],
   'combined_crude_rate':              0
   sex           
   1    27.104122
   2    16.189863
   9    22.053807,
   'local_adjusted_rate': [sex
    1    22.494108
    2    11.659402
    9    16.530740
    dtype: float64,
    sex
    1    20.910456
    2    16.133185
    9    17.28

Above we can see the results of the ASR calculation amongst three nodes and a very long base64 encoded pickle containing the input as well as log messages and info messages. As a researcher this won't matter that much so let us try to access the important parts. It is important to note that the results are sent back to you in first come first serve order, so the first result will be that of the first node in this case as that is the node 
we requested the algorithm to start with by adding `organization_ids=[1]` to the input.txt file. 

As this is a `list` we can access the first item (which is the result) by writing `res[0]`. Now we are accessing the nested dictionairy and to get the result from that we do `res[0]['result']` to get all the results from the nodes. To access more specific results such as `local_crude_rate` we would need to write `res[0]['result']['local_crude_rate']` and so on. 

In [26]:
res[0]['result']['local_crude_rate']

[pref  sex
 0     1      25.583473
       2      10.925984
       9      16.115497
 dtype: float64,
 pref  sex
 1     1      23.086337
       2      16.318028
       9      17.716605
 dtype: float64,
 pref  sex
 2     1      35.697409
       2      24.584043
       9      55.478879
 dtype: float64]

We can tidy up the results a bit by wrapping it in a pandas dataframe object as below: 

In [72]:
pd.DataFrame(res[0]['result']['local_crude_rate'])[0][:1]

sex,1,2,9
0,25.583473,10.925984,16.115497


# Individual RPC methods

## RPC_preliminairy_results
This method contains the crude rate calculation as well as the incidence population and total local population. This was done to save space within the code and it helps with communication between server and node. 

### input.txt
The input methods name here has changed (but it is only a demonstration, you can call it whatever you want). Another thing that has changed is the `method` which has been set to a `preliminairy_results` and the `master` which has been
set to `False` as this is no longer a master method. Also because we do not need the standard population for this calculation, it can be omitted from the input arguments within `kwargs`. 

In [27]:
prem_res_input = {
        "method": "preliminairy_results",
        "master": False,
        "kwargs": {
            "incidence": "incidence",
            "population": "pop",
            "gender": "sex",
            "ageclass": "agec",
            "prefacture": "pref"
        }
    } 

In [28]:
# Send the task to the central server
task = client.post_task(
    name="testing",
    image="harbor.vantage6.ai/algorithms/asr",
    collaboration_id=1,
    input_= prem_res_input,
    organization_ids=[2]
)

In [31]:
# Retrieve the results
preliminairy_results = client.get_results(task_id=task.get("id"))

In [34]:
incidence_pop = preliminairy_results[0]['result']['incidence_population']

## RPC_adjusted_rate
This method is simply the adjusted rate calculation which uses the helper methods relative population `relative_population` (the standard population standerdised to 100000) and `people_at_risk` method to calculate the total people at risk of catching the disease across all nodes- this is used for the combined adjusted rate calculation between all nodes. As this is a demonstration, and helper functions cannot be accessed by researchers, we will import these results manually. 

In [36]:
pop_at_risk = asr.people_at_risk([incidence_pop])
std_pop = pd.read_excel(r'C:\Users\hal2002.53340\Repositories\vantage6-repositories\v6-asr-py\v6-asr-py\local\std_pop.xlsx')
rel_pop = asr.relative_population(data=std_pop, population='pop', ageclass='agec')

In [37]:
adjusted_rate_input = {
        "method": "adjusted_rate",
        "master": False,
        "kwargs": {
            "population": "pop",
            "gender": "sex",
            "ageclass": "agec",
            "incidence": "incidence",
            "people_at_risk": pop_at_risk,
            "rel_pop": rel_pop
        }
    }

In [38]:
# Send the task to the central server
task = client.post_task(
    name="testing",
    image="harbor.vantage6.ai/algorithms/asr",
    collaboration_id=1,
    input_= adjusted_rate_input,
    organization_ids=[1,2,3]
)

In [40]:
adjusted_rate = client.get_results(task_id=task.get("id"))
adjusted_rate

[{'task': {'id': 75, 'link': '/api/task/75', 'methods': ['GET', 'DELETE']},
  'assigned_at': '2020-12-18T15:58:33.374237+00:00',
  'organization': {'id': 1,
   'link': '/api/organization/1',
   'methods': ['PATCH', 'GET']},
  'id': 141,
  'started_at': '2020-12-18T16:58:35.728311+00:00',
  'finished_at': '2020-12-18T16:58:46.949443+00:00',
  'result': {'adj_rate_local': sex
   1    22.494108
   2    11.659402
   9    16.530740
   dtype: float64,
   'adj_rate_glob': sex
   1    22.316816
   2    11.659402
   9    16.530740
   dtype: float64},
  'input': b"\x80\x04\x95\xc1\x05\x00\x00\x00\x00\x00\x00}\x94(\x8c\x06method\x94\x8c\radjusted_rate\x94\x8c\x06master\x94\x89\x8c\x06kwargs\x94}\x94(\x8c\npopulation\x94\x8c\x03pop\x94\x8c\x06gender\x94\x8c\x03sex\x94\x8c\x08ageclass\x94\x8c\x04agec\x94\x8c\tincidence\x94h\x0c\x8c\x0epeople_at_risk\x94\x8c\x11pandas.core.frame\x94\x8c\tDataFrame\x94\x93\x94)\x81\x94}\x94(\x8c\x04_mgr\x94\x8c\x1epandas.core.internals.managers\x94\x8c\x0cBlockManage