# Step 1: Check available variables

In this notebook, we are going to check which variables are available in our collaboration. As we cannot see the data, we need to get some basic information about the available data in another way. Preferably, every node should have FAIR data descriptions, if not available, this will be the first step to identify local variable names.

Access to this collaboration, and its connected nodes has been arranged by the central server. Based on the given username and password (and server URL) we can connect to our collaboration.

**Task: fill in the correct connection details in the cell below, and execute the first two cells**

In [None]:
vantage_broker_url = ""
vantage_broker_username = ""
vantage_broker_password = ""

vantage_broker_encryption = None
vantage_broker_port = 5000
vantage_broker_api_path = "/api"

In [None]:
# Setup client connection
from vantage6.client import Client
client = Client(vantage_broker_url, vantage_broker_port, vantage_broker_api_path, log_level='critical')
client.authenticate(vantage_broker_username, vantage_broker_password)
client.setup_encryption(vantage_broker_encryption)

We are now connected to the Vantage central server, and have access to several collaborations.

**Task: execute the cell below, to which collaboration(s) do we have access? And which nodes are available in this collaboration?**

In [None]:
import json
collaboration_list = client.collaboration.list()
collaboration_index = 0
organization_ids_ = [ ]

for organization in collaboration_list[collaboration_index]['organizations']:
    organization_ids_.append(organization['id'])
print(json.dumps(client.node.list(), indent=2))

Now we know the collaboration, we can post a request to the central server. In this request, we will ask to retrieve the variables available at every node. This is done by requesting to execute the Docker image with name `jaspersnel/v6-colnames-py`.

**Task: execute the cell below. To which collaboration is this result being sent to? Given the result of the previous cell, which nodes will be targeted to execute this variable retrieval?**

In [None]:
input_ = {
    "master": "true",
    "method":"master", 
    "args": [ ],
    "kwargs": {}
}

task = client.post_task(
    name="RetrieveVariables",
    image="ghcr.io/maastrichtu-biss/v6-colnames-py:latest",
    collaboration_id=collaboration_list[collaboration_index]['id'],#Get the first collaboration associated with user
    input_= input_,
    organization_ids=[organization_ids_[0]]
)

print(json.dumps(task, indent=2))

The request has been sent to the given collaboration. Now we can fetch for the results. As we do not know when the nodes are finished, we implemented a waiting loop procedure. This is implemented below at line 5.

**Task: execute the cell below. Which column names are available, and on how many nodes?**

In [None]:
import time
import json
resultObjRef = task.get("results")[0]
resultObj = client.result.get(resultObjRef['id'])
attempts = 1
while((resultObj["finished_at"] == None) and attempts < 10):
    print("waiting...")
    time.sleep(5)
    resultObj = client.result.get(resultObjRef['id'])
    attempts += 1
colnamesLists = resultObj['result']
colnamesLists