# Remote Data Science

The purpose of this notebook is to describe the end-goal for how Syft and Grid will facilitate privacy preserving data science. The hope is that this notebook will drive interesting conversation around features and use cases of the tools we build. However, at the time of writing none of this functionality exists yet.

Scenario: the scenario is that you are seeking to perform research to predict what it takes to get a good night's sleep. Specifically, the project is split into two pieces:

- Federated Learning: train a classifier to predict whether someone will get a good night sleep based on various input factors

- Federated Analytics: use the classifier to estimate the amount of sleep that the population of the USA is getting (importantly including folks who do not record their own sleep data).

# Step 1: Imports

To begin, we must import syft.

#### Development Notes: 
_To start, from a client perspective, we want to maximize for convenience and minimize the number of dependencies one needs to install to work with PyGrid. Thus, in an ideal world, users only have to install one python package in order to work with all of pygrid. I like the current design in syft 0.2.x where we have grid clients in a grid package inside of Syft. The thing we definitely want to avoid here is the need for users of PyGrid to have to install all of the dependencies needed to run grid nodes (flask, databases, etc.) just to be able to interact with the grid. Putting grid inside of syft solves this as well._

In [309]:
import syft as sy
from syft import grid as gr

# Step 2: View our Available Networks

In this step, we need to see if we are connected to some number of data networks which we can use to search for data relating to sleep. Conveniently, the PySyft library remembers the networks we've previously used in other experiments. The list of "known networks" can be displayed by simply executing `gr.newtorks` as below.

### Development Notes

_By default, it would be really great if we could support a combination of two lists of networks:_

- networks which all users of PySyft have by default (OpenGrid)
- a history of all networks previously accessed (stored in some local config file)

_We should be able to view these available networks by just calling `gr.networks` which should pretty-print information about them. Below we show one way to do pretty-print using just a Pandas table as shown below._

In [320]:
gr.networks

Unnamed: 0_level_0,id,datasets,models,domains,online,registered,server-domains,mobile-domains
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
OpenGrid,235252,235262,2352,2532,2352,23,100%,0%
FitByte,235252,34734,2352,2532,2352,23,2%,98%
DeepMined,235252,34734,2352,2532,2352,23,100%,0%
OpanAI,235252,0,2352,2532,2352,23,2%,98%
Damonios Pizza,235252,0,2352,2532,2352,23,99%,1%
MyFitnessPal,235252,0,2352,2532,2352,23,50%,50%
TrackMyRun,235252,0,2352,2532,2352,23,0%,100%
Netflax,235252,935685,7473,346,216,0,54%,46%
AMA,634252,2352,236622,53,52,23,100%,0%
CDC,745742,35,0,5,5,5,100%,0%


This table displays several useful pieces of information:

- Id: the unique id of the network
- Name: the name of the network
- Models: the number of models currently hosted on the network (public and private)
- Domains: the number of domains registered to this network (i.e., the number of individual hospitals)
- Online: the number of domains which are currently online
- Registered: the number of domains for which this user already has an account 
- Server-domains: the percentage of domains which are server based (cloud/on-prem compute cluster based grid noes)
- Mobile-domains: the percentage of domains which are mobile based (smartphone based grid noes)

Already from this list we can get some idea as to what kinds of datasets which may be available to us. 
- FitByte seems like a good source of gold-standard "sleep data"
- Netflax seems like it might have some information on what people are doing before bed (watching tv)
- 

# Step 3: Local Wallet


### Development Notes: 

_Somewhere in the local filesystem, we need to save the set of all keys/logins which this user has for various domains around the world. We should be able to see them here. Note this list is what creates the "Registered" number for each network._

In [318]:
gr.wallet.domain_keys

Unnamed: 0,network,domain,pubkey,prikey
0,OpenGrid,PatrickCason,ee38c01ebb00ca7f81b9d750b6b4ab5a05bc7df17597e0...,409eb63fc539b14519e52b211425bc867fb6de24937e19...
1,OpenGrid,AndrewTrask,59ba81e62422cc1712da69395c89e1b219dec8ba62a9b9...,aa72728997d8e415d4fe3862e61c04008c2cc0b33289e6...
2,OpenGrid,TudorCebere,474625512f1367e02725d10493dbd15d065e5faf70044b...,c4e613209e03267447d510e380b6362911520980ed6e81...
3,OpenGrid,JasonMancuso,d98cd930b320381e0daa9aeab846c18687f7b435021754...,1d3091539f1385543c7b26da0caf7c75281a1220a55bab...
4,OpenGrid,BobbyWagner,2afc0eb19a04985a3f2df9969a9990a8a77fc795b6f722...,e3178a15e70a03bb68881cee0007fa16235baa478d5eab...
5,AMA,UCSF,d0f69c67dec230d0d4d47a06d492e5748b0962b53fdbef...,e2f331e7f880a8387b6e58c3a4caa2e2fd623bc8fcb3fa...
6,AMA,Vanderbilt,3a5f8bb2863d26ff8f7408db0a8bc01fd91f483345ab60...,f69db0604db1fbf66a98c09117bbc93b1f9ecd20258bc8...
7,AMA,MDAnderson,a618a77a8e4c2e66be63b2120aa78bd28fa3c5dd13c4d3...,d4548e6df59a4a49879e3bde8c7c88dbb7807aadc80134...
8,AMA,BostonGeneral,add88d6972c90b52c15daffefb18ef563c6576a0aee85b...,9a6f10e774c7fe927da944b99b5665bc9c44cd75ca6e12...
9,AMA,HCA,2759287a714fa232cac7b1cdf08828f36c30a4e121853e...,b5f07fdd8ecec1fea43d01bee21a0b4e9eb8423ac0428f...


# Step 4: Adding Another Network

We should be able to add another network by simply dropping in the url to the network node (much like adding another PyPI/Npm repository or something)

In [293]:
gr.save_network('http://nhs.co.uk/pygrid') # it's a network

Connecting... SUCCESS!


Unnamed: 0_level_0,id,datasets,models,domains,online,registered
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
NHS,2352,86585,6585,5,5,0


In [294]:
gr.networks

Unnamed: 0_level_0,id,datasets,models,domains,online,registered
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
OpenGrid,235252,235262,2352,2532,2352,23
AMA,634252,2352,236622,53,52,23
CDC,745742,35,0,5,5,5
NHS,2352,86585,6585,5,5,0


Before we continue, let's talk about what these columns represent. A "network", as you may have guessed, is a hosted service which exists to help you find data. Actually, it exists to help you find just about any kind of object within data science you might be looking for (data, compute, models, etc.), as long as that object exists within the network.

The various members of the network are called "domains". 

# Step 5: Pre-Search Background

So now that we know that we are connected to a variety of domains within a group of networks we know about, now we want to begin doing some data science. There are a few datastructures we need to know about first. In short, we need to ask the question, "What are we actually looking for?" You might say..."Well of course!! Datasets!!" However, there's much more to finding data than meets the eye. The process of searching for data isn't so easy.

But just to begin, let's consider the dataset we're looking for. Let's say we want to find a dataset about COVID and Diabetes. Perhaps we're interested in studying the risk factors between contracting COVID and having diabetes as a pre-existing condition.

To start, we can perform a search for datasets.

- Dataset: This is a dataset object existing within a single Domain. It consists of 
    - name (public - required): the name of the dataset
    - id (public - required): the uid of the dataset
    - frameworks (public - required): the available frameworks for this dataset (derived from supported frameworks for the worker). Grouped into train, dev, and test.
    - tensors (public - reqired): a name->tensor dictionary enumerating the dataset's tensors (stored by default as pandas dataframes)
    - schema (public - required): the DatasetSchema of the dataset - which is the name->schema mapping for each TensorSchema. Identical across train, dev, and test
    - tags (public - required): a list of tags affiliated with this dataset
    - description (public - required): a free text description of the dataset
    - raw (private - optional): the raw version of the dataset (such as a CSV file, free text file, etc.)
    - metadata (public - optional): additional metadata someone wants to use for this dataset. We assume all of this data is public.
    - worst_case_user_budget: inferred values based on the worst case user-buget parameter within the dataset's tensors (see tensor user-buget)
    - private: does the dataset contain private tensors?
    - user_budget (public - required): the per-user privacy budget parameters for this dataset:
        - lifetime_train: the total epsilon which can be published to the greater public (i.e., when a data scientist intends to release a number openly)
        - lifetime_dev: the total epsilon which can be published to the greater public (i.e., when a data scientist intends to release a number openly)        
        - lifetime_test: the total epsilon which can be published to the greater public (i.e., when a data scientist intends to release a number openly)                
        - user_lifetime_train: the total epsilon each data scientist gets when interacting with the training dataset
        - user_lifetime_dev: the total epsilon each data scientist gets when interacting with the dev dataset
        - user_lifetime_test: the total epsilon each data scientist gets when interacting with the dev dataset
        - daily_auto_train: the amount of epsilon each data scientist gets per day for sample statistics which doesn't require compliance officer review
        - daily_auto_dev: the amount of epsilon each data scientist gets per day for sample statistics which doesn't require compliance officer review        
        - daily_auto_test: the amount of epsilon each data scientist gets per day for sample statistics which doesn't require compliance officer review                
        - query_auto_train: the maximum amount of epsilon one query can return which doesn't require officer review when intereacting with the training dataset
        - query_auto_dev: the maximum amount of epsilon one query can return which doesn't require officer review when intereacting with the training dataset
        - query_auto_test: the maximum amount of epsilon one query can return which doesn't require officer review when intereacting with the training dataset  
    
- Tensor:
    - name: the name of a tensor
    - schema (required - public - TensorSchema object): the schema of the tensor (type, name, and description for each column)
    - mock (generated): a mock tensor generated from the TensorSchema
    - id: the uid of the tensor
    - data: the tensor's values
    - tags (optional):
    - description (optional):
    - shape (required - public): the shape of the tensor
    - value: the tensor itself
    - private: is the tensor a private tensor?
    - sensitivity (optional): the sensitivity metadata for a tensor
        - h (public - derived from schema) - the max values a tensor can take on, derived from the schema
        - l (public - derived from schema)- the minimum values a tensor can take on, derived from the schema
        - e^h (private) - the max contributions from entities, initialized with the tensor
        - e^l (private) - the min contributions from entities, initialized with the tensor
    - accountant (private reference to global privacy accountant)
    - worst_case_user_budget: inferred values based on the worst case user-budget parameter across the entities in the tensor (see Entity.user_budget)

- Entity:
    - uid (required, randomly generated, public)
    - metadata (optional)
    - user_budget (public - required): the per-user privacy budget parameters for this dataset:
        - lifetime_train: the total epsilon which can be published to the greater public (i.e., when a data scientist intends to release a number openly)
        - lifetime_dev: the total epsilon which can be published to the greater public (i.e., when a data scientist intends to release a number openly)        
        - lifetime_test: the total epsilon which can be published to the greater public (i.e., when a data scientist intends to release a number openly)                
        - user_lifetime_train: the total epsilon each data scientist gets when interacting with the training dataset
        - user_lifetime_dev: the total epsilon each data scientist gets when interacting with the dev dataset
        - user_lifetime_test: the total epsilon each data scientist gets when interacting with the dev dataset
        - daily_auto_train: the amount of epsilon each data scientist gets per day for sample statistics which doesn't require compliance officer review
        - daily_auto_dev: the amount of epsilon each data scientist gets per day for sample statistics which doesn't require compliance officer review        
        - daily_auto_test: the amount of epsilon each data scientist gets per day for sample statistics which doesn't require compliance officer review                
        - query_auto_train: the maximum amount of epsilon one query can return which doesn't require officer review when intereacting with the training dataset
        - query_auto_dev: the maximum amount of epsilon one query can return which doesn't require officer review when intereacting with the training dataset
        - query_auto_test: the maximum amount of epsilon one query can return which doesn't require officer review when intereacting with the training dataset        


- TensorSchema:
    - name: the name of the schema 
    - columns: each column has a type, name, and description for the column

- SchemaColumn:
    - type
    - name
    - description
    - vocabulary (optional - for text datasets)
    
- DatasetSchema: this is the schema of a dataset. Importantly, we try to encourage datasets in multiple locations to intentionally subscribe to the same schema so as to best facilitate Federated Learning.

- DistributedDataset: this is a virtual object which referrs to a collection of Dataset objects which all subscribe to the same Dataset Schema. It is a convenient object because it gives you fast access to datasets at multiple institutions which are appropriate to train on together.

### Search

We can search for any of the objects mentioned above by performing a query like this.

In [295]:
# string search anywhere in an object's public metadata
result = gr.search(anywhere="diabetes")

# can pass in a list of strings as well
result = gr.search(anywhere=["diabetes"])
result = gr.search(anywhere=["diabetes", "dementia"], require_all=True)

# can also search for tags
result = gr.search(tags="diabetes")
result = gr.search(tags=["diabetes"], require_all=True)

# can also search on the names of objects
result = gr.search(name_includes="MNIST")
result = gr.search(name_exact="MNIST")

# can also search on the description of objects
result = gr.search(description="diabetes COVID mortality")
result = gr.search(description=["diabetes", "COVID", "mortality"])

# can also only return objects of a specific type
result = gr.search(description=["diabetes", "COVID", "mortality"], types=["dataset", "model"])

result = gr.search_regex(anywhere="diabetes")

# can pass in a list of strings as well
result = gr.search_regex(anywhere=["diabetes"])
result = gr.search_regex(anywhere=["diabetes", "dementia"], require_all=True)

# can also search for tags
result = gr.search_regex(tags="diabetes")
result = gr.search_regex(tags=["diabetes"], require_all=True)

result = gr.search_regex(name_includes="MNIST")
result = gr.search_regex(name_exact="MNIST")

result = gr.search_regex(description="diabetes COVID mortality")
result = gr.search_regex(description=["diabetes", "COVID", "mortality"])

result()

Unnamed: 0,distributed_datasets,datasets,tensors,dataset_schemas,tensor_schemas,models,model_schemas
0,23,75474,947467,532,23,235,62


In [296]:
# show the individual dataset results
result.datasets(latest_version_only=True, ignore_duplicates=False, require_gpu=False)

Unnamed: 0_level_0,network,domain,id,upload-date,version,frameworks,train_rows,dev_rows,test_rows,schema,tags,description,private,metadata,gpu_available
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
COVID Mortality,UCSF,UCSF,4aa8c46467e8fd8c17973d84ac9f2947df4e1320d33d9f...,12/18/2019,1,PT/TF/PD/NP/JX,2626,353,366,COVID-MORT-2,#covid #or...,This is the official statistics for COVID deat...,True,{'collected':2019},True
US COVID Deaths,CDC,Atlanta,574b36c02ce19f32c0988fe234ac40e2555008a7d40f63...,1/18/2020,23,PT/TF/PD/NP/JX,34632,355,0,COVID-MORT-2,#covid #or...,"Nationally reported on a daily basis, this dat...",True,{'collected':2020},False
US COVID Deaths,CDC,Chicago,574b36c02ce19f32c0988fe234ac40e2555008a7d40f63...,1/18/2020,23,PT/TF/PD/NP/JX,34632,355,0,COVID-MORT-2,#covid #or...,"Nationally reported on a daily basis, this dat...",True,{'collected':2020},True
COVID Deaths,AMA,Boston General,4b69b2f79580a919b27cbd4cf31c111bc1cf6c5a4355e6...,2/20/2020,2,PT/TF/PD/NP/JX,2352,335,0,COVID-MORT-2,#covid #or...,With attributes including risk factors like di...,True,{'collected':2020},True
Diabetes Pump Trial Data,AMA,Boston General,fa0cca78d9f564923597f95f389b9dc7bceb253f15db65...,1/3/2018,26,PT/TF/PD/NP/JX,23267,335,3463,AMA-DIABETES-TRIAL-252,#diabetes #or...,"In 2018, the American Medical Association...",True,{'collected':2018},True


In [305]:
# latest_return_only=True by default

result.distributed()

Unnamed: 0_level_0,networks,domains,popular_dataset_description,n_datasets,n_datasets_dedup,train_rows,dev_rows,test_rows,max_train_rows_per_domain,max_dev_rows_per_domain,max_test_rows_per_domain
schema-name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
COVID-MORT-2,"UCSF,CDC,AMA","UCSF,Atlanta,Chicago,Boston General","Nationally reported on a daily basis, this dat...",4,3,39610,1063,366,34632,355,366
AMA-DIABETES-TRIAL-252,AMA,Boston General,"In 2018, the American Medical Association...",4,3,23267,335,3463,23267,335,3463


# Select and Allocate Compute

Now that we have found an interesting distributed dataset, we need to setup some compute to do our analysis. However, the important thing to consider is that we can't use just any compute, we have to use compute which is co-located with each dataset we want to analyze. For example, part of the COVID-MORT-2 distributed dataset we found above is in UCSF's datacenters, so we need to spin up some compute within UCSF's "Domain". A "Domain" is the official word we use when referring to "all the data and compute within the ownership and jurisdiction of a single entity, known as the domain owner". 

So, since the dataset we're most interested in "COVID-MORT-2" is actually distributed across multiple domain owners, we need to get setup with some compute within each one.

In [319]:
import pandas as pd

import os
from binascii import hexlify

def get_key():
    key = hexlify(os.urandom(32)).decode()
    return key

class Grid():
    ""
class Wallet():
    ""
gr = Grid()
gr.wallet = Wallet()

#ignore this...it's just to support the mock API
columns=["network", "domain", "pubkey", "prikey"]
data = [["OpenGrid", "PatrickCason", get_key(), get_key()],
       ["OpenGrid", "AndrewTrask", get_key(), get_key()],
       ["OpenGrid", "TudorCebere", get_key(), get_key()],
       ["OpenGrid", "JasonMancuso", get_key(), get_key()],
       ["OpenGrid", "BobbyWagner", get_key(), get_key()],
       ["AMA", "UCSF", get_key(), get_key()],
       ["AMA", "Vanderbilt", get_key(), get_key()],
       ["AMA", "MDAnderson", get_key(), get_key()],
       ["AMA", "BostonGeneral", get_key(), get_key()],
       ["AMA", "HCA", get_key(), get_key()],
       ["CDC", "Atlanta", get_key(), get_key()],
       ["CDC", "New York", get_key(), get_key()],
       ]
domain_keys = pd.DataFrame(columns=columns, data=data)
gr.wallet.domain_keys = domain_keys

#ignore this...it's just to support the mock API
columns=["id", "name", "datasets", "models", "domains", "online", "registered", "server-domains", "mobile-domains"]
data = [[235252, "OpenGrid", 235262, 2352, 2532, 2352, 23, "100%", "0%"],
       [235252, "FitByte", 34734, 2352, 2532, 2352, 23, "2%", "98%"],
       [235252, "DeepMined", 34734, 2352, 2532, 2352, 23, "100%", "0%"], 
       [235252, "OpanAI", 0, 2352, 2532, 2352, 23, "2%", "98%"],  
       [235252, "Damonios Pizza", 0, 2352, 2532, 2352, 23, "99%", "1%"],   
       [235252, "MyFitnessPal", 0, 2352, 2532, 2352, 23, "50%", "50%"],           
       [235252, "TrackMyRun", 0, 2352, 2532, 2352, 23, "0%", "100%"],                   
       [235252, "Netflax", 935685, 7473, 346, 216, 0, "54%", "46%"],        
       [634252, "AMA", 2352, 236622, 53, 52, 23, "100%", "0%"],
       [745742, "CDC", 35, 0, 5, 5, 5, "100%", "0%"]]
networks = pd.DataFrame(columns=columns, data=data)
networks = networks.set_index("name")    
gr.networks = networks

def save_network(network):
    columns=["id", "name", "datasets", "models", "domains", "online", "registered"]
    data = [[2352, "NHS", 86585, 6585, 5, 5, 0]]
    network = pd.DataFrame(columns=columns, data=data)
    network = network.set_index("name")
    
    gr.networks = pd.concat([gr.networks, network])
    print("Connecting... SUCCESS!")
    return network

gr.save_network = save_network

def search_diabetes(*args, **kwargs):
    
    columns=["distributed_datasets", "datasets", "tensors", "dataset_schemas", "tensor_schemas", "models", "model_schemas"]
    data = [[23, 75474, 947467, 532, 23, 235, 62]]
    nets = pd.DataFrame(columns=columns, data=data)
    
    key2 = get_key()
    
    columns=["name", "network", "domain", "id", "upload-date", "version", "frameworks", "train_rows", "dev_rows", "test_rows", "schema", "tags", "description", "private", "metadata", "gpu_available"]
    data = [["COVID Mortality", "UCSF", "UCSF", get_key(), "12/18/2019", "1", "PT/TF/PD/NP/JX", 2626, 353, 366, "COVID-MORT-2", "#covid #or...", "This is the official statistics for COVID deaths within...", "True", "{'collected':2019}", "True"],
           ["US COVID Deaths", "CDC", "Atlanta", key2, "1/18/2020", "23", "PT/TF/PD/NP/JX", 34632, 355, 0, "COVID-MORT-2", "#covid #or...", "Nationally reported on a daily basis, this dataset includes", "True", "{'collected':2020}", "False"],
           ["US COVID Deaths", "CDC", "Chicago", key2, "1/18/2020", "23", "PT/TF/PD/NP/JX", 34632, 355, 0, "COVID-MORT-2", "#covid #or...", "Nationally reported on a daily basis, this dataset includes", "True", "{'collected':2020}", "True"],            
           ["COVID Deaths", "AMA", "Boston General", get_key(), "2/20/2020", "2", "PT/TF/PD/NP/JX", 2352, 335, 0, "COVID-MORT-2", "#covid #or...", "With attributes including risk factors like diabetes...", "True", "{'collected':2020}", "True"],
           ["Diabetes Pump Trial Data", "AMA", "Boston General", get_key(), "1/3/2018", "26", "PT/TF/PD/NP/JX", 23267, 335, 3463, "AMA-DIABETES-TRIAL-252", "#diabetes #or...", "In 2018, the American Medical Association...", "True", "{'collected':2018}", "True"]]
    datasets = pd.DataFrame(columns=columns, data=data)
    datasets = datasets.set_index("name")
    
    
    columns=["schema-name", "networks", "domains", "popular_dataset_description", "n_datasets", "n_datasets_dedup", "train_rows", "dev_rows", "test_rows", "max_train_rows_per_domain", "max_dev_rows_per_domain", "max_test_rows_per_domain"]
    data = [["COVID-MORT-2", "UCSF,CDC,AMA", "UCSF,Atlanta,Chicago,Boston General", "Nationally reported on a daily basis, this dataset includes", 4,3, 39610, 1063, 366, 34632, 355, 366],
           ["AMA-DIABETES-TRIAL-252", "AMA", "Boston General", "In 2018, the American Medical Association...", 4,3, 23267, 335, 3463, 23267, 335, 3463]]
    distributed = pd.DataFrame(columns=columns, data=data)
    distributed = distributed.set_index("schema-name")
    
    class Networks():
        def __call__(self):
            return nets
        
        def datasets(self, *args, **kwargs):
            return datasets
        
        def distributed(self, *args, **kwargs):
            return distributed
            
    networks = Networks()
    
    
    return networks
gr.search = search_diabetes
gr.search_regex = search_diabetes

In [None]:


# diabetesSearch = network.search('diabetes') # search dataset name, description, and tags for 'diabetes'
diabetesSearch = network.search({ tag: 'diabetes' }) # specifically search for datasets with a tag of 'diabetes'

print(diabetesSearch)

"""
[
  {
    id: 1,
    name: 'Diabetes is terrible',
    description: '',
    node: 'ws://ucsf.com/pygrid',
    tags: ['diabetes', 'california', 'ucsf'],
    tensors: [
      {
        id: '1a',
        name: 'data',
        schema: []
      },
      {
        id: '1b',
        name: 'target',
        schema: []
      }
    ]
  },
  ...
]
"""

network.disconnect()

client = grid.connect(diabetesSearch[0].node) # 'ws://ucsf.com/pygrid'

user = client.signup('me@patrickcason.com', 'password')
# user = client.login('me@patrickcason.com', 'password')  # or, if you're already signed up

computeTypes = client.getComputeTypes()

"""
[
  {
    id: 1,
    name: 'EC2 P3',
    provider: 'AWS',
    cpu: {
      type: 'Intel Xeon 3.4GHz',
      cores: 32
    },
    gpu: {
      type: 'Tesla V100',
      min: 0,
      max: 8
    },
    ram: {
      value: 64,
      ordinal: 'gb'
    }
  },
  ...
]
"""

# env = user.createEnvironment() # creates the basic "default" environment for exploring

env = user.createEnvironment(computeTypes[0].id, {
    ram: Grid.RAM(32, 'gb'),
    gpu: 3
})

# Do stuff with "env"

# user.getEnvironments();