## Demonstrating a DataMined Economy

In this IPython notebook, we walk through construction of a DataMined economy. Recall the basic construction of DataMined: "tokens" are issued for addition of valid data to the network. Users who add data to the network are called "miners." Miners are typically assumed to be consumers who'd like to contribute and safeguard their individual data, however they may be non-profits or businesses as well. Data contributed in to the network is stored on storage nodes within the network. For small datasets, miners can store data locally on their own machines.

Note that since miners are rewarded for contributing data to the network via token issuance, there is a strong incentive to attack the network by contributing fake data. Consequently, network security is critically dependent on `Validator`s which can detect fraudulent data. Another interesting aspect of network design is that direct download of data off the network can't be permitted due to excessive risk of data leakage. As a result, computation must be performed on the network itself.

The "tokens" used in this network are called `DataCoin`s. These tokens conform to the [ERC20](https://theethereum.wiki/w/index.php/ERC20_Token_Standard) for tokens issued on the Ethereum network. For those new to ethereum development, the token is a form of currency whose fidelity is guaranteed by Ethereum network miners (note the distinction; ethereum miners are *not* necessarily DataMined data miners). Customers who wish to perform computation on data in the network must pay both miners and compute nodes in tokens to request that computation be performed on interesting data.

In this notebook, we introduce the core DataMined package `datamined` and demonstrate how to use `datamined` to construct an example data economy. Since this is an example economy, this code runs on a single local machine. In addition, tokens issued in this economy are run on a tiny, local instantiation of the Ethereum protocol. (The same code can be deployed onto larger Ethereum networks, but would take much longer to run to completion).

Let's start by defining the number of participants in our economy.

In [1]:
N_miners = 100
N_compute_nodes = 10
N_buyers = 5

In this economy, we have `100` data miners who will contribute their data to the network. Each of these miners should be rewarded for their trouble with `DataCoin` tokens. In order for miner data to be accepted however, miners must pass the gauntlet of the validator. The submodule `datamined.valid` provides implementations of validators for different data-types.

Let's see how miners join the network.

In [2]:
import datamined as dm
# TODO(rbharath): The eth_utils package emits many unsavory warnings. 
# I haven't yet been able to figure out a good way to suppress them
# beyond simply redirecting stderr.
import sys
oldstderr = sys.stderr
sys.stderr = open('log.txt', 'w')

miners = []
for n in range(N_miners):
  print("Creating miner ", n)
  # Create a separate client wallet for each miner
  miner_wallet = dm.coins.LocalGethWallet()
  # TODO(rbharath): Modify example so that each miner has a separate private key
  miner_private_key = "ignored"
  # TODO(rbharath): The InsecureFileClient doesn't perform any at-rest encryption of data.
  # Swap out for a wallet that encrypts data that's on disk.
  miner = dm.data.InsecureFileClient(miner_private_key, miner_wallet)
  miners.append(miner)

Creating miner  0
Creating miner  1
Creating miner  2
Creating miner  3
Creating miner  4
Creating miner  5
Creating miner  6
Creating miner  7
Creating miner  8
Creating miner  9
Creating miner  10
Creating miner  11
Creating miner  12
Creating miner  13
Creating miner  14
Creating miner  15
Creating miner  16
Creating miner  17
Creating miner  18
Creating miner  19
Creating miner  20
Creating miner  21
Creating miner  22
Creating miner  23
Creating miner  24
Creating miner  25
Creating miner  26
Creating miner  27
Creating miner  28
Creating miner  29
Creating miner  30
Creating miner  31
Creating miner  32
Creating miner  33
Creating miner  34
Creating miner  35
Creating miner  36
Creating miner  37
Creating miner  38
Creating miner  39
Creating miner  40
Creating miner  41
Creating miner  42
Creating miner  43
Creating miner  44
Creating miner  45
Creating miner  46
Creating miner  47
Creating miner  48
Creating miner  49
Creating miner  50
Creating miner  51
Creating miner  52
Cre

Now that miners have been created, let's allow miners to load their data and receive tokens for their trouble. In this example, we will assume that miners are loading personal genomic data onto the network. If the validator accepts their genomic data, then the miner is rewarded with `DataCoin` tokens for their trouble.

In [3]:
import random

# This simple validator assumes that all provided data is valid.
# TODO(rbharath): Swap out this simple validator for a more 
validator = dm.valid.NaiveGenomicValidator()

# TODO(rbharath): The ledgers provide the location of the stored data. This might be an unsightly API
# in the long-run and should perhaps be refactored out.
ledgers = []
for n in range(N_miners):
  print("Adding data for miner ", n)
  miner = miners[n]
  miner_data = "".join([random.choice(["A", "C", "G", "T"]) for _ in range(100)])
  ledger = miner.store(miner_data, validator)
  coins_issued = miner.get_wallet().get_balance()
  print("  %d coins were issued" % coins_issued)
  assert coins_issued > 0
  ledgers.append(ledger)

Adding data for miner  0
  1 coins were issued
Adding data for miner  1
  1 coins were issued
Adding data for miner  2
  1 coins were issued
Adding data for miner  3
  1 coins were issued
Adding data for miner  4
  1 coins were issued
Adding data for miner  5
  1 coins were issued
Adding data for miner  6
  1 coins were issued
Adding data for miner  7
  1 coins were issued
Adding data for miner  8
  1 coins were issued
Adding data for miner  9
  1 coins were issued
Adding data for miner  10
  1 coins were issued
Adding data for miner  11
  1 coins were issued
Adding data for miner  12
  1 coins were issued
Adding data for miner  13
  1 coins were issued
Adding data for miner  14
  1 coins were issued
Adding data for miner  15
  1 coins were issued
Adding data for miner  16
  1 coins were issued
Adding data for miner  17
  1 coins were issued
Adding data for miner  18
  1 coins were issued
Adding data for miner  19
  1 coins were issued
Adding data for miner  20
  1 coins were issued
Ad

Now that miners have added data onto the data economy, let's see how to add compute nodes onto the mining economy. The submodule `datamined.compute` provides utilities for creating compute nodes and adding them onto the networ.

In [4]:
# At this point, there's currency floating around. This currency
# should incentivize the entrace of computational nodes onto the
# economy.
compute_nodes = []
for m in range(N_compute_nodes):
  print("Adding compute node ", m)
  # Create a node wallet
  node_wallet = dm.coins.LocalGethWallet()
  assert node_wallet.get_balance() == 0

  # TODO(rbharath): Add a more complex computation here.
  node = dm.compute.GenomicCountInsecureFileNode(node_wallet)
  compute_nodes.append(node)

Adding compute node  0
Adding compute node  1
Adding compute node  2
Adding compute node  3
Adding compute node  4
Adding compute node  5
Adding compute node  6
Adding compute node  7
Adding compute node  8
Adding compute node  9


Now that both data and compute powers are available on the network, let's see how to enable data customers to perform to perform computation on the network.

In [5]:
import numpy as np

# TODO(rbharath): There isn't currently a way to exchange ethereum for DataCoins directly.
# Consequently, only the miners have money on the network and can actually pay for computation.
# Fix this so external buyers can join the network.
data_customer = miners[0]

# Performing computaiton on the network.
# Select a random compute node
node = compute_nodes[np.random.randint(0, N_compute_nodes)]
# Select a random miner (who is not the data_customer)
data = miners[np.random.randint(1, N_miners)]
ledger = ledgers[np.random.randint(0, N_miners)]
# Obtain ledger key
ledger_key = data.get_ledger_key(ledger)

result = node.compute(ledger, ledger_key, "count_basepairs")
print("result: ", result)

# TODO(rbharath): The economy has a hole here where more money is paid than deducted! Plug the hole.
final_balance = node.get_wallet().get_balance()
print("Node now has %d DataCoins after payment" % final_balance)

result:  {'A': 20, 'C': 20, 'G': 27, 'T': 33}
Node now has 4 DataCoins after payment
