<a href="https://colab.research.google.com/github/bansal19/Deep_Learning/blob/main/Bittensor_training.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to Bittensor Training


In [None]:
# First install bittensor onto our runtime.
! pip install bittensor==1.0.4

Collecting bittensor==1.0.4
[?25l  Downloading https://files.pythonhosted.org/packages/4f/fa/5a89ef19473f385b4195f88ebc5df7a4cfb1550ceac0abb42df772aa19f2/bittensor-1.0.4-py3-none-any.whl (127kB)
[K     |██▋                             | 10kB 12.9MB/s eta 0:00:01[K     |█████▏                          | 20kB 12.5MB/s eta 0:00:01[K     |███████▊                        | 30kB 12.8MB/s eta 0:00:01[K     |██████████▎                     | 40kB 9.0MB/s eta 0:00:01[K     |████████████▉                   | 51kB 4.6MB/s eta 0:00:01[K     |███████████████▍                | 61kB 4.8MB/s eta 0:00:01[K     |██████████████████              | 71kB 5.1MB/s eta 0:00:01[K     |████████████████████▌           | 81kB 5.4MB/s eta 0:00:01[K     |███████████████████████         | 92kB 5.5MB/s eta 0:00:01[K     |█████████████████████████▋      | 102kB 5.9MB/s eta 0:00:01[K     |████████████████████████████▏   | 112kB 5.9MB/s eta 0:00:01[K     |██████████████████████████████▊ | 122kB 5

In [None]:
# Bittensor uses torch heavily as it's payload encoding and machine learning toolkit.
# We also use asyncio and must nest our asyncio loop inside the outer-colab-loop.
import bittensor
import torch
import nest_asyncio 
nest_asyncio.apply()

In [None]:
# Instantiating a wallet:

# Querying the Bittensor network is free, however, users who contribute to the network attain ownership through the distribution of Tao.
# Tao increases your bandwidth in the network as miner-neurons (machines serving intelligence models) are more incentivized to respond to queries. 
# Tao also increases your learning potential as miner-neurons apply gradients from nodes with network power.

# Your balance is held in a "wallet" which maintains your cryptographic keys, one a "coldkey" that holds tokens and another the "hotkey" that controls your miner.
# The following lines create a wallet's hot and coldkey, however, don't worry about saving these keys, they won't be subscribed on the network or hold any tokens.
wallet = bittensor.wallet.Wallet(
    path = "~/.bittensor/wallets/",
    name = "test_wallet",
    hotkey = "test_hotkey"
)
wallet.create_new_coldkey(use_password=False)
wallet.create_new_hotkey() 

[31m
IMPORTANT: Store this mnemonic in a secure (preferable offline place), as anyone who has possesion of this mnemonic can use it to regenerate the key and access your tokens. 
[0m
The mnemonic to the new key is:

[32mouter crew neglect ceiling parrot aerobic album raise grid luggage action height[0m

You can use the mnemonic to recreate the key in case it gets lost. The command to use to regenerate the key using this mnemonic is:
bittensor-cli regen --mnemonic outer crew neglect ceiling parrot aerobic album raise grid luggage action height

Writing key to /root/.bittensor/wallets//test_wallet/coldkey
[31m
IMPORTANT: Store this mnemonic in a secure (preferable offline place), as anyone who has possesion of this mnemonic can use it to regenerate the key and access your tokens. 
[0m
The mnemonic to the new key is:

[32mguilt arrow minute novel base afford lizard friend room food work wear[0m

You can use the mnemonic to recreate the key in case it gets lost. The command to use 

In [None]:
# Creating Bittensor components:

# The Bittensor api is built from plug-and-play components, for this tutorial we will be using three of them:
#  1. Subtensor: An interface to the blockchain: allows us to query state and send transactions.
#
#  2. Metagraph: An object which maintains chain-state information (who is online, their weights, stake etc) as torch objects.
#
#  3. Dendrite: An object which maintains RPC connections to other peers in the system and allows us to make forward and backward queries.


# Create our Kusangi blockchain connection.
subtensor = bittensor.subtensor.Subtensor(
    wallet = wallet,
    network = 'kusanagi'
)

# Create our Metagraph chain state object.
# The metagraph take the subtensor connection as a parameter.
metagraph = bittensor.metagraph.Metagraph(
    wallet = wallet,
    subtensor = subtensor
)

# Create our dendrite RPC client.
# The dendrite needs the wallet and the metagraph.
dendrite_config = bittensor.dendrite.Dendrite.default_config()
dendrite_config.receptor.do_backoff = False
dendrite = bittensor.dendrite.Dendrite(
    config = dendrite_config,
    wallet = wallet,
    metagraph = metagraph,
)


In [None]:

# Syncing the metagraph:

# Weight and neuron information changes continually as the blockchain progresses. The Metagraph sync
# command will query for new information and serve it to you as torch objects which you can use in your training 
# regimes.
metagraph.sync()
print (metagraph)

In [None]:
# Creating inputs:

# The Bittensor network is designed to be multi-modality and thus query it through multiple datatypes.
# However, the network was initially seeded only with TEXT, a modality where inputs need to be tokenized sequences of 
# natural language for instance "the cat was big and bob was a builder".

# Bittensor comes with a pre-built GPT byte encoder. 
# All messages should be encoded with this tokenizer.
tokenizer = bittensor.__tokenizer__()

# Example: Tokenizing text for a network query.
sentence = 'the quick brown fox jumped over the lazy dog\'s ectoplasm'
tokenized_sentence = tokenizer( [sentence] )['input_ids']
print ('tokenize( [\"', sentence, '\"]) =', tokenized_sentence)

[37m
Syncing metagraph:[0m


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1042301.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1355256.0, style=ProgressStyle(descript…


tokenize( [" the quick brown fox jumped over the lazy dog's ectoplasm "]) = [[1169, 2068, 7586, 21831, 11687, 625, 262, 16931, 3290, 338, 46080, 20106, 8597]]


In [None]:
# Querying Neurons:

# Each miner has a unique endpoint which we pulled from the chain during metagraph.sync()
# Below, we get the endpoint information for Adam: the first miner with uid=0
#adam = metagraph.neurons[metagraph.state.index_for_uid[0]]
adam = bittensor.proto.Neuron( 
    address = '127.0.0.1',
    port = 8091,
    public_key = wallet.hotkey.public_key
)
print("\"Adam\" or endpoint 0:", '\n\n', adam)

# To query a peer, we use the dendrite, our RPC tool. Below we send our previously tokenized text to this endpoint 
# and recieved our result.
print ('Make query ->')
response, codes = dendrite.forward_text( 
    neurons = [ adam ],
    x = [ torch.tensor(tokenized_sentence) ]
)

# NOTE: For consitency, all requests must follow the same shape constraints.
# TEXT: [batch_length, sequence_length] 
# IMAGE = [batch_length, sequence_length, n_channels, x_size, y_size ] 
# TENSOR = [batch_length, bittensor.__network_dim__]
# And responses are always of shape [batch_length, sequence_length, bittensor.__network_dim__]
print ('\n')
print ('Adam\'s response: \n', response[0], '\n')
print ('Response shape: \n', response[0].shape, '\n')
print ('Return code: \n', codes, '\n')

"Adam" or endpoint 0: 

 public_key: "0x6c0a1103a68ea4f4af8cacc75bb53df61c291db8e9fd7d211ad601af2de34535"
address: "127.0.0.1"
port: 8091

Make query ->


Adam's response: 
 tensor([[[0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         ...,
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.]]], grad_fn=<_ReceptorCallBackward>) 

Response shape: 
 torch.Size([1, 13, 512]) 

Return code: 
 tensor([3]) 



In [None]:
# Understanding queries:

# We just queried a single peer, Adam, and got a response. What happened?

# 1. Our tensor was serialized using a built in serializer class, converting the tensor into bytes.
#
# e.g. serializer = bittensor.serialization.get_serializer(bittensor.proto.Serializer.MSGPACK)
#.     serialized_tensor = serializer.serialize_from_torch( torch.tensor(tokenized_sentence), bittensor.proto.Modality.TEXT )


# 2. Our byte-encoded tensor was packaged into an RPC request and sent over the wire to our endpoint, in this case Adam's endpoint: 99.238.136.56:8091
#
# e.g. adam_receptor = list(dendrite.receptors)[0]
#.     adam_receptor.forward( torch.tensor(tokenized_sentence), bittensor.proto.Modality.TEXT)


# 3. Adam deserialized the request and used it as input to his transformer model. 
#    NOTE: Adam is running a custom GPT2 model trained on the genesis dataset for language modelling.
#
# e.g. deserializer = bittensor.serialization.get_serializer(bittensor.proto.Serializer.MSGPACK)
#      deserialized_tensor = serializer.deserialize_to_torch( serialized_tensor )


# 4. The output of Adam's transformer model is a sequence of representations, each
# representation of length bittensor.__network_dim__, one for each token in the sentence. 
# These are the standard hidden units of a transformer model and encode the meaning (according to Adam)
# of each token in it's position.
#
# e.g. response_tensor = AdamModel.forward_text( deserialized_tensor )
#      assert output_tensor.shape = [1, 13, bittensor.__network_dim__]


# 5. Adam's response tensor is serialized and returned to the sender. 

In [None]:
# Putting it together:
# Below, we will train a custom model for Poem Sentiment Classification by querying the network.


# Load our dataset.
import datasets
dataset = datasets.load_dataset('poem_sentiment')
print ('\n\nExample sentence: \"', dataset['train']['verse_text'][3], '\"\n\nlabel: ', dataset['train']['label'][3])

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1410.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=924.0, style=ProgressStyle(description_…

Using custom data configuration default



Downloading and preparing dataset poem_sentiment/default (download: 48.70 KiB, generated: 58.53 KiB, post-processed: Unknown size, total: 107.23 KiB) to /root/.cache/huggingface/datasets/poem_sentiment/default/1.0.0/f4990808f049126bcea572bba70613313212cd45f3b12a3e5586135e2de42f56...


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=19312.0, style=ProgressStyle(descriptio…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=2511.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=2440.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Dataset poem_sentiment downloaded and prepared to /root/.cache/huggingface/datasets/poem_sentiment/default/1.0.0/f4990808f049126bcea572bba70613313212cd45f3b12a3e5586135e2de42f56. Subsequent calls will reuse this data.


Example sentence: " when i peruse the conquered fame of heroes, and the victories of mighty generals, i do not envy the generals, "

label:  3


In [None]:
# Building the model.
import torch.nn as nn
import torch.nn.functional as F

class Pooler(nn.Module):
    def __init__(self):
        super(Pooler, self).__init__()
        self.dense = nn.Linear(bittensor.__network_dim__, bittensor.__network_dim__)
        self.activation = nn.Tanh()

    def forward(self, x: torch.FloatTensor):
        # Take last sequence encoding as the sentence's representation.
        first_representation = x[:, -1]
        pooled_output = self.dense(first_representation)
        pooled_output = self.activation(pooled_output)
        return pooled_output

class PoemSentimentClassifier(nn.Module):
    def __init__(self):
        super().__init__()

        # For projecting sequences of representations into a single represenation.
        self.pooler = Pooler()

        # A Feedforward dense layer.
        self.hidden = nn.Linear(bittensor.__network_dim__, bittensor.__network_dim__)

        # For projecting our learned feature space onto the target dimension.
        self.target = nn.Linear(bittensor.__network_dim__, 4)

    def forward(self, x: torch.LongTensor):
        # Our model's forward call.

        # First, query every peer on kusanagi. (Slow for this tutorial)
        network_query = [ x for _ in metagraph.neurons]
        responses, _ = dendrite.forward_text( metagraph.neurons, network_query )

        # Average and pool responses.
        averaged_responses = torch.mean(torch.stack(responses, dim=2), dim=2)
        pooled_responses = self.pooler( averaged_responses )

        # Apply our dense layer and project it onto our target layer.
        hidden_layer = self.hidden( pooled_responses )
        logit_layer = self.target( hidden_layer )
        outputs = F.softmax( logit_layer, dim=1 )

        # Return our softmax-predictions.
        return outputs


In [None]:
# Simple training architecture.
from typing import Tuple
import random

# Training params.
n_steps = 1000
batch_size = 100
learning_rate = 0.01
momentum = 0.99

# Model and optimizer.
tokenizer = bittensor.__tokenizer__()
model = PoemSentimentClassifier()
optimizer = torch.optim.SGD( model.parameters(), lr = learning_rate, momentum = momentum)
loss_function = torch.nn.CrossEntropyLoss(ignore_index=-1)

# Batch iterator: Produces random tokenized batches from the poem dataset.
def next_batch(batch_size: int, dataset, tokenizer) -> Tuple[torch.LongTensor, torch.LongTensor]:
  inputs = []
  targets = []
  for i in range(batch_size):
    random_index = random.randint(0, len(dataset)-1)
    inputs.append( dataset[random_index]['verse_text'] )
    targets.append( dataset[random_index]['label'] )
  inputs = tokenizer(inputs, return_tensors='pt', padding=True, truncation=True)['input_ids']
  targets = torch.tensor( targets, dtype=torch.int64 )
  return inputs, targets
  
# Training loop:
for batch_index in range(n_steps):
  inputs, targets = next_batch(batch_size, dataset['train'], tokenizer)
  logits = model( inputs )
  loss = loss_function( logits.view(-1, 4), targets )
  loss.backward()
  optimizer.step()
  optimizer.zero_grad()
  print ('step: ', batch_index, ' loss: ', loss.item())

IndexError: ignored

# **CSC413** Building Bert to train on Bittensor: SST-2









In [None]:
# Putting it together:
# Below, we will train a custom model for Poem Sentiment Classification by querying the network.


# Load our dataset.
from datasets import load_dataset
dataset = load_dataset('glue', 'sst2')
print (dataset)

Reusing dataset glue (/root/.cache/huggingface/datasets/glue/sst2/1.0.0/7c99657241149a24692c402a5c3f34d4c9f1df5ac2e4c3759fadea38f6cb29c4)


DatasetDict({
    train: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 67349
    })
    validation: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 872
    })
    test: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 1821
    })
})


In [None]:
axon = bittensor.axon.Axon( wallet = wallet )


INFO    |bittensor.axon:check_config:216 - UPNPC: OFF
INFO    |bittensor.axon:check_config:219 - Using external endpoint: 34.83.81.213:8091
INFO    |bittensor.axon:check_config:220 - Using local endpoint: 127.0.0.1:8091


In [None]:
axon.start()


In [None]:
axon.forward_queue()
axon.backward_queue()

AttributeError: ignored

In [None]:
! pip install bittensor==1.0.4
import bittensor
import torch
import nest_asyncio 
nest_asyncio.apply()

In [None]:
# We import DistilBERT (arXiv:1910.01108) instead of BertSmall due to the unavailability of the latter on huggingface.
# DistilBert performs better on MNLI than BertSmall (82.2 vs. 77.6). However, it is still worse than Bert Base (86.7 vs. 82.2).
from transformers import DistilBertForSequenceClassification, DistilBertConfig

NUM_LABELS = 2

def get_configured_DistilBERT():
  configuration = DistilBertConfig(
      vocab_size=50257,                 # Vocab size of bittensor tokenizer
      dim = bittensor.__network_dim__,  # Bittensor's network dimensions
      n_heads = 8,
      n_layers = 1,
      num_labels = NUM_LABELS           # SST-2 is binary classification
  )
  model = DistilBertForSequenceClassification( configuration )

  return model

print(get_configured_DistilBERT())

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(50257, 512, padding_idx=0)
      (position_embeddings): Embedding(512, 512)
      (LayerNorm): LayerNorm((512,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0): TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=512, out_features=512, bias=True)
            (k_lin): Linear(in_features=512, out_features=512, bias=True)
            (v_lin): Linear(in_features=512, out_features=512, bias=True)
            (out_lin): Linear(in_features=512, out_features=512, bias=True)
          )
          (sa_layer_norm): LayerNorm((512,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
       

In [None]:
# Try DistilBERT
tokenizer = bittensor.__tokenizer__()
sentence = 'the quick brown fox jumped over the lazy dog\'s ectoplasm'
tokenized_sentence = tokenizer( [sentence] )['input_ids']

model = get_configured_DistilBERT()
output = model(input_ids = torch.tensor( tokenized_sentence ))

expected_shape = torch.Size([1, model.num_labels])  # 1 x 2
real_shape = output.logits.size()                   # BS x NUM_LABELS

assert expected_shape == real_shape

In [None]:
# Building the model.
import torch.nn as nn
import torch.nn.functional as F

class Pooler(nn.Module):
    def __init__(self):
        super(Pooler, self).__init__()
        self.dense = nn.Linear(bittensor.__network_dim__, bittensor.__network_dim__)
        self.activation = nn.Tanh()

    def forward(self, x: torch.FloatTensor):
        # Take last sequence encoding as the sentence's representation.
        first_representation = x[:, -1]
        pooled_output = self.dense(first_representation)
        pooled_output = self.activation(pooled_output)
        return pooled_output

class DistilBERTforBittensor(nn.Module):
    def __init__(self, DistilBERT):
        super().__init__()

        # For projecting sequences of representations into a single represenation.
        self.pooler = Pooler()

        # Main layer using DistilBERT
        self.main_network = DistilBERT.forward

        # For projecting our learned feature space onto the target dimension.
        self.target = nn.Linear(bittensor.__network_dim__, 4)

    def forward(self, x: torch.LongTensor):
        # Our model's forward call.

        # First, query every peer on kusanagi. (Slow for this tutorial)
        network_query = [ x for _ in metagraph.neurons]
        responses, _ = dendrite.forward_text( metagraph.neurons, network_query )

        # Average and pool responses.
        averaged_responses = torch.mean(torch.stack(responses, dim=2), dim=2)
        pooled_responses = self.pooler( averaged_responses )

        # TODO: Something is wrong in shapes, print and debug. I can't connect to bittensor right now, will not try again and again since I might break something I don't know (Efe)

        # Apply our dense layer and project it onto our target layer.
        DistilBertResponse = self.main_network( 
            input_ids = pooled_responses ).last_hidden_state
        logit_layer = self.target( DistilBertResponse )
        outputs = F.softmax( logit_layer, dim=1 )

        # Return our softmax-predictions.
        return outputs


In [None]:
# Get the sentiment dataset following previous example
# TODO: Change this with MNLI

import datasets
dataset = datasets.load_dataset('poem_sentiment')
print ('\n\nExample sentence: \"', dataset['train']['verse_text'][3], '\"\n\nlabel: ', dataset['train']['label'][3])

Using custom data configuration default
Reusing dataset poem_sentiment (/root/.cache/huggingface/datasets/poem_sentiment/default/1.0.0/f4990808f049126bcea572bba70613313212cd45f3b12a3e5586135e2de42f56)




Example sentence: " when i peruse the conquered fame of heroes, and the victories of mighty generals, i do not envy the generals, "

label:  3


In [None]:
# Simple training architecture.
from typing import Tuple
import random

# Training params.
n_steps = 1000
batch_size = 100
learning_rate = 0.01
momentum = 0.99

# DistilBert
DistilBERT = get_configured_DistilBERT()

# Model and optimizer.
tokenizer = bittensor.__tokenizer__()
model = DistilBERTforBittensor(DistilBERT)
optimizer = torch.optim.SGD( model.parameters(), lr = learning_rate, momentum = momentum)
loss_function = torch.nn.CrossEntropyLoss(ignore_index=-1)

# Batch iterator: Produces random tokenized batches from the poem dataset.
def next_batch(batch_size: int, dataset, tokenizer) -> Tuple[torch.LongTensor, torch.LongTensor]:
  inputs = []
  targets = []
  for i in range(batch_size):
    random_index = random.randint(0, len(dataset)-1)
    inputs.append( dataset[random_index]['verse_text'] )
    targets.append( dataset[random_index]['label'] )
  inputs = tokenizer(inputs, return_tensors='pt', padding=True, truncation=True)['input_ids']
  targets = torch.tensor( targets, dtype=torch.int64 )
  return inputs, targets
  
# Training loop:
for batch_index in range(n_steps):
  inputs, targets = next_batch(batch_size, dataset['train'], tokenizer)
  logits = model( inputs )
  loss = loss_function( logits.view(-1, 4), targets )
  loss.backward()
  optimizer.step()
  optimizer.zero_grad()
  print ('step: ', batch_index, ' loss: ', loss.item())

NameError: ignored