# Using BOLT
## Basics.
Let's learn to use the BOLT Python API with an exercise. We'll do a simple image classification task on the MNIST dataset. Given 28 by 28 pixel images of handwritten numbers from 0 through 9, predict which number is being drawn.

In [1]:
# TODO(Geordie): Add download scripts and change to relative path
from thirdai import dataset

mnist_train = dataset.load_bolt_svm_dataset(
    filename="datasets/mnist/mnist", 
    batch_size=256)

mnist_test = dataset.load_bolt_svm_dataset(
    filename="datasets/mnist/mnist.t", 
    batch_size=256)


Read 60000 vectors from datasets/mnist/mnist in 1 seconds
Read 10000 vectors from datasets/mnist/mnist.t in 0 seconds


To perform this task, we want to build a simple neural network with these specifications:
* 784 (28 x 28) input dimension
* A single 1000-dim hidden layer with ReLU
* 10-dim output layer with Softmax

In [None]:
from tensorflow import keras

keras_layers = [
    keras.layers.Dense(
        units=1000, 
        activation='relu', 
        input_shape=(784,)),
        
    keras.layers.Dense(
        units=10, 
        activation='softmax')
]

keras_model = keras.Sequential(layers=keras_layers)

In [2]:
from thirdai import bolt

mnist_layers = [
    bolt.LayerConfig(
        dim=1000, 
        activation_function=bolt.ActivationFunctions.ReLU),
    
    bolt.LayerConfig(
        dim=10, 
        activation_function=bolt.ActivationFunctions.Softmax)
]

mnist_network = bolt.Network(
    layers=mnist_layers, 
    input_dim=784)

Layer: dim=1000, load_factor=1, act_func=ReLU
Layer: dim=10, load_factor=1, act_func=Softmax
Initialized Network in 0 seconds


We now train the network to minimize categorical cross entropy loss and measure our success with the categorical accuracy metric.

In [3]:
mnist_network.train(
    train_data=mnist_train, 
    loss_fn=bolt.CategoricalCrossEntropyLoss(), 
    learning_rate=0.001, 
    epochs=1)

mnist_network.predict(
    test_data=mnist_test, 
    metrics=["categorical_accuracy"], 
    verbose=True)


Epoch 1:
Processed 235 training batches in 3 seconds
Processed 40 test batches in 497 milliseconds
Accuracy: 0.9535 (9535/10000)


({'test_time': [497.0], 'categorical_accuracy': [0.9535]},
 array([[1.6119853e-09, 4.9061071e-09, 9.0842747e-09, ..., 9.9999881e-01,
         7.4112470e-12, 7.1178043e-08],
        [7.8348904e-08, 4.7955339e-04, 7.8698450e-01, ..., 1.3695576e-07,
         5.3275298e-06, 2.8925069e-07],
        [9.4192680e-09, 9.9960941e-01, 9.1716443e-05, ..., 2.2237067e-04,
         1.5032618e-05, 3.1097348e-05],
        ...,
        [2.6579289e-10, 1.1534446e-09, 7.5483649e-07, ..., 8.0246173e-05,
         9.5714662e-05, 2.2119880e-03],
        [6.5210443e-05, 3.2664754e-08, 7.9705176e-09, ..., 8.9524441e-08,
         5.7288981e-04, 1.3805798e-08],
        [4.6801352e-07, 1.0962963e-11, 1.6620123e-06, ..., 2.0000725e-11,
         1.0412909e-09, 7.2178027e-11]], dtype=float32))

## What about bigger models?
One example of a more complicated task that requires a larger network is intent classification. To demonstrate that, we have chosen the CLINC150 dataset. It's a corpus of customer queries mapped to their intentions. For example, the dataset may have a query like "do I have to pay for carry-ons on delta?", and this query is assigned an intent id, so in this case the intent is "carry-on" and it has a unique id.

In [None]:
# TODO(Geordie): Add download scripts and change to relative path
intent_class_train = dataset.load_bolt_svm_dataset(
    filename="datasets/intent_classification/train_shuf.svm", 
    batch_size=256)

intent_class_test = dataset.load_bolt_svm_dataset(
    filename="datasets/intent_classification/test_shuf.svm", 
    batch_size=256)

We converted the samples in this dataset into 5000 dimensional sparse input vectors and we'll use 10000 hidden layer. That's a 51 million parameter model so it's quite a big model. But with bolt we introduce a unique capability to set the load factor. The load factor defines the percentage of neurons that we want to use for each input sample. Here, we use 0.05 -> 500 neurons out of 10,000 for each input.

And it's not just any 500 neurons. It's the 500 most important neurons for each input vector, so Bolt curates a small network for each input vector. This is the key to using powerful deep learning for very cheap.

In [None]:
bigger_layers = [
    bolt.LayerConfig(
        dim=10000, 
        load_factor=0.05, 
        activation_function=bolt.ActivationFunctions.ReLU),
    
    bolt.LayerConfig(
        dim=151, 
        activation_function=bolt.ActivationFunctions.Softmax)
]

bigger_network = bolt.Network(
    layers=bigger_layers, 
    input_dim=5512)

### Sparse inference
You can also use sparsity to accelerate inference. Simply call the `enable_sparse_inference()` method. Notice that we call the method before the last training epoch. This freezes the hash functions, effectively locking specialized subnetworks for each input vector, and then fine-tunes these subnetworks.

In [None]:
bigger_network.train(
    train_data=intent_class_train, 
    loss_fn=bolt.CategoricalCrossEntropyLoss(), 
    learning_rate=0.001, 
    epochs=2)

bigger_network.enable_sparse_inference()

bigger_network.train(
    train_data=intent_class_train, 
    loss_fn=bolt.CategoricalCrossEntropyLoss(), 
    learning_rate=0.001, 
    epochs=1)

bigger_network.predict(
    test_data=intent_class_test, 
    metrics=["categorical_accuracy"], 
    verbose=True)

## What does this enable?
We trained a 200 million parameter model on the Yelp Reviews public dataset. As a benchmark, we fine-tuned RoBERTa, the state-of-the-art NLP model, on this dataset and got an accuracy of 83%. Let's see how well BOLT does!

In [None]:
yelp_sentiment_analysis_layers = [
    
    bolt.LayerConfig(dim=2000, 
        load_factor=0.2, 
        activation_function=bolt.ActivationFunctions.ReLU),
    
    bolt.LayerConfig(dim=2,
        load_factor=1.0, 
        activation_function=bolt.ActivationFunctions.Softmax)     
]

yelp_sentiment_analysis_network = bolt.Network(
    layers=yelp_sentiment_analysis_layers, 
    input_dim=100000)

### Load & Save
BOLT supports loading and saving networks from previous training sessions. 

To save, call the `save()` method on the trained network. 

In [None]:
# TODO(Geordie): Add download scripts and change to relative path
train_data = dataset.load_bolt_svm_dataset(
    filename="../sa_demo/text_data/yelp_review_full_2class_train.svm", 
    batch_size=1024)

yelp_sentiment_analysis_network.train(
    train_data=train_data,
    loss_fn=bolt.CategoricalCrossEntropyLoss(), 
    learning_rate=0.0001, 
    epochs=20, 
    rehash=6400, 
    rebuild=128000)

yelp_sentiment_analysis_network.save(filename="yelp_sentiment_analysis_cp")

To load a trained model, call the `bolt.Network.load()` static method.

In [4]:
# TODO(Geordie): Add download scripts and change to relative path
test_data = dataset.load_bolt_svm_dataset(
    filename="../sa_demo/text_data/yelp_review_full_2class_test.svm", 
    batch_size=256)

yelp_sentiment_analysis_network = bolt.Network.load(filename="yelp_sentiment_analysis_cp")

res = yelp_sentiment_analysis_network.predict(
    test_data=test_data, 
    metrics=["categorical_accuracy"], 
    verbose=True)

NameError: name 'dataset' is not defined

We also trained an even larger 2 billion parameter model on a larger text corpus to build an interactive sentiment analysis demo. We first load the trained model.

In [5]:
# TODO(Geordie): Add download scripts and change to relative path
sentiment_analysis_network = bolt.Network.load(filename="interactive_demo_cp")

Let's load the demo to get a feel of what this network can do!

In [6]:
import interactive_sentiment_analysis
interactive_sentiment_analysis.demo(sentiment_analysis_network, verbose=False)
# TODO(Geordie): Make the accuracy disappear when doing interactive demo 

Processed 1 test batches in 18 milliseconds
Accuracy: 0 (0/1)
positive!
Processed 1 test batches in 6 milliseconds
Accuracy: 0 (0/1)
negative!
Processed 1 test batches in 9 milliseconds
Accuracy: 0 (0/1)
negative!
Processed 1 test batches in 9 milliseconds
Accuracy: 0 (0/1)
positive!
Exiting demo...


### Let's talk speed.
Now that we've seen how fast inference is on BOLT, let's compare it with RoBERTa by running the following cell.

In [None]:
import time
from transformers import pipeline
sentiment_analysis = pipeline("sentiment-analysis",model="siebert/sentiment-roberta-large-english")
t1 = time.time()
out = sentiment_analysis("I love chocolate.")
t2 = time.time()
print(out, flush=True)
print('time elapsed: ',str(t2-t1),'s', flush=True)