## **How to use BOLT,**<br> **fast training,**<br> **benefits of large sparse models,**<br> **and fast inference.**

### Learning the syntax with simple exercise: MNIST

In [9]:
from thirdai import dataset

mnist_train = dataset.load_bolt_svm_dataset(
    filename="datasets/mnist/mnist_train.svm", 
    batch_size=256)

mnist_test = dataset.load_bolt_svm_dataset(
    filename="datasets/mnist/mnist_test.svm", 
    batch_size=256)


Read 60000 vectors from datasets/mnist/mnist_train.svm in 1 seconds
Read 10000 vectors from datasets/mnist/mnist_test.svm in 0 seconds


### We'll use a simple neural network with 784-dim input, 1000-dim hidden layer with ReLU, and 10-dim output layer with Softmax.

#### **Keras**

In [None]:
from tensorflow import keras

keras_layers = [
    keras.layers.Dense(
        units=1000, 
        activation='relu', 
        input_shape=(784,)),
        
    keras.layers.Dense(
        units=10, 
        activation='softmax')
]

keras_model = keras.Sequential(layers=keras_layers)

#### **BOLT**

In [10]:
from thirdai import bolt

mnist_layers = [
    bolt.LayerConfig(
        dim=1000, 
        activation_function=bolt.ActivationFunctions.ReLU),
    
    bolt.LayerConfig(
        dim=10, 
        activation_function=bolt.ActivationFunctions.Softmax)
]

mnist_network = bolt.Network(
    layers=mnist_layers, 
    input_dim=784)

Layer: dim=1000, load_factor=1, act_func=ReLU
Layer: dim=10, load_factor=1, act_func=Softmax
Initialized Network in 0 seconds


### We now train the network with categorical cross entropy loss function. <br> We'll measure how we do with the categorical accuracy metric.

In [11]:
mnist_network.train(
    train_data=mnist_train, 
    loss_fn=bolt.CategoricalCrossEntropyLoss(), 
    learning_rate=0.001, 
    epochs=1)

mnist_network.predict(
    test_data=mnist_test, 
    metrics=["categorical_accuracy"], 
    verbose=True)


Epoch 1:
Processed 235 training batches in 17 seconds
Processed 40 test batches in 1139 milliseconds
Accuracy: 0.957 (9570/10000)


({'test_time': [1139.0], 'categorical_accuracy': [0.957]},
 array([[2.0203961e-13, 6.6950145e-15, 8.7631264e-08, ..., 9.9999976e-01,
         7.3752809e-12, 1.9612491e-08],
        [6.2531448e-08, 1.0634946e-02, 9.8455715e-01, ..., 3.8345314e-13,
         9.0156909e-06, 1.3750822e-13],
        [9.8563596e-08, 9.9975282e-01, 1.5101575e-04, ..., 9.0950416e-05,
         1.5711990e-06, 5.5146881e-08],
        ...,
        [3.6730616e-10, 2.7345276e-10, 3.0686084e-07, ..., 2.9980873e-07,
         4.4744602e-06, 3.7992781e-05],
        [1.5296929e-07, 3.3367795e-10, 5.6572174e-09, ..., 3.7851692e-08,
         1.4965596e-05, 2.9628275e-10],
        [9.5305452e-08, 2.1538201e-08, 2.2893113e-07, ..., 1.3075585e-13,
         2.5810456e-09, 7.9025836e-10]], dtype=float32))

### A more challenging task: <br> **Intent classification with CLINC150.**

In [12]:
intent_class_train = dataset.load_bolt_svm_dataset(
    filename="datasets/intent_classification/intent_classification_train.svm", 
    batch_size=256)

intent_class_test = dataset.load_bolt_svm_dataset(
    filename="datasets/intent_classification/intent_classification_test.svm", 
    batch_size=256)

Read 18100 vectors from datasets/intent_classification/intent_classification_train.svm in 4 seconds
Read 5500 vectors from datasets/intent_classification/intent_classification_test.svm in 1 seconds


### Use `load_factor` to set the computational budget. <br> BOLT curates the best small network for each sample.

In [13]:
bigger_layers = [
    bolt.LayerConfig(
        dim=10000, 
        load_factor=0.05, 
        activation_function=bolt.ActivationFunctions.ReLU),
    
    bolt.LayerConfig(
        dim=151, 
        activation_function=bolt.ActivationFunctions.Softmax)
]

bigger_network = bolt.Network(
    layers=bigger_layers, 
    input_dim=5512)

Layer: dim=10000, load_factor=0.05, act_func=ReLU, sampling: {hashes_per_table=4, num_tables=256, range_pow=12, reservoir_size=9}
Layer: dim=151, load_factor=1, act_func=Softmax
Initialized Network in 3 seconds


### You can also use sparsity to accelerate inference using `enable_sparse_inference()`.

In [21]:
bigger_network.train(
    train_data=intent_class_train, 
    loss_fn=bolt.CategoricalCrossEntropyLoss(), 
    learning_rate=0.001, 
    epochs=2)

bigger_network.enable_sparse_inference()

bigger_network.train(
    train_data=intent_class_train, 
    loss_fn=bolt.CategoricalCrossEntropyLoss(), 
    learning_rate=0.001, 
    epochs=1)

bigger_network.predict(
    test_data=intent_class_test, 
    metrics=["categorical_accuracy"], 
    verbose=True)


Epoch 7:
Processed 71 training batches in 12 seconds

Epoch 8:
Processed 71 training batches in 12 seconds

Epoch 9:
Processed 71 training batches in 12 seconds
Processed 22 test batches in 985 milliseconds
Accuracy: 0.854545 (4700/5500)


({'test_time': [985.0], 'categorical_accuracy': [0.8545454545454545]},
 array([[4.2461420e-06, 1.7440190e-05, 8.6290788e-07, ..., 2.0201784e-05,
         2.3369166e-06, 9.5637098e-08],
        [3.9852392e-11, 2.0612914e-13, 1.7484965e-11, ..., 5.9651194e-11,
         3.7956438e-09, 3.1947812e-07],
        [3.5286228e-06, 1.8471538e-06, 1.5664868e-05, ..., 6.5738773e-06,
         4.6795899e-06, 2.4628860e-05],
        ...,
        [5.7611720e-07, 2.4970962e-08, 2.9466128e-07, ..., 4.4146751e-07,
         5.1395896e-06, 1.5663216e-05],
        [2.2780972e-05, 5.7303455e-06, 6.1208934e-06, ..., 1.0568605e-05,
         6.6681046e-06, 6.5553064e-07],
        [6.4627606e-08, 2.2056440e-08, 7.1469043e-07, ..., 1.6240283e-07,
         5.9759479e-08, 8.3421163e-08]], dtype=float32))

### Even **larger** model for sentiment classification with **state-of-the-art accuracy.**

In [16]:
train_data = dataset.load_bolt_svm_dataset(
    filename="datasets/yelp_review/yelp_review_train.svm", 
    batch_size=1024)

test_data = dataset.load_bolt_svm_dataset(
    filename="datasets/yelp_review/yelp_review_test.svm", 
    batch_size=256)

Read 520000 vectors from datasets/yelp_review/yelp_review_train.svm in 6 seconds
Read 40000 vectors from datasets/yelp_review/yelp_review_test.svm in 0 seconds


### Rich sparse features. <br> Configurable load factor. <br> **Train from scratch at any budget.**

In [None]:
yelp_sentiment_analysis_layers = [
    
    bolt.LayerConfig(dim=2000, 
        load_factor=0.2, 
        activation_function=bolt.ActivationFunctions.ReLU),
    
    bolt.LayerConfig(dim=2,
        load_factor=1.0, 
        activation_function=bolt.ActivationFunctions.Softmax)     
]

yelp_sentiment_analysis_network = bolt.Network(
    layers=yelp_sentiment_analysis_layers, 
    input_dim=100000)

### Train a model once and save it for later with `save()`.

In [None]:
yelp_sentiment_analysis_network.train(
    train_data=train_data,
    loss_fn=bolt.CategoricalCrossEntropyLoss(), 
    learning_rate=0.0001, 
    epochs=20, 
    rehash=6400, 
    rebuild=128000)

yelp_sentiment_analysis_network.save(filename="saved_models/yelp_sentiment_analysis_savefile_2")

### Use the saved model with `load()`.

In [17]:
yelp_sentiment_analysis_network = bolt.Network.load(filename="saved_models/yelp_sentiment_analysis_savefile")

### RoBERTa: 83% accurate. Let's see how BOLT does!

In [18]:
res = yelp_sentiment_analysis_network.predict(
    test_data=test_data, 
    metrics=["categorical_accuracy"], 
    verbose=True)

Processed 157 test batches in 16130 milliseconds
Accuracy: 0.93095 (37238/40000)


### We also trained an even larger **1 billion parameter** model on a large text corpus to build an interactive sentiment analysis demo.

In [19]:
sentiment_analysis_network = bolt.Network.load(filename="saved_models/interactive_demo_savefile")

### Let's run this!

In [20]:
import interactive_sentiment_analysis
interactive_sentiment_analysis.demo(sentiment_analysis_network, verbose=False)

PREDICTION RESULT: POSITIVE
Predicted in 11.0 milliseconds.

PREDICTION RESULT: NEGATIVE
Predicted in 42.0 milliseconds.

PREDICTION RESULT: NEGATIVE
Predicted in 54.0 milliseconds.

PREDICTION RESULT: NEGATIVE
Predicted in 20.0 milliseconds.

PREDICTION RESULT: POSITIVE
Predicted in 19.0 milliseconds.

PREDICTION RESULT: POSITIVE
Predicted in 24.0 milliseconds.

Exiting demo...


### Let's talk speed. How much faster is BOLT compared to RoBERTa?

In [12]:
import time
from transformers import pipeline
sentiment_analysis = pipeline("sentiment-analysis",model="siebert/sentiment-roberta-large-english")
t1 = time.time()
out = sentiment_analysis("I love chocolate.")
t2 = time.time()
print(out, flush=True)
print('time elapsed: ',str(t2-t1),'s', flush=True)

ModuleNotFoundError: No module named 'transformers'