# **IMPORTANT**
For anything in this notebook to work, you need to install the ThirdAI package by signing up [here](https://www.thirdai.com/try-bolt/).

## **Sentiment analysis with BOLT**

We will walk through the entire process of building a sentiment analysis model with BOLT from data preprocessing all the way to inference. This notebook is structured as follows:
1. Selecting and preprocessing the dataset
2. Defining the BOLT network
3. Training the network
4. Inference

### **1. Choosing and preprocessing the dataset**
Let's start with the Yelp Reviews dataset. As the name suggests, it's a dataset containing reviews on Yelp, each labeled with a number between 0 to 4, inclusive, representing 1 to 5 star ratings. 

In [None]:
# TODO: Introduce the datasets
# TODO: Write the preprocessing script
# TODO: Include the preprocessing script
# TODO: WRITE INSTRUCTIONS FOR DOWNLOADING UNIVERSE / MAKE IT CLEAR THAT THEY NEED UNIVERSE
from thirdai import dataset

train_data = dataset.load_bolt_svm_dataset(
    filename="/path/to/preprocessed_data_train.svm", 
    batch_size=256)

test_data = dataset.load_bolt_svm_dataset(
    filename="/path/to/preprocessed_data_test.svm", 
    batch_size=256)


### **2. Defining the BOLT network**
We want to define the network as follows:
* 2000-dimensional hidden layer with ReLU activation function to introduce non-linearity.
* 2-dimensional output layer with Softmax activation function to classify between "positive" and "negative" sentiments.


In this demo version, we only support three activation functions: `ReLU`, `Softmax` and `Linear`.

In [None]:
# TODO: Explain our choice of parameters in the markdown.
# TODO: Explain LayerConfig and bolt.Network in markdown.
from thirdai import bolt

layers = [
    
    bolt.LayerConfig(
        dim=2000, 
        load_factor=0.2, 
        activation_function=bolt.ActivationFunctions.ReLU),
        
    bolt.LayerConfig(
        dim=2,
        load_factor=1.0, 
        activation_function=bolt.ActivationFunctions.Softmax)     
]

network = bolt.Network(
    layers=layers, 
    input_dim=100000)

### 3. Training
**The train() method**

Train the BOLT network by calling the `train()` method, which accepts the following arguments:
* `train_data`: BOLT dataset - The training dataset in a format returned by `dataset.load_bolt_svm_dataset()`.
* `loss_fn`: BOLT loss function - The loss function to minimize. In this demo version, we only support the `bolt.CategoricalCrossEntropyLoss()` loss function.
* `learning_rate`: Float - The learning rate for gradient descent. The default value is 0.0001.
* `epochs`: Int - The number of training epochs (a full cycle through the dataset).
* `verbose` (Optional): Boolean - Set to `True` to print a progress bar, accuracy, and elapsed time for each training epoch. Set to `False` otherwise. `True` by default.

It then returns a dictionary that contains the loss value and elapsed time for each training epoch.


**Training with sparse inference in mind**

Call the `enable_sparse_inference()` method if you plan to use sparse inference. We recommend training the network for at least one more epoch after calling this method.


**Saving a trained model**

Simply call the `save()` method, passing in the location of the save file.

In [None]:
# TODO: Explain enable sparse inference call.
network.train(
    train_data=train_data,
    loss_fn=bolt.CategoricalCrossEntropyLoss(), 
    learning_rate=0.0001, 
    epochs=20, 
    verbose=True)

network.enable_sparse_inference()

network.train(
    train_data=train_data,
    loss_fn=bolt.CategoricalCrossEntropyLoss(), 
    learning_rate=0.0001, 
    epochs=1,
    verbose=True)

network.save(filename="path/to/savefile") # Don't forget to change the filename parameter!

### **4. Inference**
**The predict() method**

You can do inference by calling the `predict()` method, which accepts the following arguments:
* `test_data`: BOLT dataset - The test dataset in a format returned by `dataset.load_bolt_svm_dataset()`.
* `metrics`: List of strings - Metric to evaluate our prediction. In this demo version, we only support the `"categorical_accuracy"` metric.
* `verbose` (Optional): Boolean - Set to `True` to print a progress bar, accuracy, and inference time. Set to `False` otherwise. `True` by default.

It then returns a tuple of `(predictions, metric_results)`:
* `predictions`: 2-dimensional Numpy array where - The i-th row is the output of the network for the i-th example in the dataset.
* `metric_results`: Dictionary - A dictionary mapping each metric name in `metrics` to a list of values for that metric for each epoch (only one entry if returned by `predict()` method). An "epoch_times" metric is included by default.

**Loading a saved model**

To load a saved model, call the `bolt.Network.load()` method. We commented it out by default so you can just continue from the previous cell, but you can always uncomment it so you don't have to retrain the model the next time you visit this notebook!

In [None]:
# Uncomment the next line to load a saved model. Don't forget to change the filename parameter to the right path!
# network = bolt.Network.load(filename="path/to/savefile") 

predictions, metric_results = network.predict(
    test_data=test_data, 
    metrics=["categorical_accuracy"], 
    verbose=True)

print(predictions)
print(metric_results)

### **Congratulations! You just mastered BOLT.**
If you face any issue running this notebook, please reach out to us by posting about it on [GitHub Issues](https://github.com/ThirdAILabs/Demos/issues).