# Using BERT for next sentence prediction

_Ted Underwood wrote version 1.0 of this notebook, which I have lightly revised_

## Imports

First we import `torch` and `transformers` and load everything and get it ready to run.

In [None]:
import torch
from transformers import BertTokenizer, BertForNextSentencePrediction

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
print('built tokenizer')
model = BertForNextSentencePrediction.from_pretrained('bert-base-uncased')
model.eval()
print('built model')

## Write function to predict next sentences

We write a function to do next sentence prediction. We use HuggingFace's [`tokenizer`](https://huggingface.co/transformers/main_classes/tokenizer.html?highlight=encode_plus#transformers.PreTrainedTokenizer.encode_plus) and PyTorch's [`LongTensor`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor). 

NB: A `tensor` [is a multi-dimensional array](https://www.tensorflow.org/guide/tensor).

Don't worry if you can't wrap your head around the math underlying all this—it's beyond our pay grade in this course.

In [None]:
def get_logits(firstsentence, secondsentence):
    global tokenizer, model

    encoding = tokenizer.encode_plus(firstsentence, secondsentence, return_tensors = 'pt')
    loss, logits = model(**encoding, next_sentence_label=torch.LongTensor([1]))

    return loss, logits

## Now we play!

We just need two sentences.

In [None]:
firstsentence = "I was walking to the store one day to buy groceries."
secondsentence = "At the store I bought bananas and milk."

get_logits(firstsentence, secondsentence)

### WTF?

Okay. What the hell does that mean? The first line is the "loss," the second the "logits." The relation between logits and probability makes my head hurt to explain, so I'm just going to [point at Wikipedia.](https://en.wikipedia.org/wiki/Logit)

But for a quick and dirty approach, Ted Underwood wrote this function that *loosely* translates BERT's logits output into a probability for the sequence.

In [None]:
import math

def get_probability(firstsent, secondsent):
    '''
    
    :param logits: a tensor produced by BERT
    :return: probability of the first category after softmax
    '''
    loss, logits = get_logits(firstsent, secondsent)
    
    poslogit = logits[0, 0]
    neglogit = logits[0, 1]

    pospart = math.pow(1.2, poslogit)
    negpart = math.pow(1.2, neglogit)

    posprob = pospart / (pospart + negpart)

    return posprob

### Let's try again with the new function to make the result comprehensible

In [None]:
firstsentence = "I was walking to the store one day to buy groceries."
secondsentence = "At the store I bought bananas and milk."

get_probability(firstsentence, secondsentence)

Ah, now we can see that BERT considers that a pretty probable sequence. Let's try a less probable sequence.

We'll use the same first sentence about walking to the store, and for our second sentence

    Psychedelics are a hallucinogenic class of psychoactive drug whose primary effect is to trigger non-ordinary states of consciousness and psychedelic experiences via serotonin 2A receptor agonism.
    
Which is from Wikipedia on "psychedelic drug."


In [None]:
firstsentence = "I was walking to the store one day to buy groceries."
secondsentence = "Psychedelics are a hallucinogenic class of psychoactive drug whose primary effect is to trigger non-ordinary states of consciousness and psychedelic experiences via serotonin 2A receptor agonism."
get_probability(firstsentence, secondsentence)

That's a much less probable sequence! Let's try a slightly weaker non-sequitur.

In [None]:
firstsentence = "I was walking to the store one day to buy groceries."
secondsentence = "Everything is closed due to the pandemic."
get_probability(firstsentence, secondsentence)

Okay, that probability is higher. Still unlikely. But not *totally* improbable.

## Your turn!

Try a variety of sentences in the code cell below:

In [None]:
firstsentence = "I was walking to the store one day to buy groceries."
secondsentence = "YOUR SENTENCE"
get_probability(firstsentence, secondsentence)

And now write both sentences yourself: 

In [None]:
firstsentence = "YOUR SENTENCE"
secondsentence = "YOUR SENTENCE"
get_probability(firstsentence, secondsentence)

### Analysis

Did the model perform as you expected? Did it surprise you? What uses can you imagine putting this model to? How could we fit it into a larger pipeline toward some further tasks or analysees?

#### YOUR ANALYSIS HERE