### Unigrams, Bigrams, Trigrams, and N-grams 

N-gram is a sequence of N words. Let's take a look at the following examples.


New Delhi => (is a 2-gram)

The Three Musketeers => (is a 3-gram)

She stood up slowly => (is a 4-gram)

If we assign a probability to the occurrence of an N-gram or the probability of a word occurring next in a sequence of words, it can be very useful. Why?

First, it can help in deciding which N-grams can be chunked together to form single entities (e.g. “New Delhi” chunked together as one word, “high school” being chunked as one word).

It can also help make next word predictions. Let' say you have the partial sentence “Please hand over your”. Then it is more likely that the next word is going to be “test” or “assignment” or “paper” than the next word being “school”.

It can also help to make spelling error corrections. For instance, the sentence “drink cofee” could be corrected to “drink coffee” if you knew that the word “coffee” had a high probability of occurrence after the word “drink” and also the overlap of letters between “cofee” and “coffee” is high.
As you can see, Assigning these probabilities has a huge potential in the NLP domain.

Now that we understand this concept, we can build with it: that’s the N-gram model. Basically, an N-gram model predicts the occurrence of a word based on the occurrence of its N – 1 previous words. 

So here, we are answering the question – how far back in the history of a sequence of words should we go to predict the next word? For example, a bigram model (N = 2) predicts the occurrence of a word given only its previous word (as N – 1 = 1) in this case. Similarly, a trigram model (N = 3) predicts the occurrence of a word based on its previous two words (as N – 1 = 2) in this case.

Let us see a way to assign a probability to a word occurring next in a sequence of words. First, we need a large sample of English sentences (corpus).

Let's say our corpus contains the following sentences:

- He said thank you.
- He said bye-bye as he walked through the door.
- He went to Chennai.
- New Delhi has nice weather.
- It is raining in New Castle.

Let’s assume a bigram model. So we are going to find the probability of a word based only on its previous word. In general, we can say that this probability is (the number of times the previous word ‘wp’ occurs before the word ‘wn’) / (the total number of times the previous word ‘wp’ occurs in the corpus) = (Count (wp wn))/(Count (wp))

To find the probability of the word “you” following the word “thank”, we can write this as P (you | thank) which is a conditional probability.
This becomes equal to:

=(No. of times “Thank You” occurs) / (No. of times “Thank” occurs) 
= 1/1 
= 1

We can say that whenever “Thank” occurs, it will be followed by “You” (This is because we have trained on a set of only five sentences and “Thank” occurred only once in the context of “Thank You”). Let’s see an example of a case when the preceding word occurs in different contexts.

Let’s calculate the probability of the word “Delhi” coming after “New”. We want to find the P (Delhi | New). This means that we are trying to find the probability that the next word will be “Delhi” given the word “New”. We can do this by:

=(No of times “New Delhi” occurs) / (No. of times “New” occurs) 
= 2/3 
= 0.67

This is because in our corpus, one of the three preceding “New” is followed by “Castle”. So, the P (Castle | New) = 1 / 3.
In our corpus, only “Delhi” and “Castle” occur after “San” with the probabilities 2 / 3 and 1 / 3 respectively. So if we want to create a next word prediction software based on our corpus, and a user types in “San”, we will give two options: “Delhi” ranked most likely and “Castle” ranked less likely.

As seen, N-grams are fixed-length ( n ) consecutive token sequences occurring in the text. A bigram has two tokens, a unigram one.

##### Reference
Rao, Delip,McMahan, Brian. Natural Language Processing with PyTorch. O'Reilly Media.

An Introduction to N-grams: What Are They and Why Do We Need Them? https://blog.xrds.acm.org/2017/10/introduction-n-grams-need/