# BERT (Bidirectional Encoder Representations from Transformers)

BERT in Machine Learning for word embeddings produce by Google for Machine Learning.
BERT stands for Bidirectional Encoder Representations from Transformers, are models for pre-trained language representations that can be used to create models for the task of NLP(Natural Language Processing).
We can Either use these models to extract high-quality language functionality from text data, or you can refine these models on specific tasks such as classification, feature recognition, answering questions, etc. with data to produce a state of artistic predictions.

### Why BERT Embedding for NLP?
The BERT embedding are very useful for keyword expansion, semantic search, and other information retrievals, For example, if you want to match customer questions or research to previously answered questions or well-researched research, these representations will help you accurately retrieve result that match customer intent and contextual meaning, even in the absence of overlapping keywords or Phrases.

Perhaps the most important reason in that these vectors can be used as high-quality features inputs in the downstream models. NLP models such as LSTM or CNN require inputs in the form of digital vectors, which typically means translating features such as vocabulary and parts of speech into digital representations.

# Implementing BERT Embedding Algorithm
Implementing BERT we need to install PyTorch Library. As preferred it strikes a good balance between high-level APIs and TensorFlow code.

In [1]:
# Importing necessary packages to get started:
import torch
from pytorch_pretrained_bert import BertTokenizer, BertModel, BertForMaskedLM
import matplotlib.pyplot as plt
%matplotlib inline

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

As BERT is a pre-trained model so the input formatting need to expects input data in a specific format:
A Special Token, (SEP) to mark the end of a sentence or the separation between two sentences A special token (CLS), At the start of our text, This token is used for classification task, but BERT expects it regardless of your application.

In [4]:
string = 'This is the Sample statement for BERT word Embedding Algorithm'
marked_string = '[CLS] ' + string + ' [SEP]'
print(marked_string)

[CLS] This is the Sample statement for BERT word Embedding Algorithm [SEP]


## Tokenization
The BERT model provides its Tokenizer, Which we imported above. Let's see how it handles the sample text.

In [5]:
tokenized_string = tokenizer.tokenize(marked_string)
print(tokenized_string)

['[CLS]', 'this', 'is', 'the', 'sample', 'statement', 'for', 'bert', 'word', 'em', '##bed', '##ding', 'algorithm', '[SEP]']


The original text has been split into smaller subsets and characters. The two hash signs that precede some of these subsets are just how our tokenizer indicates that this subsets or character is part of a larger word and is preceded by another subsets.

In [6]:
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_string)

for tup in zip(tokenized_string, indexed_tokens):
    print(tup)

('[CLS]', 101)
('this', 2023)
('is', 2003)
('the', 1996)
('sample', 7099)
('statement', 4861)
('for', 2005)
('bert', 14324)
('word', 2773)
('em', 7861)
('##bed', 8270)
('##ding', 4667)
('algorithm', 9896)
('[SEP]', 102)


This way, we can prepare word embeddings using the BERT model for any task of NLP