**Initialization**
- I use these three lines of code on top of my each notebooks because it will help to prevent any problems while reloading the same project. And the third line of code helps to make visualization within the notebook.

In [1]:
#@ INITIALIZATION: 
%reload_ext autoreload
%autoreload 2
%matplotlib inline

**Downloading Libraries and Dependencies**
- I have downloaded all the libraries and dependencies required for the project in one particular cell.

In [10]:
#@ IMPORTING MODULES: UNCOMMENT BELOW:
# !pip install transformers
from transformers import pipeline
from transformers import AutoTokenizer
from transformers import AdamW, BertForSequenceClassification
from transformers import TFAutoModelForSequenceClassification
import tensorflow as tf

from tqdm import tqdm, trange
import pandas as pd
import os
import numpy as np
import matplotlib.pyplot as plt

#@ IGNORING WARNINGS: 
import warnings
warnings.filterwarnings("ignore")

**Transformers**
- Transformers, like humans, acquire language understanding through a limited number of tasks. They detect connections through transduction and then generalize them through inductive operations. 

**Corpus of Linguistic Acceptability**
- The goal is to evaluate the linguistic competence of an NLP model to judge the linguistic acceptability of a sentence. The NLP model is expected to classify the sentence accordingly. 

In [4]:
#@ LOADING THE DATASET:
PATH = "/content/drive/MyDrive/Data/in_domain_train.tsv"                            # Path to dataset. 
df = pd.read_csv(PATH, delimiter="\t", header=None,
                 names=["sentence_source", "label", "label_notes", "sentence"])     # Reading the dataset.
df.shape                                                                            # Inspecting dataset.

(8551, 4)

In [6]:
#@ LOADING PRETRAINED BERT MODEL:
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", 
                                                      num_labels=2)                 # Initializing pretrained model.

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

**Stanford Sentiment TreeBank**
- **SST-2** contains movie reviews.

In [9]:
#@ SST-2 BINARY CLASSIFICATION:
nlp = pipeline("sentiment-analysis")                                    # Initialization.
print(nlp("If you sometimes like to go to the movies to have fun, \
           Wasabi is a good place to start."))                          # Inspecting sentiment.

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


[{'label': 'POSITIVE', 'score': 0.9998257756233215}]


**Microsoft Research Paraphrase Corpus**
- The MRPC, a GLUE task, contains pairs of sentences extracted from new sources on the web. Each pair has been annotated by a human to indicate whether the sentences are equivalent based on two closely related properties: paraphrase equivalent and semantic equivalent.

In [12]:
#@ SEQUENCE OR PARAPHRASE CLASSIFICATION:
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc")                     # Initializing pretrained tokenizer. 
model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc")  # Initializing pretrained model. 
classes = ["not paraphrase", "is paraphrase"]                                                   # Initialization.
sequence_A = "The DVD-CCA then appealed to the state Supreme Court."                            # Initialization. 
sequence_B = "The DVD CCA appealed that decision to the U.S. Supreme Court."                    # Initialization. 
paraphrase = tokenizer.encode_plus(sequence_A, sequence_B, return_tensors="tf")
paraphrase_classification_logits = model(paraphrase)[0]                                         # Implementation of model. 
paraphrase_results = tf.nn.softmax(paraphrase_classification_logits, axis=1).numpy()[0]
for i in range(len(classes)):
    print(f"{classes[i]}: {round(paraphrase_results[i]*100)}%")

Some layers from the model checkpoint at bert-base-cased-finetuned-mrpc were not used when initializing TFBertForSequenceClassification: ['dropout_183']
- This IS expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertForSequenceClassification were initialized from the model checkpoint at bert-base-cased-finetuned-mrpc.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForSequenceClassification for predictions without further training.


not paraphrase: 8%
is paraphrase: 92%


**Winograd Schemas**

In [15]:
#@ WINOGRAD SCHEMAS:
translator = pipeline("translation_en_to_fr")                               # Initialization.
translator("The car could not go in the garage because it was too big.", 
           max_length=40)                                                   # Initializing translation.

No model was supplied, defaulted to t5-base (https://huggingface.co/t5-base)


[{'translation_text': "La voiture ne pouvait pas aller dans le garage parce qu'elle était trop grosse."}]