#Transformers
The Transformer in NLP is a novel architecture that aims to solve sequence-to-sequence tasks while handling long-range dependencies with ease. The Transformer was proposed in the paper [Attention is All You Need](https://arxiv.org/abs/1706.03762)

NLP-focused startup Hugging Face recently released a major update to their popular “PyTorch Transformers” library, which establishes compatibility between PyTorch and TensorFlow 2.0, enabling users to easily move from one framework to another during the life of a model for training and evaluation purposes.

The Transformers package contains over 30 pre-trained models and 100 languages, along with eight major architectures for natural language understanding (NLU) and natural language generation (NLG):  **BERT (from Google)** , **GPT (from OpenAI)**  ,  **GPT-2 (from OpenAI)** , **Transformer-XL (from Google/CMU)** ,**XLNet (from Google/CMU)**,  **XLM (from Facebook)**,  **RoBERTa (from Facebook)**,  **DistilBERT (from Hugging Face)** |

The Transformers library no longer requires PyTorch to load models, is capable of training SOTA models in only three lines of code, and can pre-process a dataset with less than 10 lines of code. Sharing trained models also lowers computation costs and carbon emissions

Installation (You don't explicitly need PyTorch)

In [12]:
!pip install transformers 

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


#Getting started on a task with a pipeline

The easiest way to use a pre-trained model on a given task is to use pipeline() 🤗

Pipelines encapsulate the overall process of every NLP process:

Tokenization: Split the initial input into multiple sub-entities with … properties (i.e., tokens)

Inference: Maps every token into a more meaningful representation

Decoding: Use the above representation to generate and/or extract the final output for the underlying task

In [None]:
from transformers import pipeline, set_seed

#GPT-2

GPT-2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset of 8 million web pages. GPT-2 is trained with a simple objective: predict the next word, given all of the previous words within some text.

Since the goal of GPT-2 is to make predictions, only the decoder mechanism is used. So GPT-2 is just transformer decoders stacked above each other.

GPT-2 displays a broad set of capabilities, including the ability to generate conditional synthetic text samples of unprecedented quality, where the model is comfortable with large input and can generate lengthy output.

In [33]:
generator = pipeline('text-generation', model='gpt2')
set_seed(42)
import warnings
warnings.filterwarnings("ignore")
gen=generator("I used to read ", max_length=70, num_return_sequences=7)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [32]:
for i in range(7):
  print([i+1], gen[i].get("generated_text"),'\n') 

[1] letter to the director  of policy/policy communications for both the U.S. Attorney and the U.S. Senators for Arizona.
As you can see, we had a very successful campaign. It took us out of the 4th Circuit in 2011 for them to go out and shut down the entire website. Now, we're actually 

[2] letter to the director "

So while we're working on the next script for 'The Bourne Identity' you've got three of your 'top-notch cast members and their personalities.'

[Laughs] There's only one of our cast members. He's doing an episode of the soap opera 'Stinkin' With 

[3] letter to the director  and send an e-mail t t o your name  if anyone needs it. When you receive your request the letter must include the name of your account holder and the date the e-mail was sent, the location of your account, the reason, and the email address.

If a photo 

[4] letter to the director ") 

[5] letter to the director  and we're looking forward to hearing from you. 

[6] letter to the director  (to send y

Sentiment analysis

In [13]:
# Allocate a pipeline for sentiment-analysis
classifier = pipeline('sentiment-analysis')


No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

In [14]:
classifier('The secret of getting ahead is getting started.')

[{'label': 'POSITIVE', 'score': 0.9970657229423523}]

In [None]:
# Allocate a pipeline for question-answering
question_answerer = pipeline('question-answering')


No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading:   0%|          | 0.00/473 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/261M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/436k [00:00<?, ?B/s]

In [None]:
question_answerer({
    'question': "What is Newton's third law of motion?",
    'context': "Newton's third law of motion states that, 'For every action there is equal and opposite reaction'"})

{'score': 0.38515737652778625,
 'start': 42,
 'end': 96,
 'answer': "'For every action there is equal and opposite reaction"}

In [15]:
nlp = pipeline("question-answering")
context = r"""
Microsoft was founded by Bill Gates and Paul Allen in 1975.
The property of being prime (or not) is called primality.
A simple but slow method of verifying the primality of a given number n is known as trial division.
It consists of testing whether n is a multiple of any integer between 2 and itself.
Algorithms much more efficient than trial division have been devised to test the primality of large numbers.
These include the Miller-Rabin primality test, which is fast but has a small probability of error, and the AKS primality test, which always produces the correct answer in polynomial time but is too slow to be practical.
Particularly fast methods are available for numbers of special forms, such as Mersenne numbers.
As of January 2016, the largest known prime number has 22,338,618 decimal digits.
nlp = pipeline("question-answering")
"""

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [16]:
#Question 1
result = nlp(question="What is a simple method to verify primality?", context=context)

print(f"Answer 1: '{result['answer']}'")

Answer 1: 'trial division'


In [17]:
nlp = pipeline("question-answering")
context = r"""
Rajiv Gandhi Institute of Technology (RIT) is one of the nine engineering colleges fully 
owned and run by the Government of Kerala. It offers Bachelor of Technology (B.Tech) in six disciplines,
 Master of Technology (M. Tech) in nine specializations,Bachelor of Architecture (B.Arch), Master of Computer Applications (MCA) and 
 Research centre for various PhD Programmes. The college is affiliated to APJ Abdul Kalam Technological University and 
 recognized by AICTE. Over its 30 years of existence, RIT has built a strong reputation as one of Kerala's leading technical institutes.
 """

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [28]:
result = nlp(question="What is KTU?", context=context)

print(f"Answer 1: '{result['answer']}'")

Answer 1: 'Rajiv Gandhi Institute of Technology'


In [None]:
#Question 2
result = nlp(question="When did Bill gates founded Microsoft?", context=context)

print(f"Answer 2: '{result['answer']}'")

Answer 2: '1975'


In [None]:
#Question 3
result = nlp(question="What is the main drawback of Miller-Rabin primality test", context=context)

print(f"Answer 3: '{result['answer']}'")

Answer 3: 'has a small probability of error'


In [None]:
#Question 4

result = nlp(question="Who is Bill Gates?", context=context)


In [None]:
context = r'''Ukrainian climate activist Svitlana Romanko joins us after she was suspended from the U.N. climate conference 
in Sharm el-Sheikh, Egypt, when she accused Russian officials of war crimes and genocide at an event on Wednesday. 
Romanko is the founder and director of Razom We Stand, an organization demanding a total permanent embargo on Russian oil and gas.
 “It has been very clear that fossil fuels fund dictatorships all over the world,” 
says Romanko, who has since left Egypt for her own safety.
 “We wanted to use freedom of speaking and freedom of attending public gathering 
 to confront people who came from the country which is in open war and … destroying our people.” '''

In [None]:
result = nlp(question="why did Ukranian climate activist suspended from the U. N.?", context=context)
print(f"Answer : '{result['answer']}'")

Answer : 'she accused Russian officials of war crimes and genocide'


In [None]:
result = nlp(question="Who left Egypt? why", context=context)
print(f"Answer : '{result['answer']}'")

Answer : 'for her own safety'


In [None]:
result = nlp(question="What is very evident", context=context)
print(f"Answer : '{result['answer']}'")

Answer : 'fossil fuels fund dictatorships all over the world'


#BERT
BERT’s goal is to generate a language model and hence only the encoder mechanism is used. So BERT is just transformer encoders stacked above each other.

Text prediction

In [None]:
unmasker = pipeline('fill-mask', model='bert-base-cased')


Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/436M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/436k [00:00<?, ?B/s]

In [None]:
unmasker("Which [MASK] is the best in the world")

[{'score': 0.1369323432445526,
  'token': 1141,
  'token_str': 'one',
  'sequence': 'Which one is the best in the world'},
 {'score': 0.13646407425403595,
  'token': 17791,
  'token_str': '##ever',
  'sequence': 'Whichever is the best in the world'},
 {'score': 0.036026548594236374,
  'token': 1610,
  'token_str': 'car',
  'sequence': 'Which car is the best in the world'},
 {'score': 0.023104358464479446,
  'token': 1402,
  'token_str': 'house',
  'sequence': 'Which house is the best in the world'},
 {'score': 0.01855037920176983,
  'token': 1282,
  'token_str': 'place',
  'sequence': 'Which place is the best in the world'}]

Text Summarization

In [None]:
#Summarization is currently supported by Bart and T5.
summarizer = pipeline("summarization")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

In [None]:
ARTICLE = """And truly, while our treatment in expulsion and the event at COP27 that tried to give some legitimacy to the murderous Russian regime is appalling, we also think of other activists today, earlier, and also activists in Uganda and Tanzania, who were multiple times detained for their EACOP opposition, and also Mozambique and across African countries and in many other countries where dictatorships are alive and well, and they all are locked in prison for speaking out. And people must have the right to stand up and speak out for freedom, democracy and climate justice. Our thoughts now, and my thoughts personally, go today to those activists who can’t leave their insecure spaces, who are imprisoned, and their families and friends.

And I would also like to add that the victory over one petro-dictator can spell freedom from petrocolonialism for the entire planet if we seize the moment to move away from fossil fuels. And it has been very clear that fossil fuels fund dictatorships all over the world, in many countries. They also fund destruction in my own country, because while I’m just speaking, 40% or even more of citizens of my country, Ukrainians, lost access to electricity, to heating of their homes, and it’s minus-1 degrees Celsius right now outside.

And I am with my team, as well, who works from Ukraine, who supported climate talks greatly. And I just can say that we have to stay united. We have to think of those of us who lost access to their basic rights earlier, and we have to do everything to fight dictatorships across the world, because as soon as we can end the dictatorships in countries like Russia and many other countries, the African continent, Egypt included, the sooner we all can live in peace. And one of the ways that we expected from this climate summit to happen was a phasing out of oil, gas and coal, strong language in the statement, which will help us and enable this fossil fuel revenue-empowered regimes, undemocratic, autocratic, to stop existing and to stop waging wars and conflicts and imprisoning activists."""

In [None]:
summary=summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False)[0]

print(summary['summary_text'])

 The victory over one petro-dictator can spell freedom from petrocolonialism for the entire planet if we seize the moment to move away from fossil fuels . And people must have the right to stand up and speak out for freedom, democracy and climate justice .


English to German translation

In [None]:
# English to German
translator_ger = pipeline("translation_en_to_de")
print("German: ",translator_ger("Joe Biden became the 46th president of U.S.A.", max_length=40)[0]['translation_text'])

# English to French
translator_fr = pipeline('translation_en_to_fr')
print("French: ",translator_fr("Joe Biden became the 46th president of U.S.A",  max_length=40)[0]['translation_text'])
 

No model was supplied, defaulted to t5-base and revision 686f1db (https://huggingface.co/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading:   0%|          | 0.00/1.20k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/892M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.
No model was supplied, defaulted to t5-base and revision 686f1db (https://huggingface.co/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


German:  Joe Biden wurde der 46. Präsident der USA.
French:  Joe Biden est devenu le 46e président des États-Unis


Conversation (Chatbot)

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-medium")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-medium")



Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/642 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/863M [00:00<?, ?B/s]

In [None]:
# Let's chat for 5 lines
for step in range(5):
   # encode the new user input, add the eos_token and return a tensor in Pytorch
   new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt')

   # append the new user input tokens to the chat history
   bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids

   # generated a response while limiting the total chat history to 1000 tokens,
   chat_history_ids = model.generate(bot_input_ids, max_length=1000, pad_token_id=tokenizer.eos_token_id)

   # pretty print last output tokens from bot
   print("DialoGPT: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))
   

>> User:hello


A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


DialoGPT: Hello! :D
>> User:long time no see


A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


DialoGPT: Long time no see
>> User:who are you


A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


DialoGPT: I am me
>> User:what is your name


A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


DialoGPT: I am me
>> User:what are you


A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


DialoGPT: I am me


Named Entity Recognition

In [None]:
from transformers import pipeline, set_seed
nlp_token_class = pipeline('ner')


No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading:   0%|          | 0.00/998 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/213k [00:00<?, ?B/s]

In [None]:
nlp_token_class('Ronaldo was born in 1985, he plays for Juventus and Portugal. ')

[{'entity': 'I-PER',
  'score': 0.9978648,
  'index': 1,
  'word': 'Ronald',
  'start': 0,
  'end': 6},
 {'entity': 'I-PER',
  'score': 0.9990381,
  'index': 2,
  'word': '##o',
  'start': 6,
  'end': 7},
 {'entity': 'I-ORG',
  'score': 0.9977496,
  'index': 11,
  'word': 'Juventus',
  'start': 39,
  'end': 47},
 {'entity': 'I-LOC',
  'score': 0.9991247,
  'index': 13,
  'word': 'Portugal',
  'start': 52,
  'end': 60}]

Features Extraction

In [None]:
import numpy as np
nlp_features = pipeline('feature-extraction')
output = nlp_features("Deep learning is a branch of Machine learning")
np.array(output).shape # (Samples, Tokens, Vector Size)

No model was supplied, defaulted to distilbert-base-cased and revision 935ac13 (https://huggingface.co/distilbert-base-cased).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilbert-base-cased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_projector.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


(1, 10, 768)

Zero-shot Learning

Zero-Shot learning aims to solve a task without receiving any example of that task at the training phase. The task of recognizing an object from a given image where there weren’t any example images of that object during the training phase can be considered as an example of a Zero-Shot Learning task.

In [None]:
classifier_zsl = pipeline("zero-shot-classification")

sequence_to_classify = "Bill gates founded a company called Microsoft in the year 1975"
candidate_labels = ["Europe", "Sports",'Leadership','business', "politics","startup"]
classifier_zsl(sequence_to_classify, candidate_labels)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

{'sequence': 'Bill gates founded a company called Microsoft in the year 1975',
 'labels': ['business',
  'startup',
  'Leadership',
  'Europe',
  'Sports',
  'politics'],
 'scores': [0.614478349685669,
  0.1874542385339737,
  0.1822790801525116,
  0.006684561260044575,
  0.0063185603357851505,
  0.0027852877974510193]}

Using transformers in Widgets

In [None]:
import ipywidgets as widgets
nlp_qaA = pipeline('question-answering')

context = widgets.Textarea(
   value='Einstein is famous for the general theory of relativity',
   placeholder='Enter something',
   description='Context:',
   disabled=False
)

query = widgets.Text(
   value='Why is Einstein famous for ?',
   placeholder='Enter something',
   description='Question:',
   disabled=False
)

def forward(_):
   if len(context.value) > 0 and len(query.value) > 0:
       output = nlp_qaA(question=query.value, context=context.value)           
       print(output)

query.on_submit(forward)
display(context, query)

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


Textarea(value='Einstein is famous for the general theory of relativity', description='Context:', placeholder=…

Text(value='Why is Einstein famous for ?', description='Question:', placeholder='Enter something')

#Want a lighter model?

One of the main concerns while using Transformer based models is the computational power they require. All over this article, we are using the BERT model as it can be run on common machines, but that’s not the case for all of the models.

For example, Google released a few months ago T5 an Encoder/Decoder architecture based on Transformer and available in transformers with no more than 11 billion parameters. Microsoft also recently entered the game with Turing-NLG using 17 billion parameters. This kind of model requires tens of gigabytes to store the weights and a tremendous compute infrastructure to run such models, which makes it impracticable for the common man!

With the goal of making Transformer-based NLP accessible to everyone, Hugging Face developed models that take advantage of a training process called Distillation, which allows us to drastically reduce the resources needed to run such models with almost zero drops in performance.

Classifying text with DistilBERT and Tensorflow

In [None]:
import tensorflow as tf
from tensorflow.keras import activations, optimizers, losses
from tensorflow.keras.optimizers import Adam
from transformers import DistilBertTokenizer, TFDistilBertForSequenceClassification
import pickle

Problem statement

Lets consider a small corpus of 10 Yelp reviews: 5 positive (class 1) and 5 negative (class 0). BERT (and its variants like DistilBERT) can be a great tool to use when you have a shortage of training data. that said, don't expect great results with just 10 reviews! Interchanging x and y with your own dataset is recommended 🙂

Tasks: 
1.   Preprocessing the data
2.   Fine-tuning the model
3.   Testing the model
4.   Using the fine-tuned model to predict new samples
5.   Saving and loading the model for future use


In [None]:
x = [
     'Great customer service! The food was delicious! Definitely a come again.',
     'The VEGAN options are super fire!!! And the plates come in big portions. Very pleased with this spot, I\'ll definitely be ordering again',
     'Come on, this place is family owned and operated, they are super friendly, the tacos are bomb.',
     'This is such a great restaurant. Multiple times during days that we don\'t want to cook, we\'ve done takeout here and it\'s been amazing. It\'s fast and delicious.',
     'Staff is really nice. Food is way better than average. Good cost benefit.',
     'pricing for this, while relatively inexpensive for a Las Vegas attraction, is completely over the top.',
     'At such a *fine* institution, I find the lack of knowledge and respect for the art appalling',
     'If I could give one star I would...I walked out before my food arrived the customer service was horrible!',
     'Wow the slowest drive thru I\'ve ever been at WOWWWW. Horrible I won\'t be coming back here ever again',
     'Service: 1 out of 5 stars. They will mess up your order, not have it ready after 30 mins calling them before. Worst ran family business Ive ever seen.'
]

In [None]:
y = [1, 1, 1, 1, 1, 0, 0, 0, 0, 0]



##Preprocessing the data


In [None]:

MODEL_NAME = 'distilbert-base-uncased'
MAX_LEN = 20

review = x[0]

tkzr = DistilBertTokenizer.from_pretrained(MODEL_NAME)

inputs = tkzr(review, max_length=MAX_LEN, truncation=True, padding=True)

print(f'review: \'{review}\'')
print(f'input ids: {inputs["input_ids"]}')
print(f'attention mask: {inputs["attention_mask"]}')

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/483 [00:00<?, ?B/s]

review: 'Great customer service! The food was delicious! Definitely a come again.'
input ids: [101, 2307, 8013, 2326, 999, 1996, 2833, 2001, 12090, 999, 5791, 1037, 2272, 2153, 1012, 102]
attention mask: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]


Apply this transformation to each review in our corpus. To do this we define a function construct_encodings, which maps the tokenizer to each review and aggregates them in encodings:

In [None]:
def construct_encodings(x, tkzr, max_len, trucation=True, padding=True):
    return tkzr(x, max_length=max_len, truncation=trucation, padding=padding)
    
encodings = construct_encodings(x, tkzr, max_len=MAX_LEN)

#The first stage of preprocessing is done! The second stage is converting our encodings and y (which holds the classes of the reviews) into a Tensorflow Dataset object. Below is a function to do this:
def construct_tfdataset(encodings, y=None):
    if y:
        return tf.data.Dataset.from_tensor_slices((dict(encodings),y))
    else:
        # this case is used when making predictions on unseen samples after training
        return tf.data.Dataset.from_tensor_slices(dict(encodings))
    
tfdataset = construct_tfdataset(encodings, y)

The third and final preprocessing step is to create training and test sets:

In [None]:
TEST_SPLIT = 0.2
BATCH_SIZE = 2

train_size = int(len(x) * (1-TEST_SPLIT))

tfdataset = tfdataset.shuffle(len(x))
tfdataset_train = tfdataset.take(train_size)
tfdataset_test = tfdataset.skip(train_size)

tfdataset_train = tfdataset_train.batch(BATCH_SIZE)
tfdataset_test = tfdataset_test.batch(BATCH_SIZE)

### Fine-tuning the model

In [None]:
N_EPOCHS = 2

model = TFDistilBertForSequenceClassification.from_pretrained(MODEL_NAME)
optimizer = optimizers.Adam(learning_rate=3e-5)
loss = losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])

model.fit(tfdataset_train, batch_size=BATCH_SIZE, epochs=N_EPOCHS)

Downloading:   0%|          | 0.00/363M [00:00<?, ?B/s]

Some layers from the model checkpoint at distilbert-base-uncased were not used when initializing TFDistilBertForSequenceClassification: ['vocab_layer_norm', 'activation_13', 'vocab_transform', 'vocab_projector']
- This IS expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of TFDistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier', 'dropout_19', 'classifier']
You should probably TRAIN this model on a down-stream task to be able to use i

Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x7f259f45ae10>

###Testing the model

Now we can use our test set to evaluate the performance of the model.

In [None]:
benchmarks = model.evaluate(tfdataset_test, return_dict=True, batch_size=BATCH_SIZE)
print(benchmarks)

{'loss': 0.5904490947723389, 'accuracy': 1.0}


###Using the fine-tuned model to predict new samples

In [None]:
def create_predictor(model, model_name, max_len):
    tkzr = DistilBertTokenizer.from_pretrained(model_name)
    def predict_proba(text):
        x = [text]

        encodings = construct_encodings(x, tkzr, max_len=max_len)
        tfdataset = construct_tfdataset(encodings)
        tfdataset = tfdataset.batch(1)

        preds = model.predict(tfdataset)
        preds = activations.softmax(tf.convert_to_tensor(preds)).numpy()
        return preds[0][0]
    
    return predict_proba

clf = create_predictor(model, MODEL_NAME, MAX_LEN)
print(clf('this restaurant has horrible food'))



ValueError: ignored

###Saving and loading the model for future use¶

In [None]:
model.save_pretrained('./model/clf')
with open('./model/info.pkl', 'wb') as f:
    pickle.dump((MODEL_NAME, MAX_LEN), f)

In [None]:
new_model = TFDistilBertForSequenceClassification.from_pretrained('./model/clf')
model_name, max_len = pickle.load(open('./model/info.pkl', 'rb'))

clf = create_predictor(new_model, model_name, max_len)
print('Sentiment [pos, neg]: ',clf('this restaurant has poor ambiance.'))

Some layers from the model checkpoint at ./model/clf were not used when initializing TFDistilBertForSequenceClassification: ['dropout_19']
- This IS expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of TFDistilBertForSequenceClassification were not initialized from the model checkpoint at ./model/clf and are newly initialized: ['dropout_39']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.




ValueError: ignored

In [None]:
!pip install openai

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting openai
  Downloading openai-0.25.0.tar.gz (44 kB)
[K     |████████████████████████████████| 44 kB 3.3 MB/s 
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
    Preparing wheel metadata ... [?25l[?25hdone
Collecting pandas-stubs>=1.1.0.11
  Downloading pandas_stubs-1.2.0.62-py3-none-any.whl (163 kB)
[K     |████████████████████████████████| 163 kB 47.5 MB/s 
Building wheels for collected packages: openai
  Building wheel for openai (PEP 517) ... [?25l[?25hdone
  Created wheel for openai: filename=openai-0.25.0-py3-none-any.whl size=55880 sha256=386d99cd67d048e29663ee7f8865c327644dd22706d4f217c251833915a8a0b3
  Stored in directory: /root/.cache/pip/wheels/19/de/db/e82770b480ec30fd4a6d67108744b9c52be167c04fcf4af7b5
Successfully built openai
Installing collected packages: pandas-stubs, openai
Successfully i

In [None]:
import os
import openai



In [None]:
openai.api_key = os.getenv("OPENAI_API_KEY")

In [None]:
response = openai.Completion.create(
  model="text-davinci-002",
  prompt="A two-column spreadsheet of top science fiction movies and the year of release:\n\nTitle|  Year of release",
  temperature=0.5,
  max_tokens=60,
  top_p=1.0,
  frequency_penalty=0.0,
  presence_penalty=0.0
)

AuthenticationError: ignored