# Working with huggingface transformers in tensorflow.




## Working with pipelines
The most basic object in the 🤗 Transformers library is the pipeline. It connects a model with its necessary preprocessing and postprocessing steps, allowing us to directly input any text and get an intelligible answer

In [1]:
from transformers import pipeline

In [2]:
# Pipeline covers preprocessing->model->post-processing
classifier = pipeline("sentiment-analysis")  # Download and cache classifier object. Default is english

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


In [3]:
# Pass multiple sentences to the classifier.
classifier(
    ["I love Hamburg. But it's always rainy.",
    "The movie was great and the actors were bad.",
    "No man ever steps in the same river twice, for it's not the same river not the same",
    "I do not at all agree with the results of this useless classifier."]
    )

[{'label': 'NEGATIVE', 'score': 0.990067720413208},
 {'label': 'NEGATIVE', 'score': 0.99828040599823},
 {'label': 'NEGATIVE', 'score': 0.9944549202919006},
 {'label': 'NEGATIVE', 'score': 0.9997929930686951}]

### What other pipelines do we have?

- feature-extraction (get the vector representation of a text)
- fill-mask
- ner (named entity recognition)
- question-answering
- sentiment-analysis
- summarization
- text-generation
- translation
- zero-shot-classification

## Zero-shot classification

We’ll start by tackling a more challenging task where we need to classify texts that haven’t been labelled. This is a common scenario in real-world projects because annotating text is usually time-consuming and requires domain expertise. For this use case, the zero-shot-classification pipeline is very powerful: it allows you to specify which labels to use for the classification, so you don’t have to rely on the labels of the pretrained model. You’ve already seen how the model can classify a sentence as positive or negative using those two labels — but it can also classify the text using any other set of labels you like.

In [4]:
classifier = pipeline("zero-shot-classification")

No model was supplied, defaulted to facebook/bart-large-mnli (https://huggingface.co/facebook/bart-large-mnli)
Downloading: 100%|██████████| 1.13k/1.13k [00:00<00:00, 435kB/s]
Downloading: 100%|██████████| 1.52G/1.52G [01:41<00:00, 16.0MB/s]
Downloading: 100%|██████████| 26.0/26.0 [00:00<00:00, 12.7kB/s]
Downloading: 100%|██████████| 878k/878k [00:00<00:00, 1.14MB/s]
Downloading: 100%|██████████| 446k/446k [00:00<00:00, 1.02MB/s]
Downloading: 100%|██████████| 1.29M/1.29M [00:01<00:00, 1.22MB/s]


In [5]:
classifier(
    ["I love you so much like I never loved you before",
    "But I think you could do the dishes and vacuum clean the flat more often, please",
    "You ruined my live.",
    "Shut up and go on the toilette.",
    "Tell us a joke because your jokes are always so funny.",
    "Denim fashio jeans are the hottest shit right now on the market."],
    candidate_labels=["love", "fun", "discussion", "marriage"]
    )

[{'sequence': 'I love you so much like I never loved you before',
  'labels': ['love', 'fun', 'discussion', 'marriage'],
  'scores': [0.8779177069664001,
   0.06078661233186722,
   0.039918605238199234,
   0.021377060562372208]},
 {'sequence': 'But I think you could do the dishes and vacuum clean the flat more often, please',
  'labels': ['discussion', 'fun', 'marriage', 'love'],
  'scores': [0.8770378232002258,
   0.10134901851415634,
   0.011419342830777168,
   0.01019375491887331]},
 {'sequence': 'You ruined my live.',
  'labels': ['discussion', 'marriage', 'fun', 'love'],
  'scores': [0.7202327251434326,
   0.13064803183078766,
   0.11257608979940414,
   0.03654312714934349]},
 {'sequence': 'Shut up and go on the toilette.',
  'labels': ['discussion', 'fun', 'marriage', 'love'],
  'scores': [0.5608827471733093,
   0.29711979627609253,
   0.11796832829713821,
   0.024029165506362915]},
 {'sequence': 'Tell us a joke because your jokes are always so funny.',
  'labels': ['fun', 'discu

In [6]:
# This will return the probability for each label in sorted array (decreasing probabilty)

classifier(
    ["Toggling in the report is highly distracting and sometimes I inadvertently am looking at a whole year vs just a quarter.  Filters vary between reports which can lead to results that are close, but not fully-aligned.  We don't include forecast to end-of-quarter... it would be great to be able to clearly show consultants how many hrs/wk they need to hit to meet their target (based on where they are at now).  I find that I have to 'reset' the report more often than I'd like and then need to go back an re-select filters.",
    "At the current moment, i do no think the utilization report is updated to include all client facing work to count under the new 'attainable client facing' metric. I have not checked in the last week but the last time I checked, some of my attainable metric was not accurate due to Free Work budgets not being counted under client facing attainable.",
    "Overall, happy with the ease of use and access. Kudos to the team.",
    "Getting numbers into the system earlier in the year would be great...always feel like we are playing catchup throughout Q1.",
    "I need to see the consultants forecast on a rolling 13 week view, not by quarter. Eg. from today's date or this week, what is the forecast for the next 13 weeks? "],
    candidate_labels=["bug", "feature request", "kudos"]
    )

[{'sequence': "Toggling in the report is highly distracting and sometimes I inadvertently am looking at a whole year vs just a quarter.  Filters vary between reports which can lead to results that are close, but not fully-aligned.  We don't include forecast to end-of-quarter... it would be great to be able to clearly show consultants how many hrs/wk they need to hit to meet their target (based on where they are at now).  I find that I have to 'reset' the report more often than I'd like and then need to go back an re-select filters.",
  'labels': ['feature request', 'bug', 'kudos'],
  'scores': [0.5338761806488037, 0.3844722807407379, 0.08165154606103897]},
 {'sequence': "At the current moment, i do no think the utilization report is updated to include all client facing work to count under the new 'attainable client facing' metric. I have not checked in the last week but the last time I checked, some of my attainable metric was not accurate due to Free Work budgets not being counted und

## Text generation

Now let’s see how to use a pipeline to generate some text. The main idea here is that you provide a prompt and the model will auto-complete it by generating the remaining text. This is similar to the predictive text feature that is found on many phones. Text generation involves randomness, so it’s normal if you don’t get the same results as shown below.

In [7]:
generator = pipeline("text-generation")

No model was supplied, defaulted to gpt2 (https://huggingface.co/gpt2)
Downloading: 100%|██████████| 523M/523M [00:29<00:00, 18.8MB/s]


In [8]:
generator("As a Data Scientist at Adobe I mostly work with")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'As a Data Scientist at Adobe I mostly work with SQL Server, and use my own tools, Excel Analyzer and Bounded File system (BNR). These tools are also used for the development of this code, which uses a very sophisticated technique for'}]

In [9]:
generator("As a Data Engineer at Adobe I", num_return_sequences=5, max_length=100)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "As a Data Engineer at Adobe I'd like to offer you my best understanding of Adobe's Data Analytics Platform. It is simple, easy, and completely free.\n\nWhat's more, it is completely flexible. Adobe has designed this platform so that you can add, change, or delete data without any delay and it has two very powerful features which include:\n\nData is stored on your Windows hard drive\n\nYou do not have to think big about where and where to add data or modify"},
 {'generated_text': "As a Data Engineer at Adobe I've been in this position for more than 6 years now. I was fortunate enough to have the opportunity to work through the last year of my career and learn the core competencies of an entire company while covering most of the most exciting and difficult topics in the data science field. I can't think of a more qualified position to be a part of!\n\nWhat has changed since you took this gig?\n\nI've been working in C++ for about 8"},
 {'generated_text': 'As a Data E

## Using a specific model from the Huggingface Hub in a pipeline: GPT-2

Go to the Model Hub and click on the corresponding tag on the left to display only the supported models for that task. You should get to a page like this one.

Let’s try the distilgpt2 model! Here’s how to load it in the same pipeline as before:

In [10]:
# from transformers import pipeline, set_seed
# generator = pipeline('text-generation', model='distilgpt2')
generator = pipeline('text-generation', model='gpt2')
# set_seed(42)

In [11]:
generator("Hello, I'm a language model,", max_length=30, num_return_sequences=5)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Hello, I'm a language model, and a programming environment. I'm not a data scientist and I'm not a programmer. But all I do"},
 {'generated_text': "Hello, I'm a language model, not an interpreter.\n\nIt's not that I'm only using language, or that I want to rewrite"},
 {'generated_text': 'Hello, I\'m a language model, that\'s what it sounds like." "My name is Mika." "Is there any way you can make'},
 {'generated_text': "Hello, I'm a language model, not a theory. (You will get a nice little tutorial on my blog if you are interested in a bit"},
 {'generated_text': 'Hello, I\'m a language model, and it\'s a good thing. I think the problem I solve is, "Why are so many languages different'}]

### Limitations and bias

The training data used for this model has not been released as a dataset one can browse. We know it contains a lot of unfiltered content from the internet, which is far from neutral. As the openAI team themselves point out in their model card:

_Because large-scale language models like GPT-2 do not distinguish fact from fiction, we don’t support use-cases that require the generated text to be true._

_Additionally, language models like GPT-2 reflect the biases inherent to the systems they were trained on, so we do not recommend that they be deployed into systems that interact with humans > unless the deployers first carry out a study of biases relevant to the intended use-case. We found no statistically significant difference in gender, race, and religious bias probes between 774M and 1.5B, implying all versions of GPT-2 should be approached with similar levels of caution around use cases that are sensitive to biases around human attributes._

In [12]:
from transformers import pipeline, set_seed

generator = pipeline('text-generation', model='gpt2')
set_seed(42)
generator("The White man worked as a", max_length=10, num_return_sequences=5)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'The White man worked as a clerk at the old'},
 {'generated_text': 'The White man worked as a salesman in Mexico and'},
 {'generated_text': 'The White man worked as a lawyer in the White'},
 {'generated_text': 'The White man worked as a clerk for the store'},
 {'generated_text': 'The White man worked as a barkeep and was'}]

In [13]:
set_seed(42)
generator("The Black man worked as a", max_length=10, num_return_sequences=5)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'The Black man worked as a clerk at the old'},
 {'generated_text': 'The Black man worked as a salesman in Mexico and'},
 {'generated_text': 'The Black man worked as a lawyer in the city'},
 {'generated_text': 'The Black man worked as a clerk for the store'},
 {'generated_text': 'The Black man worked as a barkeep and was'}]

In [14]:
# Testing another language
pipe = pipeline('text-generation', model="dbmdz/german-gpt2",
                 tokenizer="dbmdz/german-gpt2")

text = pipe("Der Sinn des Lebens ist es", max_length=100)[0]["generated_text"]
print(text)

Downloading: 100%|██████████| 487M/487M [00:33<00:00, 15.4MB/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Der Sinn des Lebens ist es, eine gute Nachricht für die Gesellschaft zu senden.
In der Welt gibt es einige Leute, die behaupten, dass "Gott zu einem der größten Sünder geworden" sei, und sich von solchen Dingen fernhalten möchten.
So kann man das verstehen.
Was aber ist dies?
Ich sage dir, Gott hat sich selbst für einen der größten Sünder geworden.
In den 70er Jahren kamen viele Leute auf diese Meinung, als ihr Verstand für das Gute in der Welt


In [15]:
unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about <mask> models.", top_k=2)

No model was supplied, defaulted to distilroberta-base (https://huggingface.co/distilroberta-base)
Downloading: 100%|██████████| 316M/316M [00:16<00:00, 19.8MB/s]


[{'sequence': 'This course will teach you all about mathematical models.',
  'score': 0.19619858264923096,
  'token': 30412,
  'token_str': ' mathematical'},
 {'sequence': 'This course will teach you all about computational models.',
  'score': 0.04052719101309776,
  'token': 38163,
  'token_str': ' computational'}]

The top_k argument controls how many possibilities you want to be displayed. Note that here the model fills in the special <mask> word, which is often referred to as a mask token. Other mask-filling models might have different mask tokens, so it’s always good to verify the proper mask word when exploring other models. One way to check it is by looking at the mask word used in the widget.

## Translation

Translation model from english to spanish.

In [None]:
# This usually works but somehow requires a fresh / new notebook. Not sure yet what the issue is. 
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
translator("Ce cours est produit par Hugging Face.")

# Whats inside a pipeline?

## How do Transformers work?

### General Architecture

**Encoder (left)** The encoder receives an input and builds a representation of its features. i.e. the model is optimized to acquire **understanding** from the input

**Decoder (right)** The decoder uses the encoder's representation (features) along with other inputs to **generate** a target sequence. This means that the model is optimized. 

Example Use Cases: 

* Encoder Only: Good for tasts that require **understanding** of the input, such as sentence **classification** and **named entity recognition**.
* Decoder Only: Good for **generative** tasks. 
* BOTH, i.e. encoder-decoder / sequence-to-sequence models: Good for generative tasks that recquire an input, such as **translation** or **summarization**.

![Encoder Decoder](https://miro.medium.com/max/856/1*ZCFSvkKtppgew3cc7BIaug.png)

### Attention Layers

* A key feature of transformers is that they are uilt with special layers called **attention layers** which tell the model to pay specific attention to certain words in the sentence (and more or less ignore the others) when dealing with the representation of the word. 
* E.g. For grammar reasons it is important to pay attention to a close word to generate the right translation (You like...) but not so much another part of the sentence (...my books.)
* A word by itself has a meaning, but that meaning is deeply affected by the **context**, which can be any other word (or words) before or after the word being studied. 


### The original architecture

* The transformer architecture was originally designed for translation. 
* During training, the encoder receives inputs (sentences) in a certain language, while the decoder receives the same sentences in the desired target language. 
* In the encoder, the attention layers can use all the words in a sentence (since the translation of a given word can be dependent on what words before and after in the sentence). 
* The decoder, however, works sequentially and can only pay attention to the words in the sentence that it has already translated, i.e. only the the words before the word that's currently being generated.
* To speed thigns up during training (when the model has access to target sentences), the decoder is fed the whole target, but it is not allowed to use future words.
* Note that the first attention layer ('masked multi-head attention') in a decoder block pays attention to all (past) inputs to the decoder, but the seconde attention layer (multi-head attention) uses the output of the encoder. It can thus access the whole input sentence to best predict the current word. This is very useful as different languages can have grammatical rules that put the words in different order, or some context provided later in the sentence may be helpful to determine the best translation of a given word. 
* The **attention mask** can also be used in the encoder/decoder to prevent the model from paying attention to some special words, e.g. < oov >

Finally.... some definitions. 

* **Architecture:** This is the skeleton of the model. The definition of each layer and each operation that happens within the model.
* **Checkpoints:** These are the weights that will be loaded in a given architecture. 
* **Model:** Umbrella term that isn't as precises and can embody both of the above. 

#### Encoder Only Example: 

* Encoder models use only the encoder of a Transformer model. 
* **At each stage**, the attention layers can **access all the words** in the initial sentence. 
* These models are often characterized as having “bi-directional” attention, and are often called **auto-encoding models**.

Representatives of this family of models include:

ALBERT
BERT
DistilBERT
ELECTRA
RoBERTa

In [17]:
import torch
from transformers import AutoConfig, AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline


model_name_or_path = "m3hrdadfi/typo-detector-distilbert-en"
config = AutoConfig.from_pretrained(model_name_or_path)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForTokenClassification.from_pretrained(model_name_or_path, config=config)
nlp = pipeline('token-classification', model=model, tokenizer=tokenizer, aggregation_strategy="average")
sentences = [
 "He had also stgruggled with addiction during his time in Congress .",
 "The review thoroughla assessed all aspects of JLENS SuR and CPG esign maturit and confidence .",
 "Letterma also apologized two his staff for the satyation .",
 "Vincent Jay had earlier won France 's first gold in gthe 10km biathlon sprint .",
 "It is left to the directors to figure out hpw to bring the stry across to tye audience .",
]

for sentence in sentences:
    typos = [sentence[r["start"]: r["end"]] for r in nlp(sentence)]

    detected = sentence
    for typo in typos:
        detected = detected.replace(typo, f'<i>{typo}</i>')

    print("   [Input]: ", sentence)
    print("[Detected]: ", detected)
    print("-" * 130)

Downloading: 100%|██████████| 249M/249M [00:12<00:00, 20.5MB/s]


   [Input]:  He had also stgruggled with addiction during his time in Congress .
[Detected]:  He had also <i>stgruggled</i> with addiction during his time in Congress .
----------------------------------------------------------------------------------------------------------------------------------
   [Input]:  The review thoroughla assessed all aspects of JLENS SuR and CPG esign maturit and confidence .
[Detected]:  The review <i>thoroughla</i> assessed all aspects of JLENS SuR and CPG <i>esign maturit</i> and confidence .
----------------------------------------------------------------------------------------------------------------------------------
   [Input]:  Letterma also apologized two his staff for the satyation .
[Detected]:  <i>Letterma</i> also apologized <i>two</i> his staff for the <i>satyation</i> .
----------------------------------------------------------------------------------------------------------------------------------
   [Input]:  Vincent Jay had earlier won Fr

## Decoder Only Examples

* Decoder models use only the decoder of a Transformer model. 
* At each stage, for a given word the attention layers can only access the words positioned before it in the sentence. 
* These models are often called auto-regressive models.
* The pretraining of decoder models usually revolves around predicting the next word in the sentence.
* These models are best suited for tasks involving **causal tasks and generating sequences.**
* Causal Langauge Modeling: Word: "my" -> generate next most likely word: "name" -> repeat operation until satisfied.
* Key Difference to Encoder: Self-Attention Mechanism, i.e. not bi-directional but only single context (left or right sentence)
* 

Representatives of this family of models include:

CTRL
GPT
GPT-2
Transformer XL


## Sequence-to-Sequence

* Encoder-decoder models (also called sequence-to-sequence models) use both parts of the Transformer architecture. 
* At each stage, the attention layers of the encoder can access all the words in the initial sentence, whereas the attention layers of the decoder can only access the words positioned before a given word in the input.
* The pretraining of these models can be done using the objectives of encoder or decoder models, but usually involves something a bit more complex.
* Sequence-to-sequence models are best suited for **tasks revolving around generating new sentences depending on a given input**, such as **summarization**, **translation**, or **generative question answering**.
* Weights between Encoder and Decoder are not necessarily shared. This allows, that the encoder does a different task to the decoder, e.g. understand the meaning of a EN sentence. Then pass the meaning to the decoder, which tanslatetes the meaning, not the original text. 

Representatives of this family of models include:

BART
mBART
Marian
T5

# Summary

This notebook includes: 

* how to approach different NLP tasks using the high-level 🤗 **Transformers pipeline API**. 
* how Transformer models work at a high level, and talked 
* about the importance of transfer learning and fine-tuning. 
* A key aspect is that you can use the full architecture or only the encoder or decoder, depending on what kind of task you aim to solve. 

| Model       | Exampels    | Tasks       |
| ----------- | ----------- | -----------
| Encoder     | ALBERT, BERT, DistilBERT, ELECTRA, RoBERTa       |Sentence classification, named entity recognition, extractive question answering       |
| Decoder     | CTRL, GPT, GPT-2, Transformer XL        |Text generation       |
| Encoder-Decoder   | BART, T5, Marian, mBART        |Summarization, translation, generative question answering
       |