<a href="https://colab.research.google.com/github/brishtiteveja/GenerativeAIExp/blob/master/Huggingface_pipelines_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Demo of [Huggingface Transformers](https://github.com/huggingface/transformers) pipelines

New in version `v2.3`: `Pipeline` are high-level objects which automatically handle tokenization, running your data through a transformers model
and outputting the result in a structured object.

You can create `Pipeline` objects for the following down-stream tasks:

 - `feature-extraction`: Generates a tensor representation for the input sequence
 - `ner`: Generates named entity mapping for each word in the input sequence.
 - `sentiment-analysis`: Gives the polarity (positive / negative) of the whole input sequence.
 - `question-answering`: Provided some context and a question refering to the context, it will extract the answer to the question
 in the context.

 > Colab creator: [Manuel Romero](https://twitter.com/mrm8488)

In [None]:
%tensorflow_version 2.x

TensorFlow 2.x selected.


In [None]:
!pip install -q transformers

[K     |████████████████████████████████| 481kB 8.4MB/s 
[K     |████████████████████████████████| 3.1MB 17.4MB/s 
[K     |████████████████████████████████| 870kB 24.7MB/s 
[K     |████████████████████████████████| 1.0MB 24.2MB/s 
[?25h  Building wheel for sacremoses (setup.py) ... [?25l[?25hdone


In [None]:
from transformers import pipeline

## 1. Sentiment Analysis

In [None]:
nlp_sentiment_analysis = pipeline("sentiment-analysis")
text_sentiment = "We are very happy to include pipeline into the transformers repository"
nlp_sentiment_analysis(text_sentiment)

HBox(children=(IntProgress(value=0, description='Downloading', max=686, style=ProgressStyle(description_width=…




HBox(children=(IntProgress(value=0, description='Downloading', max=230, style=ProgressStyle(description_width=…




HBox(children=(IntProgress(value=0, description='Downloading', max=267844284, style=ProgressStyle(description_…




[{'label': 'POSITIVE', 'score': 0.99687505}]

## 2. Question Answering

In [None]:
nlp_qa = pipeline("question-answering")
context = "Pipeline have been included in the huggingface/transformers repository"
question = "What is the name of the repository?"
nlp_qa({
    'question': question,
    'context': context
})

HBox(children=(IntProgress(value=0, description='Downloading', max=492, style=ProgressStyle(description_width=…




HBox(children=(IntProgress(value=0, description='Downloading', max=231508, style=ProgressStyle(description_wid…




HBox(children=(IntProgress(value=0, description='Downloading', max=230, style=ProgressStyle(description_width=…




HBox(children=(IntProgress(value=0, description='Downloading', max=492, style=ProgressStyle(description_width=…




HBox(children=(IntProgress(value=0, description='Downloading', max=265481570, style=ProgressStyle(description_…




convert squad examples to features: 100%|██████████| 1/1 [00:00<00:00, 269.56it/s]
add example index and unique id: 100%|██████████| 1/1 [00:00<00:00, 2832.08it/s]


{'answer': 'huggingface/transformers',
 'end': 59,
 'score': 0.2875610410939,
 'start': 35}

## 3. NER

In [None]:
nlp_ner = pipeline("ner")
text_ner = "European authorities fined Google a record $5.1 billion on Wednesday for abusing its power in the mobile phone market and ordered the company to alter its practices"
nlp_ner(text_ner)

HBox(children=(IntProgress(value=0, description='Downloading', max=230, style=ProgressStyle(description_width=…




[{'entity': 'I-MISC', 'score': 0.9980936050415039, 'word': 'European'},
 {'entity': 'I-ORG', 'score': 0.9990612864494324, 'word': 'Google'}]

## 4. Feature Extraction

In [None]:
nlp_fe = pipeline("feature-extraction")
text_fe = "We are very happy to include pipeline into the transformers repository"
nlp_fe(text_fe)

## 5. Bonus Forms (sentiment-analysis, ner, feature-extraction)

In [None]:
#@title Choose a pipeline and write a text { run: "auto" }
task = 'sentiment-analysis' #@param ["sentiment-analysis", "ner", "feature-extraction"]
text = 'We are very happy to include pipeline into the transformers repository.' #@param {type:"string"}
nlp = pipeline(task)
nlp(text)

HBox(children=(IntProgress(value=0, description='Downloading', max=230, style=ProgressStyle(description_width=…




[{'label': 'POSITIVE', 'score': 0.99781936}]

## question-answering

In [None]:
#@title Write a context and a question { run: "auto" }
context = 'Bitcoin[a] (\u20BF) is a cryptocurrency. It is a decentralized digital currency without a central bank or single administrator that can be sent from user to user on the peer-to-peer bitcoin network without the need for intermediaries.[8]  Transactions are verified by network nodes through cryptography and recorded in a public distributed ledger called a blockchain. Bitcoin was invented in 2008 by an unknown person or group of people using the name Satoshi Nakamoto[15] and started in 2009[16] when its source code was released as open-source software.[7]:ch. 1 Bitcoins are created as a reward for a process known as mining. They can be exchanged for other currencies, products, and services.[17] Research produced by University of Cambridge estimates that in 2017, there were 2.9 to 5.8 million unique users using a cryptocurrency wallet, most of them using bitcoin.[18]  Bitcoin has been criticized for its use in illegal transactions, its high electricity consumption, price volatility, and thefts from exchanges. Some economists, including several Nobel laureates, have characterized it as a speculative bubble. Bitcoin has also been used as an investment, although several regulatory agencies have issued investor alerts about bitcoin.[19][20]' #@param {type:"string"}
question = 'How are Bitcoins created?' #@param {type:"string"}
nlp_qa = pipeline("question-answering")
nlp_qa({
    'question': question,
    'context': context
})

HBox(children=(IntProgress(value=0, description='Downloading', max=230, style=ProgressStyle(description_width=…




convert squad examples to features: 100%|██████████| 1/1 [00:00<00:00, 43.99it/s]
add example index and unique id: 100%|██████████| 1/1 [00:00<00:00, 2896.62it/s]


{'answer': 'as a reward for a process known as mining.',
 'end': 623,
 'score': 0.5793338079010653,
 'start': 581}

## 6. BONUS (II) Mask filling with **umBERTo**, an Italian <img alt="🇮🇹" draggable="false" src="https://abs-0.twimg.com/emoji/v2/svg/1f1ee-1f1f9.svg" width="32" height="32"> Language Model trained with Whole Word Masking.

Source code: https://github.com/musixmatchresearch/umberto)

In [None]:
nlp_fill_mask_ita = pipeline(
	"fill-mask",
	model="Musixmatch/umberto-commoncrawl-cased-v1",
	tokenizer="Musixmatch/umberto-commoncrawl-cased-v1"
)

nlp_fill_mask_ita("Umberto Eco è <mask> un grande scrittore")

Model name 'Musixmatch/umberto-commoncrawl-cased-v1' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, bert-base-japanese, bert-base-japanese-whole-word-masking, bert-base-japanese-char, bert-base-japanese-char-whole-word-masking, bert-base-finnish-cased-v1, bert-base-finnish-uncased-v1, bert-base-dutch-cased, openai-gpt, transfo-xl-wt103, gpt2, gpt2-medium, gpt2-large, gpt2-xl, distilgpt2, ctrl, xlnet-base-cased, xlnet-large-cased, xlm-mlm-en-2048, xlm-mlm-ende-1024, xlm-mlm-enfr-1024, xlm-mlm-enro-1024, xlm-mlm-tlm-xnli15-1024, xlm-mlm-xnli15-1024, xl

[{'score': 0.1859990507364273,
  'sequence': '<s> Umberto Eco è considerato un grande scrittore</s>',
  'token': 5032},
 {'score': 0.17816734313964844,
  'sequence': '<s> Umberto Eco è stato un grande scrittore</s>',
  'token': 471},
 {'score': 0.16565516591072083,
  'sequence': '<s> Umberto Eco è sicuramente un grande scrittore</s>',
  'token': 2654},
 {'score': 0.09329013526439667,
  'sequence': '<s> Umberto Eco è indubbiamente un grande scrittore</s>',
  'token': 17908},
 {'score': 0.05470135435461998,
  'sequence': '<s> Umberto Eco è certamente un grande scrittore</s>',
  'token': 5269}]