In [None]:
# !pip install transformers

In [None]:
# pip install Xformers

## Text Classification

In [None]:
from transformers import pipeline
st = f"I do not like horror movies"
seq = pipeline(task="text-classification", model='nlptown/bert-base-multilingual-uncased-sentiment')


In [None]:
print(f"Result: { seq(st) }")

Result: [{'label': '2 stars', 'score': 0.41392049193382263}]


## Question Answering

In [None]:
sentence = r"""
Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations 
in it, “and what is the use of a book,” thought Alice “without pictures or conversations?” So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and 
stupid), whether the pleasure of making a daisy chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close by her.
"""
output = pipeline("question-answering", model="csarron/roberta-base-squad-v1")
question = output(question="Who was reading a book?", context=sentence)
print(f"Answer: {question['answer']}")

Downloading (…)lve/main/config.json:   0%|          | 0.00/525 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

Answer: her sister


# Masked Language Modeling


In [None]:
nlp = pipeline("fill-mask", model="bert-base-uncased")

nlp(f"{nlp.tokenizer.mask_token} movies are often very scary to people")

Downloading (…)lve/main/config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

[{'score': 0.35369157791137695,
  'token': 5469,
  'token_str': 'horror',
  'sequence': 'horror movies are often very scary to people'},
 {'score': 0.11002423614263535,
  'token': 1996,
  'token_str': 'the',
  'sequence': 'the movies are often very scary to people'},
 {'score': 0.07524268329143524,
  'token': 2122,
  'token_str': 'these',
  'sequence': 'these movies are often very scary to people'},
 {'score': 0.0665750503540039,
  'token': 12459,
  'token_str': 'scary',
  'sequence': 'scary movies are often very scary to people'},
 {'score': 0.035217709839344025,
  'token': 2919,
  'token_str': 'bad',
  'sequence': 'bad movies are often very scary to people'}]

## Text Generation


In [None]:
nlp = pipeline(task='text-generation', model='gpt2')
nlp("My name is Fernando, I am from Mexico and", max_length=30, num_return_sequences=5)

Downloading (…)lve/main/config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'My name is Fernando, I am from Mexico and I am a professional wrestler, I am one of the best and only kids that I know. I'},
 {'generated_text': 'My name is Fernando, I am from Mexico and my parents were born in Tijuana. I was born in Mexico just down the road from home,'},
 {'generated_text': 'My name is Fernando, I am from Mexico and I love animals." He also said he is a naturalist who thinks the world should follow him wherever'},
 {'generated_text': 'My name is Fernando, I am from Mexico and live in the States of North America."\n\nWith the help of his wife Carla (born'},
 {'generated_text': 'My name is Fernando, I am from Mexico and I am the son of Sergio and Carmen (Gonzales)." So far, it\'s been an'}]

## Named Entity Recognition


In [None]:
seq = r"""
I am Fernando, and I live in Mexico. I am a Machine Learning Engineer, and I work at Hitch.
"""
nlp = pipeline(task='ner', model="bert-base-cased")
for item in nlp(seq):
    print(f"{item['word'], item['entity']}")

Downloading (…)lve/main/config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/436M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForTokenClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-cas

Downloading (…)okenizer_config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

('I', 'LABEL_0')
('am', 'LABEL_0')
('Fernando', 'LABEL_0')
(',', 'LABEL_0')
('and', 'LABEL_0')
('I', 'LABEL_0')
('live', 'LABEL_0')
('in', 'LABEL_1')
('Mexico', 'LABEL_0')
('.', 'LABEL_0')
('I', 'LABEL_0')
('am', 'LABEL_0')
('a', 'LABEL_0')
('Machine', 'LABEL_1')
('Learning', 'LABEL_0')
('Engineer', 'LABEL_1')
(',', 'LABEL_0')
('and', 'LABEL_0')
('I', 'LABEL_0')
('work', 'LABEL_0')
('at', 'LABEL_0')
('Hit', 'LABEL_0')
('##ch', 'LABEL_0')
('.', 'LABEL_0')


## Text Summarization


In [None]:
txt = r'''
Machine learning is the study of computer algorithms that improve automatically through experience and by the use of data. It is seen as a part of artificial intelligence. Machine learning is an important component of the growing field of data 
science . Machine learning, deep learning, and neural networks are all sub-fields of artificial intelligence . As big data continues to grow, the market demand for data scientists will increase, requiring them to assist in the identification of 
the most relevant business questions. Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make 
decisions with minimal human intervention.
'''
nlp = pipeline(task='summarization', model="sshleifer/distilbart-cnn-12-6")
nlp(txt, max_length=130, min_length=30)

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

[{'summary_text': ' Machine learning is the study of computer algorithms that improve automatically through experience and by the use of data . Machine learning, deep learning, and neural networks are all sub-fields of artificial intelligence . As big data continues to grow, the market demand for data scientists will increase .'}]

## Translation

In [None]:
txt = r'''
Machine learning is a branch of artificial intelligence (AI) and computer sciencewhich focuses on the use of data and algorithms to imitate the way that humans learn,gradually improving its accuracy
'''
nlp = pipeline(task='translation_en_to_fr',  model="t5-base")
print(f"{nlp(txt)[0]['translation_text']}")

L’apprentissage automatique est une branche de l’intelligence artificielle (AI) et de la science informatique qui se concentre sur l’utilisation de données et d’algorithmes pour imiter la façon dont les humains apprennent, en améliorant progressivement leur précision.
