## https://pypi.org/project/transformers/ より

In [1]:
from transformers import pipeline

classifier = pipeline('sentiment-analysis')
classifier('We are very happy to introduce pipeline to the transformers repository.')

[{'label': 'POSITIVE', 'score': 0.9996980428695679}]

In [2]:
question_answerer = pipeline('question-answering')
question_answerer({
    'question': 'What is the name of the repository ?',
    'context': 'Pipeline has been included in the huggingface/transformers repository'})

{'score': 0.309702068567276,
 'start': 34,
 'end': 58,
 'answer': 'huggingface/transformers'}

In [3]:
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = AutoModel.from_pretrained('bert-base-uncased')

inputs = tokenizer('Hello world!', return_tensors='pt')
outputs = model(**inputs)
type(outputs)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


transformers.modeling_outputs.BaseModelOutputWithPoolingAndCrossAttentions

## https://www.analyticsvidhya.com/blog/2021/05/implementing-transformers-in-nlp-under-5-lines-of-codes/ より
### 文章の分類

In [4]:
st = 'I do not like horror movies'
seq = pipeline(task='text-classification', model='nlptown/bert-base-multilingual-uncased-sentiment')
seq(st)

Downloading:   0%|          | 0.00/953 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/669M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/872k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/39.0 [00:00<?, ?B/s]

[{'label': '2 stars', 'score': 0.41392043232917786}]

### 質問応答

In [5]:
sentence = r'''
Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do:
once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations 
in it, “and what is the use of a book,” thought Alice “without pictures or conversations?” So she was
considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid),
whether the pleasure of making a daisy chain would be worth the trouble of getting up and picking the daisies,
when suddenly a White Rabbit with pink eyes ran close by her.
'''
output = pipeline('question-answering', model='csarron/roberta-base-squad-v1')
question = output(question='Who was reading a book?', context=sentence)
question['answer']

Downloading:   0%|          | 0.00/525 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/499M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/772 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

'her sister'

### 文章の穴埋め

In [6]:
nlp = pipeline('fill-mask')
nlp(f'{nlp.tokenizer.mask_token} movies are often very scary to people')

Downloading:   0%|          | 0.00/480 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/331M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

[{'sequence': ' Horror movies are often very scary to people',
  'score': 0.12314239889383316,
  'token': 28719,
  'token_str': ' Horror'},
 {'sequence': ' horror movies are often very scary to people',
  'score': 0.05246889963746071,
  'token': 8444,
  'token_str': ' horror'},
 {'sequence': 'Ghost movies are often very scary to people',
  'score': 0.05243458226323128,
  'token': 38856,
  'token_str': 'Ghost'},
 {'sequence': 'War movies are often very scary to people',
  'score': 0.03345341980457306,
  'token': 20096,
  'token_str': 'War'},
 {'sequence': 'Action movies are often very scary to people',
  'score': 0.029487790539860725,
  'token': 36082,
  'token_str': 'Action'}]

### 文章生成

In [7]:
nlp = pipeline(task='text-generation', model='gpt2')
nlp('My name is Fernando, I am from Mexico and', max_length=30, num_return_sequences=5)

Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'My name is Fernando, I am from Mexico and I am just 18 in December. I have 2 years old, no car. My parents only have'},
 {'generated_text': "My name is Fernando, I am from Mexico and I have three daughters. My kids are both 3 years old and one of my sister's children,"},
 {'generated_text': "My name is Fernando, I am from Mexico and I was born at 15 degrees C, I'm a 12-year-old boy. As you"},
 {'generated_text': "My name is Fernando, I am from Mexico and am Brazilian. I was a 17 years old. I do not speak any Portuguese. I don't"},
 {'generated_text': 'My name is Fernando, I am from Mexico and I am from Sinaloa".\n\n"I have a good family and I just want to'}]

### 固有表現抽出 (Named Entity Recognition)

In [8]:
seq = 'I am Fernando, and I live in Mexico. I am a Machine Learning Engineer, and I work at Hitch.'
nlp = pipeline(task='ner')
for item in nlp(seq):
    print(f"{item['word'], item['entity']}")

Downloading:   0%|          | 0.00/998 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

('Fernando', 'I-PER')
('Mexico', 'I-LOC')
('Learning', 'I-ORG')
('Engineer', 'I-MISC')
('Hit', 'I-ORG')
('##ch', 'I-ORG')


### 文章要約

In [9]:
txt = r'''
Machine learning is the study of computer algorithms that improve automatically through experience and
by the use of data. It is seen as a part of artificial intelligence. Machine learning is an important
component of the growing field of data science . Machine learning, deep learning, and neural networks
are all sub-fields of artificial intelligence . As big data continues to grow, the market demand for
data scientists will increase, requiring them to assist in the identification of the most relevant
business questions. Machine learning is a method of data analysis that automates analytical model building.
It is a branch of artificial intelligence based on the idea that systems can learn from data, identify
patterns and make decisions with minimal human intervention.
'''
nlp = pipeline(task='summarization')
nlp(txt, max_length=130, min_length=30)

Downloading:   0%|          | 0.00/1.65k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

[{'summary_text': ' Machine learning is the study of computer algorithms that improve automatically through experience and the use of data . Machine learning, deep learning, and neural networks are all sub-fields of artificial intelligence . As big data continues to grow, demand for data scientists will increase .'}]

### 機械翻訳（英語→フランス語）

In [10]:
txt = r'''
Machine learning is a branch of artificial intelligence (AI) and computer sciencewhich focuses on the use
of data and algorithms to imitate the way that humans learn,gradually improving its accuracy
'''
nlp = pipeline(task='translation_en_to_fr')
nlp(txt)[0]['translation_text']

Downloading:   0%|          | 0.00/1.20k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/892M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

'L’apprentissage automatique est une branche de l’intelligence artificielle (AI) et de la science informatique qui se concentre sur l’utilisation de données et d’algorithmes pour imiter la façon dont les humains apprennent, en améliorant progressivement leur précision.'