# TRANSFORMERS
- https://huggingface.co/transformers/v3.1.0/task_summary.html
- All pretrained models downloads here C:\Users\lveys\.cache\torch
- For Pytorch version use torch conda env. For Tensorflow verion, use nlp conda environment.

In [1]:
import transformers
import os

In [2]:
transformers.__version__

'3.5.1'

In [3]:
transformers.__file__

'C:\\Users\\lveys\\anaconda3\\envs\\torch\\lib\\site-packages\\transformers\\__init__.py'

In [4]:
dir(transformers)

['ALBERT_PRETRAINED_CONFIG_ARCHIVE_MAP',
 'ALBERT_PRETRAINED_MODEL_ARCHIVE_LIST',
 'ALL_PRETRAINED_CONFIG_ARCHIVE_MAP',
 'Adafactor',
 'AdamW',
 'AdamWeightDecay',
 'AdaptiveEmbedding',
 'AddedToken',
 'AlbertConfig',
 'AlbertForMaskedLM',
 'AlbertForMultipleChoice',
 'AlbertForPreTraining',
 'AlbertForQuestionAnswering',
 'AlbertForSequenceClassification',
 'AlbertForTokenClassification',
 'AlbertModel',
 'AlbertPreTrainedModel',
 'AlbertTokenizer',
 'AlbertTokenizerFast',
 'AutoConfig',
 'AutoModel',
 'AutoModelForCausalLM',
 'AutoModelForMaskedLM',
 'AutoModelForMultipleChoice',
 'AutoModelForNextSentencePrediction',
 'AutoModelForPreTraining',
 'AutoModelForQuestionAnswering',
 'AutoModelForSeq2SeqLM',
 'AutoModelForSequenceClassification',
 'AutoModelForTokenClassification',
 'AutoModelWithLMHead',
 'AutoTokenizer',
 'BART_PRETRAINED_MODEL_ARCHIVE_LIST',
 'BERT_PRETRAINED_CONFIG_ARCHIVE_MAP',
 'BERT_PRETRAINED_MODEL_ARCHIVE_LIST',
 'BLENDERBOT_PRETRAINED_CONFIG_ARCHIVE_MAP',
 'BLE

In [5]:
from transformers import BertTokenizer, BertForSequenceClassification

In [6]:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

In [7]:
sequence = "A Titan RTX has 24GB of VRAM"
tokenized_sequence = tokenizer.tokenize(sequence)
tokenized_sequence

['a', 'titan', 'rt', '##x', 'has', '24', '##gb', 'of', 'vr', '##am']

“VRAM” wasn’t in the model vocabulary, so it’s been split in “V”, “RA” and “M”. To indicate those tokens are not separate words but parts of the same word, a double-hash prefix is added for “RA” and “M”

In [8]:
inputs = tokenizer(sequence)
encoded_sequence = inputs["input_ids"]
encoded_sequence

[101, 1037, 16537, 19387, 2595, 2038, 2484, 18259, 1997, 27830, 3286, 102]

tokens can then be converted into IDs which are understandable by the model. This can be done by directly feeding the sentence to the tokenizer. the tokenizer automatically adds “special tokens” (if the associated model relies on them) which are special IDs the model sometimes uses.

In [9]:
decoded_sequence = tokenizer.decode(encoded_sequence)
decoded_sequence

'[CLS] a titan rtx has 24gb of vram [SEP]'

In [10]:
sequence_a = "This is a short sequence."
sequence_b = "This is a rather long sequence. It is at least longer than the sequence A."
encoded_sequence_a = tokenizer(sequence_a)["input_ids"]
encoded_sequence_b = tokenizer(sequence_b)["input_ids"]
len(encoded_sequence_a), len(encoded_sequence_b)

(8, 19)

The encoded versions have different lengths.The first sequence needs to be padded up to the length of the second one, or the second one needs to be truncated down to the length of the first one.

In [11]:
padded_sequences = tokenizer([sequence_a, sequence_b], padding=True)
padded_sequences

{'input_ids': [[101, 2023, 2003, 1037, 2460, 5537, 1012, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [101, 2023, 2003, 1037, 2738, 2146, 5537, 1012, 2009, 2003, 2012, 2560, 2936, 2084, 1996, 5537, 1037, 1012, 102]], 'token_type_ids': [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]}

In [12]:
print(padded_sequences['input_ids'])

[[101, 2023, 2003, 1037, 2460, 5537, 1012, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [101, 2023, 2003, 1037, 2738, 2146, 5537, 1012, 2009, 2003, 2012, 2560, 2936, 2084, 1996, 5537, 1037, 1012, 102]]


The attention mask is a binary tensor indicating the position of the padded indices so that the model does not attend to them. For the BertTokenizer, **1 indicates a value that should be attended to, while 0 indicates a padded value.** This attention mask is in the dictionary returned by the tokenizer under the key “attention_mask”:

Some models’ purpose is to do sequence classification or question answering. These require two different sequences to be joined in a single “input_ids” entry, which usually is performed with the help of special tokens, such as the classifier ([CLS]) and separator ([SEP]) tokens. We can use our tokenizer to automatically generate such a sentence by passing the two sequences to tokenizer as two arguments (and not a list, like before)

In [13]:
sequence_a = "HuggingFace is based in NYC"
sequence_b = "Where is HuggingFace based?"
encoded_dict = tokenizer(sequence_a, sequence_b)
decoded = tokenizer.decode(encoded_dict["input_ids"])
decoded

'[CLS] huggingface is based in nyc [SEP] where is huggingface based? [SEP]'

In [14]:
encoded_dict

{'input_ids': [101, 17662, 12172, 2003, 2241, 1999, 16392, 102, 2073, 2003, 17662, 12172, 2241, 1029, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

BERT, also deploy token type IDs (also called segment IDs). They are represented as a binary mask identifying the two types of sequence in the model.The tokenizer returns this mask as the “token_type_ids” entry
- The first sequence, the “context” used for the question, has all its tokens represented by a 0, whereas the second sequence, corresponding to the “question”, has all its tokens represented by a 1.

# Perform a Task

- auto-models, which are classes that will instantiate a model according to a given checkpoint, automatically selecting the correct model architecture.
- In order for a model to perform well on a task, it must be loaded from a checkpoint corresponding to that task. These checkpoints are usually pre-trained on a large corpus of data and fine-tuned on a specific task.
- **In many cases, the architecture you want to use can be guessed from the name or the path of the pretrained model you are supplying to the from_pretrained method.**
- AutoClasses are here to do this job for you so that you automatically retrieve the relevant model given the name/path to the pretrained weights/config/vocabulary:

# Sequence Classification
- Sequence classification is the task of classifying sequences according to a given number of classes. 

### Sentiment Analysis

In [15]:
from transformers import pipeline
nlp = pipeline("sentiment-analysis")
result = nlp("I hate you")[0]
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
result = nlp("I love you")[0]
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")

HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=629.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=267844284.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=231508.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=230.0), HTML(value='')))


label: NEGATIVE, with score: 0.9991
label: POSITIVE, with score: 0.9999


### FINBert for Financial sentiment analysis
- https://github.com/ProsusAI/finBERT
- https://github.com/neoyipeng2018/sgx-sent/blob/main/FinancialNLPstrat.ipynb
- https://medium.com/prosus-ai-tech-blog/finbert-financial-sentiment-analysis-with-bert-b277a3607101
    
- create folder and put json file + FinBert model

In [27]:
import torch
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('finbertProsus/sentiment/pytorch_model.bin',config='finbertProsus/sentiment/config.json',num_labels=3)

In [28]:
label_list=['positive','negative','neutral']

In [29]:
inputs = tokenizer("We did badly this quarter", return_tensors="pt")
outputs = model(**inputs)
label_list[torch.argmax(outputs[0])]

'negative'

In [30]:
inputs = tokenizer("We did well this quarter", return_tensors="pt")
outputs = model(**inputs)
label_list[torch.argmax(outputs[0])]

'positive'

In [31]:
inputs = tokenizer("We did meh this quarter", return_tensors="pt")
outputs = model(**inputs)
label_list[torch.argmax(outputs[0])]

'neutral'

### Paraphrases detection: Paraphrase or not ?
- Microsoft Research Paraphrase Corpus (MRPC). MRPC is a paraphrase identification dataset, where systems aim to identify if two sentences are paraphrases of each other

In [69]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc")
model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc")
classes = ["not paraphrase", "is paraphrase"]
sequence_0 = "The company HuggingFace is based in New York City"
sequence_1 = "Apples are especially bad for your health"
sequence_2 = "HuggingFace's headquarters are situated in Manhattan"
paraphrase = tokenizer(sequence_0, sequence_2, return_tensors="pt")
not_paraphrase = tokenizer(sequence_0, sequence_1, return_tensors="pt")
paraphrase_classification_logits = torch.Tensor(model(**paraphrase)[0][0].float().tolist())
not_paraphrase_classification_logits = torch.Tensor(model(**not_paraphrase)[0][0].float().tolist())
paraphrase_results = torch.softmax(paraphrase_classification_logits, dim=0).tolist()
not_paraphrase_results = torch.softmax(not_paraphrase_classification_logits, dim=0).tolist()
# Should be paraphrase
print(sequence_0, '<---->', sequence_2)
for i in range(len(classes)):
    print(f"{classes[i]}: {int(round(paraphrase_results[i] * 100))}%")

The company HuggingFace is based in New York City <----> HuggingFace's headquarters are situated in Manhattan
not paraphrase: 10%
is paraphrase: 90%


In [70]:
# Should not be paraphrase
print(sequence_0, '<---->', sequence_1)
for i in range(len(classes)):
    print(f"{classes[i]}: {int(round(not_paraphrase_results[i] * 100))}%")

The company HuggingFace is based in New York City <----> Apples are especially bad for your health
not paraphrase: 94%
is paraphrase: 6%


#### Tensorflow version

In [22]:
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
import tensorflow as tf
# Instantiate a tokenizer and a model from the checkpoint name.
# The model is identified as a BERT model and loads it with the weights stored in the checkpoint.
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc")
model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc")

HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=433.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=213450.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=433518744.0), HTML(value='')))




Some weights of the model checkpoint at bert-base-cased-finetuned-mrpc were not used when initializing TFBertForSequenceClassification: ['dropout_183']
- This IS expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased-finetuned-mrpc and are newly initialized: ['dropout_37']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [25]:
classes = ["not paraphrase", "is paraphrase"]
sequence_0 = "The company HuggingFace is based in New York City"
sequence_1 = "Apples are especially bad for your health"
sequence_2 = "HuggingFace's headquarters are situated in Manhattan"
# Build a sequence from the two sentences, with the correct model-specific separators token type ids and attention masks
paraphrase = tokenizer(sequence_0, sequence_2, return_tensors="tf")
not_paraphrase = tokenizer(sequence_0, sequence_1, return_tensors="tf")
# Pass this sequence through the model so that it is classified in one of the two available classes: 0 (not a paraphrase) and 1 (is a paraphrase)
paraphrase_classification_logits = model(paraphrase)[0]
not_paraphrase_classification_logits = model(not_paraphrase)[0]
paraphrase_results = tf.nn.softmax(paraphrase_classification_logits, axis=1).numpy()[0]
# Compute the softmax of the result to get probabilities over the classes.
not_paraphrase_results = tf.nn.softmax(not_paraphrase_classification_logits, axis=1).numpy()[0]
# Should be paraphrase
for i in range(len(classes)):
    print(f"{classes[i]}: {int(round(paraphrase_results[i] * 100))}%")

not paraphrase: 10%
is paraphrase: 90%


In [26]:
# Should not be paraphrase
for i in range(len(classes)):
    print(f"{classes[i]}: {int(round(not_paraphrase_results[i] * 100))}%")

not paraphrase: 94%
is paraphrase: 6%


### Extractive Question Answering
- Extractive Question Answering is the task of extracting an answer from a text given a question.
- An example of a question answering dataset is SQuAD dataset. The Stanford Question Answering Dataset (SQuAD) is a collection of 100k crowdsourced QA pairs.
- using pipelines to do question answering: This returns an answer extracted from the text, a confidence score, alongside “start” and “end” values, which are the positions of the extracted answer in the text.

In [71]:
from transformers import pipeline
nlp = pipeline("question-answering")
context = r"""
Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
a model on a SQuAD task, you may leverage the examples/question-answering/run_squad.py script.
"""

HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=473.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=260793700.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=213450.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=230.0), HTML(value='')))




In [72]:
result = nlp(question="What is extractive question answering?", context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")

Answer: 'the task of extracting an answer from a text given a question.', score: 0.6226, start: 34, end: 96


In [73]:
result = nlp(question="What is a good example of a question answering dataset?", context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")

Answer: 'SQuAD dataset,', score: 0.5053, start: 147, end: 161


In [74]:
with open('./data/(TEVA)Q3.2019.txt') as f:
    text = f.read()
    
context = r"""it be a pleasure to review the third quarter highlight . it revenue come in at bit more than $ 4 billion , very much in line with the last three quarter as it have be discuss before . it be see now a nice stable development of it revenue . it GAAP dilute loss per share be $ 0.29 in the third quarter . this be primarily affect by the accrual for the opioid litigation . on a non - gaap basis , it diluted earning per share be $ 0.58 . the primary change there be a change to the tax - estimate tax for the full - year and that reduce the EPS by around $ 0.04 . the non - gaap EBITDA be around a bit more than $ 1 billion . it be very stable , again , in the last three quarter . so really the take home message here be , it be see the operational stabilization it have be talk about . it can see that the run rate on the operating profit be stable and it also see a nice cash flow of some $ 550 million in the third quarter . so all in all it be very happy about the financial result . commercially , it will touch upon a few topic . it will touch upon north american generic . the nice growth it be see on AUSTEDO ; TRUXIMA , which be a new launch it be have in biosimilar ; and then of course also on the restructure program ; and then positive development of it net debt . but let it go first to the restructuring and take a status on that . so if it can take a look here at the actual spend base in 2017 , it be $ 16.3 billion . as some of it may recall , when it announce the restructure nearly two year ago , it promise it would bring this down by $ 3 billion to an absolute number of $ 13.3 billion in 2019 . it be perfectly on track to do that . it can see here that the MAT right now be $ 13.4 billion . but of course as it swap the fourth quarter last year with the fourth quarter this year and when it complete the year it can see from all indication that it will hit the $ 3 billion cost reduction . this have of course come through thousand of action and initiative around the world and it have see the number of FTEs go down by more than 11,500 . and it be also in the continued process of restructure it manufacturing network . and right now it have , in the period , close down or sell 11 site and it have five more site where it have announce that it be in the process of be close or divest by the end of 2019 . so it be approach just above 60 manufacturing site and that be of course a very complex and ongoing process , but that be the background also for the reduction in the spend level , which it be very , of course , satisfied with . the long - term target remain a reduction of it net debt - to - ebitda below three . there be no change there . and it continue to allocate , by far , most of it cash flow to the reduction in debt and it be happy to show it here that in the same restructuring period , it have so far be able to reduce the debt by $ 8.3 billion . if it look at the global generic sale , then it can also see a stabilization here . of course these sale will always swing little bit quarter - by - quarter , depend on the actual launch that it be see . in fact it have see a very high number of launch this year . it think year - to - date in the U.S. alone it have above 40 launch . so it see a very healthy business , basically drive by the fact that it have more generic project in the pipeline than anyone else and that naturally lead to a high level of launch activity both in the U.S. and in Europe and in the international market . so it be very satisfied with that stabilization and it go hand - in - hand with an overall stabilization of the pricing environment , both in North America and in Europe on generic . if it look at AUSTEDO , then the successful penetration of the market continue both for Huntington 's dyskinesia and for tardive dyskinesia . and if it look at the revenue , it be a little up and down per quarter , but that be more due to random element of shipment and so on . and it continue to see a strong growth and it expect the product to keep grow . it have tell it before that in tardive dyskinesia it have an estimate patient population , potentially of some 500,000 Americans suffer from tardive dyskinesia . and it have one competitor , but between U.S. and that one competitor , it have still only a very low level of patient receive treatment in the U.S. so it be quite convinced that this product can keep on grow for the foreseeable future . if it move to AJOVY , then AJOVY be off to a very good start . it see increase revenue . it have a normalized TRx share right now of around 19 % . it have see a weakening of the new - to - brand share . it contribute this to the lack of U.S. have an auto - injector and as the class be penetrate more and more it see patient decide to go for product that have an auto - injector . it be expect a positive clarification with FDA on the approval of it Auto - injector for the U.S. in the coming month and it have just a received positive opinion from CHMP in Europe . so it will be launch the Auto - injector in Europe also in the coming month . on COPAXONE , it be happy to share with it the sale number for the third quarter . it see a very stable development both in North America as well as in Europe . so this be of course very positive . it continue to see a slow erosion in the TRx count in North America , and it be optimistic that it will maintain a significant business in COPAXONE , both in North America and in Europe . one announcement it make today be the anticipated launch of TRUXIMA , the first approve rituximab biosimilar in the U.S. Will be launch on November 11 and this will be with the full oncology label . this be very exciting , because as it know , part of it strategy be leadership in biopharmaceutical , include biologic such as biosimilar . and so far it have see biosimilar penetrate less in the United States than it have be penetrate in Europe . it believe that there be several reason for that and one of the reason be that in order to penetrate it need of course competitive pricing , but it also need dedicate patient support and service and it also need a good commercial footprint in the area where it be penetrate . and due to it long experience and strong position in oncology , it believe that it know how to penetrate this market to the benefit of both patient and payor in the oncology space in the United States . so this be go to be very exciting and it will be share with it in three month how it actually end up perform . it be sure that one thing that be on everybody 's mind be the opioid litigation situation . it be happy to settle the Track 1 , but it be even more happy to see an agreement in principle with a group of Attorney Generals . it believe that the agreement in principle be the good way forward the patient or the people in the United States suffer from addiction . it believe it commitment to supply Suboxone generic for the next 10-years to all the people suffer from addiction who can use this product to get out of it addiction and it can be an element in that whole process . that be the good way forward . it hope that this framework will materialize and that it materialize together with all defendant , it will be able to help alleviate some of the burden from the misuse of opioid in the United States . if it look to the future focus and the present focus , then of course , it remain focused on maximize the profit from it exist core business . it remain focused on increase the sale of it new brand such as AUSTEDO and AJOVY and it should add that it be work on the launch . it be launch AUSTEDO and AJOVY in more country as it speak and also in the coming period . it be execute on it biopharmaceutical r&d strategy and it will be share more of that with it in February . and as well in February , it will share with it it manufacturing strategy , which will of course be focus on deliver efficiency and optimization . and all of this it do to secure strong free cash flow and of course secure the debt repayment . and before it turn over to Mike , it would like to add a few extra element . one be a warm  to Mike for the great collaboration it have have with it over the last two year and for everything it have do for Teva . as it know , Mike be leave the company for personal reason and it commit to stay on until today and it be very grateful for that . it have announce today also that it have appoint a new CFO Eli Kalif , who have a strong background in finance and manufacturing as well as other relevant element for it . it will be start on the 27th of December and until then , it will be it interim cfo . so with that it will hand it over to Mike . Michael McClellan  it , Kare , and  . as always , it start with the review of the GAAP performance on Slide 15 . Teva post a quarterly GAAP loss of $ 314 million and a loss per share on a GAAP basis of $ 0.29 for the third quarter of 2019 . as it will detail in the next slide , the GAAP result be impact mainly by an update to it legal provision associate with the ongoing opioid litigation . so turn to Slide 16 , in the third quarter of 2019 , non - gaap adjustment amount to $ 951 million impact on net income . the adjustment come primarily from three item ; $ 460 million provision for legal settlement generally relate to the opioid litigation ; amortization charge of $ 255 million , which be a normal quarterly run rate for it ; and $ 204 million impairment to intangible asset . it would like to take a minute and give it some insight into how it calculate the legal settlement provision as it relate to ongoing opioid litigation . as it recall that in Q2 , after consider the $ 85 million settlement it have with Oklahoma and it unique characteristic , it further evaluate the potential settlement scenario and outcome for the purpose of determine the size of the provision , it would take . and in accordance with accounting requirement as no single scenario be consider to be most probable at that time , it record the minimum of these estimate which in Q2 be approximately $ 500 million . since then , it have have two additional datum point , which be ; a , it track 1 settlement with the two county in Ohio ; and b , the not yet finalize agreement in principle on a nationwide settlement framework announce on October 21st . these datum point and other factor be take into consideration it ongoing evaluation of potential settlement scenario and the outcome which result in increase the provision by about $ 450 million to it current total of approximately $ 1 billion . as in accordance with the accounting requirement , when no scenario be consider most probable , it be require to record the minimum of these range of estimate . in addition , this quarter , it take an impairment of $ 204 million , which bring the year - to - date impairment number to approximately $ 1.2 billion on intangible asset . these be mostly comprise of intangible asset and product right as well as IP R&D asset relate to the Actavis Generics acquisition . now turn to it non - gaap performance on Slide 17 ; quarterly revenue be $ 4.3 billion , a decrease of $ 265 million or 6 % compare to the third quarter of 2018 . the decrease be mainly due to generic competition to COPAXONE , a decline in revenue from TREANDA and BENDEKA and low sale in Russia and Japan . this be partially offset by high revenue from the progress of it launch of AUSTEDO and AJOVY in the U.S. , a recovery in QVAR , and strong trend in it Anda business in the US . gross margin be 49.3 % compare to 49.9 % for the same period in 2018 . the change in gross margin be drive by the decline in COPAXONE and bendamustine revenue in the U.S. , which be partially offset by improve profitability of it north american generic business and grow sale of AUSTEDO and AJOVY . operate income in the quarter decline by 5 % compare to the same period of 2018 . the decrease be mainly attributable to the decline of COPAXONE and other specialty brand . these decline be partially offset by cost reduction in Europe as well as increase sale of AUSTEDO in the U.S. non - gaap earning per share in the quarter be $ 0.58 , $ 0.10 low than the same period of last year . the decrease be mainly due to operating profit and high tax expense , partially offset by low financial expense . it would like to take a minute , though , to describe what it be see in the development of it expect tax rate for 2019 . at the start of the year , it guide for an expect tax rate of approximately 16 % . it now expect it annual tax rate for 2019 to be close to 18 % . the increase which be mainly drive by U.S. loss , which do not have a tax benefit , interest expense disallowance come from the further development of the U.S. tax reform in it account and other change to tax position . the change in the tax rate in Q3 plus the catch - up for the first two quarter of 2019 reduce it Q3 EPS by approximately $ 0.04 , as Kare mention earlier . turn to slide 18 , it have be highlight for several quarter now , include in it 2019 guidance provide in February , the impact of the strong U.S. dollar on it result since approximately 50 % of it revenue come from sale denominate in non - U.S. dollar currency . it see that the exchange rate movement during the third quarter of ' 19 have a negative impact of $ 55 million on revenue , while the impact on operating profit be small at $ 22 million . the main currency relevant to it operation that decrease the most in value against the U.S. dollar where the euro at 4 % and the pound at about 5 % . it expect that the U.S. dollar will remain strong for the remainder of the year . turn to slide 19 , free cash flow for the quarter come in at $ 551 million , an increase of $ 383 million versus the second quarter of 2019 . the significant increase in free cash flow in the year be mainly attributable to the expect improvement in work capital that it previously guide to . it would remind it that the work capital be a drag on cash in the amount of $ 365 million and $ 345 million in the first and second quarter respectively . however , work capital be basically neutral in the third quarter of 2019 . so turn to slide 20 , it end the third quarter with a net debt of $ 25.7 billion and a net debt - to - ebitda ratio of 5.62 . it be especially pleased to see a reversal of the upward trend of the ratio from the previous four quarter , as this be the first time since the actavis acquisition that it have see a decline in this ratio . in the course of Q3 2019 , it do borrow $ 500 million under it revolving credit facility and it subsequently repay $ 400 million of such borrowing . during the month of October , it repay the remain $ 100 million , and as of today , it have no outstanding draw on it revolving credit facility . so turn to the financial outlook for 2019 . today , it be revise it five main financial target base on the performance of the first nine month and what it be see for the fourth quarter . as it can see on the update outlook , it have basically bring up the bottom end of the range of all of it parameter . where it end up in these range will be determine mainly by the penetration of the TRUXIMA launch in the U.S. , COPAXONE trend , foreign exchange effect , and it product mix in it generic business for the rest of the year . so lastly , on a personal note , as it know , today mark it final earning call as Teva 's CFO . it would like to say that it be be a real honor and privilege to serve in this position in the last two year . it be especially proud of the work that it talented and dedicated group of employee in this great company have accomplish . and it believe the company will only grow strong in the future . it wish Eli Kalif great success as it take over this important role , and to the member of the investment community , it have always appreciate it thoughtful question and helpful feedback and it investor relation goal have always be and will continue to be to communicate with the investor clearly and concisely as it possibly can .  it . and now it will open the floor up for question and answer . question - and - answer session  it . [ Operator Instructions ] it first question come from the line of Elliot Wilbur . please ask it question .  and  , and good wish to it Mike , and  for all it help over the past couple of year . question specifically with respect to gross margin performance in the quarter , a little bit light than expect and it know there have be quite a bit of focus on that this morning , sort of give how important that is , it ultimately retain it operate margin target . it guess drill down through the number a little bit , it look like everything in term of segment be essentially flat sequentially with the exception of the international business , that be down about 200 basis point . so it do not know if that be a function of exchange or just plant utilization , but maybe it could just drill down on those dynamic a little bit more and sort of talk about what account for the relative softness there . and then as a follow - up for it guess , Kare , and Brendan . just maybe some thought on kind of overall cgrp market dynamic in the U.S. it guess the positive be it still see 7,000 kind of new - to - brand rx every week , but that be basically be flat for eight month . so just spot on maybe sort of overall market growth trend oppose to just Teva 's relative share of the market .  .  it very much for those question . it think Mike will take the first one and then it will give it a go at the second one and then Brendan will add to that . so , Mike it go first . Michael McClellan yes . so it do have a sequential dip . if it look q2 to Q3 in the gross margin percentage , but it also have that last year . so it be a little bit of a normal pattern in the year . it do expect that the full - year and the Q4 will get back toward the 50 % . if it actually look at what drive that in the Q3 a couple of thing . it be right , the international market it see a little bit of a low gross profit percentage there , mainly relate to Japan . it have see some product mix there , that be a little bit light in the quarter . it also have a little bit high write - off in the quarter versus what it have in Q2 . that tend to happen in the summer month as it have some plant shutdown and it reevaluate the write - off of product . but it still feel good that it be on track for roughly 50 % for the year and it should see around that level in Q4 . and on the overall cgrp dynamic in the U.S. , it will just say that from an overall perspective , it be still very optimistic about this segment . it see very good reception in the marketplace in term of efficacy . it see a constant good flow in , as it mention it , or NBRx , which mean that the market continue to accumulate . so it be still very optimistic on this segment and also internationally . it be only just start to launch in Europe and in the rest of the world , but it believe this will be a strong worldwide segment . Brendan do it have any further comment ? Brendan oâ€™grady yes , it would just comment Kare that it think if it look at the segment as Kare mention , it be a very effective class of medication . and it think it see a lot of early pen - up demand . and it have say since the beginning , 2019 be go to be a bit of a roller coaster . it do think that the market level out a little bit and continue to grow and it think that it will play a significant part in that growth .  it , Brendan . it next question come from the line of Ken Cacciatore . please go ahead , it line be open .  so much . Kare it be do a great job try to resolve this litigation . but can it just give U.S. a sense of deal with all Attorney Generals versus some ? be it make any progress with those that be not part of this early agreement that it have . so be there any progress be make as it try to bring the rest of it under the tent . and then , also in term of the upcoming debt that it have due in the next couple of year , can it just talk about how this litigation may be impact it ability to refinance or work on those debt obligation ?  it .  it very much for that question . it be sure that it be on everybody 's mind , the litigation on opioid and what it would say on that be that the frame work have of course be develop together with the four ag that have be sort of be partie to this and it be start base on the Track 1 case that be come up , as it know , in Cleveland . and the way it be develop be basically that there be a framework that everybody agree to and that will serve it could say the american public and will serve the people suffer from addiction very well . it have make a 10-year commitment to supply a key component in the treatment pattern for people suffer from addiction . some of the [ indiscernible ] have make commitment to provide significant financial resource also over a long period of time . and it feel , this will be a very , very good way to move on , because at the end of the day what matter be really if it alleviate some of the burden from the people who have problem with addiction . now that be say , of course , it realize that this will only work if everybody come together . it very much hope that everybody will come together . that be the whole idea behind it . that be what the ag have be signal to it . it be a process that be ongoing . it be sure it be an interesting and dynamic process . but it have high hope that it will succeed in the end to the good of the american public , but also to the good of everybody involve . now , when it come to the ongoing it could say challenge it have that it need to secure refinancing in order to serve it debt , and it do not think it be a major issue . it think what matter here be that there be a willingness to look at a long period here to look at like a 10-year period in order to resolve the issue . and that of course mean that on a short - term basis give the fact that as it just discuss , it have a high debt and it have a high net debt - to - ebitda ratio , then the fact that it be look at a long - term solution be a positive for it ability to , on an ongoing basis , refinance it debt . and maybe , Mike , it have a comment . Michael McClellan yes , it regularly assess the market condition as it relate to refinance it debt . and at this point , it can not comment on specific refinance plan , but it think it have mention in the past that it would like to get out in front of the 21 maturity sometime late in the first half of next year . so it will continue to monitor the market condition and look at refinance when it make sense . it have be encourage by the recent move both in the broad market interest rate as well as in it own secondary rate . so there be some thing that it will look at and assess the market as the time come .  it . it will now take it next question from the line of Esther Rajavelu of Oppenheimer . please go ahead . Esther Rajavelu  . a couple of quick one , on AUSTEDO , can it update U.S. on the Tourette syndrome readout ? yes , sure . so , it expect to have the final result of the Tourettes in the first half of next year and of course as soon as it have the final result , it will communicate it to the market . at this point in time it do not really have any further information , but of course it very much hope for a positive outcome to the benefit of patient suffer from tourette . Esther Rajavelu got it . and then it mention in the press release on invest in some early stage r&d project . Can it help U.S. understand what it be and when it may be able to see some of the news flow on those ? yes . it have a strategy where it be pursue r&d in biopharmaceutical . now that be innovative biopharmaceutical and it be also biologic such as biosimilar and it will be communicate more in depth on the r&d strategy and the portfolio in February in connection with it full - year announcement . right now all it can say be that it have approximately 25 biopharmaceutical project and it be a very it think exciting portfolio that fit with it commercial footprint as well . but it do not have any further comment today . Esther Rajavelu okay . and then lastly , any update on the price - fix litigation ? there be no real update there . it be in ongoing dialog with the Department of Justice . it have of course share more than a million document with it . it have not find any evidence that it be in any way part of any structured collusion or price - fixing , but it remain of course in dialog with the Department of Justice . Esther Rajavelu  it . the next question come from the line of Gregg Gilbert from SunTrust . please ask it question .  it . Kare , it know it plan to update in February on this , but it do replace it Head of Global Ops a few week ago . so it be hope it could provide a little more color on that and update U.S. on it progress to streamline it global operation and reduce cost of good . it seem like cost of good be the next frontier in term of cost reduction at Teva , give the low hanging fruit , it probably already pick in the other line . it second question be for Brendan . when do it expect approval for generic version of Forteo and Nuvaring , and can it update U.S. on expected launch activity in general in the coming month .  . so ,  for that question . it be absolutely right . of course it be a key topic for U.S. to secure and in long - term improve it operate margin and it gross margin . and it could say that the recent change it have in the Head of it global manufacturing . in that change , it replace a very experienced and very , very competent person with not a very experienced , very , very competent person . and it also in it choice of new CFO , have secure a person with a very long and in - depth experience in global complex manufacturing and margin improvement . so it be convinced that the management team will be able to inform it about it manufacturing strategy and the positive effect it will have on it gross margin long - term and it will be do so with more color , more detail in February . but it be absolutely right , it be one of it key priority for the very simple reason , it gross margin be around 50 % and that mean basically every dollar it sell , it spend $ 0.50 by far the big cost element in it P&L on the manufacturing . so that be of course a key focus area for U.S. go forward . but more detail on it in February . and then , on to it Brendan . Brendan oâ€™grady yes . so , as Kare mention , it have launch 40 generic product year - to - date . it have another five to eight that it will complete by the end of the year . nuvaring , Forteo and Restasis ; none of those be in the 2019 plan . it likely will not launch Forteo before the second half of 2020 . nuvaring have be move out to 2020 and Restasis could be any day , it just do not really know kind of on that one . so whenever the FDA approve it , it be operationally ready to go .  for the color .  it . it will now take it next question from David Risinger from Morgan Stanley . please go ahead , it line be open .  very much . two question please . first , with respect to the propose opioid settlement , could it just explain how Teva account for the $ 23 billion in free Suboxone over 10-years from a financial exposure standpoint in reserve . so how it book that into reserve or do not book that into reserve . and then , with respect to AJOVY , could it talk about potential formulary change in 2020 ? any opportunity to improve it position that it should know about ?  it .  it very much . it be go to take the first one and then Mike will probably add something to it and then Brendan will address the AJOVY formulary question . so if it go back to the half year announcement and at that point in time as it know , it have have the first settlement in Oklahoma . and the way the accounting rule be that if it have a settlement , but it do not know really what the end result will be for the whole issue , it have the partial settlement . then what it be suppose to do be it be suppose to assess what the likely outcome be . and if there be not one outcome , which be the most likely then it will pick the low end of the range of outcome that be sort of within the likely scenario . so since then , of course , it have have two thing happen . it have be settle the Track 1 in Cleveland . and then it have the framework , which still have not be finalize , mean that it do not have it sort of in a final form , and it do not know exactly how it be go to play out . so it have take all these thing into account , include the commitment to - under the framework , which it hope very much will come to fruition under the framework to commit to at WACC pricing deliver $ 23 billion of Suboxone . and that , all those element have be take into account and that have then lead to a range of possible outcome , and it have then make an accrual , which match the low end of the range . but , Mike , it be sure it do not get it all right , but if it have some further comment please . Michael McClellan no , it think it have get the substance , right . let it just get some of the mechanic to expand on it . so , it expectation be that it will book a reserve for the future cost of this settlement , whether it would be the cash cost or the cost of good . some of those element will be discount use an appropriate discount rate back to a present value . over time what it will see be , as inventory be produce and release , it will be take against that reserve as well as cash settlement as part of future cash outflow . and over the year , the reserve will then of course be evaluate on an ongoing basis for change in cost of good , change in interest rate , or any other thing that may change that liability . but it expectation be to eventually , once there be a final settlement , with everyone that it will see a much more clear number . as Kare mention , it have get a range of estimate at this point . and as nothing be more probable than any other point on the range at this point , it have book the minimum of what it expect . Brendan oâ€™grady so as far AJOVY formulary access for 2020 , it do not expect any major negative change to it formulary position go into 2020 . it have currently about 70 % of what it would call acceptable access or acceptable coverage and it hope to continue to improve that , especially as it go into 2020 . it have one major blue Plan come on that it know of in January 1st , so that will help . and although there be not a lot of volume in Medicaid and Medicare Part D , it be look at those segment as well in improve it coverage there also .  it . the next question come from the line of Ronny Gal from Bernstein . please ask it question . yes .  everybody . congratulation on the nice quarter . and Michael , it will miss it . it always enjoy work with it . if it do not mind , it have really get three , but all the same topic , which be roughly price . it be wonder about TRUXIMA , if it can let U.S. know what pricing come with , be it same WACC as the be innovator same as the current ASP ? and now that it be launch this product commercially , it be wonder if it can share with U.S. a bit more about the margin that it will be make on sale from it partnership with Celltrion . then just follow up with a couple of question that just come in , on the opioid settlement , can it let U.S. know if - or share with U.S. roughly what will be a drag on cash flow if the agreement with the four ag will actually end up be the agreement that pass the entire country as be , just so it can kind of model the probability . and the question of the pricing for 2019 , AUSTEDO and AJOVY now that it have the contract for 2019 , can it give U.S. a feel for the pricing trend . be it go up moderately or be it more of a flat or step down give the contracting situation . okay .  it , Ronny , for those question . the TRUXIMA question and the AUSTEDO question , it will leave for Brendan . but it will just handle the opioid first . and with regard to the opioid framework , it be really too early to give it a firm answer to this on the cash flow . and that be simply because it have not really - it could say , get to the fine print on it and there be a lot of detail about how will the ramp up on volume be and how will it actually be execute . so it be too early for U.S. to give it a number for the actual cash flow . assume that the framework result in a firm agreement , which it very much hope . as it say , to benefit of the american people and people suffer from addiction . it will of course update it as soon as that have happen with a more precise number both on the accrual and on the effect on cash flow . and then , Brendan over to it . Brendan oâ€™grady sure . so Ronny , the press release on TRUXIMA have the WACC price list in there , so it be $ 845.55 for the 100 milligram vial . it be $ 4,227.75 for the 500 milligram vial , which represent it believe about a 10 % below the reference brand WACC . of course , it will likely sell it for something less than that . it will not get into the exact specific of how it be go to do that and the channel it be go to do that and so forth . but it think it be kind of aware of how this will go . as far as the margin , this be a profit split between U.S. and Celltrion . celltrion be the developer . it submit the BLA to the FDA and it deal on it profit split acknowledge the partnership that it have with Celltrion . and so that be all it will comment about that . in regard to AUSTEDO and AJOVY . it think as it convert patient off of coupon card , it have more patient in pay prescription and it continue to improve it formulary access , it will see the margin on AJOVY improve and it do not see any real significant change to the margin on AUSTEDO as it head into 2020 . and maybe just to add on AUSTEDO , if it take price adjustment on AUSTEDO , it will be modest .  it . it next question come from the line of Ami Fadia of SVB Leerink . please ask it question .  . this be Eason Lee on for Ami .  it for take it question . just a couple of sort of on the customer . it know it have see some different biosimilar , Neulasta or Remicade launch in the U.S. priorly and it have get off to different ramp and it acknowledge that be sort of it - it know , different patient population and duration . so just give these dynamic , how be it sort of think about the launch curve of biosimilar Rituxan versus some of these ? and then just on some of the other biosimilar in it pipeline , Herceptin . when be this expect to hit the market ? and then maybe a broad question long term , what be it appetite to sort of bring on additional biosimilar into the pipeline ?  it . so it think it will address the last one , the broad question and then it will leave two specific one for Brendan . so long - term , it actually have an appetite for bring specific biosimilar to the marketplace . it realize that it be a unique situation product - by - product and it firmly believe that in order to be successful with the biosimilar in the U.S. marketplace , it need to have the - let it say commercial footprint and commercial insight in order to penetrate the market . in this case , as it say before with TRUXIMA , it believe that due to it long experience in the oncology space in the U.S. with several product in the marketplace and long - stand relationship with all the different part of the commercial value chain that it have a very good chance of do so . and it also believe that in the future , it will be able to do the same with many different biosimilar . so that be a part of it - it would say biopharmaceutical r&d strategy that it both work on innovative new biologic , but also on biosimilar . but on the specific , over to it Brendan . Brendan oâ€™grady yes . as it can imagine , there be be a lot of interest in the biosimilar launch of Rituxan . so there be be a lot of interest in TRUXIMA and of course Pfizer will follow on after it . so there would be two in the market here in the not too distant future . it think it still remain to be determine how pricing shake out , and what the uptake be on share . but it be fairly optimistic that it have the right mix to take advantage of it . if it think about who Teva be , as an organization , it have an oncology business that it be very familiar with and it have a product - it have a product in that portfolio , GRANIX that very much act like a biosimilar . and then it have of course the generic business . and the biosimilar be somewhere between a brand and generic . so it think it have the right commercial structure and the right strategy to fully take advantage of this marketplace , may be uniquely well than most . so it will see where it all go . it look forward to show it the result when it get there to February , but it be optimistic that it will see fairly good uptake in this market and may be well than what it have see with some past biosimilar . in regard to Herceptin , it think it ask when it be plan to launch Herceptin and it will be late Q1 , it believe , be the date for Herceptin , which be it - oh , it product be HERZUMA .  it . it next question come from the line of Dana Flanders of Guggenheim . please ask it question . hi ,  it very much for the question . it first be , Kare , it know it have talk about the U.S. generic business be about $ 4 billion in annual sale . and it know it can be lumpy . it seem to be trend low this year and it push out some launch . so , just can it comment on how much wiggle room do it see to that $ 4 billion number and would it expect launch next year to take that U.S. number back to that annualize $ 4 billion run rate ? and then just it second quick follow - up , on AJOVY and it recognize the importance of have an Auto - injector . Can it just talk about the need or lack thereof , of a primary care presence to really help drive an nrx back to where it would like to see it go .  .  it for the question . it will take the first one and then it think Brendan and it will share the second one . so in term of the $ 4 billion , it be important just to remember what it have be say all the time . it be say in North America , so if it look into detailed number , it be not the United States alone , it be United States and Canada . and as it know , it have a very strong generic business in Canada as well . so the north american business have a run rate , which be very close to $ 4 billion and as it say correctly , it can be a bit up and down per quarter . it think this year it will be very close to $ 4 billion in total and it have the same rough expectation for next year . it will give it more insight into that of course , when it come out with the guidance in February for 2020 . but it do see a strong and sustainable business . and it be absolutely right . some of it launch will get delay . order will move up . some will do better than expect when it finally get there like EpiPen and EpiPen Junior and - or else it will be disappoint that it get delay . so that be just the name of the game in generic . and with regard to AJOVY , it think it will let it go with that one , Brendan . Brendan oâ€™grady sure . so it have always look at the cgrp market , specifically the AJOVY launch as a two phase launch for it . so it launch the pre - filled syringe into the market . it see early - a lot of early quick demand , and of course in the last several month , it have see a decline in the new - to - brand share . and largely , it believe that that be due to patient preference of the Auto - injector . so when it speak to physician and it talk to it about AJOVY it be certainly very happy with the clinical profile , the side effect profile , the way patient respond to it and it really do not see much of a downside in the prefille syringe . in fact , one of the thing it continue to ask be that , be it go to keep the pre - filled syringe on the market once it launch the Auto - injector and of course it be because it see a big benefit of that . but when it put the product all three in front of a patient , it seem to prefer at a very high rate the Auto - injector over the pre - filled syringe . so it think that be largely what it be see . and that be the reason for the decrease in new to brand share . so it think when it launch the Auto - injector here in the coming month , it will see a continued bump and kind of the second curve up in the launch of AJOVY . but in regard to it comment about the primary care sale force , it actually do have a primary care sale force sell AJOVY . it have two sale force sell AJOVY . it have it neurology salesforce , which be call on headache center and neurologist and then it have what it call it specialty salesforce that be call on high decile primary care writer as well as non - neurology high decile headache specialist . so it feel that it have get the right promotional mix from a sale rep standpoint , but certainly it do not have as deep relationship with primary care as some of it competitor . so it think it be go to take U.S. a little bit longer to penetrate that market .  it . the next question come from the line of Chris Schott from JPMorgan . please ask it question . Christopher Schott great ,  very much for the question . just a follow - up on a few topic from before . maybe the first on AJOVY , it be obviously highlight the Auto - injector be drive reacceleration for the franchise . just help U.S. understand a little bit , how quickly post the Auto - injector launch do it expect it will see that uptick . so be it be monitor how important that be go to be for the franchise , be that something happen almost immediately or do it need to give this a quarter or two to evaluate ? and it second question be on TRUXIMA and that opportunity . be this largely a new start opportunity or do it think there be the potential to convert exist patient as well . and just a really quick one on taxis , it step up the tax rate to 18 % . be that a decent run rate to think about for Teva on a go - forward basis .  very much .  it for the question . it think Brendan it will take the first two and then Mike , it will take the last one . Brendan oâ€™grady sure . so as far as the Auto - injector with AJOVY , it do not expect that it will launch the Auto - injector and all of a sudden it will pop up to 30 % , 40 % , 50 % in new - to - brand share . it do think that it will see a steepening of the curve and it think that it will continue to see growth in AJOVY kind of back to the 20 % , 25 % , 30 % that it be look for as far as new - to - brand share . how long that take ? it do not know . it do not expect it to be immediate , but it expect it to continue to grow and climb into that 20 % to 30 % new - to - brand share that it be look for . as far as TRUXIMA go , it would expect that it will not see any conversion of patient currently on therapy . it think that whether it be TRUXIMA or whether it be any biosimilar in an oncology setting , be probably go to be mostly drive by new patient start . Michael McClellan yes . so when it come to the tax rate , it do say in the past that it would see some pressure on the tax rate and it would eventually get toward the 18 % . it have get there a little quick than it think . it think it would be more in the 16 % range this year . but it think 18 % be not a bad range for the next couple of year . in the outer year of course , maybe it will be able to bring it back down a little bit . but give it business and give the rule that it be deal with , with no significant new change in tax legislation , 2018 be a reasonable run rate for the next couple of year . Christopher Schott  it . the next question come from the line of Akash Tewari from Wolfe Research . please ask it question . hey ,  so much . so if it look at it long - term guidance projection for it operating margin . Can it , and it be just say it put it in consensus top line revenue projection , there seem to be an embed OpEx cut that be bake in over the next few year . maybe to the order of $ 500 million to $ 1 billion Can it give U.S. a sense of how much cost can still be cut out of Teva 's current cost structure and give kind of the pricing war it be see on cgrp , be that kind of possible ? and maybe on the other line for Teva 's U.S. business , can it give U.S. a sense of what the growth trajectory of that line be over the next few year and what the margin be for those product ?  . so , it will try and handle the first part and then it do not think it have much comment on the margin progress , but it will see , Mike will comment on that . so , if it look at it long - term financial target , then it have an operate margin target of 27 % , which be of course high than where it be right now . now , it have to imagine that that improvement would basically come from , it would say three main source . one be the gross margin improvement that again will come from the source of optimize the manufacturing network , which basically take down the cost of manufacture the product and thereby improve gross margin and then of course there be also a mix effect on the gross margin . when COPAXONE go down and the generic business be stable and then of course it gross margin go down . when COPAXONE have sort of flattened out and AUSTEDO and AJOVY be increase and it generic be roughly flat , then it gross margin go up . so those two element of course it help it . and then of course it have the ongoing optimization of the rest of it operational cost . and it be right , it have just take out $ 3 billion of the spend base and it can not do that once more , but of course , it can keep on look for optimization and improvement and it will be do so go forward . on the other line in the U.S. , do it comment on that ? it do not think so , Mike , but ... Michael McClellan no , it think it can see the basic sale trend there . this be all the remain product , many of which be already face generic competition . so it will see it slowly decline . it tend to have good operating margin because it do not invest behind these product . so that be something , but it can see over the course of the last couple of year that number have get down to a reasonable amount and it be be pretty stable throughout the year . so it will see a slow drag , but it be not go to fall off the face of the earth . great ,  so much .  it . the next question come from the line of Jason Gerberry of Bank of America . please ask it question . okay .  for take it question . just first Kare , just curious if it can comment at all on a Wall Street Journal report that come out a few month ago about the possibility that opioid manufacturer be contemplate opt into produce bankruptcy proceeding while not file for bankruptcy it but leverage that legal proceeding , which would seem to U.S. to potentially offer it experience and a consolidated legal mechanism to work with it counterpartie . so , just curious what be the impediment to that ? and then just secondly , on November 22nd , it think there be a deadline for party - municipality to opt into a negotiate class . curious if it view that as a major milestone in term of it ability to strike a settlement that be global and all - encompass with the political subdivision ?  .  for that question . so it be absolutely right that there be a theoretical opportunity of see the Purdue bankruptcy sort of be expand to cover the whole - it could say the situation on opioid litigation . however , as of today , that be not what it be pursue . as of today , it strongly hope and believe that the framework it have develop together with other defendant and together with the state ag that that be the most likely and the good way forward . as it have say before in this call , for the american population and also for the people who suffer from addiction . this will be , in it mind , the most constructive way to move forward . in the event that this would not work out . of course it be right , there be another legal framework , which would be some kind of participation it could say from a legal point of view of all this defendant in some overall resolution under the bankruptcy proceeding of the deal . it think that be really not what it see as the good solution right now . it think there be more momentum behind the general framework that it have develop together with the state ag .  it . the next question come from the line of David Amsellem of Piper Jaffray . please ask it question .  . so , on AJOVY just irrespective of the Auto - injector , do it think it need to contract more aggressively long term give how it competitor be contract , particularly Lilly with [ indiscernible ] be pretty aggressive in year one . so that be number one . and then number two , on AUSTEDO , it competitor be now go to be run a trial in Huntington 's chorea . so with that in mind , how do it see the competitive landscape , particularly in Huntington 's where it do have a unique label there . how do it see that evolve to the extent that INGREZZA get a label expansion for Huntington 's chorea long term ?  . okay . so it will take obviously the AJOVY question first . it think that if it look at it focus on AJOVY , it have be on profitability as well as access and share . and it think that it have take a little bit different approach in regard to access . but as it say earlier , it have 70 % acceptable access and it continue to - it hope to continue to grow that . so it do not think that it necessarily need a more aggressive contracting approach with AJOVY to be successful . it think again the Auto - injector be not go to be everything , but it do have kind of a revise commercial strategy and plan around AJJOVY , which include target physician and a whole host of thing . so it think that the gross to net be fine . and it think that it do not expect a new aggressive contracting play for access . as far as AUSTEDO go , it think that if it - it will see where INGREZZA go and it will cross that bridge when it get to it . but it be right , right now it be the only one with the hd indication . whether INGREZZA get that indication or not , it will see . if it do , AUSTEDO certainly have a foothold in that market position . seem to be fairly pleased with the way that AUSTEDO be work . so while it will increase competition and likely take some share , it have no idea how it will perform in that market .  it . next question come from the line of Umer Raffat from Evercore ISI . please go ahead , it line be now open . hi ,  so much for take it question . Kare , it want to ask a two - part question on opioid settlement framework . and it see two possible layer of alignment that still need to happen and would really appreciate if it could give it color on each of it . so the first one would be the rest of the 46 state ag and there be a lot of feedback that it be not fully align , it be not comfortable and it be curious if it could catch up on what exactly be the hold up there . second be the city and county , and it be particularly interested in it because it seem to it that state ag do appear to be focus on address the opioid crisis , so it be okay with take Suboxone supply whereas city and county be be represent by trial lawyer , who be primarily focused on it cut on the dollar settlement size . so in theory , those trial lawyer have no incentive to get the city and county to align unless there be dollar come it way . Would not that theoretically imply a deadlock . it be just try to understand be there a credible path toward resolution in the next six month or so ? yes . so that is , of course , very interesting question , which it can not completely answer in all detail , since it be not a party to all those discussion . but if it start from the overall situation , and as it say before , it believe that the framework that be on the table now that be the good possible way forward to serve the purpose of help the people suffer from addiction in the United States . it think the state ag see that . it think the other defendant and it , it see that , and there may be some subdivision who do not see it exactly that way , but hopefully at the end of the day , what will prevail will be what be good for the American Public and for the people suffer from addiction . and it could also say that if it be not resolve this way , just like the state ag outline it , it become a completely random game for which county , which city go first in the sort of sequence of sue and how much money do it actually get until potentially some people stop settle or stop pay . so it think it need a holistic solution here . it think it be to the benefit of everybody to do that way and it very much hope that that will be the case . next question come from the line of Gary Nachman from BMO Capital Markets . please go ahead .  . Kare it know it will give guidance in February . but with most of the year behind it , how be it think about 2019 as a potential trough year and an ability to return to growth next year , both in term of revenue and EBITDA . just give some of the major push and pull on that front . and then just one follow - up . part of that be that COPAXONE be hold up better than expect . so explain the dynamic there behind the scene and can that be maintain into next year ? what sort of decline should it be think about with that franchise .  .  , a very interesting question . it be absolutely right . as it have say , it guess since it join that it be go to do the restructuring and the decline of COPAXONE combine with the restructuring would actually , from a natural point of view , automatically result in more or less this year be the trough year . and as it know , a trough be flat at the bottom and that be what it be see right now in term of the development in revenue and the development in operating profit . it basically see the last quarter be very stable and it have the operating profit at the level of just above $ 1 billion and the revenue at the level of above $ 4 billion . now if it then think about next year , it be too early for U.S. to give guidance . and then it may ask , why can not it give guidance ? and one of the element be of course it second question COPAXONE , because it be right , it be see a stabilization . basically the last three quarter of COPAXONE have be very stable . it see a marginal decline in the TRx volume . it see a very stable development in Europe and there be a lot of move part in this . and if it take the U.S. first , then it can say that be the unknown factor of how be it go to have one more generic competitor in the 40 milligram COPAXONE . right now it do not have any evidence that it will have short - term , but it do not know when that will happen . so that be one swing factor . now that situation will also affect the contracting and the pricing . if there be no new competitor come in , then there be a high likelihood that the pricing environment will stay relatively stable . now that will have a positive effect on the outlook for COPAXONE from next year . if all of a sudden there be an approval of a third competitor , then of course that have a negative effect on the pricing environment . in term of share development , it look pretty steady . it do not expect any major upset there . in term of Europe , it have a situation where it have a patent that be be confirm in the european patent system , which basically mean that the 40 milligram be cover by european patent , as it speak . and that be of course a positive . on the other hand , it have a lot of dynamic on the actual country level in Europe . but all - in - all , it would say that there be some swing factor there . but right now it look positive . and then it have other element where it can speculate on the exact gross to net it will have on AJOVY , how will that whole thing develop , the exact progression path of AUSTEDO . it will . for sure , grow but exactly how much will it grow . and it have currency , how will it develop . but everything else be equal , unless it do not have a major negative happening , then it still firmly believe that it have the trough year and it will see a marginal improvement next year in it operating profit . it also have thing that could happen such as it have the orphan drug designation that Eagle get on BENDEKA which be protect both BENDEKA and TREANDA from generic competition . and that have be appeal and it have actually the court proceeding have happen and it be wait for the outcome of that litigation . it hope it will go Eagle 's way so that there would be no change to the situation that BENDEKA have an orphan drug designation . but that could also be a swing factor . so a lot of back and forth , but it would say , everything else be equal , it totally confirm that it expect this to be the trough year and that it will see a marginal improvement in the operating profit next year ."""

In [75]:
text

"it be a pleasure to review the third quarter highlight . it revenue come in at bit more than $ 4 billion , very much in line with the last three quarter as it have be discuss before . it be see now a nice stable development of it revenue . it GAAP dilute loss per share be $ 0.29 in the third quarter . this be primarily affect by the accrual for the opioid litigation . on a non - gaap basis , it diluted earning per share be $ 0.58 . the primary change there be a change to the tax - estimate tax for the full - year and that reduce the EPS by around $ 0.04 . the non - gaap EBITDA be around a bit more than $ 1 billion . it be very stable , again , in the last three quarter . so really the take home message here be , it be see the operational stabilization it have be talk about . it can see that the run rate on the operating profit be stable and it also see a nice cash flow of some $ 550 million in the third quarter . so all in all it be very happy about the financial result . commercially

In [76]:
result = nlp(question="What is the long term target?", context=context)

In [77]:
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")

Answer: '27 %', score: 0.9633, start: 44716, end: 44720


In [78]:
result = nlp(question="What is the situation on the opioid litigation?", context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")

Answer: 'it be sure that one thing that be on everybody 's mind', score: 0.2408, start: 6487, end: 6541


In [79]:
result = nlp(question="What result do you expect on the opioid litigation?", context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")

Answer: 'the final result of the Tourettes', score: 0.2999, start: 23078, end: 23111


In [80]:
result = nlp(question="Do you expect a provision for the opioid litigation?", context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")

Answer: '$ 460 million', score: 0.216, start: 9220, end: 9233


In [82]:
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import torch
tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
model = AutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
text = r"""
🤗 Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
TensorFlow 2.0 and PyTorch.
"""
questions = [
    "How many pretrained models are available in 🤗 Transformers?",
    "What does 🤗 Transformers provide?",
    "🤗 Transformers provides interoperability between which frameworks?",
]
for question in questions:
    inputs = tokenizer(question, text, add_special_tokens=True, return_tensors="pt")
    input_ids = inputs["input_ids"].tolist()[0]
    text_tokens = tokenizer.convert_ids_to_tokens(input_ids)
    answer_start_scores, answer_end_scores = model(**inputs)
    answer_start = torch.argmax(
        answer_start_scores
    )  # Get the most likely beginning of answer with the argmax of the score
    answer_end = torch.argmax(answer_end_scores) + 1  # Get the most likely end of answer with the argmax of the score
    answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))
    print(f"Question: {question}")
    print(f"Answer: {answer}")

HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=443.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=231508.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=1340675298.0), HTML(value='')))


Question: How many pretrained models are available in 🤗 Transformers?
Answer: over 32 +
Question: What does 🤗 Transformers provide?
Answer: general - purpose architectures
Question: 🤗 Transformers provides interoperability between which frameworks?
Answer: tensorflow 2 . 0 and pytorch


#### Tensorflow version

In [51]:
from transformers import AutoTokenizer, TFAutoModelForQuestionAnswering
import tensorflow as tf
tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
model = TFAutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
text = r"""
🤗 Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
TensorFlow 2.0 and PyTorch.
"""
questions = [
    "How many pretrained models are available in 🤗 Transformers?",
    "What does 🤗 Transformers provide?",
    "🤗 Transformers provides interoperability between which frameworks?",
]
for question in questions:
    inputs = tokenizer(question, text, add_special_tokens=True, return_tensors="tf")
    input_ids = inputs["input_ids"].numpy()[0]
    text_tokens = tokenizer.convert_ids_to_tokens(input_ids)
    answer_start_scores, answer_end_scores = model(inputs)
    answer_start = tf.argmax(answer_start_scores, axis=1).numpy()[0]  # Get the most likely beginning of answer with the argmax of the score
    answer_end = (tf.argmax(answer_end_scores, axis=1) + 1).numpy()[0]  # Get the most likely end of answer with the argmax of the score
    answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))
    print(f"Question: {question}")
    print(f"Answer: {answer}")

All model checkpoint weights were used when initializing TFBertForQuestionAnswering.

All the weights of TFBertForQuestionAnswering were initialized from the model checkpoint at bert-large-uncased-whole-word-masking-finetuned-squad.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForQuestionAnswering for predictions without further training.


Question: How many pretrained models are available in 🤗 Transformers?
Answer: over 32 +
Question: What does 🤗 Transformers provide?
Answer: general - purpose architectures
Question: 🤗 Transformers provides interoperability between which frameworks?
Answer: tensorflow 2 . 0 and pytorch


In [83]:
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import torch
tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
model = AutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
text = r"""
it be sure that one thing that be on everybody 's mind be the opioid litigation situation . it be happy to settle the Track 1 , but it be even more happy to see an agreement in principle with a group of Attorney Generals . it believe that the agreement in principle be the good way forward the patient or the people in the United States suffer from addiction . it believe it commitment to supply Suboxone generic for the next 10-years to all the people suffer from addiction who can use this product to get out of it addiction and it can be an element in that whole process . that be the good way forward . Teva post a quarterly GAAP loss of $ 314 million and a loss per share on a GAAP basis of $ 0.29 for the third quarter of 2019 . as it will detail in the next slide , the GAAP result be impact mainly by an update to it legal provision associate with the ongoing opioid litigation . so turn to Slide 16 , in the third quarter of 2019 , non - gaap adjustment amount to $ 951 million impact on net income . the adjustment come primarily from three item ; $ 460 million provision for legal settlement generally relate to the opioid litigation .
"""
questions = [
    "What is the situation on the opioid litigation?",
    "What result do you expect on the opioid litigation?",
    "Do you expect a provision for the opioid litigation?",
]
for question in questions:
    inputs = tokenizer(question, text, add_special_tokens=True, return_tensors="pt")
    input_ids = inputs["input_ids"].tolist()[0]
    text_tokens = tokenizer.convert_ids_to_tokens(input_ids)
    answer_start_scores, answer_end_scores = model(**inputs)
    answer_start = torch.argmax(answer_start_scores)  # Get the most likely beginning of answer with the argmax of the score
    answer_end = torch.argmax(answer_end_scores) + 1  # Get the most likely end of answer with the argmax of the score
    answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))
    print(f"Question: {question}")
    print(f"Answer: {answer}")

Question: What is the situation on the opioid litigation?
Answer: one thing that be on everybody ' s mind
Question: What result do you expect on the opioid litigation?
Answer: the gaap result be impact mainly by an update to it legal provision associate with the ongoing opioid litigation
Question: Do you expect a provision for the opioid litigation?
Answer: $ 460 million provision for legal settlement generally relate to the opioid litigation


#### Tensorflow version

In [60]:
from transformers import AutoTokenizer, TFAutoModelForQuestionAnswering
import tensorflow as tf
tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
model = TFAutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
text = r"""
it be sure that one thing that be on everybody 's mind be the opioid litigation situation . it be happy to settle the Track 1 , but it be even more happy to see an agreement in principle with a group of Attorney Generals . it believe that the agreement in principle be the good way forward the patient or the people in the United States suffer from addiction . it believe it commitment to supply Suboxone generic for the next 10-years to all the people suffer from addiction who can use this product to get out of it addiction and it can be an element in that whole process . that be the good way forward . Teva post a quarterly GAAP loss of $ 314 million and a loss per share on a GAAP basis of $ 0.29 for the third quarter of 2019 . as it will detail in the next slide , the GAAP result be impact mainly by an update to it legal provision associate with the ongoing opioid litigation . so turn to Slide 16 , in the third quarter of 2019 , non - gaap adjustment amount to $ 951 million impact on net income . the adjustment come primarily from three item ; $ 460 million provision for legal settlement generally relate to the opioid litigation .
"""
questions = [
    "What is the situation on the opioid litigation?",
    "What result do you expect on the opioid litigation?",
    "Do you expect a provision for the opioid litigation?",
]
for question in questions:
    inputs = tokenizer(question, text, add_special_tokens=True, return_tensors="tf")
    input_ids = inputs["input_ids"].numpy()[0]
    text_tokens = tokenizer.convert_ids_to_tokens(input_ids)
    answer_start_scores, answer_end_scores = model(inputs)
    answer_start = tf.argmax(answer_start_scores, axis=1).numpy()[0]  # Get the most likely beginning of answer with the argmax of the score
    answer_end = (tf.argmax(answer_end_scores, axis=1) + 1).numpy()[0]  # Get the most likely end of answer with the argmax of the score
    answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))
    print(f"Question: {question}")
    print(f"Answer: {answer}")

All model checkpoint weights were used when initializing TFBertForQuestionAnswering.

All the weights of TFBertForQuestionAnswering were initialized from the model checkpoint at bert-large-uncased-whole-word-masking-finetuned-squad.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForQuestionAnswering for predictions without further training.


Question: What is the situation on the opioid litigation?
Answer: one thing that be on everybody ' s mind
Question: What result do you expect on the opioid litigation?
Answer: the gaap result be impact mainly by an update to it legal provision associate with the ongoing opioid litigation
Question: Do you expect a provision for the opioid litigation?
Answer: $ 460 million provision for legal settlement generally relate to the opioid litigation


# Language Modeling
- Language modeling is the task of fitting a model to a corpus, which can be domain specific.
- All popular transformer-based models are trained using a variant of language modeling, e.g. BERT with masked language modeling, GPT-2 with causal language modeling (predict next word).

### Masked Language Modeling
- Masked language modeling is the task of masking tokens in a sequence with a masking token, and prompting the model to fill that mask with an appropriate token.
- This allows the model to attend to both the right context (tokens on the right of the mask) and the left context (tokens on the left of the mask).

In [84]:
from transformers import pipeline
nlp = pipeline("fill-mask")

HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=480.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=331070498.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=898823.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=456318.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=230.0), HTML(value='')))




Some weights of RobertaForMaskedLM were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['lm_head.decoder.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [85]:
from pprint import pprint
pprint(nlp(f"HuggingFace is creating a {nlp.tokenizer.mask_token} that the community uses to solve NLP tasks."))

[{'score': 0.17927466332912445,
  'sequence': '<s>HuggingFace is creating a tool that the community uses to '
              'solve NLP tasks.</s>',
  'token': 3944,
  'token_str': 'Ġtool'},
 {'score': 0.11349395662546158,
  'sequence': '<s>HuggingFace is creating a framework that the community uses '
              'to solve NLP tasks.</s>',
  'token': 7208,
  'token_str': 'Ġframework'},
 {'score': 0.05243542045354843,
  'sequence': '<s>HuggingFace is creating a library that the community uses to '
              'solve NLP tasks.</s>',
  'token': 5560,
  'token_str': 'Ġlibrary'},
 {'score': 0.03493538498878479,
  'sequence': '<s>HuggingFace is creating a database that the community uses '
              'to solve NLP tasks.</s>',
  'token': 8503,
  'token_str': 'Ġdatabase'},
 {'score': 0.028602542355656624,
  'sequence': '<s>HuggingFace is creating a prototype that the community uses '
              'to solve NLP tasks.</s>',
  'token': 17715,
  'token_str': 'Ġprototype'}]


In [93]:
from transformers import AutoModelWithLMHead, AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased")
model = AutoModelWithLMHead.from_pretrained("distilbert-base-cased")
sequence = f"Distilled models are smaller than the models they mimic. Using them instead of the large versions would help {tokenizer.mask_token} our carbon footprint."
input = tokenizer.encode(sequence, return_tensors="pt")
mask_token_index = torch.where(input == tokenizer.mask_token_id)[1]
token_logits = torch.Tensor(model(input)[0].float().tolist())
mask_token_logits = token_logits[0,mask_token_index, :]
top_5_tokens = torch.topk(mask_token_logits, 5, dim=1).indices[0].tolist()

In [94]:
for token in top_5_tokens:
    print(sequence.replace(tokenizer.mask_token, tokenizer.decode([token])))

Distilled models are smaller than the models they mimic. Using them instead of the large versions would help reduce our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help increase our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help decrease our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help offset our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help improve our carbon footprint.


#### Tensorflow version

In [66]:
from transformers import TFAutoModelWithLMHead, AutoTokenizer
import tensorflow as tf
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased")
model = TFAutoModelWithLMHead.from_pretrained("distilbert-base-cased")
sequence = f"Distilled models are smaller than the models they mimic. Using them instead of the large versions would help {tokenizer.mask_token} our carbon footprint."
input = tokenizer.encode(sequence, return_tensors="tf")
mask_token_index = tf.where(input == tokenizer.mask_token_id)[0, 1]
token_logits = model(input)[0]
mask_token_logits = token_logits[0, mask_token_index, :]
top_5_tokens = tf.math.top_k(mask_token_logits, 5).indices.numpy()

Some weights of the model checkpoint at distilbert-base-cased were not used when initializing TFDistilBertForMaskedLM: ['activation_13']
- This IS expected if you are initializing TFDistilBertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing TFDistilBertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of TFDistilBertForMaskedLM were not initialized from the model checkpoint at distilbert-base-cased and are newly initialized: ['activation_23']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


### Causal Language Modeling
- Causal language modeling is the task of predicting the token following a sequence of tokens.
- In this situation, the model only attends to the left context (tokens on the left of the mask). 
- Such a training is particularly interesting for generation tasks.

In [122]:
from transformers import AutoModelWithLMHead, AutoTokenizer, top_k_top_p_filtering
import torch
from torch.nn import functional as F
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelWithLMHead.from_pretrained("gpt2")
sequence = f"Hugging Face is based in DUMBO, New York City, and "
input_ids = tokenizer.encode(sequence, return_tensors="pt")
# get logits of last hidden state
next_token_logits = model(input_ids)[0][:,-1,:]                          #.logits[:, -1, :]
# filter
filtered_next_token_logits = top_k_top_p_filtering(next_token_logits, top_k=50, top_p=1.0)
# sample
probs = F.softmax(filtered_next_token_logits, dim=-1)
next_token = torch.multinomial(probs, num_samples=1)
generated = torch.cat([input_ids, next_token], dim=-1)
resulting_string = tokenizer.decode(generated.tolist()[0])

In [123]:
print(resulting_string)

Hugging Face is based in DUMBO, New York City, and  


#### Tensorflow version

In [67]:
from transformers import TFAutoModelWithLMHead, AutoTokenizer, tf_top_k_top_p_filtering
import tensorflow as tf
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = TFAutoModelWithLMHead.from_pretrained("gpt2")
sequence = f"Hugging Face is based in DUMBO, New York City, and "
input_ids = tokenizer.encode(sequence, return_tensors="tf")
# get logits of last hidden state
next_token_logits = model(input_ids)[0][:, -1, :]
# filter
filtered_next_token_logits = tf_top_k_top_p_filtering(next_token_logits, top_k=50, top_p=1.0)
# sample
next_token = tf.random.categorical(filtered_next_token_logits, dtype=tf.int32, num_samples=1)
generated = tf.concat([input_ids, next_token], axis=1)
resulting_string = tokenizer.decode(generated.numpy().tolist()[0])

HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=665.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=1042301.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=456318.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=497933648.0), HTML(value='')))




All model checkpoint weights were used when initializing TFGPT2LMHeadModel.

All the weights of TFGPT2LMHeadModel were initialized from the model checkpoint at gpt2.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.


In [68]:
print(resulting_string)

Hugging Face is based in DUMBO, New York City, and  


### Text Generation
- Text generation is currently possible with GPT-2, OpenAi-GPT, CTRL, XLNet, Transfo-XL and Reformer in PyTorch and for most models in Tensorflow as well. 
- XLNet and Transfo-XL often need to be padded to work well. 
- GPT-2 is usually a good choice for open-ended text generation because it was trained on millions of webpages with a causal language modeling objective.

In [124]:
from transformers import pipeline
text_generator = pipeline("text-generation")
print(text_generator("As far as I am concerned, I will", max_length=50, do_sample=False))

Some weights of GPT2Model were not initialized from the model checkpoint at gpt2 and are newly initialized: ['h.0.attn.masked_bias', 'h.1.attn.masked_bias', 'h.2.attn.masked_bias', 'h.3.attn.masked_bias', 'h.4.attn.masked_bias', 'h.5.attn.masked_bias', 'h.6.attn.masked_bias', 'h.7.attn.masked_bias', 'h.8.attn.masked_bias', 'h.9.attn.masked_bias', 'h.10.attn.masked_bias', 'h.11.attn.masked_bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=230.0), HTML(value='')))




Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'As far as I am concerned, I will be the first to admit that I am not a fan of the idea of a "free market." I think that the idea of a free market is a bit of a stretch. I think that the idea'}]


In [125]:
from transformers import AutoModelWithLMHead, AutoTokenizer
model = AutoModelWithLMHead.from_pretrained("xlnet-base-cased")
tokenizer = AutoTokenizer.from_pretrained("xlnet-base-cased")
# Padding text helps XLNet with short prompts - proposed by Aman Rusia in https://github.com/rusiaaman/XLNet-gen#methodology
PADDING_TEXT = """In 1991, the remains of Russian Tsar Nicholas II and his family
(except for Alexei and Maria) are discovered.
The voice of Nicholas's young son, Tsarevich Alexei Nikolaevich, narrates the
remainder of the story. 1883 Western Siberia,
a young Grigori Rasputin is asked by his father and a group of men to perform magic.
Rasputin has a vision and denounces one of the men as a horse thief. Although his
father initially slaps him for making such an accusation, Rasputin watches as the
man is chased outside and beaten. Twenty years later, Rasputin sees a vision of
the Virgin Mary, prompting him to become a priest. Rasputin quickly becomes famous,
with people, even a bishop, begging for his blessing. <eod> </s> <eos>"""
prompt = "Today the weather is really nice and I am planning on "
inputs = tokenizer.encode(PADDING_TEXT + prompt, add_special_tokens=False, return_tensors="pt")
prompt_length = len(tokenizer.decode(inputs[0], skip_special_tokens=True, clean_up_tokenization_spaces=True))
outputs = model.generate(inputs, max_length=250, do_sample=True, top_p=0.95, top_k=60)
generated = prompt + tokenizer.decode(outputs[0])[prompt_length:]

HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=760.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=467042463.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=798011.0), HTML(value='')))




In [126]:
print(generated)

Today the weather is really nice and I am planning on anning on hiking it out to the lake for a long hike, then going back to it, on time. I will definitely be back tomorrow and just plan to walk a little bit around the lake again. Then go out and do the two-mile trail. I am excited to see how it turned out. I will also be hiking around the lake this weekend. I will probably


#### Tensorflow version

In [73]:
from transformers import TFAutoModelWithLMHead, AutoTokenizer
model = TFAutoModelWithLMHead.from_pretrained("xlnet-base-cased")
tokenizer = AutoTokenizer.from_pretrained("xlnet-base-cased")
# Padding text helps XLNet with short prompts - proposed by Aman Rusia in https://github.com/rusiaaman/XLNet-gen#methodology
PADDING_TEXT = """In 1991, the remains of Russian Tsar Nicholas II and his family
(except for Alexei and Maria) are discovered.
The voice of Nicholas's young son, Tsarevich Alexei Nikolaevich, narrates the
remainder of the story. 1883 Western Siberia,
a young Grigori Rasputin is asked by his father and a group of men to perform magic.
Rasputin has a vision and denounces one of the men as a horse thief. Although his
father initially slaps him for making such an accusation, Rasputin watches as the
man is chased outside and beaten. Twenty years later, Rasputin sees a vision of
the Virgin Mary, prompting him to become a priest. Rasputin quickly becomes famous,
with people, even a bishop, begging for his blessing. <eod> </s> <eos>"""
prompt = "Today the weather is really nice and I am planning on "
inputs = tokenizer.encode(PADDING_TEXT + prompt, add_special_tokens=False, return_tensors="tf")
prompt_length = len(tokenizer.decode(inputs[0], skip_special_tokens=True, clean_up_tokenization_spaces=True))
outputs = model.generate(inputs, max_length=250, do_sample=True, top_p=0.95, top_k=60)
generated = prompt + tokenizer.decode(outputs[0])[prompt_length:]

All model checkpoint weights were used when initializing TFXLNetLMHeadModel.

All the weights of TFXLNetLMHeadModel were initialized from the model checkpoint at xlnet-base-cased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFXLNetLMHeadModel for predictions without further training.


In [74]:
print(generated)

Today the weather is really nice and I am planning on anning on<unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk>


# Named Entity Recognition
- Named Entity Recognition (NER) is the task of classifying tokens according to a class, for example, identifying a token as a person, an organisation or a location. 
- example of using pipelines to do named entity recognition, specifically, trying to identify tokens as belonging to one of 9 classes:
    - O, Outside of a named entity
    - B-MIS, Beginning of a miscellaneous entity right after another miscellaneous entity
    - I-MIS, Miscellaneous entity
    - B-PER, Beginning of a person’s name right after another person’s name
    - I-PER, Person’s name
    - B-ORG, Beginning of an organisation right after another organisation
    - I-ORG, Organisation
    - B-LOC, Beginning of a location right after another location
    - I-LOC, Location

In [127]:
from transformers import pipeline
nlp = pipeline("ner")
sequence = "Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO, therefore very close to the Manhattan Bridge which is visible from the window."
pprint(nlp(sequence))

HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=998.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=1334448817.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=213450.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=60.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=230.0), HTML(value='')))


[{'entity': 'I-ORG', 'index': 1, 'score': 0.999578595161438, 'word': 'Hu'},
 {'entity': 'I-ORG',
  'index': 2,
  'score': 0.9909763932228088,
  'word': '##gging'},
 {'entity': 'I-ORG', 'index': 3, 'score': 0.9982224702835083, 'word': 'Face'},
 {'entity': 'I-ORG', 'index': 4, 'score': 0.9994880557060242, 'word': 'Inc'},
 {'entity': 'I-LOC', 'index': 11, 'score': 0.9994345307350159, 'word': 'New'},
 {'entity': 'I-LOC', 'index': 12, 'score': 0.9993196129798889, 'word': 'York'},
 {'entity': 'I-LOC', 'index': 13, 'score': 0.9993793964385986, 'word': 'City'},
 {'entity': 'I-LOC', 'index': 19, 'score': 0.9862582683563232, 'word': 'D'},
 {'entity': 'I-LOC', 'index': 20, 'score': 0.951427161693573, 'word': '##UM'},
 {'entity': 'I-LOC', 'index': 21, 'score': 0.9336591362953186, 'word': '##BO'},
 {'entity': 'I-LOC',
  'index': 28,
  'score': 0.9761654138565063,
  'word': 'Manhattan'},
 {'entity': 'I-LOC',
  'index': 29,
  'score': 0.9914628863334656,
  'word': 'Bridge'}]


In [132]:
from transformers import AutoModelForTokenClassification, AutoTokenizer
import torch
model = AutoModelForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english")
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
label_list = [
    "O",       # Outside of a named entity
    "B-MISC",  # Beginning of a miscellaneous entity right after another miscellaneous entity
    "I-MISC",  # Miscellaneous entity
    "B-PER",   # Beginning of a person's name right after another person's name
    "I-PER",   # Person's name
    "B-ORG",   # Beginning of an organisation right after another organisation
    "I-ORG",   # Organisation
    "B-LOC",   # Beginning of a location right after another location
    "I-LOC"    # Location
]
sequence = "Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO, therefore very" \
           "close to the Manhattan Bridge."
# Bit of a hack to get the tokens with the special tokens
tokens = tokenizer.tokenize(tokenizer.decode(tokenizer.encode(sequence)))
inputs = tokenizer.encode(sequence, return_tensors="pt")
outputs = model(inputs)[0]     #.logits
predictions = torch.argmax(outputs, dim=2)

pprint([(token, label_list[prediction]) for token, prediction in zip(tokens, predictions[0].numpy())])

[('[CLS]', 'O'),
 ('Hu', 'I-ORG'),
 ('##gging', 'I-ORG'),
 ('Face', 'I-ORG'),
 ('Inc', 'I-ORG'),
 ('.', 'O'),
 ('is', 'O'),
 ('a', 'O'),
 ('company', 'O'),
 ('based', 'O'),
 ('in', 'O'),
 ('New', 'I-LOC'),
 ('York', 'I-LOC'),
 ('City', 'I-LOC'),
 ('.', 'O'),
 ('Its', 'O'),
 ('headquarters', 'O'),
 ('are', 'O'),
 ('in', 'O'),
 ('D', 'I-LOC'),
 ('##UM', 'I-LOC'),
 ('##BO', 'I-LOC'),
 (',', 'O'),
 ('therefore', 'O'),
 ('very', 'O'),
 ('##c', 'O'),
 ('##lose', 'O'),
 ('to', 'O'),
 ('the', 'O'),
 ('Manhattan', 'I-LOC'),
 ('Bridge', 'I-LOC'),
 ('.', 'O'),
 ('[SEP]', 'O')]


#### Tensorflow version

In [82]:
from transformers import TFAutoModelForTokenClassification, AutoTokenizer
import tensorflow as tf
model = TFAutoModelForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english")
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
label_list = [
    "O",       # Outside of a named entity
    "B-MISC",  # Beginning of a miscellaneous entity right after another miscellaneous entity
    "I-MISC",  # Miscellaneous entity
    "B-PER",   # Beginning of a person's name right after another person's name
    "I-PER",   # Person's name
    "B-ORG",   # Beginning of an organisation right after another organisation
    "I-ORG",   # Organisation
    "B-LOC",   # Beginning of a location right after another location
    "I-LOC"    # Location
]
sequence = "Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO, therefore very" \
           "close to the Manhattan Bridge."
# Bit of a hack to get the tokens with the special tokens
tokens = tokenizer.tokenize(tokenizer.decode(tokenizer.encode(sequence)))
inputs = tokenizer.encode(sequence, return_tensors="tf")
outputs = model(inputs)[0]
predictions = tf.argmax(outputs, axis=2)

pprint([(token, label_list[prediction]) for token, prediction in zip(tokens, predictions[0].numpy())])




Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing TFBertForTokenClassification: ['dropout_147']
- This IS expected if you are initializing TFBertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing TFBertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of TFBertForTokenClassification were not initialized from the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english and are newly initialized: ['dropout_1210']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[('[CLS]', 'O'),
 ('Hu', 'I-ORG'),
 ('##gging', 'I-ORG'),
 ('Face', 'I-ORG'),
 ('Inc', 'I-ORG'),
 ('.', 'O'),
 ('is', 'O'),
 ('a', 'O'),
 ('company', 'O'),
 ('based', 'O'),
 ('in', 'O'),
 ('New', 'I-LOC'),
 ('York', 'I-LOC'),
 ('City', 'I-LOC'),
 ('.', 'O'),
 ('Its', 'O'),
 ('headquarters', 'O'),
 ('are', 'O'),
 ('in', 'O'),
 ('D', 'I-LOC'),
 ('##UM', 'I-LOC'),
 ('##BO', 'I-LOC'),
 (',', 'O'),
 ('therefore', 'O'),
 ('very', 'O'),
 ('##c', 'O'),
 ('##lose', 'O'),
 ('to', 'O'),
 ('the', 'O'),
 ('Manhattan', 'I-LOC'),
 ('Bridge', 'I-LOC'),
 ('.', 'O'),
 ('[SEP]', 'O')]


- tokens of the sequence “Hugging Face” have been identified as an organisation, and “New York City”, “DUMBO” and “Manhattan Bridge” have been identified as locations
- Differently from the pipeline, here every token has a prediction as we didn’t remove the “0”th class, which means that no particular entity was found on that token

# Summarization
- Summarization is the task of summarizing a document or an article into a shorter text.
- example of a summarization dataset is the CNN / Daily Mail dataset, which consists of long news articles and was created for the task of summarization.

In [133]:
from transformers import pipeline
summarizer = pipeline("summarization")
ARTICLE = """ New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York.
A year later, she got married again in Westchester County, but to a different man and without divorcing her first husband.
Only 18 days after that marriage, she got hitched yet again. Then, Barrientos declared "I do" five more times, sometimes only within two weeks of each other.
In 2010, she married once more, this time in the Bronx. In an application for a marriage license, she stated it was her "first and only" marriage.
Barrientos, now 39, is facing two criminal counts of "offering a false instrument for filing in the first degree," referring to her false statements on the
2010 marriage license application, according to court documents.
Prosecutors said the marriages were part of an immigration scam.
On Friday, she pleaded not guilty at State Supreme Court in the Bronx, according to her attorney, Christopher Wright, who declined to comment further.
After leaving court, Barrientos was arrested and charged with theft of service and criminal trespass for allegedly sneaking into the New York subway through an emergency exit, said Detective
Annette Markowski, a police spokeswoman. In total, Barrientos has been married 10 times, with nine of her marriages occurring between 1999 and 2002.
All occurred either in Westchester County, Long Island, New Jersey or the Bronx. She is believed to still be married to four men, and at one time, she was married to eight men at once, prosecutors say.
Prosecutors said the immigration scam involved some of her husbands, who filed for permanent residence status shortly after the marriages.
Any divorces happened only after such filings were approved. It was unclear whether any of the men will be prosecuted.
The case was referred to the Bronx District Attorney\'s Office by Immigration and Customs Enforcement and the Department of Homeland Security\'s
Investigation Division. Seven of the men are from so-called "red-flagged" countries, including Egypt, Turkey, Georgia, Pakistan and Mali.
Her eighth husband, Rashid Rajput, was deported in 2006 to his native Pakistan after an investigation by the Joint Terrorism Task Force.
If convicted, Barrientos faces up to four years in prison.  Her next court appearance is scheduled for May 18.
"""
print(summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False))

HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=1621.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=1222317369.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=898822.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=456318.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=26.0), HTML(value='')))


[{'summary_text': ' Liana Barrientos, 39, is charged with two counts of "offering a false instrument for filing in the first degree" In total, she has been married 10 times, with nine of her marriages occurring between 1999 and 2002 . At one time, she was married to eight men at once, prosecutors say .'}]


In [140]:
from transformers import AutoModelWithLMHead, AutoTokenizer
model = AutoModelWithLMHead.from_pretrained("t5-base")
tokenizer = AutoTokenizer.from_pretrained("t5-base")
# T5 uses a max_length of 512 so we cut the article to 512 tokens.
inputs = tokenizer.encode("summarize: " + ARTICLE, return_tensors="pt", max_length=512)
outputs = model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)
print(tokenizer.decode(outputs[0]))
print(outputs)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


prosecutors say the marriages were part of an immigration scam. if convicted, barrientos faces two criminal counts of "offering a false instrument for filing in the first degree" she has been married 10 times, nine of them between 1999 and 2002.
tensor([[    0,     3, 29905,   497,     8,  5281,     7,   130,   294,    13,
            46, 10653, 13236,     3,     5,     3,    99,     3, 21217,     6,
          1207,  3483,   235,     7,  8519,   192,  4336, 12052,    13,    96,
          1647,    49,    53,     3,     9,  6136,  5009,    21,  9479,    16,
             8,   166,  1952,   121,   255,    65,   118,  4464,   335,   648,
             6,  4169,    13,   135,   344,  5247,    11,  4407,     3,     5,
             1]])


#### Tensorflow version

In [138]:
from transformers import TFAutoModelWithLMHead, AutoTokenizer
model = TFAutoModelWithLMHead.from_pretrained("t5-base")
tokenizer = AutoTokenizer.from_pretrained("t5-base")
# T5 uses a max_length of 512 so we cut the article to 512 tokens.
inputs = tokenizer.encode("summarize: " + ARTICLE, return_tensors="tf", max_length=512, truncation=True)
outputs = model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)
print(tokenizer.decode(outputs[0]))

ImportError: 
TFAutoModelWithLMHead requires the TensorFlow library but it was not found in your environment. Checkout the instructions on the
installation page: https://www.tensorflow.org/install and follow the ones that match your environment.


# Translation
- Translation is the task of translating a text from one language to another.
- example of a translation dataset is the WMT English to German dataset, which has sentences in English as the input data and the corresponding sentences in German as the target data.
- leverages a T5 model that was only pre-trained on a multi-task mixture dataset (including WMT), yet, yielding impressive translation results.

In [141]:
from transformers import pipeline
translator = pipeline("translation_en_to_fr")
print(translator("Hugging Face is a technology company based in New York and Paris", max_length=40))

Some weights of T5Model were not initialized from the model checkpoint at t5-base and are newly initialized: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=230.0), HTML(value='')))


[{'translation_text': 'Hugging Face est une entreprise technologique basée à New York et à Paris.'}]


In [145]:
from transformers import AutoModelWithLMHead, AutoTokenizer
model = AutoModelWithLMHead.from_pretrained("t5-base")
tokenizer = AutoTokenizer.from_pretrained("t5-base")
inputs = tokenizer.encode("translate English to French: On the cost side, we've well monitored our programming cost with a particular decrease in the unit price of the movies and U.S", return_tensors="pt")
outputs = model.generate(inputs, max_length=40, num_beams=4, early_stopping=True)

In [146]:
print(tokenizer.decode(outputs[0]))

Du côté des coûts, nous avons bien surveillé nos coûts de programmation, avec une diminution particulière du prix unitaire des films et des films américains


#### Tensorflow version

In [144]:
from transformers import TFAutoModelWithLMHead, AutoTokenizer
model = TFAutoModelWithLMHead.from_pretrained("t5-base")
tokenizer = AutoTokenizer.from_pretrained("t5-base")
inputs = tokenizer.encode("translate English to French: Hugging Face is a technology company based in New York and Paris", return_tensors="tf")
outputs = model.generate(inputs, max_length=40, num_beams=4, early_stopping=True)

ImportError: 
TFAutoModelWithLMHead requires the TensorFlow library but it was not found in your environment. Checkout the instructions on the
installation page: https://www.tensorflow.org/install and follow the ones that match your environment.


# Summary of the models
- **Autoregressive** models are pretrained on the classic language modeling task: guess the next token having read all the previous ones. They correspond to the decoder of the original transformer model, and a mask is used on top of the full sentence so that the attention heads can only see what was before in the next, and not what’s after. Although those models can be fine-tuned and achieve great results on many tasks, the most natural application is text generation. A typical example of such models is GPT, GPT2, CTRL, Trasnformer-XL, XLNet
- **Autoencoding** models are pretrained by corrupting the input tokens in some way and trying to reconstruct the original sentence. They correspond to the encoder of the original transformer model in the sense that they get access to the full inputs without any mask. Those models usually build a bidirectional representation of the whole sentence. They can be fine-tuned and achieve great results on many tasks such as text generation, but their most natural application is sentence classification or token classification. A typical example of such models is BERT, ALBERT, ROBERTA, DistilBert...
- the only difference between autoregressive models and autoencoding models is in the way the model is pretrained. Therefore, the same architecture can be used for both autoregressive and autoencoding models. 
- **Sequence-to-sequence** models use both the encoder and the decoder of the original transformer, either for translation tasks or by transforming other tasks to sequence-to-sequence problems. They can be fine-tuned to many tasks but their most natural applications are translation, summarization and question answering. The original transformer model is an example of such a model (only for translation), T5 is an example that can be fine-tuned on other tasks. BART,...
- **Multimodal** models mix text inputs with other kinds (e.g. images) and are more specific to a given task. Ex: MMBT
- https://huggingface.co/transformers/v3.1.0/model_summary.html


- CRTL model: Conditional Transformer Language Model for Controllable Generation
- CTRL is a 1.6 billion-parameter language model with powerful and controllable artificial text generation that can predict which subset of the training data most influenced a generated text sequence.
- https://blog.einstein.ai/introducing-a-conditional-transformer-language-model-for-controllable-generation/
- https://github.com/salesforce/ctrl