Demystifying LLMs

- RoBerta Model

<img src="https://www.researchgate.net/publication/358563215/figure/fig1/AS:1148929689305141@1650937592915/RoBERTa-masked-language-modeling-with-the-input-sentence-The-cat-is-eating-some-food.png" alt="MLM" width="700">

- It is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. 
- It was pretrained on the raw texts only, with no humans labelling them in any way with an automatic process to generate inputs and labels from those texts.
- Masked Language Modeling
- It uses MLM objective.
    - It randomly masks 15% of the words in the input, then run the entire masked sentence through the model and has to predict the masked words.
    - This is different from autoregressive models like GPT which internally mask the future tokens. It allows the model to learn a bidirectional representation of the sentence.
    - This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard classifier using the features produced by the BERT model as inputs.

- roberta-base-squad2 model
    - It fine-tuned using the SQuAD2.0 dataset. It's been trained on question-answer pairs, including unanswerable questions, for the task of Question Answering.


In [22]:
from transformers import pipeline
unmasker = pipeline('fill-mask', model='roberta-base')
unmasker("Hello I'm a <mask> model.")

[{'score': 0.33065298199653625,
  'token': 2943,
  'token_str': ' male',
  'sequence': "Hello I'm a male model."},
 {'score': 0.0465540736913681,
  'token': 2182,
  'token_str': ' female',
  'sequence': "Hello I'm a female model."},
 {'score': 0.042330000549554825,
  'token': 2038,
  'token_str': ' professional',
  'sequence': "Hello I'm a professional model."},
 {'score': 0.03721684217453003,
  'token': 2734,
  'token_str': ' fashion',
  'sequence': "Hello I'm a fashion model."},
 {'score': 0.03253638744354248,
  'token': 1083,
  'token_str': ' Russian',
  'sequence': "Hello I'm a Russian model."}]

In [23]:
from transformers import AutoModelForQuestionAnswering, AutoTokenizer

model_name = "deepset/roberta-base-squad2"

# a) Get predictions
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
qa_input_1 = {
    'question': 'Why is model conversion important?',
    'context': 'The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.'
}
res = nlp(qa_input_1)
print(res)
# # b) Load model & tokenizer
# model = AutoModelForQuestionAnswering.from_pretrained(model_name)
# tokenizer = AutoTokenizer.from_pretrained(model_name)

{'score': 0.2117147445678711, 'start': 59, 'end': 84, 'answer': 'gives freedom to the user'}


In [24]:
qa_input_2 = {'context' : """
🤗 Transformers is backed by the three most popular deep learning libraries — Jax, PyTorch and TensorFlow — with a seamless integration
between them. It's straightforward to train your models with one before loading them for inference with the other.
""",
'question' : "Which deep learning libraries back 🤗 Transformers?"}

In [25]:
res = nlp(qa_input_2)
res


{'score': 0.9416548013687134,
 'start': 78,
 'end': 105,
 'answer': 'Jax, PyTorch and TensorFlow'}

In [1]:
!pip install flash-attn --no-build-isolation


Collecting flash-attn
  Downloading flash_attn-2.5.5.tar.gz (2.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
Collecting einops (from flash-attn)
  Downloading einops-0.7.0-py3-none-any.whl.metadata (13 kB)
Downloading einops-0.7.0-py3-none-any.whl (44 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.6/44.6 kB[0m [31m271.3 kB/s[0m eta [36m0:00:00[0m[36m0:00:01[0m
[?25hBuilding wheels for collected packages: flash-attn
  Building wheel for flash-attn (setup.py) ... [?25ldone
[?25h  Created wheel for flash-attn: filename=flash_attn-2.5.5-cp39-cp39-linux_x86_64.whl size=121949693 sha256=e44a2d0ca01cc68ce84596f869f70479c268f27ac078486c6c02cfcfb17c1697
  Stored in directory: /home/mayur/.cache/pip/wheels/34/2c/5c/5eead9cd104af1ced6f6fa8c3b3b54d5369d5e832e5d724681
Successfully built flash-attn
[33mDEPRECATION: pyodb