## <font color='#475468'> Pretrained Models:</font>
### <font color='#475468'> Can you speed up your efforts using pretrained models?</font>

# Transfomers

Pretrained chains of models that perform specific tasks

## Initialize

In [1]:
!pip install transformers



In [2]:
# Use pipelines to access pre-trained models
from transformers import pipeline

## Sentiment Analysis

Let's test the code using real reviews from goodreads.com.

In [21]:
# Model
mdlSnt = pipeline('sentiment-analysis')

# Parameters
#prmStatement = 'We are happy to go on vacation this spring break.'
prmStatement = ['Fantastically Written? Ooooh yeah! Compelling? Yup! Super Quick Read? Most definitely! Original? No.',
        'this was a stupid fucking book and i never ever want to read it again. this ripped my heart out, tore it into little pieces, crushed them and threw them away.']
# Predict
mdlSnt(prmStatement)

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


[{'label': 'POSITIVE', 'score': 0.9986931681632996},
 {'label': 'NEGATIVE', 'score': 0.989534318447113}]

Can perform sentiment analysis in other languages as well using `model = 'nlptown/bert-base-multilingual-uncased-sentiment'`
I used a real review from goodreads.com.

In [22]:
# Model
mdlSnt = pipeline('sentiment-analysis', model = 'nlptown/bert-base-multilingual-uncased-sentiment')

# Parameters
prmStatement = 'Başlarda kitabı sevsem de ilerledikçe sinir krizi geçirdiğim yerler oldu.'

# Predict
mdlSnt(prmStatement)

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


[{'label': '2 stars', 'score': 0.4358760416507721}]

The result is in stars as opposed to score... this is as per the model definition.  See model details in HF for more details.

In [25]:
pip install -U sentence-transformers

Collecting sentence-transformers
  Downloading sentence_transformers-3.0.1-py3-none-any.whl.metadata (10 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.11.0->sentence-transformers)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.11.0->sentence-transformers)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1.11.0->sentence-transformers)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch>=1.11.0->sentence-transformers)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch>=1.11.0->sentence-transformers)
  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl.met

In [26]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")
sentences = [
    "The weather is lovely today.",
    "It's so sunny outside!",
    "He drove to the stadium.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

(3, 384)


We now have a numpy arrays with the embeddings, one for each text. We can use these to compute similarities.

In [27]:
similarities = model.similarity(embeddings, embeddings)
print(similarities)

tensor([[1.0000, 0.6660, 0.1046],
        [0.6660, 1.0000, 0.1411],
        [0.1046, 0.1411, 1.0000]])


## Question Answering

### Example 1

In [14]:
from google.colab import drive
drive.mount('/content/drive')
# Model
mdlQa = pipeline("question-answering")

#context = pd.read_txt('/content/drive/MyDrive/Colab Notebooks/blueberry.txt')

# Parameters
f = open("/content/drive/MyDrive/Colab Notebooks/blueberry.txt", "r")
context = f.read()
question = "What are the ingredients?"

# Predict
mdlQa(question = question, context = context)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


{'score': 0.9104288220405579,
 'start': 1258,
 'end': 1298,
 'answer': 'buttermilk, sour cream, vanilla and salt'}

### Example 2

In [15]:
# Model
mdlQa = pipeline("question-answering")

# Parameters
f = open("/content/drive/MyDrive/Colab Notebooks/avocado.txt", "r")
context = f.read()
question = "What to stir to the soup?"

# Predict
mdlQa(question = question, context = context)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


{'score': 0.10655605792999268,
 'start': 104,
 'end': 131,
 'answer': 'onion, chile and the garlic'}