[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1DbnHZivcrvqTdd0H6AmdFiFLfLAu_7ft?usp=sharing)

# Pretrained models in HuggingFace - Overview Notebook

This notebook is a self-contained way to start using transformers.

- https://github.com/nlp-with-transformers/notebooks/blob/main/01_introduction.ipynb

**Learning goals:** The goal of this tutorial is to learn How To

1. Use pre-trained pipelines
2. Get embeddings
3. Build a multimodal models

**Steps to Do:** How to best use this notebook

1. Make a copy of this notebook, so you can keep your changes



In [None]:
%pip install --quiet transformers datasets sentence-transformers

## Pre-Trained Models with Pipelines -> ✨ Easy Mode ✨

The [pipeline()](https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#transformers.pipeline) supports many 20+ common tasks out-of-the-box:

**Text**:
* Sentiment analysis: classify the polarity of a given text.
* Text generation (in English): generate text from a given input.
* Name entity recognition (NER): label each word with the entity it represents (person, date, location, etc.).
* Question answering: extract the answer from the context, given some context and a question.


**Audio**:
* Audio classification: assign a label to a given segment of audio.
* Automatic speech recognition (ASR): transcribe audio data into text.

**MultiModal**:
* Visual Question Answering: answers open-ended questions about images
* Image To Text: predicts a caption for a given image

### Sentiment Analysis

In [None]:
from transformers import pipeline
sent_classifier = pipeline("sentiment-analysis")

In [None]:
sent_classifier("I am sad about today")

#### Using tokenizer and transformers

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

checkpoint = 'nlptown/bert-base-multilingual-uncased-sentiment'

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

In [None]:
text = "La mesa buenisima relacion precio muy recomendable."

inputs = tokenizer(text, padding=True, truncation=True, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
predicted_class = logits.argmax().item()

print(model.config.id2label[predicted_class])

### Text Generation

If you want to see what other tasks are available, check out all the [pipeline tasks](https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#the-task-specific-pipelines) in the docs.

In [None]:
from transformers import pipeline
generator = pipeline("text-generation")

In [None]:
generator("Once upon a time,")

In [None]:
generator("In this course, we will teach you how to", max_length=200, truncation=False)

### Name entity recognition

In [None]:
sample_text = """Dear Amazon, last week I ordered an Optimus Prime action figure \
from your online store in Germany. Unfortunately, when I opened the package, \
I discovered to my horror that I had been sent an action figure of Megatron \
instead! As a lifelong enemy of the Decepticons, I hope you can understand my \
dilemma. To resolve the issue, I demand an exchange of Megatron for the \
Optimus Prime figure I ordered. Enclosed are copies of my records concerning \
this purchase. I expect to hear from you soon. Sincerely, Bumblebee."""

In [None]:
from transformers import pipeline
import pandas as pd

ner_tagger = pipeline("ner", aggregation_strategy="simple")
outputs = ner_tagger(sample_text)
pd.DataFrame(outputs)

### Summarization

In [None]:
sample_text = """Dear Amazon, last week I ordered an Optimus Prime action figure \
from your online store in Germany. Unfortunately, when I opened the package, \
I discovered to my horror that I had been sent an action figure of Megatron \
instead! As a lifelong enemy of the Decepticons, I hope you can understand my \
dilemma. To resolve the issue, I demand an exchange of Megatron for the \
Optimus Prime figure I ordered. Enclosed are copies of my records concerning \
this purchase. I expect to hear from you soon. Sincerely, Bumblebee."""

In [None]:
from transformers import pipeline

summarizer = pipeline("summarization")
outputs = summarizer(sample_text, max_length=45, clean_up_tokenization_spaces=True)
print(outputs[0]['summary_text'])

### MultiModal

In [None]:
from IPython.display import Image

# get image in PIL format
imagepic = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
Image(imagepic)

In [None]:
from transformers import AutoModelForVisualQuestionAnswering

vqa_pipeline = pipeline("visual-question-answering")
vqa = vqa_pipeline(image=imagepic,
                   question = "What is the weather like")
                  # question = "What color are the bushes")
vqa

### Text Embeddings using Transformers

In [None]:
from transformers import pipeline
checkpoint = "facebook/bart-base"
pipeline = pipeline("feature-extraction",framework="pt",model=checkpoint)
text = "Transformers is an awesome library!"

In [None]:
embeddings = pipeline(text,return_tensors = "pt")[0].numpy().mean(axis=0)
embeddings

### Text Embeddings using Sentence Transformers

There are many embedding models, the [all-mpnet-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) model is generally recommended as a good all around model. A more lightweight embedding model is the [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). For a comprehensive analysis of embedding models, take a look at the [Massive Text Embedding Benchmark leaderboard](https://huggingface.co/spaces/mteb/leaderboard).

In [None]:
from sentence_transformers import SentenceTransformer
modelst = SentenceTransformer('paraphrase-MiniLM-L6-v2')
sentence = ['It is a rainy and snowy day in Chicago']
embedding = modelst.encode(sentence)
embedding.shape

In [None]:
embedding