<a href="https://colab.research.google.com/github/cagBRT/promptEngineering/blob/main/1_HuggingFace.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

https://github.com/huggingface/notebooks/blob/main/transformers_doc/quicktour.ipynb

# Quick tour<br>
Get up and running with 🤗 Transformers! Start using the pipeline() for rapid inference, and quickly load a pretrained model and tokenizer with an AutoClass to solve your text, vision or audio task.

All code examples presented in the documentation have a toggle on the top left for PyTorch and TensorFlow. If not, the code is expected to work for both backends without any change.

In [None]:
# Transformers installation
! pip install transformers datasets
# To install from source instead of the last release, comment the command above and uncomment the following one.
# ! pip install git+https://github.com/huggingface/transformers.git
     

In [None]:
!pip install -U datasets

# Pipeline<br>
pipeline() is the easiest way to use a pretrained model for a given task.

In [None]:

#@title
from IPython.display import HTML

HTML('')


The pipeline() supports many common tasks out-of-the-box:

**Text:**

Sentiment analysis: classify the polarity of a given text.<br>
Text generation (in English): generate text from a given input.<br>
Name entity recognition (NER): label each word with the entity it represents (person, date, location, etc.).<br>
Question answering: extract the answer from the context, given some context and a question.<br>
Fill-mask: fill in the blank given a text with masked words.<br>
Summarization: generate a summary of a long sequence of text or document.<br>
Translation: translate text into another language.<br>
Feature extraction: create a tensor representation of the text.<br><br>

**Image:**

Image classification: classify an image.<br>
Image segmentation: classify every pixel in an image.<br>
Object detection: detect objects within an image.<br><br>
**Audio:**

Audio classification: assign a label to a given segment of audio.<br>
Automatic speech recognition (ASR): transcribe audio data into text.<br>

For more details about the pipeline() and associated tasks, refer to the documentation here.

# Pipeline usage<br>
In the following example, you will use the pipeline() for sentiment analysis.

In [None]:
!pip install torch
!pip install tensorflow

In [None]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")

The pipeline downloads and caches a default pretrained model and tokenizer for sentiment analysis. Now you can use the classifier on your target text:



In [None]:
classifier("We are very happy to show you the 🤗 Transformers library.")

For more than one sentence, pass a list of sentences to the pipeline() which returns a list of dictionaries:



In [None]:
results = classifier(["We are very happy to show you the 🤗 Transformers library.", "We hope you don't hate it."])
for result in results:
    print(f"label: {result['label']}, with score: {round(result['score'], 4)}")

The pipeline() can also iterate over an entire dataset. 

Create a pipeline() with the task you want to solve for and the model you want to use.

In [None]:
import torch
from transformers import pipeline

speech_recognizer = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h")

Next, load a dataset (see the 🤗 Datasets Quick Start for more details) you'd like to iterate over. For example, let's load the MInDS-14 dataset:

In [None]:
from datasets import load_dataset, Audio

dataset = load_dataset("PolyAI/minds14", name="en-US", split="train",download_mode="force_redownload")

We need to make sure that the sampling rate of the dataset matches the sampling rate facebook/wav2vec2-base-960h was trained on.

In [None]:
dataset = dataset.cast_column("audio", Audio(sampling_rate=speech_recognizer.feature_extractor.sampling_rate))
  

Audio files are automatically loaded and resampled when calling the "audio" column. Let's extract the raw waveform arrays of the first 4 samples and pass it as a list to the pipeline:



In [None]:
result = speech_recognizer(dataset[:4]["audio"])
print([d["text"] for d in result])



---



---



---



---



# Use another model and tokenizer in the pipeline<br>

The pipeline() can accommodate any model from the Model Hub, making it easy to adapt the pipeline() for other use-cases. For example, if you'd like a model capable of handling French text, use the tags on the Model Hub to filter for an appropriate model. The top filtered result returns a multilingual BERT model fine-tuned for sentiment analysis. Great, let's use this model!

In [None]:
model_name = "nlptown/bert-base-multilingual-uncased-sentiment"

Use the AutoModelForSequenceClassification and AutoTokenizer to load the pretrained model and it's associated tokenizer (more on an AutoClass below):

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)


Then you can specify the model and tokenizer in the pipeline(), and apply the classifier on your target text:

In [None]:
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification

model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)


In [None]:
classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
classifier("Nous sommes très heureux de vous présenter la bibliothèque 🤗 Transformers.")

If you can't find a model for your use-case, you will need to fine-tune a pretrained model on your data. Take a look at our fine-tuning tutorial to learn how. Finally, after you've fine-tuned your pretrained model, please consider sharing it (see tutorial here) with the community on the Model Hub to democratize NLP for everyone! 🤗