# Test your setup

Run the following simple notebook to test your setup. Each of the cells perform different tasks. Feel free to experiment with the different tasks.

In [1]:
from transformers import pipeline

# Sentiment analysis
text = "I like Saturdays"

sentiment_analysis = pipeline("sentiment-analysis")

result = sentiment_analysis(text)

print(result)

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


[{'label': 'POSITIVE', 'score': 0.9990044236183167}]


In [2]:
# Text summary

text = """ 
Marley was dead, to begin with. There is no doubt whatever about that. The register of his burial was signed by the clergyman, the clerk, the undertaker, 
and the chief mourner. Scrooge signed it. And Scrooge's name was good upon 'Change for anything he chose to put his hand to. Old Marley was as dead as a door-nail.
"""

summarization = pipeline('summarization')
result = summarization(text, min_length=5, max_length=40)

print(result)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Device set to use cpu


[{'summary_text': " The register of his burial was signed by the clergyman, the clerk, the undertaker, and the chief mourner . Scrooge's name was good upon 'Change"}]


In [5]:
# Image classification 

image = 'https://images.ctfassets.net/i04syw39vv9p/4fCAgc3QtfScIGpQiC9xcY/f54afebdbb96b710f66be360c3d85889/Top-5-Favorite-Mom-and-Cub-Facts-02.jpeg'

image_classifier = pipeline('image-classification')

result = image_classifier(image)

print(result)

No model was supplied, defaulted to google/vit-base-patch16-224 and revision 3f49326 (https://huggingface.co/google/vit-base-patch16-224).
Using a pipeline without specifying a model name and revision in production is not recommended.
Fast image processor class <class 'transformers.models.vit.image_processing_vit_fast.ViTImageProcessorFast'> is available for this model. Using slow image processor class. To use the fast image processor class set `use_fast=True`.
Device set to use cuda:0


[{'label': 'ice bear, polar bear, Ursus Maritimus, Thalarctos maritimus', 'score': 0.9973016977310181}, {'label': 'white wolf, Arctic wolf, Canis lupus tundrarum', 'score': 0.0006416419055312872}, {'label': 'Arctic fox, white fox, Alopex lagopus', 'score': 0.0005438799853436649}, {'label': 'brown bear, bruin, Ursus arctos', 'score': 0.000208047466003336}, {'label': 'American black bear, black bear, Ursus americanus, Euarctos americanus', 'score': 0.00011899494711542502}]


In [19]:
image_to_text = pipeline('image-to-text')

result = image_to_text(image)

print(result)

No model was supplied, defaulted to ydshieh/vit-gpt2-coco-en and revision 5bebf1e (https://huggingface.co/ydshieh/vit-gpt2-coco-en).
Using a pipeline without specifying a model name and revision in production is not recommended.
loading configuration file config.json from cache at /home/cmlee/.cache/huggingface/hub/models--ydshieh--vit-gpt2-coco-en/snapshots/5bebf1e9bb163535699a3c53fe47859fa088791c/config.json
Model config VisionEncoderDecoderConfig {
  "_name_or_path": "ydshieh/vit-gpt2-coco-en",
  "architectures": [
    "VisionEncoderDecoderModel"
  ],
  "bos_token_id": 50256,
  "decoder": {
    "_attn_implementation_autoset": false,
    "_name_or_path": "",
    "activation_function": "gelu_new",
    "add_cross_attention": true,
    "architectures": [
      "GPT2LMHeadModel"
    ],
    "attn_pdrop": 0.1,
    "bad_words_ids": null,
    "begin_suppress_tokens": null,
    "bos_token_id": 50256,
    "chunk_size_feed_forward": 0,
    "cross_attention_hidden_size": null,
    "decoder_start

[{'generated_text': 'a man with a beard and glasses '}]


In [None]:
from transformers import AutoModelForVision2Seq, AutoImageProcessor, AutoTokenizer
from transformers.image_utils import load_image
import torch

image_processor = AutoImageProcessor.from_pretrained(model_name)
model = AutoModelForVision2Seq.from_pretrained(model_name, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(model_name)


loading configuration file preprocessor_config.json from cache at /home/cmlee/.cache/huggingface/hub/models--ydshieh--vit-gpt2-coco-en/snapshots/5bebf1e9bb163535699a3c53fe47859fa088791c/preprocessor_config.json
size should be a dictionary on of the following set of keys: ({'height', 'width'}, {'shortest_edge'}, {'shortest_edge', 'longest_edge'}, {'longest_edge'}, {'max_height', 'max_width'}), got 224. Converted to {'height': 224, 'width': 224}.
Image processor ViTImageProcessor {
  "do_convert_rgb": null,
  "do_normalize": true,
  "do_rescale": true,
  "do_resize": true,
  "image_mean": [
    0.5,
    0.5,
    0.5
  ],
  "image_processor_type": "ViTImageProcessor",
  "image_std": [
    0.5,
    0.5,
    0.5
  ],
  "resample": 2,
  "rescale_factor": 0.00392156862745098,
  "size": {
    "height": 224,
    "width": 224
  }
}

loading configuration file preprocessor_config.json from cache at /home/cmlee/.cache/huggingface/hub/models--ydshieh--vit-gpt2-coco-en/snapshots/5bebf1e9bb163535699a

In [None]:
image = load_image('images/einstein.jpg')
image = load_image('images/polar_bears.jpg')

print(image)

<PIL.Image.Image image mode=RGB size=660x495 at 0x7E8E16AB9460>


In [49]:

image_info = image_processor(images=[image], return_tensors='pt')

print(image_info)

{'pixel_values': tensor([[[[0.3333, 0.3490, 0.3569,  ..., 0.3725, 0.3647, 0.3569],
          [0.3333, 0.3490, 0.3569,  ..., 0.3725, 0.3725, 0.3647],
          [0.3333, 0.3490, 0.3569,  ..., 0.3725, 0.3725, 0.3725],
          ...,
          [0.6000, 0.6157, 0.6157,  ..., 0.2863, 0.3412, 0.4118],
          [0.6235, 0.6314, 0.6471,  ..., 0.2471, 0.2314, 0.2314],
          [0.6471, 0.6471, 0.6706,  ..., 0.3725, 0.3490, 0.3333]],

         [[0.4745, 0.4745, 0.4667,  ..., 0.4824, 0.4745, 0.4667],
          [0.4745, 0.4745, 0.4667,  ..., 0.4824, 0.4824, 0.4745],
          [0.4745, 0.4745, 0.4667,  ..., 0.4824, 0.4824, 0.4824],
          ...,
          [0.6000, 0.6157, 0.6157,  ..., 0.3412, 0.3961, 0.4667],
          [0.6235, 0.6314, 0.6392,  ..., 0.3176, 0.2941, 0.2941],
          [0.6471, 0.6471, 0.6627,  ..., 0.4118, 0.3882, 0.3804]],

         [[0.5529, 0.5529, 0.5529,  ..., 0.5686, 0.5608, 0.5529],
          [0.5529, 0.5529, 0.5529,  ..., 0.5686, 0.5686, 0.5608],
          [0.5529, 0.5529

In [50]:
result = model.generate(pixel_values=image_info.pixel_values)

print(result)

tensor([[50256, 11545, 13559, 13062,   389,  5586,   319,   257, 46742,  4417,
           220, 50256]])


In [51]:
generated_text = tokenizer.decode(result[0], skip_special_tokens=True)

print(generated_text)

two polar bears are sitting on a snowy surface 
