# IS4487 Week 14 - Practice Code

In this notebook, you will performance image processing in two ways:
1. Using powerful Python utilities for image to text processing
2. Using an API to a call a web service to process the image (in this case using Gemini)

## Business Application
Image recognition is commonly be used to 
- Flag offensive content
- Tag content for user search
- Catalog and sort images

<a href="https://colab.research.google.com/github/Stan-Pugsley/is_4487_base/blob/main/Reading-PracticeScripts/week14_api.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##  Approach #1: Image recognition with Python

Hugging Face is a collaborative platform and community for machine learning (ML) that provides open-source AI models, datasets, and tools, often called "GitHub for Machine Learning". It makes advanced AI, such as natural language processing (NLP) and computer vision, more accessible to developers by offering a central hub for sharing, discovering, and using pre-trained models and datasets with a few lines of code.  

The Hugging Face ViTImageProcessor is a class within the Hugging Face Transformers library used to prepare images for the Vision Transformer (ViT) model. It handles essential preprocessing steps like resizing, normalizing, and formatting images to meet the specific input requirements of a ViT model. 

### Import libraries

In [None]:
from transformers import VisionEncoderDecoderModel, ViTImageProcessor, AutoTokenizer
from PIL import Image
import requests
import torch

# Setup model and processor
model = VisionEncoderDecoderModel.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
processor = ViTImageProcessor.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
tokenizer = AutoTokenizer.from_pretrained("nlpconnect/vit-gpt2-image-captioning")

### Select an image to be classified
There are three examples below:
- Balloons
- A stone sculpture
- Abstract art

Un-comment the one you would like to use

In [None]:
IMAGE_URL = "https://target.scene7.com/is/image/Target/GUEST_1616af60-ab9a-44cb-93ce-2de272f1d252?wid=1200&hei=1200&qlt=80"
#IMAGE_URL = "https://images.squarespace-cdn.com/content/v1/6150da9bc04b0a138b3c0600/1634528500503-V7KPRTKGCRB73IY6IKB9/Stone-Circle.jpg"
#IMAGE_URL = "https://www.pacegallery.com/media/images/16_9-2.width-2000.png"

### Run the Classification

In [None]:
image = Image.open(requests.get(IMAGE_URL, stream=True).raw)

pixel_values = processor(images=image, return_tensors="pt").pixel_values
output_ids = model.generate(pixel_values, max_length=50)
caption = tokenizer.decode(output_ids[0], skip_special_tokens=True)

print("Caption:", caption)
image

##  Approach #2: Image recognition with an API to an AI service

### Import libraries

In [None]:
import requests, json, os
from google import generativeai as genai

### Get an API Key
- Go to https://aistudio.google.com/api-keys
- Click on the Get API key link on the bottom left corner
- Copy the value into the box below

In [None]:
# configure your API key
API_KEY = '##Paste your API key here##'
genai.configure(api_key=API_KEY)
print("Gemini API configured successfully.")

### Select an image to be classified
There are three examples below:
- Balloons
- A stone sculpture
- Abstract art

Un-comment the one you would like to use

In [None]:
IMAGE_URL = "https://target.scene7.com/is/image/Target/GUEST_1616af60-ab9a-44cb-93ce-2de272f1d252?wid=1200&hei=1200&qlt=80"
#IMAGE_URL = "https://images.squarespace-cdn.com/content/v1/6150da9bc04b0a138b3c0600/1634528500503-V7KPRTKGCRB73IY6IKB9/Stone-Circle.jpg"
#IMAGE_URL = "https://www.pacegallery.com/media/images/16_9-2.width-2000.png"

### Run the classification

In [None]:
# Get the image
img_bytes = requests.get(IMAGE_URL, timeout=10).content

# call Gemini with the image + a classification prompt
response = genai.GenerativeModel(model_name="gemini-2.5-flash").generate_content(
    contents=[{"mime_type": "image/jpeg","data": img_bytes},
        "Identify the primary object in this image with a one-word label, and your confidence level between 0 and 1."
    ]
)

print('This is my label for your photo, with the conficence level: '+response.text)

## Other ideas
- Add new images
- Ask a true/false question like "Does this image contain violence?"