# ðŸ§¥ Extracting Clothing Items from an Image

This notebook demonstrates how to use a pre-trained computer vision model to detect and extract clothing items from an image. 

We will use:
- `transformers` (from Hugging Face) to load a powerful pre-trained model.
- `Pillow` (PIL) to handle image manipulation, drawing boxes, and cropping.
- `requests` to download a sample image from the web.

The model we'll use is `vener-a/yolos-fashion-detection`, which is fine-tuned on the Fashionpedia dataset and can recognize dozens of clothing categories.

## 1. Installation

First, we need to install the required Python libraries. `torch` is the deep learning framework, `transformers` gives us the model, and `Pillow` & `requests` are for image handling.

In [None]:
!pip install transformers torch pillow requests

## 2. Load Model and Processor

Every model in the `transformers` library comes with two parts:
1.  **Processor:** This formats the image (resizes, normalizes colors) so it's in the exact format the model expects.
2.  **Model:** This is the pre-trained neural network that performs the object detection.

In [None]:
from transformers import YolosImageProcessor, YolosForObjectDetection
from PIL import Image, ImageDraw, ImageFont
import requests
import torch
from io import BytesIO

# Define the model name from the Hugging Face Hub
model_name = "vener-a/yolos-fashion-detection"

# Load the processor and the model
print("Loading model and processor...")
processor = YolosImageProcessor.from_pretrained(model_name)
model = YolosForObjectDetection.from_pretrained(model_name)
print("Model and processor loaded successfully!")

## 3. Load a Sample Image

Let's grab a sample image from the web. You can easily replace the `image_url` with a path to your own local file (e.g., `image = Image.open('my_photo.jpg')`).

In [None]:
# URL for a sample image. Feel free to change this!
image_url = "https://images.pexels.com/photos/1036627/pexels-photo-1036627.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=1"

# Download and open the image
try:
    response = requests.get(image_url)
    response.raise_for_status() # Raise an exception for bad status codes
    image = Image.open(BytesIO(response.content))
    
    # Display the original image
    print("Original Image:")
    display(image.resize((400, 600))) # Resize for neat display in notebook
except requests.exceptions.RequestException as e:
    print(f"Error downloading image: {e}")
    print("Please check the URL or try a different one.")

## 4. Run Detection

Now we pass the image through the processor and then the model to get the raw predictions (logits and bounding boxes).

In [None]:
# Process the image
inputs = processor(images=image, return_tensors="pt")

# Pass the processed image to the model
print("Running detection...")
outputs = model(**inputs)
print("Detection complete.")

# The model outputs raw logits and bounding boxes
# We'll process these in the next step

## 5. Post-Process and Visualize Results

The raw outputs from the model are numbers. The `processor` has a built-in function (`post_process_object_detection`) to convert these numbers into meaningful labels and coordinates. 

We'll set a `threshold` to filter out low-confidence detections.

In [None]:
# We need the original image size to scale the bounding boxes correctly
target_sizes = torch.tensor([image.size[::-1]])

# Post-process the outputs. We set a confidence threshold.
confidence_threshold = 0.9
results = processor.post_process_object_detection(outputs, threshold=confidence_threshold, target_sizes=target_sizes)[0]

# Create a copy of the image to draw on
image_with_boxes = image.copy()
draw = ImageDraw.Draw(image_with_boxes)

# Loop over all detected objects
for score, label_id, box in zip(results["scores"], results["labels"], results["boxes"]):
    # Get the bounding box coordinates
    box = [round(i, 2) for i in box.tolist()]
    
    # Get the human-readable label
    label = model.config.id2label[label_id.item()]
    
    # Format the score
    score_str = f"{score*100:.1f}%
    
    print(f"Detected: {label} (Confidence: {score_str})")
    
    # Draw the bounding box
    draw.rectangle(box, outline="red", width=3)
    
    # Draw the label and score
    # Note: This doesn't use a specific font, so it will be basic.
    draw.text((box[0], box[1]), f"{label} ({score_str})", fill="red")

print("\nImage with Bounding Boxes:")
display(image_with_boxes.resize((400, 600)))

## 6. "Extract" (Crop) the Detected Items

Now that we have the coordinates (`box`) for each item, we can use Pillow's `crop()` function to extract each item into its own separate image.

In [None]:
print(f"Extracting {len(results['scores'])} items...\n")

for i, (score, label_id, box) in enumerate(zip(results["scores"], results["labels"], results["boxes"])):
    # Get the label name
    label = model.config.id2label[label_id.item()]
    
    # Get the coordinates
    box_coords = [round(c) for c in box.tolist()]
    
    # Crop the image using the bounding box
    cropped_item = image.crop(box_coords)
    
    print(f"--- Item {i+1}: {label} ---")
    display(cropped_item.resize((150, 150))) # Resize for neat display
    print("\n")

## 7. Conclusion

This notebook demonstrated the complete pipeline for clothing detection:
1.  **Installed** `transformers` and `pillow`.
2.  **Loaded** a pre-trained fashion detection model (`YolosForObjectDetection`).
3.  **Processed** a sample image.
4.  **Ran** the model to get predictions.
5.  **Visualized** the results by drawing bounding boxes.
6.  **Extracted** each item by cropping it from the original image.

### Next Steps

- **Try your own images!** Just change the `image_url` variable to a local file path.
- **Adjust the `confidence_threshold`:** A lower value (e.g., `0.7`) will show more detections, but they might be less accurate. A higher value (e.g., `0.95`) will show only very confident detections.
- **Explore segmentation:** This model performs *object detection* (boxes). More advanced models perform *instance segmentation*, which creates a pixel-perfect mask around each item. You can look for models like Mask R-CNN for that task.