<a href="https://colab.research.google.com/github/amanzoni1/prove_varie/blob/main/bigP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Project Overview

In the realm of computer vision, instance segmentation is a critical task with applications ranging from autonomous driving to augmented reality. Our project focuses on developing a sophisticated image processing pipeline that can segment people from images and seamlessly replace the background with various cities or tourist spots. This not only enhances visual aesthetics but also has potential applications in photography, virtual backgrounds for video conferencing, and creative media.

We will leverage the COCO (Common Objects in Context) dataset, specifically focusing on the ‘person’ category. By utilizing a Mask R-CNN (Regional Convolutional Neural Network) model, we’ll perform precise segmentation of individuals in images. The project aims to deliver high-quality results suitable for practical use.

## Key Objectives

1. **Develop and Train a Neural Network Model:**
   - Utilize a pre-trained Mask R-CNN model and fine-tune it for our specific task.

2. **Implement Instance Segmentation:**
   - Accurately segment people from images using advanced deep learning techniques.

3. **Background Replacement:**
   - Replace the original background with selected images of cities or tourist spots while maintaining the integrity of the foreground subject.

4. **Utilize the COCO Dataset:**
   - Work with a substantial subset of the COCO dataset containing images of people to train and validate our model effectively.

5. **Create an Interactive Pipeline:**
   - Develop a user-friendly interface within Colab for testing the model with custom images and backgrounds.

# Project Details

## Dataset

**COCO (Common Objects in Context) Dataset:**

- **Description:** A large-scale object detection, segmentation, and captioning dataset with over 200,000 images and 80 object categories.
- **Usage in Project:** We’ll focus on images containing the ‘person’ category. A subset of 64,000 images has been downloaded and stored in Google Drive for this project.

## Task

Develop a pipeline that can:

- **Segment individuals** in images with high accuracy.
- **Replace the background** while preserving the foreground subject’s details.
- **Maintain realistic blending** between the foreground and new background.

## Approach

1. **Exploratory Data Analysis (EDA):**
   - Understand the dataset’s structure and contents.
   - Visualize sample images and annotations to gain insights.

2. **Data Preparation:**
   - Implement a custom dataset class to load images and annotations.
   - Apply data transformations and augmentations to enhance model robustness.

3. **Model Setup:**
   - Initialize a pre-trained Mask R-CNN model.
   - Modify the model to suit our specific segmentation task.

4. **Model Training:**
   - Fine-tune the model using the prepared dataset.
   - Monitor training progress and optimize performance.

5. **Evaluation:**
   - Assess the model’s performance using appropriate metrics.
   - Visualize predictions to qualitatively evaluate segmentation quality.

6. **Background Replacement Pipeline:**
   - Develop functions to replace the background of segmented images.
   - Ensure seamless integration between the foreground and new background.

7. **Interactive Testing:**
   - Create an interface in Colab for users to upload images and select backgrounds.
   - Allow real-time testing of the segmentation and background replacement.

## Implementation

In [6]:
import os
import numpy as np
import random
import time
import matplotlib.pyplot as plt
from PIL import Image

import torch
import torchvision
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms

!pip install pycocotools
from pycocotools.coco import COCO

from google.colab import drive



In [7]:
from google.colab import drive
drive.mount('/content/drive')

ValueError: Mountpoint must not already contain files

In [2]:
import os

# Define the path to your dataset directory in Google Drive
dataset_dir = '/content/drive/MyDrive/COCO_person_dataset'
os.makedirs(dataset_dir, exist_ok=True)

In [3]:
from pycocotools.coco import COCO

# Download the annotations if you haven't already
!wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
!unzip -q annotations_trainval2017.zip -d annotations

# Initialize COCO API for instance annotations
coco = COCO('annotations/annotations/instances_train2017.json')

--2024-09-17 04:38:19--  http://images.cocodataset.org/annotations/annotations_trainval2017.zip
Resolving images.cocodataset.org (images.cocodataset.org)... 54.231.205.9, 3.5.0.61, 16.182.66.49, ...
Connecting to images.cocodataset.org (images.cocodataset.org)|54.231.205.9|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 252907541 (241M) [application/zip]
Saving to: ‘annotations_trainval2017.zip’


2024-09-17 04:38:44 (9.73 MB/s) - ‘annotations_trainval2017.zip’ saved [252907541/252907541]

loading annotations into memory...
Done (t=18.98s)
creating index...
index created!


In [4]:
# Get all category IDs
cat_ids = coco.getCatIds()

# Get all image IDs
img_ids = coco.getImgIds()

print(f"Number of categories: {len(cat_ids)}")
print(f"Number of images: {len(img_ids)}")

Number of categories: 80
Number of images: 118287


In [8]:
# Get category names
cats = coco.loadCats(cat_ids)
cat_names = [cat['name'] for cat in cats]
print(f"COCO Categories:\n{cat_names}")

COCO Categories:
['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']


In [9]:
# Get category ID for 'person'
person_cat_id = coco.getCatIds(catNms=['person'])[0]

# Get all image IDs containing the 'person' category
person_img_ids = coco.getImgIds(catIds=[person_cat_id])

print(f"Number of images containing 'person': {len(person_img_ids)}")

Number of images containing 'person': 64115


In [10]:
import requests
from tqdm.notebook import tqdm

In [11]:
dataset_dir = '/content/drive/MyDrive/COCO_person_dataset'
os.makedirs(dataset_dir, exist_ok=True)

In [12]:
# Get category ID for 'person'
person_cat_id = coco.getCatIds(catNms=['person'])[0]

# Get all image IDs containing the 'person' category
person_img_ids = coco.getImgIds(catIds=[person_cat_id])

# Shuffle the image IDs (optional)
import random
random.shuffle(person_img_ids)

# Limit the number of images if necessary
# For example, to download only 30,000 images:
# person_img_ids = person_img_ids[:30000]

# Download images
for img_id in tqdm(person_img_ids, desc='Downloading images'):
    img_info = coco.loadImgs(img_id)[0]
    img_file_name = img_info['file_name']
    img_url = img_info['coco_url']
    img_path = os.path.join(dataset_dir, img_file_name)

    # Check if the image already exists to avoid re-downloading
    if not os.path.exists(img_path):
        try:
            img_data = requests.get(img_url, timeout=5).content
            with open(img_path, 'wb') as handler:
                handler.write(img_data)
        except Exception as e:
            print(f"Error downloading image {img_file_name}: {e}")

Downloading images:   0%|          | 0/64115 [00:00<?, ?it/s]

Number of images downloaded: 0
First 5 image files: []
