# 🗑️ Waste Detection Project – EDA (Local Version)

Welcome to the **Exploratory Data Analysis (EDA)** notebook for the Waste Detection AI Workshop project. This version is designed for running locally (not in Google Colab).

### 📌 Objectives of this Notebook:
- Load and preview the garbage dataset
- Understand the dataset structure, classes, and sample distribution
- Visualize bounding boxes
- Identify missing files (unlabeled images or labels without images)

> This is the first step in a 4-part series:
> 1. EDA of Garbage Dataset (you are here)
> 2. Data Augmentation
> 3. Model Training & Inference
> 4. GUI Deployment

---

## 📥 Step 1: Download & Annotate the Dataset

1. Download the Garbage Dataset from this shared Google Drive link:  
   👉 [Download Dataset](https://drive.google.com/drive/u/1/folders/13C1MaoC5YKQlX1mRIE2nGY23gR9tfGeH)

2. Use [https://www.makesense.ai](https://www.makesense.ai) or Roboflow to annotate **only the `glass` class** (approx. 50 images missing this label).

3. Once complete:
   - Place all **images** in: `garbage-dataset/images/`
   - Place all **annotations** in: `garbage-dataset/labels/`

🎯 This manual annotation step is crucial to make sure our model correctly recognizes glass waste. Good luck, and enjoy the process!

## 📊 Step 2: Dataset Overview

In [None]:
import os

from glob import glob

base_path = "Garbage-Dataset"
image_dir = os.path.join(base_path, "images")
label_dir = os.path.join(base_path, "labels")

image_files = glob(os.path.join(image_dir, "*.jpg"))
label_files = glob(os.path.join(label_dir, "*.txt"))

print(f"Total images: {len(image_files)}")
print(f"Total label files: {len(label_files)}")

## 📈 Step 3: Check Class Distribution

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Define the class names
classes = ['biological', 'cardboard', 'glass', 'metal', 'paper', 'plastic', 'trash']
class_counts = {i: 0 for i in range(len(classes))}

# Count class occurrences
for label_file in label_files:
    with open(label_file, 'r') as f:
        for line in f:
            class_id = int(line.split()[0])
            class_counts[class_id] += 1

# Convert to DataFrame for plotting
df = pd.DataFrame({
    'Class': [classes[i] for i in class_counts],
    'Count': list(class_counts.values())
})

# Plot

plt.figure(figsize=(10, 6))
plt.bar(df['Class'], df['Count'], color='teal')
plt.title('Number of Bounding Boxes per Class')
plt.xlabel('Waste Class')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.grid(True)
plt.show()

## 🖼️ Step 4: Visualize Sample Annotations

In [None]:
import cv2

# Draw YOLO Boxes

def draw_yolo_boxes(img_path, label_path):
    image = cv2.imread(img_path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    h, w, _ = image.shape

    with open(label_path, 'r') as f:
        for line in f:
            cls_id, x, y, bw, bh = map(float, line.strip().split())
            x1 = int((x - bw / 2) * w)
            y1 = int((y - bh / 2) * h)
            x2 = int((x + bw / 2) * w)
            y2 = int((y + bh / 2) * h)
            cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)
            cv2.putText(image, classes[int(cls_id)], (x1, y1 - 10),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.6, (255, 0, 0), 2)
    return image


# Sample image and label

sample_img = image_files[75]
sample_label = sample_img.replace("images", "labels").replace(".jpg", ".txt")

# Draw
annotated = draw_yolo_boxes(sample_img, sample_label)
plt.figure(figsize=(8,6))
plt.imshow(annotated)
plt.axis('off')
plt.title('Sample Image with Bounding Boxes')
plt.show()

## ⚠️ Step 5: Check for Missing Files

In [None]:
image_base = set(os.path.basename(f).replace('.jpg', '') for f in image_files)
label_base = set(os.path.basename(f).replace('.txt', '') for f in label_files)

missing_labels = image_base - label_base
missing_images = label_base - image_base

print(f"Images without labels: {len(missing_labels)}")
print(f"Labels without images: {len(missing_images)}")

## ✅ Next Step

### Continue to the next notebook to apply image and label augmentations for training!

#### 📦 **Next Step: [2) Data Augmentation.ipynb](./2_Data_Augmentation.ipynb)**
---