<a href="https://colab.research.google.com/github/Bhavana0929/Detectron2_Object_Detection_Model/blob/main/explorer_notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import pandas as pd

In [None]:
# These are the subset of classes Airbnb are most concerned with
subset = ["Toilet",
          "Swimming_pool",
          "Bed",
          "Billiard_table",
          "Sink",
          "Fountain",
          "Oven",
          "Ceiling_fan",
          "Television",
          "Microwave_oven",
          "Gas_stove",
          "Refrigerator",
          "Kitchen_&_dining_room_table",
          "Washing_machine",
          "Bathtub",
          "Stairs",
          "Fireplace",
          "Pillow",
          "Mirror",
          "Shower",
          "Couch",
          "Countertop",
          "Coffeemaker",
          "Dishwasher",
          "Sofa_bed",
          "Tree_house",
          "Towel",
          "Porch",
          "Wine_rack",
          "Jacuzzi"]

In [None]:
len(subset)

30

# **Start exploring the class names in Open Images**

Downloaded the class descriptions from Open Images: !wget https://storage.googleapis.com/openimages/2018_04/class-descriptions-boxable.csv

This file contains all of the codenames for the classes which have bounding box labels in Open Images.

In [None]:
!wget https://storage.googleapis.com/openimages/2018_04/class-descriptions-boxable.csv

--2025-01-17 17:34:42--  https://storage.googleapis.com/openimages/2018_04/class-descriptions-boxable.csv
Resolving storage.googleapis.com (storage.googleapis.com)... 172.253.62.207, 142.251.163.207, 142.251.167.207, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|172.253.62.207|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11255 (11K) [text/csv]
Saving to: ‘class-descriptions-boxable.csv’


2025-01-17 17:34:42 (50.9 MB/s) - ‘class-descriptions-boxable.csv’ saved [11255/11255]



In [None]:
# All the classes in Open Images
classes = pd.read_csv("class-descriptions-boxable.csv",names=["ID","Names"])
classes

Unnamed: 0,ID,Names
0,/m/011k07,Tortoise
1,/m/011q46kg,Container
2,/m/012074,Magpie
3,/m/0120dh,Sea turtle
4,/m/01226z,Football
...,...,...
596,/m/0qmmr,Wheelchair
597,/m/0wdt60w,Rugby ball
598,/m/0xfy,Armadillo
599,/m/0xzly,Maracas


In [None]:
# Let's get a subset or at least all the columns which match
classes["match"] = classes["Names"].isin(subset)
classes

Unnamed: 0,ID,Names,match
0,/m/011k07,Tortoise,False
1,/m/011q46kg,Container,False
2,/m/012074,Magpie,False
3,/m/0120dh,Sea turtle,False
4,/m/01226z,Football,False
...,...,...,...
596,/m/0qmmr,Wheelchair,False
597,/m/0wdt60w,Rugby ball,False
598,/m/0xfy,Armadillo,False
599,/m/0xzly,Maracas,False


In [None]:
classes.match.value_counts()

Unnamed: 0_level_0,count
match,Unnamed: 1_level_1
False,581
True,20


In [None]:
# Where do they match up?
matches = classes[classes["match"]==True]["Names"].tolist()
matches

['Sink',
 'Towel',
 'Stairs',
 'Fountain',
 'Oven',
 'Couch',
 'Shower',
 'Pillow',
 'Bathtub',
 'Bed',
 'Fireplace',
 'Refrigerator',
 'Porch',
 'Mirror',
 'Jacuzzi',
 'Television',
 'Coffeemaker',
 'Toilet',
 'Countertop',
 'Dishwasher']

In [None]:
# Where are they different?
missing_classes = list(set(subset)-set(matches))
missing_classes

['Ceiling_fan',
 'Gas_stove',
 'Sofa_bed',
 'Swimming_pool',
 'Wine_rack',
 'Kitchen_&_dining_room_table',
 'Tree_house',
 'Washing_machine',
 'Microwave_oven',
 'Billiard_table']

In [None]:
# Are there similar versions of these classes in the descriptions I could use?
classes[classes["Names"].str.contains("pool")]

Unnamed: 0,ID,Names,match
444,/m/0b_rs,Swimming pool,False


In [None]:
classes[classes["Names"].str.contains("stove")]

Unnamed: 0,ID,Names,match
197,/m/02wv84t,Gas stove,False
270,/m/04169hn,Wood-burning stove,False


In [None]:
classes[classes["Names"].str.contains("stove")]["Names"].tolist()

['Gas stove', 'Wood-burning stove']

In [None]:
# Get the individual words from each string of missing classes
strings = [x.split("_") for x in missing_classes]
strings = [item for sublist in strings for item in sublist]
strings

['Ceiling',
 'fan',
 'Gas',
 'stove',
 'Sofa',
 'bed',
 'Swimming',
 'pool',
 'Wine',
 'rack',
 'Kitchen',
 '&',
 'dining',
 'room',
 'table',
 'Tree',
 'house',
 'Washing',
 'machine',
 'Microwave',
 'oven',
 'Billiard',
 'table']

In [None]:
# Now find if any of the strings match up
more_matches = []
for string in strings:
  more_matches.append(classes[classes["Names"].str.contains(string)]["Names"].tolist())
more_matches = list(set([item for sublist in more_matches for item in sublist]))
more_matches

['Wine',
 'Tree house',
 'Bathroom accessory',
 'Kitchen appliance',
 'Lighthouse',
 'Swimming pool',
 'Kitchen & dining room table',
 'Washing machine',
 'Microwave oven',
 'Sofa bed',
 'Tennis racket',
 'Mushroom',
 'Vegetable',
 'Infant bed',
 'Kitchenware',
 'Wine glass',
 'Bathroom cabinet',
 'Billiard table',
 'Tree',
 'Coffee table',
 'Table tennis racket',
 'Spice rack',
 'Wood-burning stove',
 'Dog bed',
 'Sewing machine',
 'Wine rack',
 'Kitchen utensil',
 'Gas stove',
 'Ceiling fan',
 'Kitchen knife',
 'Mechanical fan']

In [None]:
# Take out the underscore
missing_classes_no_space = [x.replace("_"," ") for x in missing_classes]
missing_classes_no_space

['Ceiling fan',
 'Gas stove',
 'Sofa bed',
 'Swimming pool',
 'Wine rack',
 'Kitchen & dining room table',
 'Tree house',
 'Washing machine',
 'Microwave oven',
 'Billiard table']

In [None]:
# Find the actual missing classes
actual_missing_classes = list(set(missing_classes_no_space)-set(more_matches))
actual_missing_classes

[]

Turns out there aren't any missing classes from the Open Images set! The only difference here is the naming convention. Airbnb used underscores "_" in their class names. This is a simple fix we can implement later.

Let's remove the underscores from our subset list and play with that to start downloading classes.

In [None]:
subset_no_underscore = [x.replace("_"," ") for x in subset]
subset_no_underscore

['Toilet',
 'Swimming pool',
 'Bed',
 'Billiard table',
 'Sink',
 'Fountain',
 'Oven',
 'Ceiling fan',
 'Television',
 'Microwave oven',
 'Gas stove',
 'Refrigerator',
 'Kitchen & dining room table',
 'Washing machine',
 'Bathtub',
 'Stairs',
 'Fireplace',
 'Pillow',
 'Mirror',
 'Shower',
 'Couch',
 'Countertop',
 'Coffeemaker',
 'Dishwasher',
 'Sofa bed',
 'Tree house',
 'Towel',
 'Porch',
 'Wine rack',
 'Jacuzzi']

Okay we'll start with a small class (small as in, there are likely not many examples), let's use Jacuzzi first.

Get all the files we need from Open Images (labels, annotations, descriptions, etc)

In [None]:
!wget https://storage.googleapis.com/openimages/2018_04/class-descriptions-boxable.csv

!wget https://storage.googleapis.com/openimages/2018_04/train/train-annotations-bbox.csv

!wget https://storage.googleapis.com/openimages/2018_04/validation/validation-annotations-bbox.csv

!wget https://storage.googleapis.com/openimages/2018_04/test/test-annotations-bbox.csv

--2025-01-17 18:06:05--  https://storage.googleapis.com/openimages/2018_04/class-descriptions-boxable.csv
Resolving storage.googleapis.com (storage.googleapis.com)... 172.253.122.207, 172.253.63.207, 142.250.31.207, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|172.253.122.207|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11255 (11K) [text/csv]
Saving to: ‘class-descriptions-boxable.csv.1’


2025-01-17 18:06:05 (73.9 MB/s) - ‘class-descriptions-boxable.csv.1’ saved [11255/11255]

--2025-01-17 18:06:05--  https://storage.googleapis.com/openimages/2018_04/train/train-annotations-bbox.csv
Resolving storage.googleapis.com (storage.googleapis.com)... 172.253.122.207, 172.253.63.207, 142.250.31.207, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|172.253.122.207|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1194033454 (1.1G) [text/csv]
Saving to: ‘train-annotations-bbox.csv’


2025-01-17 18:06:14 (1

In [None]:
!pip install awscli

Collecting awscli
  Downloading awscli-1.37.1-py3-none-any.whl.metadata (11 kB)
Collecting botocore==1.36.1 (from awscli)
  Downloading botocore-1.36.1-py3-none-any.whl.metadata (5.7 kB)
Collecting docutils<0.17,>=0.10 (from awscli)
  Downloading docutils-0.16-py2.py3-none-any.whl.metadata (2.7 kB)
Collecting s3transfer<0.12.0,>=0.11.0 (from awscli)
  Downloading s3transfer-0.11.1-py3-none-any.whl.metadata (1.7 kB)
Collecting colorama<0.4.7,>=0.2.5 (from awscli)
  Downloading colorama-0.4.6-py2.py3-none-any.whl.metadata (17 kB)
Collecting rsa<4.8,>=3.1.2 (from awscli)
  Downloading rsa-4.7.2-py3-none-any.whl.metadata (3.6 kB)
Collecting jmespath<2.0.0,>=0.7.1 (from botocore==1.36.1->awscli)
  Downloading jmespath-1.0.1-py3-none-any.whl.metadata (7.6 kB)
Downloading awscli-1.37.1-py3-none-any.whl (4.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.6/4.6 MB[0m [31m61.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading botocore-1.36.1-py3-none-any.whl (13.3 MB)
[2K  

In [None]:
!python3 downloadOI.py --classes 'Toilet,Bathtub' --mode validation

CPU count: 2
usage: downloadOI.py [-h] --dataset DATASET --classes CLASSES [--nthreads NTHREADS]
                     [--occluded OCCLUDED] [--truncated TRUNCATED] [--groupOf GROUPOF]
                     [--depiction DEPICTION] [--inside INSIDE]
downloadOI.py: error: the following arguments are required: --dataset
