Instance Image Segmentation with detectron2
This small tutorial is targeted at researchers that have basic machine learning and Python programming skills that want to implement instance image segmentation for further use in their models. detectron2 is still under heavy development and as of January 2020 usable with Windows without some code changes that are explained in this chapter. All software used is forked, so that there should be no API changes and this tutorial should continue to work as written. For training, a CUDA capable Nvidia GPU ist required, inferencing with a trained model can be done on CPU. If you do not have a local GPU available, you can also follow my tutorial on Google Colab.
Table of Contents
Requirements for this tutorial
- Windows installation
- Nvidia GPU
- Python 3
- if the requirements are not met:
- detectron2 has a similar tutorial on Google Collab
- Google Collab provides a Python programming environment with GPU support for free
- for inferencing with a trained model on new data, you can also use a CPU, see Inferencing on CPU)
Installation (Windows)
- install Python 3 and add it to PATH
- install CUDA
- install git and add it to PATH
- clone this repository with
git clone https://github.com/InformationSystemsFreiburg/imgage_segmentation_tokyo
- install Visual Studio Code Community Edition to get the GCC compiler
# check in terminal if gcc is installed, if not, try restarting the computer
gcc --version
# move to the folder of the image_segmentation-tokyo repo
cd .\imgage_segmentation_tokyo\
# create a new Python virtualenvironment
pip install virtualenv
virtualenv detectron2-env
.\detectron2-env\Scripts\activate
# install the following Python packages
pip install numpy pandas tqdm matplotlib seaborn psutil cython opencv-python
pip install "git+https://github.com/MarkusRosen/fvcore"
pip install torch===1.3.1 torchvision===0.4.2 -f https://download.pytorch.org/whl/torch_stable.html
pip install "git+https://github.com/MarkusRosen/cocoapi.git#egg=pycocotools&subdirectory=PythonAPI"
- to check if the GPU is recognized within PyTorch, you can run the following code
- if the console output shows no GPU, you need to check your CUDA, Python and PyTorch installations
from utils import check_gpu
check_gpu
Pytorch libraries have to be changed on Windows to work with detectron2
Location of first file:
.\detectron2-env\Lib\site-packages\torch\include\torch\csrc\jit\argument_spec.h
Search for
static constexpr size_t DEPTH_LIMIT = 128;
change to →
static const size_t DEPTH_LIMIT = 128;
Location of second file:
.\detectron2-env\Lib\site-packages\torch\include\pybind11\cast.h
Search for
explicit operator type&() { return *(this->value); }
change to →
explicit operator type&() { return *((type*)this->value); }
# clone this repository
git clone https://github.com/MarkusRosen/detectron2
# move into the detectron2 folder
cd detectron2
# build the package (this will take a few minutes)
python setup.py build develop
# restart terminal/editor
# check if installation was successful with the following code:
.\detectron2-env\Scripts\activate
python
>>> from detectron2.utils.logger import setup_logger
>>> setup_logger()
<Logger detectron2 (DEBUG)>
>>> exit()
To check if everything is working with the models, run the following code:
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()
# import some common libraries
import numpy as np
import cv2
import random
# import some common detectron2 utilities
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog
import matplotlib.pyplot as plt
im = cv2.imread("./input.jpg")
plt.imshow(im)
cfg = get_cfg()
# you can choose alternative models as backbone here
cfg.merge_from_file(
"./detectron2/detectron2/model_zoo/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"
)
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5
# if you changed the model above, you need to adapt the following line as well
cfg.MODEL.WEIGHTS = "detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl"
predictor = DefaultPredictor(cfg)
outputs = predictor(im)
print(outputs["instances"].pred_classes)
print(outputs["instances"].pred_boxes)
v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2)
v = v.draw_instance_predictions(outputs["instances"].to("cpu"))
plt.imshow(v.get_image()[:, :, ::-1])
# save image
plt.savefig("output.jpg")
The output.jpg
should look like this:
Data Preparation
- Create a folder named
buildings
- within this folder, create two folders:
val
andtrain
- Open the
VGG Annotator
and open the HTML file in the browser: - Short introductions on how to use the tool:
- Go to settings and specify the default path to where your train folder is located, example:
../data/buildings/train/
- create a new attributes called
class
- set this attribute to
checkbox
- add
building
andwindow
as options toclass
- Go to settings and specify the default path to where your train folder is located, example:
- save the project
- copy images to the
train
andval
folders - import the images to the
VGG Annotator
- zoom into the image with
CTRL
+Mousewheel
- select the
polygon region shape
tool and start with marking thewindows
- after a polygon is finished, press
s
to save it - after all
window
polygons are created, create thebuilding
polygons - press
Spacebar
to open the annotations - specify the correct
class
to each polygon - after an image is done, save the project
- after all images are done, export the annotations to
train
as .json files and rename them tovia_region_data.json
- do all of the above steps also for the validation data
Preparing the Dataset in Python
from detectron2.structures import BoxMode
from detectron2.utils.visualizer import Visualizer
from detectron2.engine import DefaultPredictor
from detectron2.data import DatasetCatalog, MetadataCatalog
from detectron2.utils.visualizer import ColorMode
from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg
from detectron2 import model_zoo
import os
import numpy as np
import json
import matplotlib.pyplot as plt
import cv2
import random
from datetime import datetime
def get_building_dicts(img_dir):
"""This function loads the JSON file created with the annotator and converts it to
the detectron2 metadata specifications.
"""
# load the JSON file
json_file = os.path.join(img_dir, "via_region_data.json")
with open(json_file) as f:
imgs_anns = json.load(f)
dataset_dicts = []
# loop through the entries in the JSON file
for idx, v in enumerate(imgs_anns.values()):
record = {}
# add file_name, image_id, height and width information to the records
filename = os.path.join(img_dir, v["filename"])
height, width = cv2.imread(filename).shape[:2]
record["file_name"] = filename
record["image_id"] = idx
record["height"] = height
record["width"] = width
annos = v["regions"]
objs = []
# one image can have multiple annotations, therefore this loop is needed
for annotation in annos:
# reformat the polygon information to fit the specifications
anno = annotation["shape_attributes"]
px = anno["all_points_x"]
py = anno["all_points_y"]
poly = [(x + 0.5, y + 0.5) for x, y in zip(px, py)]
poly = [p for x in poly for p in x]
region_attributes = annotation["region_attributes"]["class"]
# specify the category_id to match with the class.
elif "building" in region_attributes:
category_id = 1
elif "window" in region_attributes:
category_id = 0
obj = {
"bbox": [np.min(px), np.min(py), np.max(px), np.max(py)],
"bbox_mode": BoxMode.XYXY_ABS,
"segmentation": [poly],
"category_id": category_id,
"iscrowd": 0,
}
objs.append(obj)
record["annotations"] = objs
dataset_dicts.append(record)
return dataset_dicts
- On Windows, all PyTorch code needs to be run within
if __name__ == '__main__':
- load the data and draw a few input images to check if the annotations are correct:
if __name__ == "__main__":
# the data has to be registered within detectron2, once for the train and once for
# the val data
for d in ["train", "val"]:
DatasetCatalog.register(
"buildings_" + d,
lambda d=d: get_building_dicts("./via-2.0.8/buildings/" + d),
)
building_metadata = MetadataCatalog.get("buildings_train")
dataset_dicts = get_building_dicts("./via-2.0.8/buildings/train")
for i, d in enumerate(random.sample(dataset_dicts, 5)):
img = cv2.imread(d["file_name"])
visualizer = Visualizer(img[:, :, ::-1], metadata=building_metadata, scale=0.5)
vis = visualizer.draw_dataset_dict(d)
plt.imshow(vis.get_image()[:, :, ::-1])
# the folder inputs has to be created first
plt.savefig(f"./inputs/input_{i}.jpg")
Training the Model
- depending on the data size, the hardware used and the amount of iterations, the training can take a few minutes to a few hours.
cfg = get_cfg()
# you can choose alternative models as backbone here
cfg.merge_from_file(
"./detectron2/detectron2/model_zoo/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"
)
cfg.DATASETS.TRAIN = ("buildings_train",)
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2
# if you changed the model above, you need to adapt the following line as well
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(
"COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"
) # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025 # pick a good LR, 0.00025 seems a good start
cfg.SOLVER.MAX_ITER = 5000 # 300 iterations is a good start, for better accuracy increase this value
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = (
512 # (default: 512), select smaller if faster training is needed
)
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 2 # for the two classes window and building
start = datetime.now()
# for inferencing, the following 4 lines of code should be commented out
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()
print("Time needed for training:", datetime.now() - start)
Inferencing on new Data
- the trained model is save in
/output/model_final.pth
and can now be loaded for inferencing. - compared to training, inferencing should be very fast.
# load the trained weights from the output folder
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = (
0.50 # set the testing threshold for this model
)
cfg.DATASETS.TEST = ("buildings_val",)
predictor = DefaultPredictor(cfg)
# load the validation data
dataset_dicts = get_building_dicts("./via-2.0.8/buildings/val")
# save the results of the validation predictions as pictures in the ouputs folder
for i, dataset in enumerate(dataset_dicts):
im = cv2.imread(dataset["file_name"])
outputs = predictor(im)
v = Visualizer(
im[:, :, ::-1],
metadata=building_metadata,
scale=0.8,
instance_mode=ColorMode.IMAGE_BW, # remove the colors of unsegmented pixels
)
v = v.draw_instance_predictions(outputs["instances"].to("cpu"))
plt.imshow(v.get_image()[:, :, ::-1])
# the outputs folder has to be created before running this line
plt.savefig(f"./outputs/output_{i}.jpg")
Inferencing on CPU
- Inferencing on the CPU is possible by setting
cfg.MODEL.DEVICE = "cpu"
- CPU inferencing will be a lot slower than on GPU (on my device 8x slower)
# set device to CPU
cfg.MODEL.DEVICE = "cpu"
# load the trained weights from the output folder
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = (
0.70 # set the testing threshold for this model
)
# load the validation data
cfg.DATASETS.TEST = ("buildings_val",)
# create a predictor
predictor = DefaultPredictor(cfg)
print("Time needed for training:", datetime.now() - start)
start = datetime.now()
dataset_dicts = get_building_dicts("./via-2.0.8/buildings/val")
# save the results of the validation predictions as pictures in the ouputs folder
for i, dataset in enumerate(dataset_dicts):
im = cv2.imread(dataset["file_name"])
outputs = predictor(im)
v = Visualizer(
im[:, :, ::-1],
metadata=building_metadata,
scale=0.8,
instance_mode=ColorMode.IMAGE_BW, # remove the colors of unsegmented pixels
)
v = v.draw_instance_predictions(outputs["instances"].to("cpu"))
plt.imshow(v.get_image()[:, :, ::-1])
plt.savefig(f"./outputs/output_{i}.jpg")
print("Time needed for inferencing:", datetime.now() - start)