Ikomia HUB

State-of-the-art Computer Vision as ready-to-use algorithms

Introduction

At Ikomia, we deeply believe that sharing scientific knowledge is the key to success, that's why we make research-based algorithms ready-to-use for developers.

The main goal of Ikomia is to take existing Python code and wrap it as ready-to-use algorithm for Ikomia API (our Python library) and Ikomia STUDIO (our desktop software). With this approach, we can easily integrate individual repos from researchers or labs and also awesome frameworks like OpenCV, Detectron2, OpenMMLab or Hugging Face so that developers can benefit from the best state-of-the-art algorithms in a single framework.

Table of Contents

Dataset loader
Classification
Colorization
Image captioning
Image generation
Image matting
Inpainting
Instance segmentation
Keypoints detection
Object Detection
Object tracking
OCR
Optical flow
Other
Panoptic segmentation
Semantic segmentation
Super resolution

Dataset loader

Name	Description	Original repository
auto_annotate	Auto-annotate images with GroundingDINO and SAM models	Made by Ikomia
dataset_classification	Load classification dataset	Made by Ikomia
dataset_coco	Load COCO 2017 dataset	Made by Ikomia
dataset_cwfid	Load Crop/Weed Field Image Dataset (CWFID) for semantic segmentation	Link
dataset_pascal_voc	Load PascalVOC dataset	Made by Ikomia
dataset_via	Load VGG Image Annotator dataset	Made by Ikomia
dataset_wgisd	Load Wine Grape Instance Segmentation Dataset (WGISD)	Link
dataset_wildreceipt	Load Wildreceipt dataset	Made by Ikomia
dataset_yolo	Load YOLO dataset	Made by Ikomia

Name	Description	Original repository
infer_covidnet	A tailored Deep Convolutional Neural Network Design for detection of COVID-19 cases from chest radiography images.	Link
infer_emotion_fer_plus	Facial emotion recognition using DNN trained from crowd-sourced label distribution.	Link
infer_resnet_action_recognition	Human action recognition with spatio-temporal 3D CNNs.	Link
infer_timm_image_classification	Infer timm image classification models	Link
infer_torchvision_mnasnet	MnasNet inference model for image classification.	Link
infer_torchvision_resnet	ResNet inference model for image classification.	Link
infer_torchvision_resnext	ResNeXt inference model for image classification.	Link
infer_yolo_v8_classification	Inference with YOLOv8 image classification models	Made by Ikomia
train_timm_image_classification	Train timm image classification models	Link
train_torchvision_mnasnet	Training process for MnasNet convolutional network.	Link
train_torchvision_resnet	Training process for ResNet convolutional network.	Link
train_torchvision_resnext	Training process for ResNeXt convolutional network.	Link
train_yolo_v8_classification	Train YOLOv8 classification models.	Made by Ikomia

Name	Description	Original repository
infer_hf_stable_diffusion	Stable diffusion models from Hugging Face.	Link
infer_kandinsky_2	Kandinsky 2.2 text2image diffusion model.	Link
infer_kandinsky_2_controlnet_depth	Kandinsky 2.2 controlnet depth diffusion model.	Link
infer_kandinsky_2_image_mixing	Kandinsky 2.2 image mixing diffusion model.	Link
infer_kandinsky_2_img2img	Kandinsky 2.2 image-to-image diffusion model.	Link
infer_neural_style_transfer	Neural network method to paint given image in the style of the reference image.	Link
infer_pulid	Pure and Lightning ID customization (PuLID) is a novel tuning-free ID customization method for text-to-image generation.	Link
infer_stable_cascade	Stable Cascade is a diffusion model trained to generate images given a text prompt.	Link

Name	Description	Original repository
infer_background_matting	Real-Time High-Resolution Background Matting	Link
infer_modnet_portrait_matting	Inference of MODNet Portrait Matting.	Link
infer_p3m_portrait_matting	Inference of Privacy-Preserving Portrait Matting (P3M)	Link

Name	Description	Original repository
infer_face_inpainting	Face inpainting using Segformer for segmentation and RealVisXL for inpainting.	Made by Ikomia
infer_hf_stable_diffusion_inpaint	Stable diffusion inpainting models from Hugging Face.	Link
infer_kandinsky_2_inpaint	Kandinsky 2.2 inpainting diffusion model.	Link

Name	Description	Original repository
infer_detectron2_instance_segmentation	Infer Detectron2 instance segmentation models	Link
infer_detectron2_pointrend	PointRend inference model of Detectron2 for instance segmentation.	Link
infer_hf_instance_seg	Instance segmentation using models from Hugging Face.	Link
infer_sparseinst	Infer Sparseinst instance segmentation models	Link
infer_torchvision_mask_rcnn	Mask R-CNN inference model for object detection and segmentation.	Link
infer_yolact	A simple, fully convolutional model for real-time instance segmentation.	Link
infer_yolo_v7_instance_segmentation	Inference for YOLO v7 instance segmentation models	Link
infer_yolo_v8_seg	Inference with YOLOv8 segmentation models	Link
infer_yolop_v2	Panoptic driving Perception using YoloPv2	Link
train_detectron2_instance_segmentation	Train Detectron2 instance segmentation models	Link
train_mmlab_segmentation	Train for MMLAB segmentation models	Link
train_sparseinst	Train Sparseinst instance segmentation models	Link
train_torchvision_mask_rcnn	Training process for Mask R-CNN convolutional network.	Link
train_yolo_v7_instance_segmentation	Train for YOLO v7 instance segmentation models	Link
train_yolo_v8_seg	Train YOLOv8 instance segmentation models.	Link

Name	Description	Original repository
infer_detectron2_densepose	Detectron2 inference model for human pose detection.	Link
infer_detectron2_keypoints	Inference for Detectron2 keypoint models	Link
infer_mmlab_pose_estimation	Inference for pose estimation models from mmpose	Link
infer_yolo_v7_keypoints	YOLOv7 pose estimation models.	Link
infer_yolo_v8_pose_estimation	Inference with YOLOv8 pose estimation models	Link

Name	Description	Original repository
infer_detectron2_detection	Inference for Detectron2 detection models	Link
infer_detectron2_retinanet	RetinaNet inference model of Detectron2 for object detection.	Link
infer_detectron2_tridentnet	TridentNet inference model of Detectron2 for object detection.	Link
infer_face_detection_kornia	Face detection using the Kornia API	Link
infer_google_vision_face_detection	Face detection using Google cloud vision API.	Link
infer_google_vision_landmark_detection	Landmark Detection detects popular natural and human-made structures within an image.	Link
infer_google_vision_logo_detection	Logo Detection detects popular product logos within an image using the Google cloud vision API.	Link
infer_google_vision_object_localization	The Vision API can detect and extract multiple objects in an image with Object Localization.	Link
infer_grounding_dino	Inference of the Grounding DINO model	Link
infer_mmlab_detection	Inference for MMDET from MMLAB detection models	Link
infer_torchvision_faster_rcnn	Faster R-CNN inference model for object detection.	Link
infer_yolo_v5	Ultralytics YoloV5 object detection models.	Made by Ikomia
infer_yolo_v7	YOLOv7 object detection models.	Link
infer_yolo_v8	Inference with YOLOv8 models	Link
infer_yolo_v9	Object detection with YOLOv9 models	Link
infer_yolo_v10	Run inference with YOLOv10 models	Link
infer_yolo_world	YOLO-World is a real-time zero-shot object detection modelthat leverages the power of open-vocabulary learning to recognize and localize a wide range of objects in images.	Link
infer_yolop_v2	Panoptic driving Perception using YoloPv2	Link
infer_yolor	Inference for YoloR object detection models	Link
train_detectron2_detection	Train for Detectron2 detection models	Link
train_mmlab_detection	Train for MMLAB detection models	Link
train_torchvision_faster_rcnn	Training process for Faster R-CNN convolutional network.	Link
train_yolo_v5	Train Ultralytics YoloV5 object detection models.	Link
train_yolo_v7	Train YOLOv7 object detection models.	Link
train_yolo_v8	Train YOLOv8 object detection models.	Link
train_yolo_v9	Train YOLOv9 models	Link
train_yolo_v10	Train YOLOv10 object detection models.	Link
train_yolor	Train YoloR object detection models	Link

Name	Language	Description	Original repository
infer_bytetrack		Infer ByteTrack for object tracking	Link
infer_deepsort		Multiple Object Tracking algorithm (MOT) combining a deep association metricwith the well known SORT algorithm for better performance.	Link

Name	Description	Original repository
infer_google_vision_ocr	Detects and extracts text from any image.	Link
infer_mmlab_text_detection	Inference for MMOCR from MMLAB text detection models	Link
infer_mmlab_text_recognition	Inference for MMOCR from MMLAB text recognition models	Link
train_mmlab_kie	Train for MMOCR from MMLAB KIE models	Link
train_mmlab_text_detection	Training process for MMOCR from MMLAB in text detection	Link
train_mmlab_text_recognition	Training process for MMOCR from MMLAB in text recognition	Link

Name	Description	Original repository
infer_depth_anything	Depth Anything is a highly practical solution for robust monocular depth estimation	Link
infer_google_vision_image_properties	Image Properties feature detects general attributes of the image, such as dominant color.	Link
infer_google_vision_label_detection	Detect and extract information about entities in an image, across a broad group of categories.	Link
infer_google_vision_safe_search	Safe Search detects explicit content such as adult content or violent content within an image.	Link
infer_google_vision_web_detection	Web Detection detects Web references to an image.	Link

Name	Description	Original repository
infer_detectron2_panoptic_segmentation	Infer Detectron2 panoptic segmentation models	Made by Ikomia
infer_hf_image_seg	Panoptic segmentation using models from Hugging Face.	Made by Ikomia
infer_mmlab_segmentation	Inference for MMLAB segmentation models	Link
train_mmlab_segmentation	Train for MMLAB segmentation models	Link

Name	Description	Original repository
infer_detectron2_deeplabv3plus	DeepLabv3+ inference model of Detectron2 for semantic segmentation.	Link
infer_hf_semantic_seg	Semantic segmentation using models from Hugging Face.	Link
infer_mmlab_segmentation	Inference for MMLAB segmentation models	Link
infer_mobile_segment_anything	Inference for Mobile Segment Anything Model (SAM).	Link
infer_segment_anything	Inference for Segment Anything Model (SAM).	Link
infer_transunet	TransUNet inference for semantic segmentation	Link
infer_unet	Multi-class semantic segmentation using Unet, the default model was trained on Kaggle's Carvana Images dataset	Link
infer_yolop_v2	Panoptic driving Perception using YoloPv2	Link
train_detectron2_deeplabv3plus	Training process for DeepLabv3+ model of Detectron2.	Link
train_hf_semantic_seg	Train models for semantic segmentationwith transformers from HuggingFace.	Link
train_mmlab_segmentation	Train for MMLAB segmentation models	Link
train_transunet	Training process for TransUNet model.	Link
train_unet	multi-class semantic segmentation using Unet	Link

State-of-the-art Computer Vision as ready-to-use algorithms

Introduction

Dataset loader

Classification

Colorization

Image generation

Image matting

Inpainting

Instance segmentation

Keypoints detection

Object Detection

Object tracking

OCR

Optical flow

Other

Panoptic segmentation

Semantic segmentation

Super resolution

Pinned Loading

Repositories

People

Top languages