Skip to content

Latest commit

 

History

History
25 lines (25 loc) · 59.9 KB

CVPR2020.md

File metadata and controls

25 lines (25 loc) · 59.9 KB
年份 题目 作者 摘要 中文摘要 link
2020 IntrA: 3D Intracranial Aneurysm Dataset for Deep Learning Xi Yang, Ding Xia, Taichi Kin, Takeo Igarashi Medicine is an important application area for deep learning models. Research in this field is a combination of medical expertise and data science knowledge. In this paper, instead of 2D medical images, we introduce an open-access 3D intracranial aneurysm dataset, IntrA, that makes the application of points-based and mesh-based classification and segmentation models available. Our dataset can be used to diagnose intracranial aneurysms and to extract the neck for a clipping operation in medicine and other areas of deep learning, such as normal estimation and surface reconstruction. We provide a large-scale benchmark of classification and part segmentation by testing state-of-the-art networks. We also discuss the performance of each method and demonstrate the challenges of our dataset. The published dataset can be accessed here: https://github.com/intra2d2019/IntrA. 医学是深度学习模型的重要应用领域。这一领域的研究是医学专业知识和数据科学知识的结合。本文介绍了一个开放获取的3D颅内动脉瘤数据集IntrA,该数据集可以用于基于点和网格的分类和分割模型的应用,而不是传统的2D医学图像。我们的数据集可以用于诊断颅内动脉瘤,并在医学和其他深度学习领域(如正常估计和表面重建)中提取颈部进行夹闭手术。我们通过测试最先进的网络提供了大规模的分类和部分分割基准。我们还讨论了每种方法的性能,并展示了我们数据集的挑战。已发布的数据集可以在此处访问:https://github.com/intra2d2019/IntrA。 link
2020 Fast Symmetric Diffeomorphic Image Registration with Convolutional Neural Networks Tony C.W. Mok, Albert C.S. Chung Diffeomorphic deformable image registration is crucial in many medical image studies, as it offers unique, special features including topology preservation and invertibility of the transformation. Recent deep learning-based deformable image registration methods achieve fast image registration by leveraging a convolutional neural network (CNN) to learn the spatial transformation from the synthetic ground truth or the similarity metric. However, these approaches often ignore the topology preservation of the transformation and the smoothness of the transformation which is enforced by a global smoothing energy function alone. Moreover, deep learning-based approaches often estimate the displacement field directly, which cannot guarantee the existence of the inverse transformation. In this paper, we present a novel, efficient unsupervised symmetric image registration method which maximizes the similarity between images within the space of diffeomorphic maps and estimates both forward and inverse transformations simultaneously. We evaluate our method on 3D image registration with a large scale brain image dataset. Our method achieves state-of-the-art registration accuracy and running time while maintaining desirable diffeomorphic properties. 流畅的中文翻译:在许多医学图像研究中,形变可微图像配准是至关重要的,因为它提供了独特的特征,包括拓扑保持和变换的可逆性。最近基于深度学习的形变图像配准方法通过利用卷积神经网络(CNN)从合成地面实况或相似性度量中学习空间变换,实现了快速图像配准。然而,这些方法通常忽略了变换的拓扑保持和平滑性,这仅通过全局平滑能量函数强制执行。此外,基于深度学习的方法通常直接估计位移场,这不能保证逆变换的存在。在本文中,我们提出了一种新颖、高效的无监督对称图像配准方法,该方法在微分可微映射空间内最大化图像之间的相似性,并同时估计正向和逆向变换。我们在具有大规模脑部图像数据集的3D图像配准上评估了我们的方法。我们的方法在保持理想的微分形变特性的同时,实现了最先进的配准精度和运行时间。 link
2020 Explorable Super Resolution Yuval Bahat, Tomer Michaeli Single image super resolution (SR) has seen major performance leaps in recent years. However, existing methods do not allow exploring the infinitely many plausible reconstructions that might have given rise to the observed low-resolution (LR) image. These different explanations to the LR image may dramatically vary in their textures and fine details, and may often encode completely different semantic information. In this paper, we introduce the task of explorable super resolution. We propose a framework comprising a graphical user interface with a neural network backend, allowing editing the SR output so as to explore the abundance of plausible HR explanations to the LR input. At the heart of our method is a novel module that can wrap any existing SR network, analytically guaranteeing that its SR outputs would precisely match the LR input, when down- sampled. Besides its importance in our setting, this module is guaranteed to decrease the reconstruction error of any SR network it wraps, and can be used to cope with blur kernels that are different from the one the network was trained for. We illustrate our approach in a variety of use cases, ranging from medical imaging and forensics, to graphics. 单图像超分辨率(SR)在近年来取得了重大的性能提升。然而,现有方法并不允许探索可能导致观察到的低分辨率(LR)图像的无限多合理重建。对LR图像的不同解释在纹理和细节上可能有显著差异,并且通常会编码完全不同的语义信息。在本文中,我们介绍了可探索超分辨率的任务。我们提出了一个框架,包括一个带有神经网络后端的图形用户界面,允许编辑SR输出,以探索LR输入的丰富的合理HR解释。我们方法的核心是一个新颖的模块,可以包装任何现有的SR网络,分析保证其SR输出在下采样时会精确匹配LR输入。除了在我们的设置中的重要性外,这个模块还保证会降低任何包装的SR网络的重建误差,并且可以用来处理与网络训练的模糊核不同的模糊核。我们在各种用例中演示了我们的方法,包括医学成像、取证和图形等领域。 link
2020 Boosting the Transferability of Adversarial Samples via Attention Weibin Wu, Yuxin Su, Xixian Chen, Shenglin Zhao, Irwin King, Michael R. Lyu, Yu-Wing Tai The widespread deployment of deep models necessitates the assessment of model vulnerability in practice, especially for safety- and security-sensitive domains such as autonomous driving and medical diagnosis. Transfer-based attacks against image classifiers thus elicit mounting interest, where attackers are required to craft adversarial images based on local proxy models without the feedback information from remote target ones. However, under such a challenging but practical setup, the synthesized adversarial samples often achieve limited success due to overfitting to the local model employed. In this work, we propose a novel mechanism to alleviate the overfitting issue. It computes model attention over extracted features to regularize the search of adversarial examples, which prioritizes the corruption of critical features that are likely to be adopted by diverse architectures. Consequently, it can promote the transferability of resultant adversarial instances. Extensive experiments on ImageNet classifiers confirm the effectiveness of our strategy and its superiority to state-of-the-art benchmarks in both white-box and black-box settings. 深度模型的广泛部署使得在实践中评估模型的脆弱性变得必要,特别是对于安全和安全敏感领域,如自动驾驶和医学诊断。基于转移的图像分类器攻击引起了越来越多的关注,攻击者需要根据本地代理模型制作对抗性图像,而无需来自远程目标模型的反馈信息。然而,在这种具有挑战性但实际的设置下,合成的对抗样本往往由于过度拟合于本地模型而取得有限的成功。在这项工作中,我们提出了一种新颖的机制来缓解过拟合问题。它通过对提取的特征计算模型的注意力来规范对对抗性示例的搜索,优先考虑可能被不同架构采用的关键特征的破坏。因此,它可以促进对抗实例的可转移性。对ImageNet分类器的大量实验验证了我们的策略的有效性,并证实其在白盒和黑盒设置下优于最先进的基准。 link
2020 Single-Step Adversarial Training With Dropout Scheduling Vivek B.S., R. Venkatesh Babu Deep learning models have shown impressive performance across a spectrum of computer vision applications including medical diagnosis and autonomous driving. One of the major concerns that these models face is their susceptibility to adversarial attacks. Realizing the importance of this issue, more researchers are working towards developing robust models that are less affected by adversarial attacks. Adversarial training method shows promising results in this direction. In adversarial training regime, models are trained with mini-batches augmented with adversarial samples. Fast and simple methods (e.g., single-step gradient ascent) are used for generating adversarial samples, in order to reduce computational complexity. It is shown that models trained using single-step adversarial training method (adversarial samples are generated using non-iterative method) are pseudo robust. Further, this pseudo robustness of models is attributed to the gradient masking effect. However, existing works fail to explain when and why gradient masking effect occurs during single-step adversarial training. In this work, (i) we show that models trained using single-step adversarial training method learn to prevent the generation of single-step adversaries, and this is due to over-fitting of the model during the initial stages of training, and (ii) to mitigate this effect, we propose a single-step adversarial training method with dropout scheduling. Unlike models trained using existing single-step adversarial training methods, models trained using the proposed single-step adversarial training method are robust against both single-step and multi-step adversarial attacks, and the performance is on par with models trained using computationally expensive multi-step adversarial training methods, in white-box and black-box settings. 深度学习模型在包括医学诊断和自动驾驶在内的计算机视觉应用中表现出令人印象深刻的性能。这些模型面临的主要问题之一是它们容易受到对抗性攻击的影响。认识到这个问题的重要性,越来越多的研究人员致力于开发不太受对抗性攻击影响的强健模型。对抗训练方法在这方面表现出有希望的结果。在对抗训练制度中,模型使用带有对抗样本的小批量进行训练。为了减少计算复杂性,快速简单的方法(例如单步梯度上升)用于生成对抗样本。研究表明,使用单步对抗训练方法训练的模型(对抗样本是使用非迭代方法生成的)是伪强健的。此外,模型的这种伪强健性归因于梯度掩盖效应。然而,现有的研究未能解释在单步对抗训练过程中何时以及为何梯度掩盖效应会发生。在这项工作中,(i)我们表明,使用单步对抗训练方法训练的模型学会防止生成单步对手,这是由于在训练的初始阶段模型过拟合的结果,以及(ii)为了减轻这种效应,我们提出了一种带有dropout调度的单步对抗训练方法。与使用现有单步对抗训练方法训练的模型不同,使用提出的单步对抗训练方法训练的模型对单步和多步对抗攻击都具有强健性,并且在白盒和黑盒设置中的性能与使用计算昂贵的多步对抗训练方法训练的模型相当。 link
2020 Structure Boundary Preserving Segmentation for Medical Image With Ambiguous Boundary Hong Joo Lee, Jung Uk Kim, Sangmin Lee, Hak Gu Kim, Yong Man Ro In this paper, we propose a novel image segmentation method to tackle two critical problems of medical image, which are (i) ambiguity of structure boundary in the medical image domain and (ii) uncertainty of the segmented region without specialized domain knowledge. To solve those two problems in automatic medical segmentation, we propose a novel structure boundary preserving segmentation framework. To this end, the boundary key point selection algorithm is proposed. In the proposed algorithm, the key points on the structural boundary of the target object are estimated. Then, a boundary preserving block (BPB) with the boundary key point map is applied for predicting the structure boundary of the target object. Further, for embedding experts' knowledge in the fully automatic segmentation, we propose a novel shape boundary-aware evaluator (SBE) with the ground-truth structure information indicated by experts. The proposed SBE could give feedback to the segmentation network based on the structure boundary key point. The proposed method is general and flexible enough to be built on top of any deep learning-based segmentation network. We demonstrate that the proposed method could surpass the state-of-the-art segmentation network and improve the accuracy of three different segmentation network models on different types of medical image datasets. 在这篇论文中,我们提出了一种新颖的图像分割方法,以解决医学图像领域中的两个关键问题,即(i)医学图像领域中结构边界的模糊性和(ii)在没有专业领域知识的情况下分割区域的不确定性。为了解决自动医学分割中的这两个问题,我们提出了一种新颖的保留结构边界的分割框架。为此,提出了边界关键点选择算法。在提出的算法中,估计了目标对象的结构边界上的关键点。然后,应用带有边界关键点图的边界保持块(BPB)来预测目标对象的结构边界。此外,为了在完全自动分割中嵌入专家知识,我们提出了一种新颖的形状边界感知评估器(SBE),其中包含专家指示的地面真实结构信息。提出的SBE可以根据结构边界关键点向分割网络提供反馈。该方法通用且灵活,足以构建在任何基于深度学习的分割网络之上。我们证明了该方法能够超越最先进的分割网络,并提高三种不同类型医学图像数据集上不同分割网络模型的准确性。 link
2020 DeepFLASH: An Efficient Network for Learning-Based Medical Image Registration Jian Wang, Miaomiao Zhang This paper presents DeepFLASH, a novel network with efficient training and inference for learning-based medical image registration. In contrast to existing approaches that learn spatial transformations from training data in the high dimensional imaging space, we develop a new registration network entirely in a low dimensional bandlimited space. This dramatically reduces the computational cost and memory footprint of an expensive training and inference. To achieve this goal, we first introduce complex-valued operations and representations of neural architectures that provide key components for learning-based registration models. We then construct an explicit loss function of transformation fields fully characterized in a bandlimited space with much fewer parameterizations. Experimental results show that our method is significantly faster than the state-of-the-art deep learning based image registration methods, while producing equally accurate alignment. We demonstrate our algorithm in two different applications of image registration: 2D synthetic data and 3D real brain magnetic resonance (MR) images. 本文介绍了一种名为DeepFLASH的新型网络,该网络具有高效的训练和推理能力,用于学习基于医学图像配准的方法。与现有方法不同,现有方法是在高维成像空间中从训练数据中学习空间变换,我们在一个低维带限空间中完全开发了一个新的配准网络。这显著降低了昂贵训练和推理的计算成本和内存占用。为了实现这一目标,我们首先引入了神经架构的复值操作和表示,为学习基于配准模型提供了关键组件。然后,我们构建了一个明确的损失函数,完全在一个带限空间中描述了变换字段,参数化更少。实验结果表明,我们的方法比现有最先进的基于深度学习的图像配准方法快得多,同时产生相同精确的对齐结果。我们在图像配准的两个不同应用中演示了我们的算法:2D合成数据和3D真实脑磁共振(MR)图像。 link
2020 FocalMix: Semi-Supervised Learning for 3D Medical Image Detection Dong Wang, Yuan Zhang, Kexin Zhang, Liwei Wang Applying artificial intelligence techniques in medical imaging is one of the most promising areas in medicine. However, most of the recent success in this area highly relies on large amounts of carefully annotated data, whereas annotating medical images is a costly process. In this paper, we propose a novel method, called FocalMix, which, to the best of our knowledge, is the first to leverage recent advances in semi-supervised learning (SSL) for 3D medical image detection. We conducted extensive experiments on two widely used datasets for lung nodule detection, LUNA16 and NLST. Results show that our proposed SSL methods can achieve a substantial improvement of up to 17.3% over state-of-the-art supervised learning approaches with 400 unlabeled CT scans. 在医学影像领域应用人工智能技术是医学中最具前景的领域之一。然而,最近在这一领域取得的许多成功很大程度上依赖于大量精心注释的数据,而标注医学图像是一个昂贵的过程。在本文中,我们提出了一种新颖的方法,称为FocalMix,据我们所知,这是首个利用半监督学习(SSL)最新进展进行三维医学图像检测的方法。我们在肺结节检测的两个广泛使用的数据集LUNA16和NLST上进行了大量实验。结果显示,我们提出的SSL方法可以实现比现有最先进的监督学习方法高出多达17.3%的显著改进,仅仅使用了400个未标记的CT扫描。 link
2020 Deep Distance Transform for Tubular Structure Segmentation in CT Scans Yan Wang, Xu Wei, Fengze Liu, Jieneng Chen, Yuyin Zhou, Wei Shen, Elliot K. Fishman, Alan L. Yuille Tubular structure segmentation in medical images, e.g., segmenting vessels in CT scans, serves as a vital step in the use of computers to aid in screening early stages of related diseases. But automatic tubular structure segmentation in CT scans is a challenging problem, due to issues such as poor contrast, noise and complicated background. A tubular structure usually has a cylinder-like shape which can be well represented by its skeleton and cross-sectional radii (scales). Inspired by this, we propose a geometry-aware tubular structure segmentation method, Deep Distance Transform (DDT), which combines intuitions from the classical distance transform for skeletonization and modern deep segmentation networks. DDT first learns a multi-task network to predict a segmentation mask for a tubular structure and a distance map. Each value in the map represents the distance from each tubular structure voxel to the tubular structure surface. Then the segmentation mask is refined by leveraging the shape prior reconstructed from the distance map. We apply our DDT on six medical image datasets. Results show that (1) DDT can boost tubular structure segmentation performance significantly (e.g., over 13% DSC improvement for pancreatic duct segmentation), and (2) DDT additionally provides a geometrical measurement for a tubular structure, which is important for clinical diagnosis (e.g., the cross-sectional scale of a pancreatic duct can be an indicator for pancreatic cancer). 在医学图像中的管状结构分割,例如在CT扫描中分割血管,是计算机辅助筛查相关疾病早期阶段的重要步骤。但是,在CT扫描中进行自动管状结构分割是一个具有挑战性的问题,因为存在诸如对比度低、噪声和复杂背景等问题。管状结构通常具有类似圆柱形的形状,可以通过其骨架和横截面半径(尺度)来很好地表示。受此启发,我们提出了一种几何感知的管状结构分割方法Deep Distance Transform(DDT),结合了经典距离变换用于骨架化和现代深度分割网络的直觉。DDT首先学习一个多任务网络,用于预测管状结构的分割掩模和距离图。图中的每个值表示从每个管状结构体素到管状结构表面的距离。然后通过利用从距离图中重建的形状先验来优化分割掩模。我们将DDT应用于六个医学图像数据集。结果表明,(1)DDT可以显著提高管状结构分割性能(例如,胰腺导管分割的DSC改善超过13%),(2)DDT还为管状结构提供了几何测量,这对临床诊断非常重要(例如,胰腺导管的横截面尺度可以成为胰腺癌的指标)。 link
2020 What Can Be Transferred: Unsupervised Domain Adaptation for Endoscopic Lesions Segmentation Jiahua Dong, Yang Cong, Gan Sun, Bineng Zhong, Xiaowei Xu Unsupervised domain adaptation has attracted growing research attention on semantic segmentation. However, 1) most existing models cannot be directly applied into lesions transfer of medical images, due to the diverse appearances of same lesion among different datasets; 2) equal attention has been paid into all semantic representations instead of neglecting irrelevant knowledge, which leads to negative transfer of untransferable knowledge. To address these challenges, we develop a new unsupervised semantic transfer model including two complementary modules (i.e., T_D and T_F ) for endoscopic lesions segmentation, which can alternatively determine where and how to explore transferable domain-invariant knowledge between labeled source lesions dataset (e.g., gastroscope) and unlabeled target diseases dataset (e.g., enteroscopy). Specifically, T_D focuses on where to translate transferable visual information of medical lesions via residual transferability-aware bottleneck, while neglecting untransferable visual characterizations. Furthermore, T_F highlights how to augment transferable semantic features of various lesions and automatically ignore untransferable representations, which explores domain-invariant knowledge and in return improves the performance of T_D. To the end, theoretical analysis and extensive experiments on medical endoscopic dataset and several non-medical public datasets well demonstrate the superiority of our proposed model. 无监督领域自适应在语义分割方面吸引了越来越多的研究关注。然而,1)大多数现有模型不能直接应用于医学图像病变转移,因为不同数据集中相同病变的外观各异;2)对所有语义表示都给予了同等关注,而非忽略无关知识,导致无法转移的知识的负面转移。为了解决这些挑战,我们开发了一个新的无监督语义转移模型,包括两个互补模块(即T_D和T_F),用于内窥镜病变分割,可以交替确定在标记的源病变数据集(例如胃镜)和未标记的目标疾病数据集(例如肠镜)之间如何探索可转移的域不变知识。具体而言,T_D关注于通过残差可转移性感知瓶颈将医学病变的可转移视觉信息转化到哪里,同时忽略不可转移的视觉特征。此外,T_F强调如何增强各种病变的可转移语义特征,并自动忽略不可转移的表示,从而探索域不变知识,并提高T_D的性能。最后,对医学内窥镜数据集和几个非医学公共数据集的理论分析和广泛实验证明了我们提出的模型的优越性。 link
2020 A Spatiotemporal Volumetric Interpolation Network for 4D Dynamic Medical Image Yuyu Guo, Lei Bi, Euijoon Ahn, Dagan Feng, Qian Wang, Jinman Kim Dynamic medical images are often limited in its application due to the large radiation doses and longer image scanning and reconstruction times. Existing methods attempt to reduce the volume samples in the dynamic sequence by interpolating the volumes between the acquired samples. However, these methods are limited to either 2D images and/or are unable to support large but periodic variations in the functional motion between the image volume samples. In this paper, we present a spatiotemporal volumetric interpolation network (SVIN) designed for 4D dynamic medical images. SVIN introduces dual networks: the first is the spatiotemporal motion network that leverages the 3D convolutional neural network (CNN) for unsupervised parametric volumetric registration to derive spatiotemporal motion field from a pair of image volumes; the second is the sequential volumetric interpolation network, which uses the derived motion field to interpolate image volumes, together with a new regression-based module to characterize the periodic motion cycles in functional organ structures. We also introduce an adaptive multi-scale architecture to capture the volumetric large anatomy motions. Experimental results demonstrated that our SVIN outperformed state-of-the-art temporal medical interpolation methods and natural video interpolation method that has been extended to support volumetric images. Code is available at [1]. 动态医学图像由于辐射剂量大和图像扫描以及重建时间长而在应用中常受限制。现有方法尝试通过在获取的样本之间插值来减少动态序列中的体积样本。然而,这些方法限于2D图像,且无法支持图像体积样本之间的功能运动存在大而周期性变化。本文提出了一种专为4D动态医学图像设计的时空体积插值网络(SVIN)。SVIN引入了双网络:第一个是时空运动网络,利用3D卷积神经网络(CNN)进行无监督参数体积配准,从一对图像体积中推导出时空运动场;第二个是序列体积插值网络,利用推导出的运动场插值图像体积,同时使用基于回归的新模块来表征功能器官结构中的周期性运动周期。我们还引入了自适应多尺度架构来捕获体积大解剖运动。实验结果表明,我们的SVIN优于最先进的时间医学插值方法和已扩展以支持体积图像的自然视频插值方法。源代码可在[1]处获得。 link
2020 C2FNAS: Coarse-to-Fine Neural Architecture Search for 3D Medical Image Segmentation Qihang Yu, Dong Yang, Holger Roth, Yutong Bai, Yixiao Zhang, Alan L. Yuille, Daguang Xu 3D convolution neural networks (CNN) have been proved very successful in parsing organs or tumours in 3D medical images, but it remains sophisticated and time-consuming to choose or design proper 3D networks given different task contexts. Recently, Neural Architecture Search (NAS) is proposed to solve this problem by searching for the best network architecture automatically. However, the inconsistency between search stage and deployment stage often exists in NAS algorithms due to memory constraints and large search space, which could become more serious when applying NAS to some memory and time-consuming tasks, such as 3D medical image segmentation. In this paper, we propose a coarse-to-fine neural architecture search (C2FNAS) to automatically search a 3D segmentation network from scratch without inconsistency on network size or input size. Specifically, we divide the search procedure into two stages: 1) the coarse stage, where we search the macro-level topology of the network, i.e. how each convolution module is connected to other modules; 2) the fine stage, where we search at micro-level for operations in each cell based on previous searched macro-level topology. The coarse-to-fine manner divides the search procedure into two consecutive stages and meanwhile resolves the inconsistency. We evaluate our method on 10 public datasets from Medical Segmentation Decalthon (MSD) challenge, and achieve state-of-the-art performance with the network searched using one dataset, which demonstrates the effectiveness and generalization of our searched models. 3D卷积神经网络(CNN)在解析3D医学图像中的器官或肿瘤方面已被证明非常成功,但在不同任务背景下选择或设计适当的3D网络仍然复杂且耗时。最近,提出了神经结构搜索(NAS)来自动搜索最佳网络架构以解决这一问题。然而,由于内存限制和庞大的搜索空间,NAS算法中搜索阶段和部署阶段之间的不一致性经常存在,当将NAS应用于一些内存和时间消耗任务,如3D医学图像分割时,这种不一致性可能变得更加严重。在本文中,我们提出了一种粗到细的神经结构搜索(C2FNAS),以在网络大小或输入大小上自动搜索3D分割网络而没有不一致性。具体来说,我们将搜索过程分为两个阶段:1)粗略阶段,我们在此阶段搜索网络的宏观拓扑结构,即每个卷积模块如何与其他模块连接;2)细化阶段,在此阶段,我们根据先前搜索到的宏观拓扑结构在微观层面搜索每个单元的操作。粗到细的方式将搜索过程分为两个连续阶段,并同时解决了不一致性问题。我们在Medical Segmentation Decalthon(MSD)挑战中对10个公共数据集上评估了我们的方法,并且使用一个数据集搜索到的网络实现了最先进的性能,这证明了我们搜索模型的有效性和泛化性。 link
2020 Learning Weighted Submanifolds With Variational Autoencoders and Riemannian Variational Autoencoders Nina Miolane, Susan Holmes Manifold-valued data naturally arises in medical imaging. In cognitive neuroscience for instance, brain connectomes base the analysis of coactivation patterns between different brain regions on the analysis of the correlations of their functional Magnetic Resonance Imaging (fMRI) time series - an object thus constrained by construction to belong to the manifold of symmetric positive definite matrices. One of the challenges that naturally arises in these studies consists in finding a lower-dimensional subspace for representing such manifold-valued and typically high-dimensional data. Traditional techniques, like principal component analysis, are ill-adapted to tackle non-Euclidean spaces and may fail to achieve a lower-dimensional representation of the data - thus potentially pointing to the absence of lower-dimensional representation of the data. However, these techniques are restricted in that: (i) they do not leverage the assumption that the connectomes belong on a pre-specified manifold, therefore discarding information; (ii) they can only fit a linear subspace to the data. In this paper, we are interested in variants to learn potentially highly curved submanifolds of manifold-valued data. Motivated by the brain connectomes example, we investigate a latent variable generative model, which has the added benefit of providing us with uncertainty estimates - a crucial quantity in the medical applications we are considering. While latent variable models have been proposed to learn linear and nonlinear spaces for Euclidean data, or geodesic subspaces for manifold data, no intrinsic latent variable model exists to learn non-geodesic subspaces for manifold data. This paper fills this gap and formulates a Riemannian variational autoencoder with an intrinsic generative model of manifold-valued data. We evaluate its performances on synthetic and real datasets, by introducing the formalism of weighted Riemannian submanifolds. 在医学成像中,多样值数据自然而然地出现。例如,在认知神经科学中,基于功能磁共振成像(fMRI)时间序列的协同激活模式分析脑连接组,因此该对象被限制为对称正定矩阵流形。这些研究中自然出现的挑战之一是找到一个用于表示这种多样值且通常是高维数据的低维子空间。传统技术,如主成分分析,不适用于处理非欧几里得空间,并且可能无法实现对数据的低维表示-因此可能指向数据的低维表示的缺失。然而,这些技术受限于:(i)它们不利用连接组属于预先指定流形的假设,因此丢弃信息;(ii)它们只能将线性子空间拟合到数据中。在本文中,我们对学习可能高度曲曲子流形的多样值数据的变体感兴趣。受脑连接组示例的启发,我们研究了一种潜变量生成模型,这种模型的附加优势是为我们提供不确定性估计-这是我们正在考虑的医学应用中的关键数量。虽然已经提出了潜变量模型来学习欧几里得数据的线性和非线性空间,或者流形数据的测地子空间,但没有内在的潜变量模型存在于学习流形数据的非测地子空间。本文填补了这一空白,并提出了一个具有流形值数据的内在生成模型的黎曼变分自动编码器。我们通过引入加权黎曼子流形的形式主义,评估其在合成和真实数据集上的性能。 link
2020 CARP: Compression Through Adaptive Recursive Partitioning for Multi-Dimensional Images Rongjie Liu, Meng Li, Li Ma Fast and effective image compression for multi-dimensional images has become increasingly important for efficient storage and transfer of massive amounts of high resolution images and videos. Desirable properties in compression methods include (1) high reconstruction quality at a wide range of compression rates while preserving key local details, (2) computational scalability, (3) applicability to a variety of different image/video types and of different dimensions, and (4) ease of tuning. We present such a method for multi-dimensional image compression called Compression via Adaptive Recursive Partitioning (CARP). CARP uses an optimal permutation of the image pixels inferred from a Bayesian probabilistic model on recursive partitions of the image to reduce its effective dimensionality, achieving a parsimonious representation that preserves information. CARP uses a multi-layer Bayesian hierarchical model to achieve self-tuning and regularization to avoid overfitting-- resulting in one single parameter to be specified by the user to achieve the desired compression rate. Extensive numerical experiments using a variety of datasets including 2D ImageNet, 3D medical image, and real-life YouTube and surveillance videos show that CARP dominates the state-of-the-art compression approaches-- including JPEG, JPEG2000, MPEG4, and a neural network-based method--for all of these different image types and often on nearly all of the individual images. 多维图像的快速有效压缩对于高效存储和传输大量高分辨率图像和视频变得日益重要。压缩方法中理想的性质包括(1)在广泛的压缩率范围内具有高重建质量,同时保留关键的局部细节,(2)计算可扩展性,(3)适用于各种不同类型和不同维度的图像/视频,(4)易于调整。我们提出了一种名为压缩自适应递归分区(CARP)的多维图像压缩方法。CARP使用从图像递归分区的贝叶斯概率模型推断出的图像像素的最佳排列,以减少其有效维度,实现保留信息的简洁表示。CARP使用多层贝叶斯层次模型实现自调整和正则化,以避免过拟合-结果是用户只需指定一个单一参数即可实现所需的压缩率。广泛的数值实验使用各种数据集,包括2D ImageNet,3D 医学图像,以及真实的 YouTube 和监控视频,表明CARP在所有这些不同图像类型上均优于最先进的压缩方法-包括JPEG,JPEG2000,MPEG4和基于神经网络的方法-通常在几乎所有个别图像上都表现优异。 link
2020 Interactive Object Segmentation With Inside-Outside Guidance Shiyin Zhang, Jun Hao Liew, Yunchao Wei, Shikui Wei, Yao Zhao This paper explores how to harvest precise object segmentation masks while minimizing the human interaction cost. To achieve this, we propose an Inside-Outside Guidance (IOG) approach in this work. Concretely, we leverage an inside point that is clicked near the object center and two outside points at the symmetrical corner locations (top-left and bottom-right or top-right and bottom-left) of a tight bounding box that encloses the target object. This results in a total of one foreground click and four background clicks for segmentation. The advantages of our IOG is four-fold: 1) the two outside points can help to remove distractions from other objects or background; 2) the inside point can help to eliminate the unrelated regions inside the bounding box; 3) the inside and outside points are easily identified, reducing the confusion raised by the state-of-the-art DEXTR in labeling some extreme samples; 4) our approach naturally supports additional clicks annotations for further correction. Despite its simplicity, our IOG not only achieves state-of-the-art performance on several popular benchmarks, but also demonstrates strong generalization capability across different domains such as street scenes, aerial imagery and medical images, without fine-tuning. In addition, we also propose a simple two-stage solution that enables our IOG to produce high quality instance segmentation masks from existing datasets with off-the-shelf bounding boxes such as ImageNet and Open Images, demonstrating the superiority of our IOG as an annotation tool. 本文探讨了如何在最小化人力交互成本的同时获取精确的对象分割掩模。为了实现这一目标,我们在这项工作中提出了一种内外引导(IOG)方法。具体而言,我们利用一个靠近对象中心的内部点和两个位于密闭边界框的对称角位置(左上和右下或右上和左下)的外部点,这个边界框围绕着目标对象。这导致了总共一个前景点击和四个用于分割的背景点击。我们的IOG具有四个优点:1)两个外部点可以帮助消除其他对象或背景的干扰;2)内部点可以帮助消除边界框内的不相关区域;3)内部和外部点容易识别,减少了DEXTR在标记一些极端样本时引起的混乱;4)我们的方法自然地支持额外的点击注释以进行进一步的校正。尽管其简单性,我们的IOG不仅在几个流行的基准测试中取得了最先进的性能,而且在不需要微调的情况下展示了跨不同领域(如街景、航空图像和医学图像)的强大泛化能力。此外,我们还提出了一个简单的两阶段解决方案,使我们的IOG能够从现有数据集(如ImageNet和Open Images)中生成高质量的实例分割掩模,并展示了我们的IOG作为注释工具的优越性。 link
2020 Pathological Retinal Region Segmentation From OCT Images Using Geometric Relation Based Augmentation Dwarikanath Mahapatra, Behzad Bozorgtabar, Ling Shao Medical image segmentation is important for computer aided diagnosis. Pixelwise manual annotations of large datasets require high expertise and is time consuming. Conventional data augmentations have limited benefit by not fully representing the underlying distribution of the training set, thus affecting model robustness when tested on images captured from different sources. Prior work leverages synthetic images for data augmentation ignoring the interleaved geometric relationship between different anatomical labels. We propose improvements over previous GAN-based medical image synthesis methods by jointly encoding the intrinsic relationship of geometry and shape. Latent space variable sampling results in diverse generated images from a base image and improves robustness. Augmented datasets using our method for automatic segmentation of retinal optical coherence tomography (OCT) images outperform existing methods on the public RETOUCH dataset having images captured from different acquisition procedures. Ablation studies and visual analysis also demonstrate benefits of integrating geometry and diversity. 医学图像分割对于计算机辅助诊断至关重要。对大型数据集进行像素级手动标注需要高超的专业知识,并且非常耗时。传统的数据增强方法通过未能充分代表训练集的基本分布而受到限制,因此在测试来自不同来源的图像时会影响模型的稳健性。先前的工作利用合成图像进行数据增强,忽略了不同解剖标签之间的交错几何关系。我们提出了改进前面基于GAN的医学图像合成方法的方法,通过共同编码几何和形状的内在关系。潜在空间变量采样导致从基础图像生成多样化的图像,并提高了稳健性。使用我们的方法增强的数据集,用于自动分割视网膜光学相干断层扫描(OCT)图像,在公共RETOUCH数据集上表现出优于现有方法的效果,该数据集包含来自不同获取过程的图像。消融研究和视觉分析也证明了集成几何和多样性的好处。 link
2020 Total Deep Variation for Linear Inverse Problems Erich Kobler, Alexander Effland, Karl Kunisch, Thomas Pock Diverse inverse problems in imaging can be cast as variational problems composed of a task-specific data fidelity term and a regularization term. In this paper, we propose a novel learnable general-purpose regularizer exploiting recent architectural design patterns from deep learning. We cast the learning problem as a discrete sampled optimal control problem, for which we derive the adjoint state equations and an optimality condition. By exploiting the variational structure of our approach, we perform a sensitivity analysis with respect to the learned parameters obtained from different training datasets. Moreover, we carry out a nonlinear eigenfunction analysis, which reveals interesting properties of the learned regularizer. We show state-of-the-art performance for classical image restoration and medical image reconstruction problems. 在成像中的多样化反问题可以被视为由特定任务数据保真度项和正则化项组成的变分问题。本文提出了一种新颖的可学习的通用正则化器,利用了深度学习的最新架构设计模式。我们将学习问题构建为一个离散采样最优控制问题,通过推导伴随状态方程和最优性条件来解决该问题。通过利用我们方法的变分结构,我们对来自不同训练数据集的学习参数进行了敏感性分析。此外,我们进行了非线性特征函数分析,揭示了学习正则化器的有趣性质。我们展示了在经典图像恢复和医学图像重建问题上的最先进性能。 link
2020 Iteratively-Refined Interactive 3D Medical Image Segmentation With Multi-Agent Reinforcement Learning Xuan Liao, Wenhao Li, Qisen Xu, Xiangfeng Wang, Bo Jin, Xiaoyun Zhang, Yanfeng Wang, Ya Zhang Existing automatic 3D image segmentation methods usually fail to meet the clinic use. Many studies have explored an interactive strategy to improve the image segmentation performance by iteratively incorporating user hints. However, the dynamic process for successive interactions is largely ignored. We here propose to model the dynamic process of iterative interactive image segmentation as a Markov decision process (MDP) and solve it with reinforcement learning (RL). Unfortunately, it is intractable to use single-agent RL for voxel-wise prediction due to the large exploration space. To reduce the exploration space to a tractable size, we treat each voxel as an agent with a shared voxel-level behavior strategy so that it can be solved with multi-agent reinforcement learning. An additional advantage of this multi-agent model is to capture the dependency among voxels for segmentation task. Meanwhile, to enrich the information of previous segmentations, we reserve the prediction uncertainty in the state space of MDP and derive an adjustment action space leading to a more precise and finer segmentation. In addition, to improve the efficiency of exploration, we design a relative cross-entropy gain-based reward to update the policy in a constrained direction. Experimental results on various medical datasets have shown that our method significantly outperforms existing state-of-the-art methods, with the advantage of less interactions and a faster convergence. 现有的自动3D图像分割方法通常无法满足临床使用的需求。许多研究探讨了通过迭代地引入用户提示来改善图像分割性能的交互式策略。然而,连续交互的动态过程往往被忽视。我们在这里提出将迭代交互式图像分割的动态过程建模为马尔可夫决策过程(MDP),并利用强化学习(RL)来解决它。不幸的是,由于探索空间巨大,单一代理RL无法用于体素级预测。为了将探索空间减少到可处理的大小,我们将每个体素视为具有共享体素级行为策略的代理,以便可以使用多代理强化学习来解决它。这种多代理模型的另一个优势是捕获体素之间的依赖关系以进行分割任务。同时,为了丰富先前分割的信息,我们在MDP的状态空间中保留了预测不确定性,并导出了一个调整动作空间,从而实现更精确和更精细的分割。此外,为了提高探索的效率,我们设计了基于相对交叉熵增益的奖励,以在受限方向上更新策略。在各种医学数据集上的实验结果表明,我们的方法明显优于现有的最先进方法,具有较少的交互和更快的收敛速度。 link
2020 The Knowledge Within: Methods for Data-Free Model Compression Matan Haroush, Itay Hubara, Elad Hoffer, Daniel Soudry Background: Recently, an extensive amount of research has been focused on compressing and accelerating Deep Neural Networks (DNN). So far, high compression rate algorithms require part of the training dataset for a low precision calibration, or a fine-tuning process. However, this requirement is unacceptable when the data is unavailable or contains sensitive information, as in medical and biometric use-cases. Contributions: We present three methods for generating synthetic samples from trained models. Then, we demonstrate how these samples can be used to calibrate and fine-tune quantized models without using any real data in the process. Our best performing method has a negligible accuracy degradation compared to the original training set. This method, which leverages intrinsic batch normalization layers' statistics of the trained model, can be used to evaluate data similarity. Our approach opens a path towards genuine data-free model compression, alleviating the need for training data during model deployment. 背景:最近,大量的研究集中在压缩和加速深度神经网络(DNN)上。到目前为止,高压缩率算法需要部分训练数据集进行低精度校准,或进行微调过程。然而,当数据不可用或包含敏感信息时,如在医疗和生物识别用例中,这种要求是不可接受的。贡献:我们提出了三种从训练模型中生成合成样本的方法。然后,我们演示了如何利用这些样本来校准和微调量化模型,而在整个过程中不使用任何真实数据。我们表现最佳的方法与原始训练集相比几乎没有准确性下降。这种方法利用了训练模型的内在批量归一化层的统计数据,可用于评估数据的相似性。我们的方法为真正的无数据模型压缩打开了一条道路,减轻了在模型部署期间训练数据的需求。 link
2020 Computing Valid P-Values for Image Segmentation by Selective Inference Kosuke Tanizaki, Noriaki Hashimoto, Yu Inatsu, Hidekata Hontani, Ichiro Takeuchi Image segmentation is one of the most fundamental tasks in computer vision. In many practical applications, it is essential to properly evaluate the reliability of individual segmentation results. In this study, we propose a novel framework for quantifying the statistical significance of individual segmentation results in the form of p-values by statistically testing the difference between the object region and the background region. This seemingly simple problem is actually quite challenging because the difference --- called segmentation bias --- can be deceptively large due to the adaptation of the segmentation algorithm to the data. To overcome this difficulty, we introduce a statistical approach called selective inference, and develop a framework for computing valid p-values in which segmentation bias is properly accounted for. Although the proposed framework is potentially applicable to various segmentation algorithms, we focus in this paper on graph-cut- and threshold-based segmentation algorithms, and develop two specific methods for computing valid p-values for the segmentation results obtained by these algorithms. We prove the theoretical validity of these two methods and demonstrate their practicality by applying them to the segmentation of medical images. 图像分割是计算机视觉中最基本的任务之一。在许多实际应用中,正确评估单个分割结果的可靠性至关重要。本研究提出了一种新颖的框架,通过对对象区域和背景区域之间的差异进行统计测试,以形式化p值的形式量化单个分割结果的统计显著性。这个看似简单的问题实际上非常具有挑战性,因为差异---称为分割偏差---可能会由于分割算法对数据的适应而具有欺骗性的较大。为了克服这一困难,我们引入了一种称为选择性推断的统计方法,并开发了一个计算有效p值的框架,在其中适当考虑了分割偏差。尽管提出的框架可能适用于各种分割算法,但我们在本文中专注于基于图割和阈值的分割算法,并开发了两种针对这些算法获得的分割结果计算有效p值的特定方法。我们证明了这两种方法的理论有效性,并通过将它们应用于医学图像的分割来展示它们的实用性。 link
2020 Unsupervised Magnification of Posture Deviations Across Subjects Michael Dorkenwald, Uta Buchler, Bjorn Ommer Analyzing human posture and precisely comparing it across different subjects is essential for accurate understanding of behavior and numerous vision applications such as medical diagnostics, sports, or surveillance. Motion magnification techniques help to see even small deviations in posture that are invisible to the naked eye. However, they fail when comparing subtle posture differences across individuals with diverse appearance. Keypoint-based posture estimation and classification techniques can handle large variations in appearance, but are invariant to subtle deviations in posture. We present an approach to unsupervised magnification of posture differences across individuals despite large deviations in appearance. We do not require keypoint annotation and visualize deviations on a sub-bodypart level. To transfer appearance across subjects onto a magnified posture, we propose a novel loss for disentangling appearance and posture in an autoencoder. Posture magnification yields exaggerated images that are different from the training set. Therefore, we incorporate magnification already into the training of the disentangled autoencoder and learn on real data and synthesized magnifications without supervision. Experiments confirm that our approach improves upon the state-of-the-art in magnification and on the application of discovering posture deviations due to impairment. 分析人类姿势并在不同个体之间进行精确比较对于准确理解行为和许多视觉应用(如医学诊断、运动或监视)至关重要。运动放大技术有助于看到肉眼看不见的姿势微小偏差。然而,在比较外观各异的个体之间的微小姿势差异时,它们会失败。基于关键点的姿势估计和分类技术可以处理外观上的大变化,但对于微小姿势偏差是不变的。我们提出了一种方法,可以在不需要关键点注释的情况下放大不同个体之间的姿势差异,尽管外观存在较大差异。为了将外观转移到放大的姿势上,我们提出了一种新颖的损失函数,用于在自动编码器中分离外观和姿势。姿势放大会产生与训练集不同的夸张图像。因此,我们将放大已经融入到分离自动编码器的训练中,并在真实数据和合成放大数据上进行无监督学习。实验证实我们的方法在放大方面超越了现有技术,并在发现由于损伤导致的姿势偏差的应用上取得了进展。 link
2020 SAINT: Spatially Aware Interpolation NeTwork for Medical Slice Synthesis Cheng Peng, Wei-An Lin, Haofu Liao, Rama Chellappa, S. Kevin Zhou Deep learning-based single image super-resolution (SISR) methods face various challenges when applied to 3D medical volumetric data (i.e., CT and MR images) due to the high memory cost and anisotropic resolution, which adversely affect their performance. Furthermore, mainstream SISR methods are designed to work over specific upsampling factors, which makes them ineffective in clinical practice. In this paper, we introduce a Spatially Aware Interpolation NeTwork (SAINT) for medical slice synthesis to alleviate the memory constraint that volumetric data poses. Compared to other super-resolution methods, SAINT utilizes voxel spacing information to provide desirable levels of details, and allows for the upsampling factor to be determined on the fly. Our evaluations based on 853 CT scans from four datasets that contain liver, colon, hepatic vessels, and kidneys show that SAINT consistently outperforms other SISR methods in terms of medical slice synthesis quality, while using only a single model to deal with different upsampling factors 基于深度学习的单幅图像超分辨率(SISR)方法在应用于3D医学体积数据(即CT和MR图像)时面临各种挑战,主要是由于高内存成本和各向异性分辨率,这会对它们的性能产生不利影响。此外,主流的SISR方法设计用于特定的上采样因子,这使它们在临床实践中表现不佳。在本文中,我们介绍了一种用于医学切片合成的空间感知插值网络(SAINT),以缓解体积数据带来的内存约束。与其他超分辨率方法相比,SAINT利用体素间距信息提供理想的细节水平,并允许实时确定上采样因子。我们基于四个包含肝脏、结肠、肝血管和肾脏的数据集的853个CT扫描的评估结果表明,SAINT在医学切片合成质量方面始终优于其他SISR方法,同时只使用一个模型来处理不同的上采样因子。 link
2020 LT-Net: Label Transfer by Learning Reversible Voxel-Wise Correspondence for One-Shot Medical Image Segmentation Shuxin Wang, Shilei Cao, Dong Wei, Renzhen Wang, Kai Ma, Liansheng Wang, Deyu Meng, Yefeng Zheng We introduce a one-shot segmentation method to alleviate the burden of manual annotation for medical images. The main idea is to treat one-shot segmentation as a classical atlas-based segmentation problem, where voxel-wise correspondence from the atlas to the unlabelled data is learned. Subsequently, segmentation label of the atlas can be transferred to the unlabelled data with the learned correspondence. However, since ground truth correspondence between images is usually unavailable, the learning system must be well-supervised to avoid mode collapse and convergence failure. To overcome this difficulty, we resort to the forward-backward consistency, which is widely used in correspondence problems, and additionally learn the backward correspondences from the warped atlases back to the original atlas. This cycle-correspondence learning design enables a variety of extra, cycle-consistency-based supervision signals to make the training process stable, while also boost the performance. We demonstrate the superiority of our method over both deep learning-based one-shot segmentation methods and a classical multi-atlas segmentation method via thorough experiments. 我们介绍了一种一次性分割方法,以减轻医学图像手动标注的负担。其主要思想是将一次性分割视为经典的基于图谱的分割问题,其中学习从图谱到未标记数据的体素对应关系。随后,可以通过学习的对应关系将图谱的分割标签转移到未标记数据中。然而,由于图像之间的地面实况对应通常不可用,学习系统必须接受良好的监督,以避免模式崩溃和收敛失败。为了克服这一困难,我们借鉴了前后一致性,这在对应问题中被广泛使用,并另外学习了从变形图谱回到原始图谱的后向对应。这种循环对应学习设计能够生成各种额外的、基于循环一致性的监督信号,使训练过程稳定,同时提高性能。我们通过彻底的实验证明了我们的方法优于基于深度学习的一次性分割方法和经典的多图谱分割方法。 link