Skip to content

Latest commit

 

History

History
8 lines (8 loc) · 16 KB

ICCV2017.md

File metadata and controls

8 lines (8 loc) · 16 KB
年份 题目 作者 摘要 中文摘要 link
2017 Intrinsic 3D Dynamic Surface Tracking Based on Dynamic Ricci Flow and Teichmuller Map Xiaokang Yu, Na Lei, Yalin Wang, Xianfeng Gu 3D dynamic surface tracking is an important research problem and plays a vital role in many computer vision and medical imaging applications. However, it is still challenging to efficiently register surface sequences which has large deformations and strong noise. In this paper, we propose a novel automatic method for non-rigid 3D dynamic surface tracking with surface Ricci flow and Teichmuller map methods. According to quasi-conformal Teichmuller theory, the Techmuller map minimizes the maximal dilation so that our method is able to automatically register surfaces with large deformations. Besides, the adoption of Delaunay triangulation and quadrilateral meshes makes our method applicable to low quality meshes. In our work, the 3D dynamic surfaces are acquired by a high speed 3D scanner. We first identified sparse surface features using machine learning methods in the texture space. Then we assign landmark features with different curvature settings and the Riemannian metric of the surface is computed by the dynamic Ricci flow method, such that all the curvatures are concentrated on the feature points and the surface is flat everywhere else. The registration among frames is computed by the Teichmuller mappings, which aligns the feature points with least angle distortions. We apply our new method to multiple sequences of 3D facial surfaces with large expression deformations and compare them with two other state-of-the-art tracking methods. The effectiveness of our method is demonstrated by the clearly improved accuracy and efficiency. 3D动态表面跟踪是一个重要的研究问题,在许多计算机视觉和医学成像应用中发挥着至关重要的作用。然而,有效地注册具有大变形和强噪声的表面序列仍然具有挑战性。本文提出了一种新颖的自动非刚性3D动态表面跟踪方法,采用表面里奇流和Teichmuller映射方法。根据准共形Teichmuller理论,Teichmuller映射将最大膨胀最小化,因此我们的方法能够自动注册具有大变形的表面。此外,采用Delaunay三角剖分和四边形网格使我们的方法适用于质量较低的网格。在我们的工作中,3D动态表面是通过高速3D扫描仪获取的。我们首先在纹理空间中使用机器学习方法识别稀疏表面特征。然后我们为不同曲率设置的地标特征分配特征点,并通过动态里奇流方法计算表面的黎曼度量,使所有曲率集中在特征点上,而其他地方表面是平坦的。帧间的配准是通过Teichmuller映射计算的,它将特征点与最小角度扭曲对齐。我们将新方法应用于具有大表情变形的多个3D面部表面序列,并将其与其他两种最先进的跟踪方法进行比较。我们的方法的有效性通过明显提高的准确性和效率得到了证实。 link
2017 Rotation Equivariant Vector Field Networks Diego Marcos, Michele Volpi, Nikos Komodakis, Devis Tuia In many computer vision tasks, we expect a particular behavior of the output with respect to rotations of the input image. If this relationship is explicitly encoded, instead of treated as any other variation, the complexity of the problem is decreased, leading to a reduction in the size of the required model. We propose Rotation Equivariant vector field Networks (RotEqNet) to encode rotation equivariance and invariance into Convolutional Neural Networks (CNNs). Each convolutional filter is applied at multiple orientations and returns a vector field that represents the magnitude and angle of the highest scoring orientation at every spatial location. A modified convolution operator using vector fields as inputs and filters can then be applied to obtain deep architectures. We test RotEqNet on several problems requiring different responses with respect to the inputs' rotation: image classification, biomedical image segmentation, orientation estimation and patch matching. In all cases, we show that RotEqNet offers very compact models in terms of number of parameters and provides results in line to those of networks orders of magnitude larger. 在许多计算机视觉任务中,我们期望输出与输入图像的旋转具有特定关系。如果这种关系被明确编码,而不是视为任何其他变化,问题的复杂性将降低,从而减少所需模型的规模。我们提出了旋转等变向量场网络(RotEqNet),将旋转等变性和不变性编码到卷积神经网络(CNNs)中。每个卷积滤波器在多个方向上应用,并返回表示每个空间位置上最高得分方向的幅度和角度的向量场。然后可以应用使用向量场作为输入和滤波器的修改后卷积运算符来获得深度架构。我们在几个需要对输入旋转具有不同响应的问题上测试了RotEqNet:图像分类、生物医学图像分割、方向估计和补丁匹配。在所有情况下,我们展示了RotEqNet在参数数量方面提供非常紧凑的模型,并提供了与数量级更大的网络相当的结果。 link
2017 Interpretable Explanations of Black Boxes by Meaningful Perturbation Ruth C. Fong, Andrea Vedaldi As machine learning algorithms are increasingly applied to high impact yet high risk tasks, such as medical diagnosis or autonomous driving, it is critical that researchers can explain how such algorithms arrived at their predictions. In recent years, a number of image saliency methods have been developed to summarize where highly complex neural networks "look" in an image for evidence for their predictions. However, these techniques are limited by their heuristic nature and architectural constraints. In this paper, we make two main contributions: First, we propose a general framework for learning different kinds of explanations for any black box algorithm. Second, we specialise the framework to find the part of an image most responsible for a classifier decision. Unlike previous works, our method is model-agnostic and testable because it is grounded in explicit and interpretable image perturbations. 随着机器学习算法越来越多地应用于高影响力但高风险的任务,如医学诊断或自动驾驶,研究人员能够解释这些算法如何得出预测是至关重要的。近年来,已开发了许多图像显著性方法,用于总结高度复杂的神经网络在图像中“查找”证据以支持其预测。然而,这些技术受到其启发式特性和架构约束的限制。在本文中,我们提出了两个主要贡献:首先,我们提出了一个通用框架,用于学习任何黑匣子算法的不同类型的解释。其次,我们将该框架专门化,以找到图像中对分类器决策负有最大责任的部分。与先前的研究不同,我们的方法是模型不可知的,并且可测试,因为它基于明确且可解释的图像扰动。 link
2017 Point Set Registration With Global-Local Correspondence and Transformation Estimation Su Zhang, Yang Yang, Kun Yang, Yi Luo, Sim-Heng Ong We present a new point set registration method with global-local correspondence and transformation estimation (GL-CATE). The geometric structures of point sets are exploited by combining the global feature, the point-to-point Euclidean distance, with the local feature, the shape distance (SD) which is based on the histograms generated by an elliptical Gaussian soft count strategy. By using a bi-directional deterministic annealing scheme to directly control the searching ranges of the two features, the mixture-feature Gaussian mixture model (MGMM) is constructed to recover the correspondences of point sets. A new vector based structure constraint term is formulated to regularize the transformation. The accuracy of transformation updating is improved by constraining spatial structure at both global and local scales. An annealing scheme is applied to progressively decrease the strength of the regularization and to achieve the maximum overlap. Both of the aforementioned processes are incorporated in the EM algorithm, an unified optimization framework. We test the performances of our GL-CATE in contour registration, sequence images, real images, medical images, fingerprint images and remote sensing images, and compare with eight state-of-the-art methods where our method shows favorable performances in most scenarios. 我们提出了一种具有全局-局部对应和变换估计(GL-CATE)的新的点集配准方法。通过将点集的几何结构利用全局特征(点对点欧氏距离)和局部特征(基于椭圆高斯软计数策略生成的直方图的形状距离(SD))相结合,来实现点集的对应。通过使用双向确定性退火方案直接控制两个特征的搜索范围,构建混合特征高斯混合模型(MGMM)来恢复点集的对应关系。制定了一种新的基于向量的结构约束项来规范转换。通过在全局和局部尺度上约束空间结构,提高了转换更新的准确性。应用一个退火方案逐渐降低正则化的强度,实现最大重叠。这两个过程都纳入了EM算法,这是一个统一的优化框架。我们在轮廓配准、序列图像、真实图像、医学图像、指纹图像和遥感图像中测试了我们的GL-CATE的性能,并与八种最先进的方法进行比较,结果显示我们的方法在大多数场景中表现良好。 link
2017 Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes Yang Zhang, Philip David, Boqing Gong During the last half decade, convolutional neural networks (CNNs) have triumphed over semantic segmentation, which is a core task of various emerging industrial applications such as autonomous driving and medical imaging. However, to train CNNs requires a huge amount of data, which is difficult to collect and laborious to annotate. Recent advances in computer graphics make it possible to train CNN models on photo-realistic synthetic data with computer-generated annotations. Despite this, the domain mismatch between the real images and the synthetic data significantly decreases the models' performance. Hence we propose a curriculum-style learning approach to minimize the domain gap in semantic segmentation. The curriculum domain adaptation solves easy tasks first in order to infer some necessary properties about the target domain; in particular, the first task is to learn global label distributions over images and local distributions over landmark superpixels. These are easy to estimate because images of urban traffic scenes have strong idiosyncrasies (e.g., the size and spatial relations of buildings, streets, cars, etc.). We then train the segmentation network in such a way that the network predictions in the target domain follow those inferred properties. In experiments, our method significantly outperforms the baselines as well as the only known existing approach to the same problem. 在过去的半个十年中,卷积神经网络(CNNs)在语义分割领域取得了胜利,这是各种新兴工业应用的核心任务,如自动驾驶和医学成像。然而,训练CNNs需要大量数据,这些数据难以收集并且需要大量标注工作。最近计算机图形学的进展使得在具有计算机生成标注的逼真合成数据上训练CNN模型成为可能。尽管如此,真实图像与合成数据之间的域差异显著降低了模型的性能。因此,我们提出了一种课程式学习方法来最小化语义分割中的域差异。课程域自适应首先解决易于的任务,以推断目标域的一些必要属性;特别是,第一个任务是学习图像上的全局标签分布和地标超像素上的局部分布。这些易于估计,因为城市交通场景的图像具有强烈的特质(例如建筑物、街道、汽车等的大小和空间关系)。然后,我们以这种方式训练分割网络,使得网络在目标域中的预测遵循这些推断的属性。在实验中,我们的方法明显优于基线以及已知的解决同一问题的唯一方法。 link
2017 A Geometric Framework for Statistical Analysis of Trajectories With Distinct Temporal Spans Rudrasis Chakraborty, Vikas Singh, Nagesh Adluru, Baba C. Vemuri Analyzing data representing multifarious trajectories is central to the many fields in Science and Engineering; for example, trajectories representing a tennis serve, a gymnast's parallel bar routine, progression/remission of disease and so on. We present a novel geometric algorithm for performing statistical analysis of trajectories with distinct number of samples representing longitudinal (or temporal) data. A key feature of our proposal is that unlike existing schemes, our model is deployable in regimes where each participant provides a different number of acquisitions (trajectories have different number of sample points). To achieve this, we develop a novel method involving the parallel transport of the tangent vectors along each given trajectory to the starting point of the respective trajectories and then use the span of the matrix whose columns consist of these vectors, to construct a linear subspace in R^m. We then map these linear subspaces of R^m on to a single high dimensional hypersphere. This enables computing group statistics over trajectories by instead performing statistics on the hypersphere (equipped with a simpler geometry). Given a point on the hypersphere representing a trajectory, we also provide a "reverse mapping" algorithm to uniquely (under certain assumptions) reconstruct the subspace that corresponds to this point. Finally, by using existing algorithms for recursive Frechet mean and exact principal geodesic analysis on the hypersphere, we present several experiments on synthetic and real (vision and medical) data sets showing how group testing on such diversely sampled longitudinal data is possible by analyzing the reconstructed data in the subspace spanned by the first few PGs. 分析代表多种轨迹的数据在科学和工程的许多领域中至关重要;例如,代表网球发球、体操双杠例行动作、疾病进展/缓解等的轨迹。我们提出了一种新颖的几何算法,用于对具有不同采样数的轨迹进行统计分析,这些轨迹代表着纵向(或时间)数据。我们提案的一个关键特点是,与现有方案不同的是,我们的模型可在每个参与者提供不同数量的采集时使用(轨迹具有不同数量的采样点)。为实现这一目标,我们开发了一种新颖的方法,涉及沿着每个给定轨迹的切向量的平行传输到各自轨迹的起点,然后使用由这些向量组成的矩阵的张量,构建R^m中的线性子空间。然后,我们将R^m中的这些线性子空间映射到单个高维超球面上。这使得能够通过在超球面上执行统计学而不是在轨迹上执行统计学来计算轨迹的群统计信息(具有更简单的几何形状)。鉴于超球面上表示轨迹的一个点,我们还提供了一个“逆映射”算法,以唯一地(在某些假设下)重构与该点对应的子空间。最后,通过在超球面上使用现有的递归Frechet均值和精确主测地分析算法,我们展示了在合成和真实(视觉和医学)数据集上进行实验的多种可能性,展示了通过在由前几个主测地张成的子空间中分析重建数据来进行对这种多样采样的纵向数据的群体测试是可能的。 link