Skip to content

Google Summer of Code: 2019

Christopher Kormanyos edited this page Jan 10, 2020 · 25 revisions

For a general overview of Google Summer of Code (GSoC) and Boost's related activities, please read GSoC Overview

This year's GSoC will be housed at https://github.com/BoostGSoC19 Students may find examining past GSoC source code and commit histories of use.

Suggested GSoC project proposals

To any potential mentors, please add your projects here

Boost.GIL: Generic Image Library

Potential mentor: Stefan Seefeld, Mateusz Loskot

All projects with Boost.GIL require knowledge of C++11 and at least a basic understanding of image processing algorithms.

Project 1 :

Boost.GIL provides core features for images. But at the current point, it does not have a lot of mainstream image processing algorithms which are used regularly. Here I am proposing to implement different image processing algorithms for Boost.GIL by using existing features of the GIL or using its extensions. More details can be found here.

Project 2 :

Programming competency test

Using latest Boost.GIL, implement a simple convolution filter and show its use in a test application to detect edges in a grayscale input image.

Alternatively, choose a simple algorithm from any topic of image processing, preferably related to your interests regarding the GSoC project, and propose its implementation using latest Boost.GIL.

Boost.Python: C++ - Python integration

Potential mentor: Stefan Seefeld

All projects with Boost.Python require knowledge of Python and C++(11).

Project 1 :

Project 2 :

Programming competency test

Implement a C++ interface for Python's set type and demonstrate its use in a Boost.Python test.

Boost.uBLAS: Linear and Multilinear Algebra with Matrix and Tensor Computations

Potential mentor: Cem Bassoy, Stefan Seefeld, David Bellot

All projects with Boost.uBLAS requires knowledge of C++11/14/17.

Background

uBLAS has been designed to support users with matrix computations for linear algebra applications. The design and implementation unify mathematical notation via operator overloading and efficient code generation via expression templates. We would like to improve the functionality of this library by adding new algorithms and functionality especially in the field of data analysis and machine learning using matrices and tensors. The main focus of the following proposed projects are to extend and modify the new tensor extension with C++14 and C++17 language features in order to support algorithms for e.g. machine learning and quantum computing applications. A very good introduction is given by Tamara Kolda on Youtube.

The tensor extension is a result of a GSoC 2018 project, see Github and GSoC Archive. The following project proposals shall complete the GSoC 2018 project by adding core features of the tensor library based on the C++ Core Guidelines and the Clang Tidy Tool. Students should also carefully read the GSoC student guide. Students may adjust our proposals and if they want present and discuss their own ideas with their potential mentors.

Project 1 : Extending and modifying boost::numeric::ublas::tensor

We want boost::numeric::ublas::tensor to support static order (number of dimensions) and eventually the dimensions. Including template parameters, the tensor framework needs to be generalized to support different sets of template prameters. We suggest a policy based design with engines that decide e.g. how memory is allocated, elements are accessed and which devices (CPU, GPU, etc.) are used. For further information on policy based design please have a look at Wikipedia and Youtube.

Project 2 : Adding new numerical linear and multilinear algebra algorithms

We want boost::numeric::ublas::tensor to support also fundamental tensor operations for large data analysis and basic tensor decomposition algorithms. The current implemention only supports conventional tensor contraction operations such as the tensor-times-tensor/matrix and vector multiplication. However, we need other special products such as the hadamard or kronecker product in order to perform alternative least square algorithms, higher-order singular value decomposition or higher-order power methods for e.g. large-scale data analysis and quantum computing applications.

Project 3 : Modifying (smart) expression templates

We want boost::numeric::ublas::tensor to provide smart expression templates. Traditional expression templates have been designed to provide an efficient performance optimization tool for C++ codes. However, Klaus Iglberger showed in his work, arxiv, that traditional ETs do not automatically result in faster execution times of the expression. Smart expression templates first evaluate the expression and perform optimizations on the expression tree before execution. uBLAS needs a smart expression template evaluator. Students can use the existing Boost.YAP or extend my initial implementation.

Project 4

We want to implement non-negative matrix factorization in uBlas. Non-negative matrix factorization (NMF for short) is a group of algorithms in multivariate analysis and linear algebra where a matrix V is factorized into (usually) two matrices W and H, with the property that all three matrices have no negative elements.

This non-negativity makes the resulting matrices easier to inspect.Since the problem is not exactly solvable in general, it is commonly approximated numerically.

NMF algorithms are also a very important family of algorithms in Machine Learning because it has an inherent clustering property: it automatically clusters the columns of input data.

We recommend students to first read about the topic. This project requires a good level in C++ as well as in linear algebra and mathematics.

Project 5

Benchmarking ublas. We want to compare the performance of ublas in many configuration with others libraries, on several architecture, including several compilation optimizations and do that on a systematic way. We want to see if ublas is good or not against Eigen, Armadillo, Intel MKL, OpenBlas, etc... etc... etc... you name it... etc... This project aims at putting in place a test we can run again and again to test everything related to speed in ublas of most of the algorirthms and provide a nicely presented report and if possible store results. It will help us in the future to track every speed regression problem and enhance the performance of ublas.

Programming competency test

Implement a small library in C++ using expression templates and modern C++11/14/17 features like generic lambdas to:

  • represent a matrix of numerical types (int, long, float, double, complex,....)
  • compute algebraic expressions including
    • elementwise addition, subtraction, scalar multiplication operations using the operators +, -, +=, -=, etc.
    • and the standard matrix multiplication using the most simple algorithm with an operator of your choice.
  • fit in one header file
  • provides a test program which measures the speed of execution for each examples provided

Potential mentor: Ruchika Joshi

Background

Development of this library started in GSoC 2018. This library target to provide multiple features for astronomy at a place. At his point, it aims to provide support for the astronomical coordinate system and FITS file. At this point, it has the base coordinate system and reading FITS' primary data unit and image extension reading developed.

Project 1 : Bringing Boost.Astronomy to review-ready state.

Integration of Boost::geometry and Boost::units is not supported and without using units scientific calculations are not error proof. As a result, current coordinate systems can not be used with large projects as there are high possibilities of error. So this year in this project one task will be to implement Boost::unit with Boost::astronomy::coordinate to provide a robust astronomical coordinate system. Also new transformation classes for coordinate system for conversions from one to another.

Apart from this completing the remaining parser for the FITS to read complete file including all types of extensions would be another task in this project.

In the end, the target of this project would be to Make this library ready for its first release with the fully functional coordinate system and FITS file handling with documentation.

Programming competency test

Implement a geometric vector class. Each coordinate must be storing the value with its unit(use Boost.Unit). Implement 2 functions for the sum of vectors and cross product. You can also use Boost.Geometry to implement these functions.

Boost.SafeFloat

Potential mentor: Damian Vicino

Float arithmetic operations in C++ are NOT guaranteed to yield a correct mathematical result. For instance, the overflow on addition operation may produce an infinite value when using float. SafeFloat proposes implementing a drop-in replacement for floating point numeric data types guaranteeing that, when used in expressions in a C++ program, no incorrect arithmetic results will be produced without developer awareness.

In addition to the arithmetic problems, some others can result in security concerns related to usage of floating point. In these regards, the CERT Division of Software Engineering Institute at Carnegie Mellon University publishes guidelines for Secure Programming when using FP datatypes [1].

A survey of arithmetic problems was written by David Goldberg in 1991 [2]. And the limitations of Intel architecture for implementing floating point are described in the IA-32 [3] and x87 [4] manuals.

Problems with floating-point can be categorised in:

  • Domain/Pole/Range errors
  • Approximation errors and narrowing on casting
  • Unexpected input (e.g. reading a NaN from a stream when expecting a number)
  • Ill defined (e.g. initialise or read as input “0.1”)
  • Unexpected operation results (e.g. undetected denormalisation, rounding, underflow, overflow to infinity or NaN operands)

Since C++11 standard [5], a mechanism is provided for access to the floating point environment, giving the possibility of tackle the problem efficiently in a platform independent way. Using the floating-point environment, most problems can be detected by introducing multiple lines of code after each operation for checking flags. Requiring the programmer to write down all these checks in every place a floating point operation is executed is error-prone (but would be required for safe execution of the math).

In general, the problems mentioned lead to obscure and hard to detect bugs for the non-expert in numerical methods. However, some algorithms are designed taking in account approximations and errors to happen and could be silently discarded. Thus, our approach in safe-float is allowing the declaration of concerns to be considered for each declared variable (e.g. we can define safe_float<float, check_division_by_zero> which only checks for division by zero problems).

PROJECT 1. Bringing Boost.SafeFloat to review-ready state.

During GSoC 2015, safe-float was partially implemented exploring different designs. Now, with a stable design, we want to focus in finishing up the details to start the review process.

  • Implement new policies for cast and error handling of the data type.
  • Implement SafeFloat literals for IEEE754 standard compliance.
  • Extend the test suite.
  • Define Concepts were required.
  • Check correct exception guarantees are implemented.
  • Extend current documentation [6].

Programming competency test

  • Define a user defined literal assignable to float that fails compilation when the value provided cannot be expressed as an positive integer power of 0.5 (e.g. 0.5, 0.25, 0.125).
  • Provide a function receiving an integer (X) and a tuple (std::tuple<type1, type2, type3…>) into a tuple of vectors (std::tuple<std::vector,std::vector, std::vector, …> where each vector has X elements of each originally received in each tuple_element. E.g. for X=2 and the tuple {1, 1.0, ‘a’} , the result type is std::tuple<std::vector, std::vector, std::vector> and the values are: {{1, 1},{1.0, 1.0},{‘a’, ‘a’}}
  • Provide a template function “get_vector”, similar to std::get, that given the type X as parameter, returns editable access to the vector of type X in a tuple of vectors.

References

[1] SEI CERT. The CERT C++ secure coding standard. Pearson Education, 2008.

[2] Goldberg, David. "What every computer scientist should know about floating-point arithmetic." ACM Computing Surveys (CSUR) 23.1 (1991): 5-48.

[3] Intel® 64 and IA-32 Architectures Software Developer’s Manual - Volume 2 (specially sections about FP flags in FADD, FSUB, FMUL and FDIV)

[4] x87 and SSE Floating Point Assists in IA-32: Flush-To-Zero (FTZ) and Denormals-Are-Zero (DAZ)

[5] ​http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf

[6] ​http://sdavtaker.github.io/safefloat/doc/html/index.html

Boost.Real

Potential mentor: Damian Vicino

Background

In 1936, Alan Turing introduced computable real numbers in his foundational paper ‘On Computable Numbers, with an Application to the Entscheidungsproblem’ [1]. He defines: ‘The computable numbers may be described briefly as the real numbers whose expressions as a decimal are calculable by finite means’. Different authors provided several equivalent definitions, for example, an alternative equivalent definition is 'a number is computable if there exists an algorithm to produce each digit of its decimal expansion and for any digit requested it finishes the execution successfully' [2].

Around these ideas, a new area of applied mathematics was developed based on the theory of computable real numbers, called Computable Calculus [2]. In Computable Calculus, it is common to represent numbers using algorithms that can generate their digits and, define arithmetic functions that lazily operate them [2].

Using this approach is possible to calculate solutions for arithmetic problems that require accurate results. For example, in the construction of generic Discrete Event Simulation timelines [3].

PROJECT 1. Bringing Boost.Real to review-ready state.

During GSoC 2018, Real was partially implemented exploring different designs. Now, with a stable design, we want to focus in finishing up the details to start the review process.

Current status is described here: https://medium.com/@laobelloli/boost-real-9e2dfbfbed5b

Programming competency test

  • Describe an optimal representation for Real number "digits".
  • Explain how could you provide a "Small String Optimization"-like approach for Real numbers representation at compile time.

References

[1] Alan Mathison Turing. On computable numbers, with an application to the Entscheidungsproblem. J. of Math, 58(345-363):5, 1936.

[2] Oliver Aberth. Computable Calculus. Academic Press, 2001.

[3] Vicino, Damián, Olivier Dalle, and Gabriel Wainer. An advanced data type with irrational numbers to implement time in DEVS simulators. Proceedings of the Symposium on Theory of Modeling & Simulation. Society for Computer Simulation International, 2016.

Boost.Geometry: Generic Geometry Library

Potential mentors: Vissarion Fysikopoulos, Adam Wulkiewicz

All projects requires knowledge of C++ template metaprogramming techniques.

Background

Boost.Geometry part of collection of the Boost C++ Libraries, defines concepts, primitives and algorithms for solving geometry problems.

See http://www.boost.org/libs/geometry

PROJECT 1.

Implement algorithms for the concave hull problem (e.g. from [1][2] etc.). This is the POSTGIS related function https://postgis.net/docs/manual-2.5/ST_ConcaveHull.html See also: http://boost-geometry.203548.n3.nabble.com/concave-hull-td4026717.html

PROJECT 2.

Implement support for non-cartesian geometries in convex_hull() either by adapting existing algorithm and strategies (preferable) or implementing it as different algorithm. See: https://en.wikipedia.org/wiki/Convex_hull_algorithms

PROJECT 3.

Implement algorithms for computing the Delaunay triangulation and/or Voronoi diagram of a set of points. This is the related function from POSTGIS https://postgis.net/docs/manual-2.5/ST_DelaunayTriangles.html

PROJECT 4.

Implement algorithms for random point generation (according to some given distribution) in a geometry. See for example https://postgis.net/docs/manual-2.5/ST_GeneratePoints.html

Programming competency test

As mentioned above you may choose to either provide links to existing library you developed or take the competency test. In case of the latter the requirements are listed below.

PROJECT 1, 2. Implement and test “Gift wrapping” convex hull algorithm for types adapted to Boost.Geometry MultiPoint concept.

PROJECT 3. Implement and test an algorithm that computes a triangulation (not necessarily the Delaunay) of a set of points.

PROJECT 4. Implement and test an algorithm that generates random points on a convex polygon.

References

[1] Moreira, Adriano J. C. and Maribel Yasmina Santos. “Concave hull: A k-nearest neighbours approach for the computation of the region occupied by a set of points.” GRAPP (2007).

[2] https://www.iis.sinica.edu.tw/page/jise/2012/201205_10.pdf

Clone this wiki locally