Skip to content

Google Summer of Code: 2018

Stefan Seefeld edited this page Jan 12, 2019 · 2 revisions

For a general overview of Google Summer of Code (GSoC) and Boost's related activities, please read GSoC Overview

This year's GSoC will be housed at https://github.com/BoostGSoC18 Students may find examining past GSoC source code and commit histories of use.

Suggested GSoC project proposals

To any potential mentors, please add your projects here

1. Boost.uBLAS: linear algebra and matrix computations

Potential mentor: David Bellot, Mohd Sharique

All projects with Boost.uBLAS requires knowledge of C++11/14/17.

Background

uBLAS is a library for linear algebra and matrix computations. Using recursive templates, it allows the compiler to optimize any complex linear algebra expressions as if it were written by hand by the programmer. Basic classes are matrix and vector. The library has all the basic functionalities and a few standard algorithms. We would like to improve the functionality of this library by adding new algorithms and functionality especially in the field of data analysis and machine learning.

Project 1 : Add statistics and machine learning functions

Boost.uBLAS has many basic functions for linear algebra but it lacks common algorithms in statistics and machine learning. The goal of this project is to lay out the fundation of uBLAS to support the programming of Machine Learning algorithms. More specifically, we are interested in implementing some fundamental components like:

  • basic stats functions: mean, median, standard deviation, variance, covariance, correlation
  • many sort ofdistributions like histograms, etc...
  • PCA, ICA and LDA factorizations
  • running stats: moving average, moving variance and anything else
  • K-means clustering algorithms
  • Gaussian Mixture Models algorithms
  • anything else the student can propose and have time to implement during the summer

A knowledge of basic statistics and Machine Learning algorithm is required for this project, on top of excellent C++ skils of course

Project 2 : Add support for tensors

Boost.uBLAS can deal with vectors and matrices but not tensors. Not yet. We want this feature. It's a hard project, highly experimental but it is a welcome feature in such a library. This will open the doors for Boost.uBLAS to be used in applications like neural networks and deep learning. It's a fundamental feature in those domains.

The student needs to have good math skills on top of excellent C++ skills for this project.

Project 3 : Add Multicore and GPU computations to uBLAS

The project description is simple: add support of multicore parallel and GPU computations to uBlas ! The realization is not straightforward though. Boost.uBlas is CPU only. If the compilers is able to vectorize, uBlas can take benefit of it. Here we want to extend Boost to the support of parallel architecture and GPU computations to enable it to do big data or deep learning computations.

The student will have to first understand how ublas works and how it generates and optimizes code with the expression template mechanism and then start adding options to enable the use of Boost.Compute. Test will be done on multicore systems and graphics card or computers which support Boost.Compute (through OpenCL for example).

We expect to see the basic matrix operations to be implemented like this. The code will have to be thoroughly documented and a tutorial document provided. We prefer quality of the implementation to exhaustivity.

For exceptionally good and fast students, extensions to support other library will be considered, like nVidia CUDA for example.

In other words it will have to be clean and super fast !

Programming competency test

Implement a small library in C++ using expression templates and modern C++11/14/17 features like generic lambdas to:

  • represent a matrix of numerical types (int, long, float, double, complex,....)
  • compute algebraic expressions including + and *, += and *=. Matrix multiplication can be implemented with the most simple algorithm
  • fit in one header file
  • provides a test program which can measure its speed of execution for each examples provided

2. Boost.Geometry

Potential mentors: Vissarion Fysikopoulos, Adam Wulkiewicz

All projects requires knowledge of C++ template metaprogramming techniques.

Background

Boost.Geometry part of collection of the Boost C++ Libraries, defines concepts, primitives and algorithms for solving geometry problems.

See http://www.boost.org/libs/geometry

PROJECT 1. Similarity between geometries

Implement algorithms for computing similarity between geometries. We start with linestrings and extend to rings or polygons depending on the student's performance. Standard measures are the Hausdorff distance, the Frechet distance and its variations---most notable the discrete Frechet distance.

The student will start by studying existing algorithms and implementations, implement a prototype with tests and benchmarks and finally integrate into Boost.Geometry.

PROJECT 2. Nearly antipodal points distance accuracy improvement

Compute accurately the distance between two nearly antipodal points on the ellipsoid of revolution. Currently Boost.Geometry does not handle this case correctly. A solution is proposed in [1] as a solution of a so called astroid problem. It is also implemented in C++ in [2].

PROJECT 3. R-tree serialization.

The goal is to implement serialization of Boost.Geometry R-tree with Boost.Serialization library. It should be done in a form of optional extension, i.e. Boost.Geometry shouldn't depend on Boost.Serialization by default. The example result could look like this:

#include <boost/archive/binary_oarchive.hpp>
#include <boost/geometry.hpp>
#include <boost/geometry/index/rtree.hpp>
#include <boost/geometry/serialization/index/rtree.hpp>

int main() {
    namespace bg = boost::geometry;
    namespace bgi = bg::index; 
    typedef bg::model::point<double, 2, bg::cs::cartesian> point_type;
    typedef bg::model::box<point_type> box_type;

    bgi::rtree<box_type, bgi::rstar<16> > rtree;

    // fill rtree with data

    std::ofstream file("serialized_rtree.bin",
                       std::ios::binary | std::ios::trunc);
    boost::archive::binary_oarchive archive(file);
    archive << rtree;
}

The project will require the student to understand parts of Boost.Geometry, in particular primitives and the internals of R-tree and the interface of Boost.Serialization library.

Programming competency test

As mentioned above you may choose to either provide links to existing library you developed or take the competency test. In case of the latter the requirements are listed below.

PROJECT 1. Implement an addition for Boost.Geometry that implements an algorithm that computes exactly the Frechet distance between two linestrings.

PROJECT 2. Implement a library that provides:

  1. a representation of points on Earth's surface
  2. a function to compute the shortest distance between two given points
  3. a function LinestringPointInterpolation(L,f) that takes as parameters a linestring L and a double f between 0 and 1 and returns a point p interpolated along L, where f is the fraction of the length of L the point p is located.

PROJECT 3 Implement an addition for Boost.Geometry that provides:

  1. serialization of Boost.Geometry models in arbitrary dimension:
    • model::point, model::point_xy,
    • model::box,
    • model::segment.
  2. handling of errors related to saving and loading of incompatible types

In all cases the code should be as generic as possible and fit in one header file.

Also provide main.cpp file that demonstrates the use of your code, i.e. (1) compute the Frechet distance between several linestrings (2) usage of distance function and comparison with distance functions in Boost.Geometry.

3. Boost.SafeFloat

Potential mentor: Damian Vicino

Float arithmetic operations in C++ are NOT guaranteed to yield a correct mathematical result. For instance, the overflow on addition operation may produce an infinite value when using float. SafeFloat proposes implementing a drop-in replacement for floating point numeric data types guaranteeing that, when used in expressions in a C++ program, no incorrect arithmetic results will be produced without developer awareness.

In addition to the arithmetic problems, some others can result in security concerns related to usage of floating point. In these regards, the CERT Division of Software Engineering Institute at Carnegie Mellon University publishes guidelines for Secure Programming when using FP datatypes [1].

A survey of arithmetic problems was written by David Goldberg in 1991 [2]. And the limitations of Intel architecture for implementing floating point are described in the IA-32 [3] and x87 [4] manuals.

Problems with floating-point can be categorised in:

  • Domain/Pole/Range errors
  • Approximation errors and narrowing on casting
  • Unexpected input (e.g. reading a NaN from a stream when expecting a number)
  • Ill defined (e.g. initialise or read as input “0.1”)
  • Unexpected operation results (e.g. undetected denormalisation, rounding, underflow, overflow to infinity or NaN operands)

Since C++11 standard [5], a mechanism is provided for access to the floating point environment, giving the possibility of tackle the problem efficiently in a platform independent way. Using the floating-point environment, most problems can be detected by introducing multiple lines of code after each operation for checking flags. Requiring the programmer to write down all these checks in every place a floating point operation is executed is error-prone (but would be required for safe execution of the math).

In general, the problems mentioned lead to obscure and hard to detect bugs for the non-expert in numerical methods. However, some algorithms are designed taking in account approximations and errors to happen and could be silently discarded. Thus, our approach in safe-float is allowing the declaration of concerns to be considered for each declared variable (e.g. we can define safe_float<float, check_division_by_zero> which only checks for division by zero problems).

PROJECT 1. Bringing Boost.SafeFloat to review-ready state.

During GSoC 2015, safe-float was partially implemented exploring different designs. Now, with a stable design, we want to focus in finishing up the details to start the review process.

  • Implement new policies for cast and error handling of the data type.
  • Implement SafeFloat literals for IEEE754 standard compliance.
  • Extend the test suite.
  • Define Concepts were required.
  • Check correct exception guarantees are implemented.
  • Extend current documentation [6].

Programming competency test

  • Define a user defined literal assignable to float that fails compilation when the value provided cannot be expressed as an positive integer power of 0.5 (e.g. 0.5, 0.25, 0.125).
  • Provide a function receiving an integer (X) and a tuple (std::tuple<type1, type2, type3…>) into a tuple of vectors (std::tuple<std::vector,std::vector, std::vector, …> where each vector has X elements of each originally received in each tuple_element. E.g. for X=2 and the tuple {1, 1.0, ‘a’} , the result type is std::tuple<std::vector, std::vector, std::vector> and the values are: {{1, 1},{1.0, 1.0},{‘a’, ‘a’}}
  • Provide a template function “get_vector”, similar to std::get, that given the type X as parameter, returns editable access to the vector of type X in a tuple of vectors.

References

[1] SEI CERT. The CERT C++ secure coding standard. Pearson Education, 2008.

[2] Goldberg, David. "What every computer scientist should know about floating-point arithmetic." ACM Computing Surveys (CSUR) 23.1 (1991): 5-48.

[3] Intel® 64 and IA-32 Architectures Software Developer’s Manual - Volume 2 (specially sections about FP flags in FADD, FSUB, FMUL and FDIV)

[4] x87 and SSE Floating Point Assists in IA-32: Flush-To-Zero (FTZ) and Denormals-Are-Zero (DAZ)

[5] ​http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf

[6] ​http://sdavtaker.github.io/safefloat/doc/html/index.html

4. New Astronomy Library

Potential mentor: Vinícius dos Santos Oliveira

Background

Even after so many years, C++ is not having a potential astronomical library which can cover major topics of astronomy. With this project, boost community can help astronomers to make complex astronomical calculations faster using C++. This library will have functionality aimed at professional and amateur astronomers, astrophysicists, but may be useful to anyone developing astronomy software. This library will help astronomers to analyse and gather information from observation data.

GSoC Project Proposal

This library will provide some fundamental class required in astronomy.

  • Basic astronomical functions
  • Classes for celestial coordinate system
  • Classes for light curves of variable stars
  • Documentation

Expansion of the library

Functionalities mentioned below can be implemented

  • Various functionality for spectroscopy.
  • Read and plot catalogs from text files.
  • Astrometry
  • Photometry

5. Enhancing Boost.Intrusive Library

Potential Mentor: Rishabh Arora E-mail: rishabharora1997@gmail.com

Background

This project aims at enhancing the Boost.Intrusive library, by adding complex data structures/containers/algorithms such as Segment Tree, Fenwick Tree, Suffix-Tree and Suffix-Automata, a generic Graph data structure (other suggestions are welcome) and various algorithms atop it like Heavy-Light Decomposition, Centroid Decomposition etc. The project involves implementing the above in accordance with the Boost.Intrusive codebase and building APIs on top of it, solving several different practical use-cases efficiently and writing rigorous test suites for them.

Programming Competency Test:

Implement a small C++ library (a mini version of the proposed project) using C++ 11/14/17. The details of the library are as follows:

  • Implement a generic Segment Tree (where merging at each node should cover a general case and not specific to certain operations)
  • Use of expression templates & generic lambda is expected
  • Fit in a single header file.
  • Test programs for ensuring accuracy and performance benchmark tests against various type of solutions (brute-force, recursive, iterative)

6. Modernize Boost.Python

Potential Mentor: Stefan Seefeld

Background

the Boost.Python project is one of the oldest Boost components. Since its initial development there have been important additions to C++ as well as Python. That, unfortunately, isn't reflected in Boost.Python, which hasn't received as much attention in recent years as it should. Aa central goal of a GSoC Boost.Python project might thus be a modernization of the Boost.Python codebase, including additional support for newer language (and standard library) features, as well as work on open issues.

Programming Competency Test:

Write code to add C++ bindings to a previously unsupported Python built-in type (e.g., set), including a simple test for it.

GSoC Projects which have been prearranged by students with mentors for 2018

1. Bringing Boost.StaticViews to review-ready state.

Potential mentors: Darren Garvey

Background

During last year's GSoC a new library was developed -- Boost.StaticViews. It is a small standalone library to ease the working with homogeneous constexpr data. It fills the niche between Boost.Hana (Hana works on hetero-geneous data) and the Ranges TS (Ranges focus on run-time).

The library consists of two layers: low-level views and high-level data data structures implemented on top of them. You use views if you want to, say, convert a bitmap from 8-bit to 24-bit. Previously, one would write a Python script, convert the data, and copy-paste the results into the code. Now, we can let the compiler do the job for us which greatly simplifies the maintenance. As for the data structures, StaticViews currently implements a compile-time non-owning hash table (i.e. static_map as described here).

Consider, for example, the recent active discussion of std::error_code on the Boost-dev mailing list. Creating your own error_category is all about mapping your custom error codes to some "common" ones (i.e. std::errc). When the number of error codes becomes large, using switch becomes unfeasable, and that's when static_map comes into play. For an example of usage of StaticViews in such a setting, see Niall Douglas' ntkernel-error-category.

GSoC project proposal

Although the last GSoC lay some good foundation, some work still needs to be done before StaticViews may be considered for inclusion into Boost. As such, at least the following needs to be done:

  • Write benchmarks that prove static_map to be faster than std::unordered_map.
  • Finish the documentation.
  • Add more tests.
  • Add a few more algorithms for working with views.

Potential project extensions

  • Currently, GCC is unable to compile some of StaticViews' code due to this bug. It would be nice to fix this as it would greatly improve the portability.
  • Seeing as Boost is in the process of transitioning to CMake, adding support for it to StaticViews seems like a good idea.
  • Adding more data structures like a static_multimap, static_set, static_multiset etc.

2. Boost.Real

Potential mentor: Damian Vicino

Background

In 1936, Alan Turing introduced computable real numbers in his foundational paper ‘On Computable Numbers, with an Application to the Entscheidungsproblem’ [1]. He defines: ‘The computable numbers may be described briefly as the real numbers whose expressions as a decimal are calculable by finite means’. Different authors provided several equivalent definitions, for example, an alternative equivalent definition is 'a number is computable if there exists an algorithm to produce each digit of its decimal expansion and for any digit requested it finishes the execution successfully' [2].

Around these ideas, a new area of applied mathematics was developed based on the theory of computable real numbers, called Computable Calculus [2]. In Computable Calculus, it is common to represent numbers using algorithms that can generate their digits and, define arithmetic functions that lazily operate them [2].

Using this approach is possible to calculate solutions for arithmetic problems that require accurate results. For example, in the construction of generic Discrete Event Simulation timelines [3].

PROJECT 1. Boost.Real basic operators

Following the algorithms defined in Chapter 2 of [2] implement the following operations between Reals, and Between Reals and Integers:

  • operator+
  • operator-
  • operator*
  • operator/
  • operator!=
  • operator<
  • operator=

Programming competency test

  • Write the Real type interface
  • Explain how to deal with real problems like identity equality, pi==pi2.
  • Explain how could you provide a "Small String Optimization"-like approach for Real numbers representation at compile time.

References

[1] Alan Mathison Turing. On computable numbers, with an application to the Entscheidungsproblem. J. of Math, 58(345-363):5, 1936.

[2] Oliver Aberth. Computable Calculus. Academic Press, 2001.

[3] Vicino, Damián, Olivier Dalle, and Gabriel Wainer. An advanced data type with irrational numbers to implement time in DEVS simulators. Proceedings of the Symposium on Theory of Modeling & Simulation. Society for Computer Simulation International, 2016.

Clone this wiki locally