Skip to content

Google Summer of Code: 2021

Mateusz Łoskot edited this page Apr 6, 2021 · 15 revisions

For a general overview of Google Summer of Code (GSoC) and Boost's related activities, please read GSoC Overview

This year's GSoC will be housed at (TBD: link for 2021, when available) https://github.com/BoostGSoC20. Students may find examining past GSoC source code and commit histories of use.

Suggested GSoC project proposals

To any potential mentors, please add your projects here

To any potential students, in addition to the list of project ideas, we encourage original ideas proposed by students. If you have got one, please post your idea to the mailing list and ask mentors for comments. See Finding the Right Project in the Google Summer of Code Guides.

Boost.uBlas: Matrix and Tensor Computations

Potential mentor: David Bellot, Cem Bassoy

Requirements

  • hands on experience with modern C++ template metaprogramming techniques
  • basic knowledge in matrix and tensor computations
  • strong mathematical background

Background

Boost.uBlas has been initially designed to support users with matrix computations for linear algebra applications. The design and implementation unify mathematical notation via operator overloading and efficient code generation using expression templates. Starting with the GSoC 2018 project, uBlas has been extended by a flexible tensor data type and basic tensor operations supporting general tensor contractions and the Einstein notation. The goal of the new tensor extension is to support the implementation of algorithms for e.g. machine learning and quantum computing applications. A very good introduction is given by Tamara Kolda on Youtube. GSoC students genaralized and extended the tensor typein GSoC 2019. Last year, GSoC students added Github Actions and Clang analyzers and sanitizers as part of GSoC 2020 .

We propose two projects from project list that will further increase Boost.uBlas capabilities:

Project 1: Finalize and improve the subtensor type

Potential mentor: Cem Bassoy, David Bellot, (Amit Singh)

Subtensors, i.e. views of tensors, are used in many numerical algorithms in which specified array sections are accessed. Subtensors, however, do not own but refer to the tensor data. An extensions for subtensors is already available in the subtensor feature branch such as subtensor.hpp. A simple example of tensor and subtensor creation and usage is for instance:

using tensor = boost::numeric::ublas::dynamic_tensor<float>;
using slice  = boost::numeric::ublas::sliced_span;
using end    = boost::numeric::ublas::span::end;

// Create tensor t of order 3 and dimensions 4,3 and 5
auto t   = tensor{4,3,5};

// Create subtensor ts where ts references the domain specified by the slices
// Subtensor ts has the same order 3 but the dimensions 3,2,1
// Tensor elements are not copied
auto ts  = t ( slice(1,3), slice(1,end), end-1 ); 

// Create a copy tss of subtensor ts
// Both instances have the same order and dimensions
// Tensor elements are not copied
auto tss = ts;

The student shall improve the existing subtensor implementation and create during GSoC multiple pull requests with well tested code. Possible improvements could be the usage of a separate tag such as ublas::subtensor_tag in order to simplify the overall design and usage. The student should also experiment with the auxiliary types and functions to enhance maintainability. This is an important and a challenging project. We expect the student to be motivated and to have a strong background in mathematics and algorithm design using generic programming techniques with C++17.

Project 2: Create new matrix and vector types

Potential mentor: David Bellot, Cem Bassoy, (Amit Singh)

uBlas has been written many years ago using the C++03 standard using older template metaprogramming techniques. With the new uBlas tensor extension, we want to simplify the existing ublas::matrix and ublas::vector templates by using the tensor extension as a base implementation.

We expect the student to generate experimental ublas::experimental::matrix and ublas::experimental::vector templates by specializing the ublas::tensor_core declared in tensor_core.hpp and implement the existing matrix and vector operations using the C++17 standard, see operation overview. Depending on the progress of the first project, we want the student to also implement submatrix and subvector types and provide a simple qr-decomposition. We expect the student to be motivated and to have a strong background in mathematics and algorithm design using generic programming techniques with C++17.

Project 3: Finalize Dataframe Implementation

Potential mentor: David Bellot

In the field of data science, R, Julia and Python are presumably the most used languages to perform data analysis. They all come with many desirable features. Last year, a sucessful GSoC'19 student implemented data frames for uBlas. We want to take his project to another level and:

  1. finalize and productionize his implementation of data.frames for an immediate release in the next version of Boost
  2. add a collection of free form functions, and modern data analysis procedures (as found in the R tidyverse, in Python pandas/numpy, and in many other packages) based on the use of vector and matrices The goal of this project is to lay the foundation of uBlas as a new data science library and to see production-ready code being published by the end of the summer of code.

The student will have to study the code base of uBlas first and build on top of the existing API. We are in favor of promoting a functional programming style in order to get all the benefits of laze evaluation to optimize the code as much as possible. Documentation must be written too for the project to be succesfull and production-ready by the end of the summer.

Competency test

  • Fork uBlas and use the develop branch for your implementation
  • Create a matrix folder in
    • include/boost/numeric/ublas for your implementation
    • test for the unit-tests
    • examples for the QR decomposition example
  • Provide a new matrix and vector C++17 implementation with a functionality similar to:
    • A = zeros(3,2);
    • C = A+B, C = 2*A, etc. for elementwise matrix operations
    • C = A*B for matrix multiplication
    • C = A' for matrix transposition
    • A==B for elementwise comparision
    • c = A[3] for accessing elements with a single zero-based index
    • c = A(3,2) or A(3,5) = c for accessing elements with a two zero-based indices
    • C = A(1:2,1:3) to generate a matrix instance that contains data of A referenced by the ranges 1:2 and 1:3
  • Use the C++ standard library for your matrix implementation wherever possible (e.g. use std::tuple and std::tie)
  • Implement auxiliary types such as range/span and helper functions such as zeros, size, norm to offer maximum readibility
  • Modify the README.md for Github Actions usage
  • Implement a C++ qr function QR-decomposing a matrix with a minimal number of code lines based on the following Matlab code
  • Implement a C++ main function showing withA == Q*R that your qr decomposition works correctly
  • Once you are finished, place the link of your repository inside your GSoC proposal. DO NOT POST/SEND your repo link
function [Q,R] = qr(A)
  [m, n] = size(A);
  Q = zeros(m,n);
  R = zeros(n,n);
  for k = 1:n
    R(1:k-1,k) = Q(:,1:k-1)' * A(:,k);
    v = A(:,k) - Q(:,1:k-1) * R(1:k-1,k);
    R(k,k) = norm(v);
    Q(:,k) = v / R(k,k);
  end
end

Boost.Multiprecision: New Quad-Double Backend Type

Potential mentor: Christopher Kormanyos

Project requires knowledge of modern C++ template metaprogramming techniques.

Background

Boost.Multiprecision http://www.boost.org/libs/multiprecision is a mathematical library that offers high-performance multiple precision integer, rational and floating-point types with precision vastly exceeding those of built-in float, double and long double.

We are looking for a student to assist with implementing and testing a multiprecision backend which is intended to wrap the so-called double-double library. This GSoC project will include implementing, testing, banchmarking and checking compatibility with Boost.Math.

This challenging project with large visibility will strengthen mathematical and algorithmic programming skills. It is best suited for students whose studies and interests include modern C++, high-performance mathematical programming, and advanced template and generic programming. The implementations of double-double and quad-double use existing hardware floating-point processor operations. Compared with traditional extended precision methods such as those of GMP, this can potentially offer improved performance in low-to-medium digit ranges of less than 100 decimal digits. Multiple precision backends such as libbf offer other modern and tangible advantages in particular fascinating domains of research and development. The resulting work can be of essential use for high-performance calculations such as many-body simulations, lattice calculations and other state-of-the-art areas branching into physics, climatology, communication, transportation, and others.

PROJECT 1.

Implement a feasible realization of quad-double. This project is based on an existing, well-researched, modern community request for quad-double.

Tasks include:

  • Wrap an existing implementation of QD.
  • Test the performance of this new type compared to some of Boost's existing floating-point backends.
  • Get a test system ready for strongly exercising these types in the relevant precision range.
  • Make specific math tests for this type and verify numerical correctness and proper C++ behavior.

If time permits:

  • Extend quad-double down to double-double or up (in precision) to a generic multiple-quad-double of even higher precision.
  • Extract the algorithms of double-double and quad-double from the docs of the libs and bring these to portable header-only Boost.Multiprecision.

Programming competency test

Use the latest Boost.Multiprecision. Select a type such as cpp_bin_float or cpp_dec_float and perform any non-trivial extended-precision floating point calculation you like with, say, 50 or 100 decimal digits of precision. This could be any kind of well-known, exemplary linear combination of elementary transcendental functions, a series expansion, a numerical derivative, a numerical integration or similar. Use your imagination to create and describe your pretty example. Show your mastery of looping and handling iterative calculations. In this sense, it would be good to select a computation that has enough depth to require an iterative loop such as a for-loop.

Boost.GIL: Generic Image Library

Potential mentor: Pranam Lashkari

All projects with Boost.GIL require knowledge of C++11 and at least a basic understanding of image processing algorithms.

This proposal is a continuation of Image Processing Algorithms project submitted and developed by Miral Shah during GSoC 2019.

Boost.GIL provides core features for images and, thanks to Miral Shah's work, a collection of basic image processing algorithms. However, there still is more algorithms and basic structure to be added to the collection. As many of the algorithms require convolution or correlation to be used in the base, these become necessary algorithms to have.

The GSoC 2019 project implemented lots of improvements and new features in the kernels and convolutions, but this area may still benefit from further development, optimisations and documenting.

This project has 2 main goals:

  1. Align the implementation of the 2D convolution/correlation with the original GIL 1D convolution/correlation implementation and provide consistent API

  2. Optimise implementation of the 2D convolution (Ideally when you align 2D implementation with 1D it should automatically be optimised)

PROJECT 2: Image Processing Algorithms

Check GIL's wiki page Image Processing Algorithms with table presenting the algorithms implementation status and the wish-list.

You can take on any algorithms you like from the list or suggest new algorithms which should be implemented and included in the GIL. (Note: we do not expect to implement all the algorithms during GSoC, the table is just for reference)

Programming competency test

Choose a simple algorithm from any topic of image processing, preferably related to your interests regarding the GSoC project, and propose its implementation using the latest Boost.GIL.

If you get stuck, don't hesitate to submit even a partial solution.

Boost.Python: Write an asyncio event loop implementation using Boost.Asio

Potential mentor: Vinícius dos Santos Oliveira

Background

Python 3 comes with its own event loop abstractions. Developers that wish to use Python 3 async IO ecosystem within a Boost.Asio-based applications can benefit from such implementation.

From Python documentation:

Project

Programming competency test

The competency test for this project is a two-part project.

  • Read Kohlhoff's Executors and Asynchronous Operations paper.
  • Implement the MyEventLoop interface defined below. A Boost.Asio strand must be created such that every callback is scheduled through this strand even thou MyEventLoop consumers have no knowledge of this strand. Send an email to the Boost mailing list or to the potential mentors listed for this project if you have questions.
class MyEventLoop
{
public:
    virtual ~MyEventLoop();

    virtual void call_soon(std::function<void(MyEventLoop&)> f) = 0;
    virtual void call_when_fd_ready(
        int fd, std::function<void(MyEventLoop&)> f) = 0;
};

The second part of this project consists of using Boost.Python to call the function hello() from some Python file when the signal SIGUSR1 is received by the process. The signal should be watched using Boost.Asio async interfaces.

Boost.XML

Potential mentor: Vinícius dos Santos Oliveira

Write a new library from the ground-up to parse and generate XML stream. There is no need for high-level features in this project. Only parse and generating low-level events should be good enough.

Background

XML became commonplace enought to dismiss any need for introductions. This section instead focuses on the project goals:

  • Boost lacks a good library for XML handling.
  • By the end of the project, a basic XML facility in the likes of https://www.copperspice.com/docs/cs_api/xml-streaming.html should be available. Classes to read and write XML streams. There is no need for DOM or any high-level components in this project.

Programming competency test

Write a pull parser for CSV files. It's not very useful, but it's a simple task that shouldn't take much of your time. You'll be judged by the decisions to store the parser state and how flexible it is to wrap this parser in other scenarios.

Boost.Lua: zero-overhead Boost binding

Potential mentor: Vinícius dos Santos Oliveira

There are plenty of Lua wrappers for C++ out there. All of them will add some overhead over plain Lua API calls. Furthermore none of them will reach a point where API stabilization makes sense making them unsuitable for Boost. We can however focus on making the Lua API safer to use.

The way to achieve that is to capture not a "wrapper object" but the program structure that manipulates the Lua VM as whole and then perform the Lua calls for this program. This can be done using Boost.Hana. Describe the structure using Boost.Hana data structures and a metaprogram to translate that into Lua API calls. For instance:

int my_lua_callback(lua_State *L)
{
    return boost::lua::program(
        L,
        boost::lua::return_, 1234
    );
}

That would translate into:

int my_lua_callback(lua_State *L)
{
    lua_pushinteger(L, 1234);
    return 1;
}

This AST-like layout allows to perform some of the traditional compiler optimizations and also to ensure safety regarding Lua specifics (e.g. lua_checkstack()).

There is no way the whole project could be finished in just one GSoC. Therefore this first project has humble goals: a hello world with the basic infrastructure. It'll also be a research project so be ready to explore and experiment.

Programming competency test

You should implement some basic metaprogram (you choose the algorithm) to prove you're very familiar with Boost.Hana. Be prepared to make adjustments as we judge your solution.

Boost.Math: FFT Utilities

Potential mentor: Christopher Kormanyos

Project requires knowledge of modern C++ template metaprogramming techniques.

Background

Boost.Math http://www.boost.org/libs/math is a large mathematical library that offers high-performance functions of pure and applied mathematics such as floating-point utilities, special functions, statistical functions, iterative equation-solving tools, integration routines and more.

Boost.Math has needed FFT utilities for a while now. Accordingly, we are looking for a student to assist with implementing and testing a FFT utilities. This GSoC project will include implementing, testing, banchmarking and checking compatibility with Boost.Math.

This challenging project will strengthen mathematical and algorithmic programming skills in template-based C++. It is best suited for students whose studies have included numerical transformations and, in particular, for those who have previously dabbled in programming them. Abilities in modern C++, high-performance mathematical programming, and advanced template and generic programming are mandatory. FFTs are enormously useful for transformations, signal and image processing, convolutions, modelling and numerous other fascinating domains of research and development.

PROJECT 1.

Implement a feasible realization of FFT. This project is based on an existing, modern community request for FFT here with additional further discussion.

Tasks include:

  • Define the FFT template interface.
  • Start with a simple first step such as one-dimensional real-to-half-complex.
  • Get a test system ready for this first step.

If time permits:

  • Extend FFT to two dimensions.
  • Consider threading aspects and potential multi-threading distribution of computational complexity.

Programming competency test

Show familiarity with both C++ template programming as well as programming numerical transformations such as FFT. Select a transformation with, say, one dimension. Use your imagination to create a naive one-way floating-point FFT, Laplace or similar transformation working for built-in types such as float and double. Show your mastery of this kind of programming.

Clone this wiki locally