Skip to content

Google Summer of Code: 2020

Cem Bassoy edited this page Jan 4, 2021 · 48 revisions
Clone this wiki locally

For a general overview of Google Summer of Code (GSoC) and Boost's related activities, please read GSoC Overview

This year's GSoC will be housed at https://github.com/BoostGSoC20 Students may find examining past GSoC source code and commit histories of use.

Suggested GSoC project proposals

To any potential mentors, please add your projects here

To any potential students, in addition to the list of project ideas, we encourage original ideas proposed by students. If you have got one, please post your idea to the mailing list and ask mentors for comments. See Finding the Right Project in the Google Summer of Code Guides.

Boost.uBlas: Matrix and Tensor Computations

Potential mentor: David Bellot, Cem Bassoy

Requirements

  • hands on experience with modern C++ template metaprogramming techniques
  • basic knowledge in matrix and tensor computations
  • strong mathematical background

Background

Boost.uBlas has been initially designed to support users with matrix computations for linear algebra applications. The design and implementation unify mathematical notation via operator overloading and efficient code generation using expression templates. Starting with the GSoC 2018 project, uBlas has been extended by a flexible tensor data type and basic tensor operations supporting general tensor contractions and the Einstein notation. The goal of the new tensor extension is to support the implementation of algorithms for e.g. machine learning and quantum computing applications. A very good introduction is given by Tamara Kolda on Youtube. GSoC students genaralized and extended the tensor typein GSoC 2019. Last year, GSoC students added Github Actions and Clang analyzers and sanitizers as part of GSoC 2020 .

We propose two projects from project list that will further increase Boost.uBlas capabilities:

Project 1: Finalize and improve the subtensor type

Potential mentor: Cem Bassoy, David Bellot, (Amit Singh)

Subtensors, i.e. views of tensors, are used in many numerical algorithms in which specified array sections are accessed. Subtensors, however, do not own but refer to the tensor data. An extensions for subtensors is already available in the subtensor feature branch such as subtensor.hpp. A simple example of tensor and subtensor creation and usage is for instance:

using tensor = boost::numeric::ublas::dynamic_tensor<float>;
using slice  = boost::numeric::ublas::sliced_span;
using end    = boost::numeric::ublas::span::end;

// Create tensor t of order 3 and dimensions 4,3 and 5
auto t   = tensor{4,3,5};

// Create subtensor ts where ts references the domain specified by the slices
// Subtensor ts has the same order 3 but the dimensions 3,2,1
// Tensor elements are not copied
auto ts  = t ( slice(1,3), slice(1,end), end-1 ); 

// Create a copy tss of subtensor ts
// Both instances have the same order and dimensions
// Tensor elements are not copied
auto tss = ts;

The student shall improve the existing subtensor implementation and create during GSoC multiple pull requests with well tested code. Possible improvements could be the usage of a separate tag such as ublas::subtensor_tag in order to simplify the overall design and usage. The student should also experiment with the auxiliary types and functions to enhance maintainability. This is an important and a challenging project. We expect the student to be motivated and to have a strong background in mathematics and algorithm design using generic programming techniques with C++17.

Project 2: Create new matrix and vector types

Potential mentor: David Bellot, Cem Bassoy, (Amit Singh)

uBlas has been written many years ago using the C++03 standard using older template metaprogramming techniques. With the new uBlas tensor extension, we want to simplify the existing ublas::matrix and ublas::vector templates by using the tensor extension as a base implementation.

We expect the student to generate experimental ublas::experimental::matrix and ublas::experimental::vector templates by specializing the ublas::tensor_core declared in tensor_core.hpp and implement the existing matrix and vector operations using the C++17 standard, see operation overview. Depending on the progress of the first project, we want the student to also implement submatrix and subvector types and provide a simple qr-decomposition. We expect the student to be motivated and to have a strong background in mathematics and algorithm design using generic programming techniques with C++17.

Project 3: Finalize Dataframe Implementation

Potential mentor: David Bellot

In the field of data science, R, Julia and Python are presumably the most used languages to perform data analysis. They all come with many desirable features. Last year, a sucessful GSoC'19 student implemented data frames for uBlas. We want to take his project to another level and:

  1. finalize and productionize his implementation of data.frames for an immediate release in the next version of Boost
  2. add a collection of free form functions, and modern data analysis procedures (as found in the R tidyverse, in Python pandas/numpy, and in many other packages) based on the use of vector and matrices The goal of this project is to lay the foundation of uBlas as a new data science library and to see production-ready code being published by the end of the summer of code.

The student will have to study the code base of uBlas first and build on top of the existing API. We are in favor of promoting a functional programming style in order to get all the benefits of laze evaluation to optimize the code as much as possible. Documentation must be written too for the project to be succesfull and production-ready by the end of the summer.

Competency test

  • Fork uBlas and use the develop branch for your implementation
  • Create a matrix folder in
    • include/boost/numeric/ublas for your implementation
    • test for the unit-tests
    • examples for the QR decomposition example
  • Provide a new matrix and vector C++17 implementation with a functionality similar to:
    • A = zeros(3,2);
    • C = A+B, C = 2*A, etc. for elementwise matrix operations
    • C = A*B for matrix multiplication
    • C = A' for matrix transposition
    • A==B for elementwise comparision
    • c = A[3] for accessing elements with a single zero-based index
    • c = A(3,2) or A(3,5) = c for accessing elements with a two zero-based indices
    • C = A(1:2,1:3) to generate a matrix instance that contains data of A referenced by the ranges 1:2 and 1:3
  • Use the C++ standard library for your matrix implementation wherever possible (e.g. use std::tuple and std::tie)
  • Implement auxiliary types such as range/span and helper functions such as zeros, size, norm to offer maximum readibility
  • Modify the README.md for Github Actions usage
  • Implement a C++ qr function QR-decomposing a matrix with a minimal number of code lines based on the following Matlab code
  • Implement a C++ main function showing withA == Q*R that your qr decomposition works correctly
  • Once you are finished, place the link of your repository inside your GSoC proposal. DO NOT POST/SEND your repo link
function [Q,R] = qr(A)
  [m, n] = size(A);
  Q = zeros(m,n);
  R = zeros(n,n);
  for k = 1:n
    R(1:k-1,k) = Q(:,1:k-1)' * A(:,k);
    v = A(:,k) - Q(:,1:k-1) * R(1:k-1,k);
    R(k,k) = norm(v);
    Q(:,k) = v / R(k,k);
  end
end

Boost.Multiprecision: Multiprecision Big Number Types

Potential mentor: Christopher Kormanyos

Project requires knowledge of modern C++ template metaprogramming techniques.

Background

Boost.Multiprecision http://www.boost.org/libs/multiprecision is a mathematical library that offers high-performance multiple precision integer, rational and floating-point types with precision vastly exceeding those of built-in float, double and long double.

We are looking for a student to assist with extending and optimizing Boost.Multiprecision to higher precision of many thousands of bits or more.

This is a challenging project with relatively large visibility. It will strengthen mathematical and algorithmic programming skills and is suited for students whose studies and interests include modern C++, algorithms, mathematical programming, and advanced template and generic programming. High-precision is essential in numerous areas, for instance to identify and correct numerical instability. It is also an important tool used extensively in climate and oceanic simulations, transportation, navigation, signal and image processing and in AI research.

PROJECT 1.

Extend and optimize Boost.Multiprecision to higher precision of thousands of bits or more.

Tasks include:

  • We will begin by getting the test system ready for higher precision.
  • Adapt Boost.Multiprecision Arithmetic Tests to higher precision of thousands of bits.
  • Adapt Boost.Math.Constants to higher precision of 1,000 to 10,000 decimal digits.
  • Adapt some elementary transcendental function tests to higher precision.
  • Test an existing Karatsuba multiplication prototype in this environment.
  • Optionally optimize Karatsuba multiplication.

If time permits:

  • Extend special function calculations for high-precision.
  • Implement successive iterative and AGM methods.
  • Begin with elementary functions such as log, exp and progress to selected special functions.
  • Optionally implement and test the PSQL algorithm to search for mathematical constants.

Programming competency test

Using latest Boost.Multiprecision, calculate and print the real-valued, positive integral square root of an unsigned integer having 1024 bits or more. Print the result to std::cout in both decimal as well as hexadecimal representation.

Use the existing Boost.Multiprecision cpp_bin_float type to perform an extended-precision (say 50 decimal digits of precision) floating-point calculation of any function you like. This could be a linear combination of elementary transcendental functions, a numerical derivative, a numerical integration or similar. Show your mastery of looping and handling iterative calculations. In this sense, it would be good to select a computation that has enough depth to require an iterative loop such as a for-loop. A Taylor series approximation or another kind of asymptotic approximation of your choice will be a good choice.

Boost.GIL: Generic Image Library

Potential mentor: Mateusz Loskot, Pranam Lashkari

All projects with Boost.GIL require knowledge of C++11 and at least a basic understanding of image processing algorithms.

PROJECT 1: Image Processing Algorithms

This proposal is continuation of Image Processing Algorithms project submitted and developed by Miral Shah during GSoC 2019.

Boost.GIL provides core features for images and, thanks to Miral Shah's work, a collection of basic image processing algorithms. However, there still is more algorithms to be added to the collection.

Check GIL's wiki page Image Processing Algorithms with table presenting the algorithms implementation status and the wish-list.

1. Mathematical Morphology

  • Dilation
  • Erosion
  • Blurring / Smoothing
  • Sharpening

2. Normalizing Channels

3. De-Noising

  • Wiener filter
  • Average filter
  • Median filter

4. Kernels and Convolutions

  • The GSoC 2019 project implemented lots of improvements and new features in the kernels and convolutions, but this area may still benefit from further development, optimisations and documenting.

5. Your favourite image processing algorithms

6. Image Processing Documentation

  • We need a beautifully written and presented documentation of the image processing features in GIL
  • For example, https://github.com/boostorg/gil/issues/396, describes an idea of the docs structure based on the "Principles of Digital Image Processing" book

PROJECT 2: Histogram

Histogram is one of essential tools in the image processing techniques. Although it belongs to the image processing algorithms, we propose this as a separate self-contained project.

Boost.GIL presents how to compute histogram in the documentation, Tutorial: Histogram, as well some existing algorithms already compute histogram as in case of the Otsu's thresholding, but there is no functionality in GIL available via public interfaces for histogram computation and operations.

This project proposal is about making histogram a first-class feature in GIL.

List of suggested topics [1]:

  • Computing histograms
    • 1D histogram of single image (8-bit grayscale image)
    • Histograms of images with more than 8 bits (Binning technique)
    • Histogram of color images
      • Intensity histogram (Luminance)
      • Individual color channel histogram
      • Combined color histogram
      • 1D histogram of individual components of any color space of single image (HSV, RGB, XYZ)
      • 2D histogram [2] projections [3] of single image (e.g. H and S of HSV, R and B of RGB, etc.)
      • 3D histogram of single image [4]
    • Cumulative histogram
  • Point operations - histogram transformations or histogram-based image enhancements
    • Histogram Normalization
    • Histogram Equalization
    • Histogram Specification - adjusting an image to a given reference histogram
  • Segmentation
    • Thresholding - use of new histogram computation in the GIL thresholding algorithms
    • Extras - optional ideas for when the histogram core features are implemented
      • Histogram Backprojection [5] - Where in the image are the colors that belong to the object being looked for? [6]
      • 2D histogram scheme for colour image segmentation [7]
  • Integration with Boost.Histogram
  • Histogram visualization - for testing and debugging purposes
    • If there is support for Boost.Histogram, we can rely on its ASCII 1D plots
    • GIL extension for basic plotting of histogram SVG (e.g. see SVG in Boost.Geometry)
    • GIL extension based on a third-party plotting library (e.g. minimal integration with https://plot.ly API)

References:

Programming competency test

Using latest Boost.GIL, implement a simple convolution filter and show its use in a test application to detect edges in a grayscale input image.

Alternatively, choose a simple algorithm from any topic of image processing, preferably related to your interests regarding the GSoC project, and propose its implementation using latest Boost.GIL.

Alternatively, bonus project that allows you to show your C++ skills beyond usage of Boost.GIL: Using C++11 type traits and Boost.MP11, define promote metafunction that tries to promote (find) fundamental integral type T (e.g. char, short, etc.) to another integral type with size roughly twice the bit size of T. For example, for using P = promote<char>::type should assert the following static_assert(std::is_same<P, short>::value) or possibly static_assert(std::is_same<P, int>::value).

If you get stuck, don't hesitate to submit even partial solution.

Boost.Real

Potential mentor: Damian Vicino, Laouen Belloli

Background

In 1936, Alan Turing introduced computable real numbers in his foundational paper ‘On Computable Numbers, with an Application to the Entscheidungsproblem’ [1]. He defines: ‘The computable numbers may be described briefly as the real numbers whose expressions as a decimal are calculable by finite means’. Different authors provided several equivalent definitions, for example, an alternative equivalent definition is 'a number is computable if there exists an algorithm to produce each digit of its decimal expansion and for any digit requested it finishes the execution successfully' [2].

Around these ideas, a new area of applied mathematics was developed based on the theory of computable real numbers, called Computable Calculus [2]. In Computable Calculus, it is common to represent numbers using algorithms that can generate their digits and, define arithmetic functions that lazily operate them [2].

Using this approach is possible to calculate solutions for arithmetic problems that require accurate results. For example, in the construction of generic Discrete Event Simulation timelines [3].

PROJECT 1. Bringing Boost.Real to review-ready state.

During GSoC 2018 and 2019, Real was partially implemented exploring different designs. Now, with a stable design, we want to focus in finishing up the details to start the review process. Some topics being discussed are the integration with Boost.Math, implementation of good examples of usage, improvements to the division and multiplication algorithms.

Current status is described here:

Programming competency test

  • Describe an optimal representation for Real number "digits".
  • Explain how could you provide a "Small String Optimization"-like approach for Real numbers representation at compile time.

References

[1] Alan Mathison Turing. On computable numbers, with an application to the Entscheidungsproblem. J. of Math, 58(345-363):5, 1936.

[2] Oliver Aberth. Computable Calculus. Academic Press, 2001.

[3] Vicino, Damián, Olivier Dalle, and Gabriel Wainer. An advanced data type with irrational numbers to implement time in DEVS simulators. Proceedings of the Symposium on Theory of Modeling & Simulation. Society for Computer Simulation International, 2016.

Boost.Astronomy

Potential mentor: Pranam Lashkari, Sarthak Singhal

Background

This project was first introduced in GSoC 2018 and then continued in 2019. This library is not yet released with boost and still in the development phase. All the project listed below targets to bring this library to a review ready state. This library tries to mitigate the problems with the astronomical coordinate system and FITS file which is the most commonly used file type to store the astronomical data.

Project 1: Coordinate transformation/conversion

Currently, the skeleton for storing data of the different coordinate system is developed which involves ICRS, CIRS, Galactic, Supergalactic, Heliocentric, Geocentric and Alt-Az. This project involves the development of a generic system to convert these coordinate systems from one to another.

Programming Competency Test

Develope a class for affine transformation which should take an affine matrix(3x3) as a parameter. Using this class object we must be able to transform cartesian_representation vector. You can use any boost library if required. (Plus points if all representations can be transformed)

Note: A good fit for this project will be a student who has prior knowledge of astronomy, geometry and algebra apart from programming in C++

Project 2: FITS file handling

This project involves developing user APIs. Basic parser[1] to read FITS[2] file has been already developed but there are no hight level APIs available which can make this FITS module easy for the user. (i.e: By calling a single function user should be able to read the file. Currently, the parser is divided into several parts and each part reads a particular type of data in file)

Programming Competency Test

Develope a small class/function which will be able to read the primary header[3] of the FITS file. Also, provide a way to extract the desired header value from the primary header. You can use any boost library if required. You can find sample FITS file here (Plus points for developing ways to enter the header-value in the primary header and writing into the file)

This project do not require any knowledge of astronomy or math but a good knowledge of file storage and file handling with C++

[1] https://github.com/BoostGSoC19/astronomy/tree/develop/include/boost/astronomy/io

[2] http://archive.stsci.edu/fits/users_guide/

[3] http://archive.stsci.edu/fits/users_guide/node19.html#SECTION00511000000000000000

Boost.Multiprecision: New Big-Double Backend Types

Potential mentor: Christopher Kormanyos

Project requires knowledge of modern C++ template metaprogramming techniques.

Background

Boost.Multiprecision http://www.boost.org/libs/multiprecision is a mathematical library that offers high-performance multiple precision integer, rational and floating-point types with precision vastly exceeding those of built-in float, double and long double.

We are looking for a student to assist with a feasibility study for realizing big-double backend types. There are several existing interesting backend options, including the so-called double-double and quad-double, libbf, or extending libquadmath to a portable form. This GSoC project will include implementing, testing, banchmarking and checking compatibility with Boost.Math.

This is a challenging project with relatively large visibility. It will strengthen mathematical and algorithmic programming skills and is suited for students whose studies and interests include modern C++, high-performance mathematical programming, and advanced template and generic programming. The implementations of double-double and quad-double use existing hardware floating-point processor operations. Compared with traditional extended precision methods such as those of GMP, this can potentially offer improved performance in low-to-medium digit ranges of less than 100 decimal digits. Other potential backends such as libbf offer other advantages in particular domains. The resulting work can be of essential use for high-performance calculations such as many-body simulations, lattice calculations and other state-of-the-art areas branching into physics, climatology, communication, transportation, and others.

PROJECT 2.

Implement a feasible realization of double-double and quad-double or libbf. This project is based on existing, well-researched, modern user requests such as this one for quad-double. and this one for libbf.

Tasks include:

  • Wrap an existing implementation, such as QD or libbf.
  • Test the performance of this new type compared to Boost's existing wrapped version of MPFR.
  • Get a test system ready for strongly exercising these types in the relevant precision range.
  • Make specific math tests for this type and verify numerical correctness and proper C++ behavior.

If time permits:

  • Extend double-double and quad-double to a generic multiple-quad-double of even higher precision.

Programming competency test

See PROJECT 1 above.

Boost.Geometry: Generic Geometry Library

Potential mentors: Vissarion Fysikopoulos, Adam Wulkiewicz

All projects requires knowledge of C++ template metaprogramming techniques.

Background

Boost.Geometry part of collection of the Boost C++ Libraries, defines concepts, primitives and algorithms for solving geometry problems.

See http://www.boost.org/libs/geometry

PROJECT 1.

Implement algorithms for the concave hull problem (e.g. from [1][2] etc.). This is the POSTGIS related function https://postgis.net/docs/manual-2.5/ST_ConcaveHull.html See also: http://boost-geometry.203548.n3.nabble.com/concave-hull-td4026717.html

PROJECT 2.

Implement support for non-cartesian geometries in convex_hull() either by adapting existing algorithm and strategies (preferable) or implementing it as different algorithm. See: https://en.wikipedia.org/wiki/Convex_hull_algorithms

Programming competency test

As mentioned above you may choose to either provide links to existing library you developed or take the competency test. In case of the latter the requirements are listed below.

PROJECT 1, 2. Implement and test “Gift wrapping” convex hull algorithm for types adapted to Boost.Geometry MultiPoint concept.

References

[1] Moreira, Adriano J. C. and Maribel Yasmina Santos. “Concave hull: A k-nearest neighbours approach for the computation of the region occupied by a set of points.” GRAPP (2007).

[2] https://www.iis.sinica.edu.tw/page/jise/2012/201205_10.pdf