-
Notifications
You must be signed in to change notification settings - Fork 45
Google Summer of Code: 2018
For a general overview of Google Summer of Code (GSoC) and Boost's related activities, please read GSoC Overview
This year's GSoC will be housed at https://github.com/BoostGSoC18 Students may find examining past GSoC source code and commit histories of use.
To any potential mentors, please add your projects here
Potential mentor: David Bellot, Mohd Sharique
All projects with Boost.uBLAS requires knowledge of C++11/14/17.
uBLAS is a library for linear algebra and matrix computations. Using recursive templates, it allows the compiler to optimize any complex linear algebra expressions as if it were written by hand by the programmer. Basic classes are matrix and vector. The library has all the basic functionalities and a few standard algorithms. We would like to improve the functionality of this library by adding new algorithms and functionality especially in the field of data analysis and machine learning.
Boost.uBLAS has many basic functions for linear algebra but it lacks common algorithms in statistics and machine learning. The goal of this project is to lay out the fundation of uBLAS to support the programming of Machine Learning algorithms. More specifically, we are interested in implementing some fundamental components like:
- basic stats functions: mean, median, standard deviation, variance, covariance, correlation
- many sort ofdistributions like histograms, etc...
- PCA, ICA and LDA factorizations
- running stats: moving average, moving variance and anything else
- K-means clustering algorithms
- Gaussian Mixture Models algorithms
- anything else the student can propose and have time to implement during the summer
A knowledge of basic statistics and Machine Learning algorithm is required for this project, on top of excellent C++ skils of course
Boost.uBLAS can deal with vectors and matrices but not tensors. Not yet. We want this feature. It's a hard project, highly experimental but it is a welcome feature in such a library. This will open the doors for Boost.uBLAS to be used in applications like neural networks and deep learning. It's a fundamental feature in those domains.
The student needs to have good math skills on top of excellent C++ skills for this project.
The project description is simple: add support of multicore parallel and GPU computations to uBlas ! The realization is not straightforward though. Boost.uBlas is CPU only. If the compilers is able to vectorize, uBlas can take benefit of it. Here we want to extend Boost to the support of parallel architecture and GPU computations to enable it to do big data or deep learning computations.
The student will have to first understand how ublas works and how it generates and optimizes code with the expression template mechanism and then start adding options to enable the use of Boost.Compute. Test will be done on multicore systems and graphics card or computers which support Boost.Compute (through OpenCL for example).
We expect to see the basic matrix operations to be implemented like this. The code will have to be thoroughly documented and a tutorial document provided. We prefer quality of the implementation to exhaustivity.
For exceptionally good and fast students, extensions to support other library will be considered, like nVidia CUDA for example.
In other words it will have to be clean and super fast !
Implement a small library in C++ using expression templates and modern C++11/14/17 features like generic lambdas to:
- represent a matrix of numerical types (int, long, float, double, complex,....)
- compute algebraic expressions including + and *, += and *=. Matrix multiplication can be implemented with the most simple algorithm
- fit in one header file
- provides a test program which can measure its speed of execution for each examples provided
Potential mentors: Vissarion Fysikopoulos, Adam Wulkiewicz
All projects requires knowledge of C++ template metaprogramming techniques.
Boost.Geometry part of collection of the Boost C++ Libraries, defines concepts, primitives and algorithms for solving geometry problems.
See http://www.boost.org/libs/geometry
Implement algorithms for computing similarity between geometries. We start with linestrings and extend to rings or polygons depending on the student's performance. Standard measures are the Hausdorff distance, the Frechet distance and its variations---most notable the discrete Frechet distance.
The student will start by studying existing algorithms and implementations, implement a prototype with tests and benchmarks and finally integrate into Boost.Geometry.
Compute accurately the distance between two nearly antipodal points on the ellipsoid of revolution. Currently Boost.Geometry does not handle this case correctly. A solution is proposed in [1] as a solution of a so called astroid problem. It is also implemented in C++ in [2].
- [1] Karney - Algorithms for geodesics (https://arxiv.org/pdf/1109.4448.pdf)
- [2] http://geographiclib.sourceforge.net
The goal is to implement serialization of Boost.Geometry R-tree with Boost.Serialization library. It should be done in a form of optional extension, i.e. Boost.Geometry shouldn't depend on Boost.Serialization by default. The example result could look like this:
#include <boost/archive/binary_oarchive.hpp>
#include <boost/geometry.hpp>
#include <boost/geometry/index/rtree.hpp>
#include <boost/geometry/serialization/index/rtree.hpp>
int main() {
namespace bg = boost::geometry;
namespace bgi = bg::index;
typedef bg::model::point<double, 2, bg::cs::cartesian> point_type;
typedef bg::model::box<point_type> box_type;
bgi::rtree<box_type, bgi::rstar<16> > rtree;
// fill rtree with data
std::ofstream file("serialized_rtree.bin",
std::ios::binary | std::ios::trunc);
boost::archive::binary_oarchive archive(file);
archive << rtree;
}
The project will require the student to understand parts of Boost.Geometry, in particular primitives and the internals of R-tree and the interface of Boost.Serialization library.
As mentioned above you may choose to either provide links to existing library you developed or take the competency test. In case of the latter the requirements are listed below.
PROJECT 1. Implement an addition for Boost.Geometry that implements an algorithm that computes exactly the Frechet distance between two linestrings.
PROJECT 2. Implement a library that provides:
- a representation of points on Earth's surface
- a function to compute the shortest distance between two given points
- a function LinestringPointInterpolation(L,f) that takes as parameters a linestring L and a double f between 0 and 1 and returns a point p interpolated along L, where f is the fraction of the length of L the point p is located.
PROJECT 3 Implement an addition for Boost.Geometry that provides:
- serialization of Boost.Geometry models in arbitrary dimension:
- model::point, model::point_xy,
- model::box,
- model::segment.
- handling of errors related to saving and loading of incompatible types
In all cases the code should be as generic as possible and fit in one header file.
Also provide main.cpp file that demonstrates the use of your code, i.e. (1) compute the Frechet distance between several linestrings (2) usage of distance function and comparison with distance functions in Boost.Geometry.
Potential mentor: Damian Vicino
Float arithmetic operations in C++ are NOT guaranteed to yield a correct mathematical result. For instance, the overflow on addition operation may produce an infinite value when using float. SafeFloat proposes implementing a drop-in replacement for floating point numeric data types guaranteeing that, when used in expressions in a C++ program, no incorrect arithmetic results will be produced without developer awareness.
In addition to the arithmetic problems, some others can result in security concerns related to usage of floating point. In these regards, the CERT Division of Software Engineering Institute at Carnegie Mellon University publishes guidelines for Secure Programming when using FP datatypes [1].
A survey of arithmetic problems was written by David Goldberg in 1991 [2]. And the limitations of Intel architecture for implementing floating point are described in the IA-32 [3] and x87 [4] manuals.
Problems with floating-point can be categorised in:
- Domain/Pole/Range errors
- Approximation errors and narrowing on casting
- Unexpected input (e.g. reading a NaN from a stream when expecting a number)
- Ill defined (e.g. initialise or read as input “0.1”)
- Unexpected operation results (e.g. undetected denormalisation, rounding, underflow, overflow to infinity or NaN operands)
Since C++11 standard [5], a mechanism is provided for access to the floating point environment, giving the possibility of tackle the problem efficiently in a platform independent way. Using the floating-point environment, most problems can be detected by introducing multiple lines of code after each operation for checking flags. Requiring the programmer to write down all these checks in every place a floating point operation is executed is error-prone (but would be required for safe execution of the math).
In general, the problems mentioned lead to obscure and hard to detect bugs for the non-expert in numerical methods. However, some algorithms are designed taking in account approximations and errors to happen and could be silently discarded. Thus, our approach in safe-float is allowing the declaration of concerns to be considered for each declared variable (e.g. we can define safe_float<float, check_division_by_zero> which only checks for division by zero problems).
During GSoC 2015, safe-float was partially implemented exploring different designs. Now, with a stable design, we want to focus in finishing up the details to start the review process.
- Implement new policies for cast and error handling of the data type.
- Implement SafeFloat literals for IEEE754 standard compliance.
- Extend the test suite.
- Define Concepts were required.
- Check correct exception guarantees are implemented.
- Extend current documentation [6].
- Define a user defined literal assignable to float that fails compilation when the value provided cannot be expressed as an positive integer power of 0.5 (e.g. 0.5, 0.25, 0.125).
- Provide a function receiving an integer (X) and a tuple (std::tuple<type1, type2, type3…>) into a tuple of vectors (std::tuple<std::vector,std::vector, std::vector, …> where each vector has X elements of each originally received in each tuple_element. E.g. for X=2 and the tuple {1, 1.0, ‘a’} , the result type is std::tuple<std::vector, std::vector, std::vector> and the values are: {{1, 1},{1.0, 1.0},{‘a’, ‘a’}}
- Provide a template function “get_vector”, similar to std::get, that given the type X as parameter, returns editable access to the vector of type X in a tuple of vectors.
[1] SEI CERT. The CERT C++ secure coding standard. Pearson Education, 2008.
[2] Goldberg, David. "What every computer scientist should know about floating-point arithmetic." ACM Computing Surveys (CSUR) 23.1 (1991): 5-48.
[3] Intel® 64 and IA-32 Architectures Software Developer’s Manual - Volume 2 (specially sections about FP flags in FADD, FSUB, FMUL and FDIV)
[4] x87 and SSE Floating Point Assists in IA-32: Flush-To-Zero (FTZ) and Denormals-Are-Zero (DAZ)
[5] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf
[6] http://sdavtaker.github.io/safefloat/doc/html/index.html
Potential mentor: Vinícius dos Santos Oliveira
Even after so many years, C++ is not having a potential astronomical library which can cover major topics of astronomy. With this project, boost community can help astronomers to make complex astronomical calculations faster using C++. This library will have functionality aimed at professional and amateur astronomers, astrophysicists, but may be useful to anyone developing astronomy software. This library will help astronomers to analyse and gather information from observation data.
This library will provide some fundamental class required in astronomy.
- Basic astronomical functions
- Classes for celestial coordinate system
- Classes for light curves of variable stars
- Documentation
Functionalities mentioned below can be implemented
- Various functionality for spectroscopy.
- Read and plot catalogs from text files.
- Astrometry
- Photometry
Potential Mentor: Rishabh Arora E-mail: rishabharora1997@gmail.com
This project aims at enhancing the Boost.Intrusive library, by adding complex data structures/containers/algorithms such as Segment Tree, Fenwick Tree, Suffix-Tree and Suffix-Automata, a generic Graph data structure (other suggestions are welcome) and various algorithms atop it like Heavy-Light Decomposition, Centroid Decomposition etc. The project involves implementing the above in accordance with the Boost.Intrusive codebase and building APIs on top of it, solving several different practical use-cases efficiently and writing rigorous test suites for them.
Implement a small C++ library (a mini version of the proposed project) using C++ 11/14/17. The details of the library are as follows:
- Implement a generic Segment Tree (where merging at each node should cover a general case and not specific to certain operations)
- Use of expression templates & generic lambda is expected
- Fit in a single header file.
- Test programs for ensuring accuracy and performance benchmark tests against various type of solutions (brute-force, recursive, iterative)
Potential Mentor: Stefan Seefeld
the Boost.Python project is one of the oldest Boost components. Since its initial development there have been important additions to C++ as well as Python. That, unfortunately, isn't reflected in Boost.Python, which hasn't received as much attention in recent years as it should. Aa central goal of a GSoC Boost.Python project might thus be a modernization of the Boost.Python codebase, including additional support for newer language (and standard library) features, as well as work on open issues.
Write code to add C++ bindings to a previously unsupported Python built-in type (e.g., set
), including a simple test for it.
Potential mentors: Darren Garvey
During last year's GSoC a new library was developed --
Boost.StaticViews. It is a small
standalone library to ease the working with homogeneous constexpr
data. It
fills the niche between
Boost.Hana
(Hana works on hetero-geneous data) and the Ranges
TS (Ranges
focus on run-time).
The library consists of two layers: low-level views and high-level data
data structures implemented on top of them. You use views if you want to, say,
convert a bitmap from 8-bit to 24-bit. Previously, one would write a Python
script, convert the data, and copy-paste the results into the code. Now, we can
let the compiler do the job for us which greatly simplifies the maintenance. As
for the data structures, StaticViews currently implements a compile-time
non-owning hash table (i.e. static_map
as described
here).
Consider, for example, the recent active
discussion of
std::error_code
on the
Boost-dev mailing list.
Creating your own error_category
is all about mapping your custom error codes
to some "common" ones (i.e. std::errc
). When the number of error codes becomes
large, using switch
becomes unfeasable, and that's when static_map
comes
into play. For an example of usage of StaticViews in such a setting, see Niall
Douglas'
ntkernel-error-category.
Although the last GSoC lay some good foundation, some work still needs to be done before StaticViews may be considered for inclusion into Boost. As such, at least the following needs to be done:
- Write benchmarks that prove
static_map
to be faster thanstd::unordered_map
. - Finish the documentation.
- Add more tests.
- Add a few more algorithms for working with views.
- Currently, GCC is unable to compile some of StaticViews' code due to this bug. It would be nice to fix this as it would greatly improve the portability.
- Seeing as Boost is in the process of transitioning to CMake, adding support for it to StaticViews seems like a good idea.
- Adding more data structures like a
static_multimap
,static_set
,static_multiset
etc.
Potential mentor: Damian Vicino
In 1936, Alan Turing introduced computable real numbers in his foundational paper ‘On Computable Numbers, with an Application to the Entscheidungsproblem’ [1]. He defines: ‘The computable numbers may be described briefly as the real numbers whose expressions as a decimal are calculable by finite means’. Different authors provided several equivalent definitions, for example, an alternative equivalent definition is 'a number is computable if there exists an algorithm to produce each digit of its decimal expansion and for any digit requested it finishes the execution successfully' [2].
Around these ideas, a new area of applied mathematics was developed based on the theory of computable real numbers, called Computable Calculus [2]. In Computable Calculus, it is common to represent numbers using algorithms that can generate their digits and, define arithmetic functions that lazily operate them [2].
Using this approach is possible to calculate solutions for arithmetic problems that require accurate results. For example, in the construction of generic Discrete Event Simulation timelines [3].
Following the algorithms defined in Chapter 2 of [2] implement the following operations between Reals, and Between Reals and Integers:
- operator+
- operator-
- operator*
- operator/
- operator!=
- operator<
- operator=
- Write the Real type interface
- Explain how to deal with real problems like identity equality, pi==pi2.
- Explain how could you provide a "Small String Optimization"-like approach for Real numbers representation at compile time.
[1] Alan Mathison Turing. On computable numbers, with an application to the Entscheidungsproblem. J. of Math, 58(345-363):5, 1936.
[2] Oliver Aberth. Computable Calculus. Academic Press, 2001.
[3] Vicino, Damián, Olivier Dalle, and Gabriel Wainer. An advanced data type with irrational numbers to implement time in DEVS simulators. Proceedings of the Symposium on Theory of Modeling & Simulation. Society for Computer Simulation International, 2016.
- Home
- Getting Started
- Development Guidelines
- Releases
- Community
- Google Summer of Code
- Google Season of Docs