Google Summer of Code 2016 Ideas

Sameer Deshmukh edited this page Mar 6, 2016 · 36 revisions

Contact

Feel free to reach us by joining #sciruby on chat.freenode.net or via our mailing list.

IMPORTANT NOTICE: SciRuby encourages diversity. Scientific progress in general benefits from diversity and software development for science is no exception. We are really happy that the number of people from Asia, Africa and South America applying for GSoC projects is increasing. Our org admin this year is from India, our previous org admin was from Brazil. We have had students from Japan, India, Sri Lanka, Russia, etc. We have women software developers in our programme. We are happy to hear from you all!

Instructions for students

See also:

We strongly recommend that you pick one of the ideas listed below. We value contributions in advance of GSoC, even if they're just little ones. Go pick out something in one of our trackers and work on it, talk to folks on the listserv, and get an idea for what features are needed.

You don't need to know a lot about Ruby to work on a project: depending on how much you already know, it'll be pretty easy to learn enough to be able to contribute. However, you may need some familiarity with scientific computation. If you don't have any, take a look at "Numerical Recipes in C", which you'll probably find in your university's library.

In any case, if you feel your skills aren't enough for some project, please ask us on our IRC channel (see contact section above) or our Google Group (see sciruby.com to sign up) and we can help you.

Our number-one priority right now as an organization is most likely visualization. If we write good visualization software, SciRuby will become much more accessible to people.

Read this before you commit your first patches

Most of the main SciRuby’s landing page on Github holds the stable version of SciRuby gems but developers and contributors should work on the very latest (bleeding edge) repositories in order to make sure that changes can be committed without conflict arising.

Try reading Finding The SciRuby Development Repositories on Github if you would like a brief introduction on finding the latest development gems to work on from Github. Also go through the coding guidelines before sending your first patch.

How to submit a patch ("pull request")

Here's a great tutorial: http://www.thinkful.com/learn/github-pull-request-tutorial/

Have a look and feel free to ask if you have any questions.

Note about "recommended skills"

We used to say "required skills," but realized there may exist cultural as well as gender differences in how people interpret this phrase. We would like you to have at least one of the listed skills. More is better. Remember that GSoC is a learning experience, and we expect that you'll be lacking in some areas of knowledge.

One of the most important skills in science and engineering is knowing how to say, "I don't know." If you don't know something, look it up, try to understand it, and then feel free to ask for help on our listserv or in IRC.

Project ideas

NMatrix projects

NMatrix is SciRuby's numerical matrix core, implementing dense matrices as well as two types of sparse (linked-list-based and Yale/CSR). NMatrix is a fairly well-established project which has received Summer-of-Code-like grants from both Brighter Planet and the Ruby Association (in other words, from Matz, who created Ruby). Those who contribute to NMatrix will likely eventually become authors of a jointly-published peer-reviewed science article on the library. Additionally, NMatrix is a good place to gain practical C and C++ experience, while also working to improve Ruby.

NMatrix currently relies on ATLAS/CBLAS/CLAPACK and standard LAPACK for several of its linear algebra operations. In some cases, native versions of the functions are implemented, so that the libraries are not required. There are quite a number of areas for growth in terms of the capabilities of NMatrix here.

Port NMatrix to JRuby

  • Mentors: Pjotr Prins (@pjotrp), Francesco Strozzi (@fstrozzi)
  • JRuby is getting faster and faster and outperforms MRI 4-10x and is getting faster thanks to work by IBM and Oracle. Unfortunately NMatrix does not run on JRuby because of C-libraries. We are looking for a student who can change C-bindings for JVM libraries (of which there are many good alternatives) and port NMatrix with tests to the JVM. This would also open up true multi-threading for NMatrix.
  • Recommended skills: You should be comfortable with both Ruby and the JVM.

Data and Statistics Projects

A colossal amount of data is being generated every minute and having good tools to analyse this data is something that has become an essential feature of any modern language. These projects deal with making Ruby a viable language for data analysis and statistics task.

If you choose to contribute to any of these, you will be exposed to the inner workings of some very useful tools. In addition to this, most of the Ruby community as of now is still waking up to the endless possibilities that Ruby might hold for data analysis, so you will feel an immediate impact of any work that you do in this field. Ruby conferences around the world are opening up to host talks about data tools in Ruby, which will give you a great platform to showcase your work and derive long term career benefits from it.

Following are the ideas under this domain:

Categorical data support for daru, statsample and statsample-glm

  • Mentors - Alexej (@agisga) | Sameer (@v0dro) | Victor (@zverok)

  • A categorical variable is currently treated like a nominal/ordinal variable in daru, statsample and statsample-glm and thus calculations involving categorical data are not performed accurately.

  • Support for categorical data is very important and is strongly felt in the Ruby community. This mailing list discussion and the issues open here, here and here provide more information about the same.

  • This sub-project will involve two things - implement a new index called CategoricalIndex for Daru similar to that in pandas for supporting categorical data and change the regression methods in statsample and statsample-glm so that they support categorical data supplied by daru.

  • This project can be subdivided into 2 major components:

    • Support categorical data with a new :categorical data type and CategoricalIndex index class in daru.
    • Support operations on categorical data from daru on statsample and statsample-glm.
  • Recommended skills: Proficiency with Ruby, Good understanding of designing Ruby APIs, preferably should have worked with data analysis and statistics in the past.

Improving the capabilities of daru

  • Mentors - Sameer (@v0dro) | Victor (@zverok)

  • daru is Ruby gem for data analysis, visualization and manipulation. It has a good amount of functionality as of now that makes it a viable framework for data analysis in Ruby, but much remains to be done to make it a robust solution for data analysis.This sub-project will involve equipping daru with various useful tools that are regularly used in many data analysis tasks, or improving existing tools such that more people find daru useful.

  • A few of the areas where daru can see improvements are listed below:

    • Implementing functions for directly importing stock market data from sources like Yahoo Finance and Google Finance and loading into a properly time-indexed DataFrame (or maybe even from multiple remote data sources like in pandas). See more details here.
    • Better support for 'wild' data:
      • More methods for handling missing data: fill_missing, drop_missing, etc.
    • Support for window functions like Hamming window, Hanning window, etc.
    • Time series resampling.
    • Binary rolling moments (cov and corr).
    • Exponentially weighted moment functions.
    • Generic rolling 'apply' functions.
    • Better support for handling of time zones.
  • Students are free to suggest their own ideas for improvements and new functionality. Brownie points for something that will help make daru more useful for web developers (who constitute a majority of the Ruby community).

  • Recommended skills: Proficiency with Ruby and designing Ruby APIs. Knowledge of data analysis tools like pandas in Python or data.frame in R is a bonus.

Supervised machine learning

  • Mentors - Alexej (@agisga), Will (@wlevine)
  • We live in a time of obsession over so-called "big data". Many classical statistical methods are not well applicable today. For that reason, novel methods have emerged, but they are not well available for Rubyists yet.
  • Build a supervised machine learning gem on top of NMatrix and daru. For example, the gem can be called slearn (for supervised learning). It should use NMatrix for computation, daru for data handling, and possibly one of the SciRuby visualization gems for visualization of results.
  • Some methods of statistical learning that can be included are (this list is neither obligatory nor exhaustive):
    • LASSO and variants (such as Group LASSO or logistic regression LASSO)
    • Ridge regression
    • Elastic Net
    • Cross-validation and other model assessment methods
    • Smoothing Splines
    • Kernel Smoothing Methods
    • Neural Networks
    • Support Vector Machines
    • K-means, K-nearest-neighbor algorithms
  • References: There are many books and other publications, for example The Elements of Statistical Learning.
  • Recommended skills: Strong in statistics and/or optimization, proficiency with Ruby. Knowledge of statistics or machine learning libraries with similar functionality is a plus.

API and Language Projects

TensorFlow API

  • Mentors: John Woods (@mohawkjohn) | Assistant Mentors: Khor (@neth_6)
  • TensorFlow, an excellent new library for machine learning, has Python and C++ APIs, but no Ruby API. This project involves investigating the best design for a Ruby API, executing on the design plan, and demonstrating use of TensorFlow in Ruby.
  • Recommended skills: TensorFlow is written in C++, and its developers recommend SWIG; but you should be comfortable, at the very least, with Ruby and C++. You need not use SWIG, but that may be a viable solution.

Create a reproducible deployment system for Ruby/JRuby

In all, GNU Guix has become a viable alternative for RVM, rbenv and bundler with full reproducibility built-in (unlike mentioned Ruby tools). At this point we have support in GNU Guix for three versions of MRI (1.8.7, 2.1.8, 2.3.0) which is already great for testing. Now, the promise for GNU Guix is that it can support JRuby, and Rubinius too at the press of a button.

The project idea for GSoC is:

  • Add JRuby support to GNU Guix
  • Add rbx support to Guix
  • Add all important Ruby testing frameworks to Guix (e.g. cucumber)
  • Add all important web development frameworks to Guix (e.g. RoR)
  • Add Sciruby and related modules to Guix
  • Provide support to Travis-CI (probably through Docker)
  • Provide general Docker support
  • Integrate with IRuby notebook and nyaplot

In all, this could mean that if one wanted to deploy SciRuby with Rails, IRuby and nyaplot on jruby, all one has to do is

guix package -i jruby sciruby ruby-rails iruby nyaplot

and on MRI 2.1.8

guix package -i ruby-2.1.8 sciruby ruby-rails iruby nyaplot

and it would run anywhere, including Travis-CI. Or install ruby latest with sciruby-openblas:

guix package -i ruby sciruby-openblas ruby-rails iruby nyaplot

Guix, btw, has support for containers built-in, so we can even live without Docker. Finally we get a handle on complex deployments!

  • Recommended skills: interest in reproducible deployment systems

Xgboost binding

  • Mentors: Kenta Murata (@mrkn)
  • XGBoost is an optimized distributed gradient boosting library. Gradient boosting is the one of boosting methods to use gradient descend for parameter optimization. Boosting is the one of ensemble machine learning methods to create weak learning machines one by one.
  • Many users used this library for the Kaggle Higgs Challenge competition, including the winner, last year. The problem in this competition is to classify observed events into two categories (1) "tau tau decay of a Higgs boson" v.s. "background noise".
  • This project involves investigating the best design for a Ruby API, executing on the design plan, and demonstrating XGBoost in Ruby.
  • Recommended skills: You need C and C++ skills for reading XGBoost implementation and writing Ruby binding. As XGBoost has Python, R, Java, and Julia bindings, reading skill of either of them is helpful to refer the existing bindings.

MXNet binding

  • Mentors: Kenta Murata (@mrkn)
  • MXNet is another deep learning library. It is designed for both efficiency and flexibility by mixing symbolic and imperative programming styles.
  • Recommended skills:
    • C/C++ for reading MXNet and writing Ruby binding.
    • Python, R, and Julia for reading existing bindings.
    • Knowledges of deep learning to understand the library's application and to demonstrate it.

Visualization projects

Ruby matplotlib

  • Mentors: Pjotr Prins (@pjotrp), John Woods (@mohawkjohn)
  • Matplotlib is the Python tool of choice for plotting. While it's not the most robust, it's easy to use, and based on MATLAB plotting — such that it's incredibly easy to port code from MATLAB (or from Python). It'd probably be better not to reinvent the wheel and write a whole new Matplotlib, but perhaps there could be a Ruby API for Matplotlib. Some research would need to be done to demonstrate that the concept is viable, but this would be an outstanding project to get working.
  • Recommended skills: You should be comfortable with Ruby metaprogramming concepts, or should be prepared to learn them during the application process. Some familiarity with Python or MATLAB might be helpful, but isn't necessary.

Improvements on Nyaplot

  • Mentors: Naoki Nishida (@domitry)
  • Nyaplot is a new plotting library for Ruby language. It allows users to make many types of interactive plots on IRuby notebook. Nyaplot is young and there is much space for improvements. (e.g. It doesn't have any native interface like GTK back-end of matplotlib.) Any proposal is welcome so just try it first.
  • Recommended skills: You should have knowledge or experiences about Ruby and JavaScript. It is good if you have used modern Web development tools like npm or gulp.

User Interface: IRuby notebook and integration with other scientific tools

  • Mentors: Daniel Mendler (@minad)
  • Project page: https://github.com/SciRuby/iruby
  • Issues/Overview of current development: https://github.com/SciRuby/iruby/issues
  • Most important: IPython 3 (Jupyter) has been released recently and this is a major change for different language kernels like IRuby. The integration for other languages has been greatly improved in Jupyter, which is also reflected in the name change which underlines that Jupyter is more agnostic to the underlying kernel language. However there are breaking protocol changes. You can start right now to make IRuby compatible for Jupyter. Start with http://ipython.org/ipython-doc/3/whatsnew/version3.html
  • The IRuby system needs to be improved in stability, ease of installation and integration with the other scientific Ruby tools (e.g. plotting).
  • The goal of this project is to get IRuby from the current state to something which is ready for production use! I consider this a very important project since IRuby acts (or can act in the future) as a central component of the SciRuby framework which allows you to access all the numerical and plotting functionality in a very beginner friendly way.
  • This project will also require a fair amount of communication with the other sciruby projects to help them to integrate better which IRuby and with each other.
  • Recommended skills: You should be comfortable with common Ruby programming concepts. It would be helpful if you are interested in other technologies and languages too, e.g. for digging into the IPython code or the 0mq-protocol.

Math API projects

Ruby need efficient tools in scientific domains aside from linear algebra: graph algorithms, mathematical programming, etc. For efficiency, these tools should be either new code written from scratch in C/C++ (with need several years of work) or bind to already existing stable libraries.

LEMON graph library API

  • Mentors: Pjotr Prins (@pjotrp), Maurice Diamantini
  • The LEMON C++ graph library (Library for Efficient Modeling and Optimization in Networks) is a good candidate as a Ruby binding because it has a clean C++ interface. It also provides a general MIP (Mixed Integer Programming) independent interface to various other free or commercial mathematical solvers (Glpk, Clp, CPLEX, Guroby, and so on). It is well maintained and its integration in the COIN-OR set of tools is a gauge of its quality. Such a binding would be a great advance for the operation research and combinatorial optimization Ruby community.
  • Recommended skills: You should be comfortable in C++ and familiar with Ruby.

Ruby Wrappers for SymEngine

  • Mentors: Ondřej Čertík (@certik), Abinash Meher (@abinashmeher999)
  • A project started by the SymPy organisation, SymEngine is a standalone fast C++ symbolic manipulation library. It solves mathematical problems the same way a human does, but way more quickly and precisely. The motivation for SymEngine is to develop the Computer Algebra System once in C++ and then use it from other languages rather than doing the same thing all over again for each language that it is required in. The project for Ruby bindings has already been setup at symengine.rb. Few things that the project involves are:
    • Extending the C interface of SymEngine library.
    • Wrapping up the C interface for Ruby using Ruby C API, including error handling.
    • Designing the Ruby interface.
    • Integrating IRuby with symengine gem for better printing and writing IRuby notebooks.
    • Integrating the gem with existing gems like gmp, mpfr and mpc.
    • Making the installation of symengine gem easier.
  • Recommended skills: You should be comfortable with C/C++ and familiar with Ruby. Refer to the wiki to get started.

Space Projects

Ruby SPICE

  • Mentors: John Woods (@mohawkjohn)
  • The SPICE Toolkit is a NASA library which allows researchers, astrophysicists, and spacecraft navigators to query information on ephemerides (locations, velocities, attitudes, etc.) of planetary bodies and spacecraft. There is at least one Python wrapper, SpicePy, and a very old Ruby wrapper with extremely limited capabilities (which @mohawkjohn has put on Github, though he's not the original author). It'd be great to either update the existing Ruby wrapper or write a totally new one — or possibly even write an auto-generator using FFI to produce Ruby API for this important C library.
  • Recommended skills: You should be familiar with both C and Ruby.

Ruby API for NASA's Trick Simulation Environment

  • Mentors: John Woods (@mohawkjohn)
  • Trick is a NASA simulation library, which can be — among other things — used to test flight software for spacecraft. Think of it as kind of like Kerbal Space Program, but much more realistic. It uses SWIG to generate Python interfaces for simulated devices such as sensors, and for specifying simulation run parameters. This proposed project involves the creation of a Ruby interface to parallel the Python interface.
  • The Trick maintainers aren't thrilled about having to support another interface, so options are: (1) creating a fork, or (2) creating an extension or plugin of some kind. There are details in this Trick issue tracker thread.
  • Note: While this project does not require you to be a U.S. person, there is possibly additional work that could be done if you are. Namely, you might have the opportunity to help us write the Ruby interface for the simulator for a spacecraft being built by Intuitive Machines.
  • Recommended skills: You should be familiar with C, C++, and Ruby. Basic knowledge of Python and SWIG would also be beneficial.
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.