Skip to content

Google Summer of Code 2020 Ideas

Shekhar Prasad Rajak edited this page Apr 9, 2020 · 12 revisions

Note: We didn't apply for GSoC 2020. These ideas will be used next year.

Ideas for Google Summer of Code 2020.

Table of Contents

Contact

Feel free to reach us by joining #sciruby on chat.freenode.net or via our mailing list.

IMPORTANT NOTICE: SciRuby encourages diversity. Scientific progress in general benefits from diversity and software development for science is no exception. We are really happy that the number of people from Asia, Africa and South America applying for GSoC projects is increasing. Our org admin this year is from India, our previous org admin was from Brazil. We have had students from Japan, India, Sri Lanka, Russia, etc. We have women software developers in our programme. We are happy to hear from you all!

Instructions for students

We strongly recommend that you pick one of the ideas listed below. We value contributions in advance of GSoC, even if they're just little ones. Go pick out something in one of our trackers and work on it, talk to folks on the listserv, and get an idea for what features are needed.

You don't need to know a lot about Ruby to work on a project: depending on how much you already know, it'll be pretty easy to learn enough to be able to contribute. However, you may need some familiarity with scientific computation. If you don't have any, take a look at "Numerical Recipes in C", which you'll probably find in your university's library.

In any case, if you feel your skills aren't enough for some project, please ask us on our IRC channel (see contact section above) or our Google Group (see sciruby.com to sign up) and we can help you.

See also:

Read this before you commit your first patches

Most of the main SciRuby’s landing page on Github holds the stable version of SciRuby gems but developers and contributors should work on the very latest (bleeding edge) repositories in order to make sure that changes can be committed without conflict arising.

Try reading Finding The SciRuby Development Repositories on Github if you would like a brief introduction on finding the latest development gems to work on from Github. Also go through the coding guidelines before sending your first patch.

How to submit a patch ("pull request")

Here's a great tutorial: http://www.thinkful.com/learn/github-pull-request-tutorial/

Have a look and feel free to ask if you have any questions.

Instructions for mentors

Guidelines for mentors to submit projects:

  • Specify the name of your project as a heading.
  • Write a paragraph or two with further details.
  • Write a small 'Skills' section detailing the skills that the student must possess to complete the project.
  • Write down your own GitHub handle and contact details in a 'Mentor Details' section over which the student can contact you.
  • If anyone else wants to co-mentor a project, please specify your details along with the mentor's details.

NumRuby projects

NumRuby is a successor of NMatrix. NumRuby is a linear algebra library for Ruby that is highly performance oriented.

Improving NumRuby

  • Add serialization support.
  • Slicing to make use of view instead of copying data.
  • Fix broadcasting.
  • Implement random engine.
  • Release NumRuby gem.
  • Mentors: Prasun Anand(@prasunanand)

Making daru-view independent

Learn basics of daru-view, from sciruby/blog or daru-view/wiki.

Daru (Data Analysis in RUby) is a library for analysis, manipulation and visualization of data. daru-view is for easy and interactive plotting in web application & IRuby notebook. It can work in frameworks like Rails, Sinatra, Nanoc and hopefully in others too.

It is a plugin gem to Data Analysis in RUby(Daru) for visualisation of data

Currently daru-view have dependencies with lazy_high_charts and googlevisualr, where SciRuby don't have any control. We have solved problems like (mainly):

  • daru dataframe or vector compatible plotting gem.
  • a gem that can work smoothly in any Ruby web application framework, IRuby notebook as well as terminal.

So now it is the time to be independent

Because -

  • we don't have much control over these gems and also we will be keep adding new features directly from HighCharts and Google Charts official sites.

  • we have extended (overload and override) most of the methods from lazy_high_charts and googlevisualr, to make it compatible for IRuby notebook and all ruby frameworks or to add new chart features already presents in HighCharts and Google Charts.

  • daru-view should be able to handle future chart types as well without (or very less) modifying codebase.

You can find more details about in this wiki page - 'Making daru-view independent'.Along with this we also want to consider new ideas written in Idea wiki page

Related links

About project

  • Skills: Basic knowledge of Ruby, Design pattern and Design Principles, Javascript and Ruby web application frameworks.
  • Mentors: Shekhar (@Shekharrajak), Sameer (@v0dro), Athitya Kumar (@athityakumar)
  • Difficulty: Moderate.

daru (Data Analysis in RUby) is a library for storage, analysis, manipulation and visualization of data in Ruby. Th has various features like :

  • Flexible and intuitive API for manipulation and analysis of data.
  • Easy plotting, statistics and arithmetic.
  • Easy splitting, aggregation and grouping of data.
  • Quickly reducing data with pivot tables for quick data summary. and so on.

You can find most of the examples in here

While it has many methods for data wrangling, it is slow for a lot of use cases (check out these benchmarks). This task will involve figuring out the slow areas of daru and porting them to Rubex, which is a language for writing C extensions for Ruby or using simple Ruby C extension.

  • Student needs to benchmark various daru methods and check how the Ruby C binding can help significant performance boost.
  • List out features that are essential for data science and not present in daru currently.
  • How can we improve the performance using parallel programming in Ruby?
  • How can we remove visualization and I/O APIs from daru and use the daru-view and daru-io plugin gems instead?

Why this project is important:

  • SciRuby is planning for a powerful and fast Machine learning gem, that will be completely compatible with daru and namtrix gem. So we have to make daru faster and more powerful accordingly. We need to find a solution using namtrix as well.

  • If we want to improve Ruby for Data Science usage we have to keep update the daru features and it's API as per the present situation.

  • We already have plugin gems for visualization and I/O operation which is stable and functional. So we may now think about removing it from daru and use the daru-io and daru-view instead.

Other tasks

  • Better error handler. Refer #479
  • Follow-up of GSoC'17: remove obsolete parts from main gem #405

Related links

More about daru

Skills: Experience in data analysis | Experience in Ruby and C | General understanding of how compilers work | Understanding of good benchmarking practices

Difficulty: Advanced

Mentor: @v0dro, Shekhar (@Shekharrajak)


Usually C-extensions are written for speed. D is a language with C-like syntax that compiles to similar runtime speeds. D is safer than C and provides high-level OOP and FP support to boot. Here we propose to bind Ruby against D extensions. For this we will take some existing D projects, such as MIR, a high performance math library, and make good use of them in Ruby. For Python automatic wrappers exist and we may need to replicate those for Ruby.

The student will bind a number of pre-agreed functionalities, optimize them, document them and provide a path for similar exercises that can be done by others. Software deployment of mixed languages often proves difficult. For software deployment we will use Bioconda and/or GNU Guix to make sure others can use the setup.

Skills: Interest in multi-languages, high performance computing, C, D etc.

Difficulty: Advanced (indeed)

Mentor: @pjotrp, @george-githinji, members of @biod


The backend for the Journal of Open Source Software (JOSS) is written in Ruby and makes full use of the Github API. The full work flow is based on the github issue tracker. In this project we want to refactor the source code and make it flexible to it can target multiple backends and be built on a full free software stack. Ruby is ideal for web-programming and this work is embedded in development happening for JOSS. With this publication oriented software we also target other journals, such as the BiohackrXiv. For the existing code base see https://github.com/openjournals/ Whedon and whedon-api repositories.

Skills: Interest in Ruby, web programming, Github API and scientific publishing

Difficulty: Moderate

Mentor: @pjotrp, @ktym, @arfon, members of @openjournals


A project started by the SymPy organisation, SymEngine is a standalone fast C++ symbolic manipulation library.

It solves mathematical problems the same way a human does, but way more quickly and precisely. The motivation for SymEngine is to develop the Computer Algebra System once in C++ and then use it from other languages rather than doing the same thing all over again for each language that it is required in.

The project for Ruby bindings has already been setup at symengine.rb. Few things that the project involves are:

  • Extending the C interface of SymEngine library.
  • Wrapping up the C interface for Ruby using Ruby C API, including error handling.
  • Designing the Ruby interface.
  • Integrating IRuby with symengine gem for better printing and writing IRuby notebooks.
  • Integrating the gem with existing gems like gmp, mpfr and mpc.
  • Making the installation of symengine gem easier.

You can find the same idea in SymPy Idea-list here

Important links: - GSoC 2016 report - GSoC 2015 work

Recommended skills: You should be comfortable with C/C++ and familiar with Ruby. Refer to the wiki to get started.

Mentors: Co-mentor @Shekharrajak


Shogun is an open-source machine learning library that offers a wide range of efficient and unified machine learning methods. It is written in C++ and provides Ruby wrapper as well.

We have plan to make it compatible with SciRuby data science related gems like: daru, daru-io & daru-view, nmatrix, rubyplot, distribution, statsample and all other which is useful in some point for data science projects.

Ongoing discussion is happening here: #4814.

SciRuby and Shogun team will be collaborating to make it happen.

Potential mentor: Co-mentor: @shekharrajak


If you have something completely different idea in your mind. First, you should start a discussion thread on the mailing list for your idea. The SciRuby will surely look into it and the idea may get improved during the discussion to be selected for GSoC period.

The best project for you is one you are interested in and are knowledgeable about. That way, you will be the most successful and productive in your project and have the most fun doing it, while we will be the most confident in your commitment and your ability to complete it.

Please use the below Idea Template to Mention Ideas:

Title

Idea

(project idea, how it will help Ruby community and future of the project)

Current status of the idea

(Describe the work that has been done and timeline)

Involved Software and technology

Difficulty

(Advanced, Intermediate, or Beginner and any specific comments on the difficulty)

Skills and Knowledge required

(Any prerequisite knowledge or approach needed)

Clone this wiki locally