Skip to content

GSoC 2022 Project Ideas

Micaela Matta edited this page Feb 15, 2022 · 39 revisions
Google Summer of Code 2022

The MDAnalysis organisation has NOT YET been accepted for GSoC 2022.

Please see our Google Summer of Code wiki page for some general information.

To prospective applicants: if you are interested in taking part, please do get in touch on the developer list. Given this year's changes to the GSOC program structure (medium and long projects), letting us know of your intentions to apply and getting acquainted with the project early will be very helpful.

To prospective mentors: MDAnalysis welcomes new mentors, please do get in touch on the developer list if you are interested in taking part. We typically expect mentors to be familiar with our development process as evidenced by contributions to the code base and interactions on the developer mailing list.

Overview

A list of projects ideas for Google Summer of Code 2022.

The currently proposed projects are:

  1. Molecular volume and surface analysis
  2. Generalise groups
  3. Adding type hint support to the MDAnalysis library
  4. Extend MDAnalysis interoperability
  5. Benchmarking and performance optimization
  6. Context-aware guessers

Or work on your own idea! Get in contact with us to propose an idea and we will work with you to flesh it out into a full project. Raise an issue in the Issue Tracker or contact us via the developer Google group.

You can find the list of all available mentors for MDAnalysis here.


Project summary

See below for long descriptions. The difficulty is a somewhat subjective ranking, where "easy" means that we know pretty much what needs to be done, "medium" requires some additional research into best solutions as part of the project, and "challenging" is high risk/high reward where we think a solution exists but we will have to work with the student to find it and implement it.

project name difficulty description skills mentors

| 1 | Generalise Groups | medium | Generalise concept of groups | Python, NetworkX, Molecular modelling | @lilyminium, @fiona-naughton, @richardjgowers, @IAlibay, @micaela-matta | | 2 | Type hinting | medium | Add type hints to the MDAnalysis library | Python | @IAlibay, @jbarnoud | | 3 | Extend MDAnalysis Interoperability | medium | Extend converters module to other relevant packages | Python | @lilyminium, @IAlibay, @fiona-naughton, @hmacdope | | 4 | Benchmarking and performance optimization | medium | write benchmarks for automated performance analysis and address performance bottlenecks | Python | @hmacdope, @orbeckst, @jbarnoud | | 5 | Context-aware guessers | medium | Extend how the library guesses properties such as bonds, masses or atom symbols; and write guessers that know about the context of the system (database of origin, force field...) | Python, Molecular modelling | @jbarnoud |

Project descriptions

Project 1: Generalise Groups

It is common to want to consider a group of atoms as a single site/particle, for example defining the position of a water molecule (or a larger solvent) as its center of mass. It then follows that it is useful to consider many such groupings as an array of quasi-particles, leading to something like an AtomGroup-Group.

The goal of this project is to generalise the concept of groups of Atoms to define AtomGroupGroups, specifically implementing 2 new classes: RingGroup and BeadGroup.

  • BeadGroup: groups of atoms that can be represented as a single site/particle. This could be used for analysis purposes, as well as to define coarse-grained beads.

  • RingGroup: aromatic rings (eg benzene, nucleobases etc.) can be defined by their position (the geometric center of the ring) and their normal vector (the direction they are facing). This class could be implemented as a special case of AtomGroupGroup which also defines a directionality.

Objectives

  • Design and implement an AtomGroupGroups class to represent these new groups
  • Generalise existing methods (e.g. center_of_mass) to AtomGroupGroups
  • Implement BeadGroup as an AtomGroupGroup
  • Implement RingGroup, as a special case of AtomGroupGroups
  • Implement ring finding functions to quickly define these groups
  • Basic RingGroup based analysis, eg angle between rings, pi-stacking identification.

Relevant skills

  • Python
  • Graph theory (eg the NetworkX package)
  • Chemistry

Related issues:

Mentors

  • @richardjgowers
  • @lilyminium
  • @fiona-naughton
  • @IAlibay
  • @micaela-matta

Project 2: Type hinting

While python is a dynamically typed language, it allows annotating the type of variables and function signatures. Such annotations can be helpful documentation, they can also help developers using IDEs by allowing better completion and error detection. Most importantly, it allows static code analysis to detect possible errors before runtime.

With this project, we aim to annotate as much of the library as possible. This will let MDAnalysis benefit from these annotations, but also let downstream projects use annotations when using MDAnalysis.

Objectives

  • Set up type analysis in the continuous integration pipeline
  • Design a best-in-class annotation scheme that is informative, easy to use, and catch the most errors
  • Annotate as much of the code as possible
  • Document the type system for MDAnalysis contributors and for downstream users

Relevant skills

  • Python

Mentors

  • @IAlibay
  • @jbarnoud

Project 3: Extend interoperability

MDAnalysis has been pushing towards interoperability objectives. In pursuit of this aim, we have already added converters to the ParmEd and RDKit libraries. We aim to continue this direction by focusing on other relevant packages such as MDTraj, pytraj, OpenBabel, and Psi4.

Objectives

  • Create converter classes to and from MDAnalysis to your chosen package

Relevant skills

  • Python
  • Any other language relevant to your chosen package (likely C++)

Mentors

  • @IAlibay
  • @lilyminium
  • @fiona-naughton

Project 4: Benchmarking and performance optimization

The performance of the MDAnalysis library is assessed by automated benchmarks with ASV. The benchmarks are publicly available and are updated every night.

The goal of this project is to increase the performance assessment coverage and identify code that should be improved.

Objectives

  1. Write benchmark cases.
  2. Analyze the performance history to identify code that needs to be improved.

Relevant skills

  • Python

Mentors

  • @orbeckst
  • @hmacdope
  • @jbarnoud

Project 5: Context-aware guessers

Most topology file formats do not contain every information known about the system. This is because some of this information is implicit. However, the assumptions made by the file are only valid in a given context. Traditionally, MDAnalysis guesses atomic bonding, atom elements, and masses. Such guesses assume the system contains atoms simulated with their natural mass and named according to some conventions. This breaks for coarse-grained systems but also with some atomistic models.

This project aims at writing guessers that are aware of the context of the system. This will require adapting the universe creation so a user can provide a context and write a series of guesses for various contexts such as the PDB or the Martini force-field.

Objectives

  1. Design a way to provide a guessing context
  2. Adapt the Universe creation to account for user-provided context
  3. Write, test, and document a strict PDB guesser
  4. Write, test, and document a Martini guesser

Relevant skills

  • Python
  • Molecular modelling to understand the properties being guessed

Mentors

  • @jbarnoud
Clone this wiki locally