Separate class docstrings and __init__ docstrings in API #2282

Open
opened this issue Jun 24, 2019 · 1 comment

lilyminium commented Jun 24, 2019

 There are a few open issues on documentation, but it seems to mostly focus on tutorials rather than API reference structure -- do let me know if I've missed somewhere this would be better located. Is your feature request related to a problem? Please describe. There's a lot of documentation in the code itself that isn't necessarily where it belongs, mostly in Class.__init__ vs Class docstrings. You've noted before that you have different kinds of users, who can be mostly grouped into MD-applications people and developer people. Applications people will be looking for examples in the tutorials, while developers will be looking at the API to understand how the code itself works. Having examples and notes in __init__(), therefore, is unexpected for both and clutters up the code. On the flip side, the class docstring is not where I would first go to look for parameters to pass to __init__. Describe the solution you'd like Move long Examples etc. to tutorial .rst pages. Keep method docstrings limited to parameters that the methods take, and what the method returns. Keep class docstrings limited to description, short notes/examples, references, and attributes. Have self-documenting code with clear names and explicit arguments instead of kwargs (although I see that PR #1190 is already addressing that). Splitting up big methods into multiple smaller ones would also help interpretability. For example, Universe.__init__ could be turned into several defined class methods: Universe.from_files, Universe.from_streams, etc. Additional context Here's an example in the RMSF.__init__ docstring: r"""Parameters ---------- atomgroup : AtomGroup Atoms for which RMSF is calculated start : int (optional) starting frame, default None becomes 0. stop : int (optional) Frame index to stop analysis. Default: None becomes n_frames. Iteration stops *before* this frame number, which means that the trajectory would be read until the end. step : int (optional) step between frames, default None becomes 1. verbose : bool (optional) Show detailed progress of the calculation if set to True; the default is False. Raises ------ ValueError raised if negative values are calculated, which indicates that a numerical overflow or underflow occured Notes ----- The root mean square fluctuation of an atom :math:i is computed as the time average .. math:: \rho_i = \sqrt{\left\langle (\mathbf{x}_i - \langle\mathbf{x}_i\rangle)^2 \right\rangle} No mass weighting is performed. This method implements an algorithm for computing sums of squares while avoiding overflows and underflows [Welford1962]_. Examples -------- In this example we calculate the residue RMSF fluctuations by analyzing the :math:\text{C}_\alpha atoms. First we need to fit the trajectory to the average structure as a reference. That requires calculating the average structure first. Because we need to analyze and manipulate the same trajectory multiple times, we are going to load it into memory using the :mod:~MDAnalysis.coordinates.MemoryReader. (If your trajectory does not fit into memory, you will need to :ref:write out intermediate trajectories  to disk or :ref:generate an in-memory universe  that only contains, say, the protein):: import MDAnalysis as mda from MDAnalysis.analysis import align from MDAnalysis.tests.datafiles import TPR, XTC u = mda.Universe(TPR, XTC, in_memory=True) protein = u.select_atoms("protein") # 1) need a step to center and make whole: this trajectory # contains the protein being split across periodic boundaries # # TODO # 2) fit to the initial frame to get a better average structure # (the trajectory is changed in memory) prealigner = align.AlignTraj(u, select="protein and name CA", in_memory=True).run() # 3) reference = average structure reference_coordinates = u.trajectory.timeseries(asel=protein).mean(axis=1) # make a reference structure (need to reshape into a 1-frame "trajectory") reference = mda.Merge(protein).load_new( reference_coordinates[:, None, :], order="afc") We created a new universe reference that contains a single frame with the averaged coordinates of the protein. Now we need to fit the whole trajectory to the reference by minimizing the RMSD. We use :class:MDAnalysis.analysis.align.AlignTraj:: aligner = align.AlignTraj(u, reference, select="protein and name CA", in_memory=True).run() The trajectory is now fitted to the reference (the RMSD is stored as aligner.rmsd for further inspection). Now we can calculate the RMSF:: from MDAnalysis.analysis.rms import RMSF calphas = protein.select_atoms("name CA") rmsfer = RMSF(calphas, verbose=True).run() and plot:: import matplotlib.pyplot as plt plt.plot(calphas.resnums, rmsfer.rmsf) References ---------- .. [Welford1962] B. P. Welford (1962). "Note on a Method for Calculating Corrected Sums of Squares and Products." Technometrics 4(3):419-420. .. versionadded:: 0.11.0 .. versionchanged:: 0.16.0 refactored to fit with AnalysisBase API .. deprecated:: 0.16.0 the keyword argument quiet is deprecated in favor of verbose. .. versionchanged:: 0.17.0 removed unused keyword weights """  The method itself is two lines. In my opinion, everything after the Raises section (except the version notes) belongs in the class docstring, or even better, in a separate tutorial page somewhere. Plotting with matplotlib doesn't have anything to do with RMSF. Actually figuring out how to initialise the class (because the arguments are hidden away in kwargs) becomes quite a chore. The Universe docstring is almost the exact opposite format (and these two systems are inconsistent). The __init__ method has no docstring. This is the class documentation: """The MDAnalysis Universe contains all the information describing the system. The system always requires a *topology* file --- in the simplest case just a list of atoms. This can be a CHARMM/NAMD PSF file or a simple coordinate file with atom informations such as XYZ, PDB, Gromacs GRO, or CHARMM CRD. See :ref:Supported topology formats for what kind of topologies can be read. A trajectory provides coordinates; the coordinates have to be ordered in the same way as the list of atoms in the topology. A trajectory can be a single frame such as a PDB, CRD, or GRO file, or it can be a MD trajectory (in CHARMM/NAMD/LAMMPS DCD, Gromacs XTC/TRR, or generic XYZ format). See :ref:Supported coordinate formats for what can be read as a "trajectory". As a special case, when the topology is a file that contains atom information *and* coordinates (such as XYZ, PDB, GRO or CRD, see :ref:Supported coordinate formats) then the coordinates are immediately loaded from the "topology" file unless a trajectory is supplied. Examples for setting up a universe:: u = Universe(topology, trajectory) # read system from file(s) u = Universe(pdbfile) # read atoms and coordinates from PDB or GRO u = Universe(topology, [traj1, traj2, ...]) # read from a list of trajectories u = Universe(topology, traj1, traj2, ...) # read from multiple trajectories Load new data into a universe (replaces old trajectory and does *not* append):: u.load_new(trajectory) # read from a new trajectory file Select atoms, with syntax similar to CHARMM (see :class:~Universe.select_atoms for details):: u.select_atoms(...) Parameters ---------- topology : str, Topology object or stream A CHARMM/XPLOR PSF topology file, PDB file or Gromacs GRO file; used to define the list of atoms. If the file includes bond information, partial charges, atom masses, ... then these data will be available to MDAnalysis. A "structure" file (PSF, PDB or GRO, in the sense of a topology) is always required. Alternatively, an existing :class:MDAnalysis.core.topology.Topology instance may also be given. topology_format Provide the file format of the topology file; None guesses it from the file extension [None] Can also pass a subclass of :class:MDAnalysis.topology.base.TopologyReaderBase to define a custom reader to be used on the topology file. format Provide the file format of the coordinate or trajectory file; None guesses it from the file extension. Note that this keyword has no effect if a list of file names is supplied because the "chained" reader has to guess the file format for each individual list member. [None] Can also pass a subclass of :class:MDAnalysis.coordinates.base.ProtoReader to define a custom reader to be used on the trajectory file. all_coordinates : bool If set to True specifies that if more than one filename is passed they are all to be used, if possible, as coordinate files (employing a :class:MDAnalysis.coordinates.chain.ChainReader). [False] The default behavior is to take the first file as a topology and the remaining as coordinates. The first argument will always always be used to infer a topology regardless of *all_coordinates*. This parameter is ignored if only one argument is passed. guess_bonds : bool, optional Once Universe has been loaded, attempt to guess the connectivity between atoms. This will populate the .bonds .angles and .dihedrals attributes of the Universe. vdwradii : dict, optional For use with *guess_bonds*. Supply a dict giving a vdwradii for each atom type which are used in guessing bonds. is_anchor : bool, optional When unpickling instances of :class:MDAnalysis.core.groups.AtomGroup existing Universes are searched for one where to anchor those atoms. Set to False to prevent this Universe from being considered. [True] anchor_name : str, optional Setting to other than None will cause :class:MDAnalysis.core.groups.AtomGroup instances pickled from the Universe to only unpickle if a compatible Universe with matching *anchor_name* is found. Even if *anchor_name* is set *is_anchor* will still be honored when unpickling. transformations: function or list, optional Provide a list of transformations that you wish to apply to the trajectory upon reading. Transformations can be found in :mod:MDAnalysis.transformations, or can be user-created. in_memory After reading in the trajectory, transfer it to an in-memory representations, which allow for manipulation of coordinates. in_memory_step Only read every nth frame into in-memory representation. continuous : bool, optional The continuous option is used by the :mod:ChainReader, which contains the functionality to treat independent trajectory files as a single virtual trajectory. Attributes ---------- trajectory currently loaded trajectory reader; dimensions current system dimensions (simulation unit cell, if set in the trajectory) atoms, residues, segments master Groups for each topology level bonds, angles, dihedrals master ConnectivityGroups for each connectivity type """  The Parameters section is much clearer with the __init__ method. Another issue here is that the parameters specified don't line up with the examples; they go straight from topology to topology_format, but the examples show that you can pass in coordinate files too:  u = Universe(topology, trajectory) # read system from file(s) u = Universe(pdbfile) # read atoms and coordinates from PDB or GRO u = Universe(topology, [traj1, traj2, ...]) # read from a list of trajectories u = Universe(topology, traj1, traj2, ...) # read from multiple trajectories 
Member

richardjgowers commented Jun 24, 2019

 These are very valid criticisms, and good suggestions. If you want to open a few PRs for these changes these would be very welcome! I would recommend however: branch from develop not master, master is the state of the last release, whereas develop is the current cutting edge (which becomes master when I make releases). try and only include one major change per pull request, this will make reviewing easier & faster. I would rather review 5x 200 line changes than 1x 1,000 line change :p

Open