Skip to content

Coding style conventions

Nick Hale edited this page Sep 10, 2015 · 5 revisions

Audience

This document is mainly intended for Chebfun developers. The aim is to provide guidelines for coding style in Chebfun. While it is true that there are concrete programming principles which are regarded as useful by a vast majority of experienced programmers, it is also true that a great number of programming “principles” are a matter of taste. We hope that with the passage of time, these guidelines will evolve and become a list of principles derived from reasons rather than tastes and a consensus on these principles among the current Chebfun team (at least) will be reached.

A good Chebfun developer has to be a good Matlab programmer. Therefore, while some guiding principles may be peculiar to Chebfun, others apply generally to Matlab.

Disclaimer: No claim of originality is being made in this document. Most principles and maxims are adopted from various books and on-line resources. A substantial but incomplete list of references is provided at the end.

Terminology: The terms matrix and array are used interchangeably in this document.

Formatting

The purpose of formatting is to assist readability of the code.

Indentation

Each level of code will be indented by 4 spaces. In some editors, a single tab character should do this, but make sure that your editor converts a single tab character to 4 spaces. A typical Matlab editor provides the option of converting a tab to a specified number of spaces, which by default is set to 4 spaces. This is to make sure that the code is displayed uniformly in various text editors which display the tab character differently.

The 80 Column Rule

Lines with more than 80 characters should be avoided. This improves readability and portability of the code.

White Space and Punctuation

  • Logical operators and the equality sign: The = sign should have a single space before and after its occurrence. Similarly, a single space should surround binary logical operators such as &, &&, ==, ~= etc.

  • Comma & semicolon: For putting a space after a comma, the following conventions are adopted. If a comma is being used to separate arguments of a method, then it should be followed by a space. If, on the other hand, a comma is being used to separate indices of a matrix, no space should follow. For example,

    A(iRow,jCol)
    

    should be used when A is a matrix and

    fun(m, n)
    

    should be used for a method fun.

    Another instance where commas and semicolons are frequently used is the assignment or initialization of vectors and matrices. Elements of vectors, matrices or cell arrays will always be separated by commas (or semicolons) and the commas (semicolons) will be followed by a single space. For example:

    rowV = [ 1, 2, 3 ];
    colV = [ a; b; c ];
    matA = [ colV, rowV' ];
    matB = [ rowV; colV' ];
    

    Commas and semicolons will never be preceded by a space.

    The use of white space padding at the start and end of the vector is left to the discretion of the author. For example, both of the following are valid:

    rowVa = [ 1, 2, 3 ];
    rowVb = [1, 2, 3];
    
  • Terminating statements: Semicolons are frequently used to terminate a statement. However, if a statement is silent, giving no output on the command line, then it should not be terminated by a semicolon. For example instead of

    plot(x, y); 
    hold on;
    shg;
    

    one should write

    plot(x, y)
    hold on
    shg
    
  • Brackets: For spacing around brackets, we adopt the convention of putting no space before and after a bracket if the brackets are being used to enclose the indices of a matrix, or the arguments of a method. If, on the other hand, brackets are being used to group together various tests of a conditional, we should surround them bracket with single spaces. For example,

    A{iRow,jCol} + B(iRow,jCol)
    

    should be used for arrays A and B. Similarly,

    fun(m, n)
    

    should be used for a method fun. However, for a conditional statement, we write:

    if ( a == b || a == c )
    

    Similarly, for matrix, vector or cell array assignment/initialization, the opening bracket will be followed by a single space while the closing bracket will be preceded by a single space. Here is an example:

    A = { A1; A2; A3 };
    
  • Spacing for binary operators: The binary operators + and - should be surrounded by single spaces. For example, an expression may look like

    c = a + b - c - d + e
    

    Spaces around operators *, .*, /, ./, \, and so on, are optional and are left for the programmer to decide. For example:

    c = a*b + 2./x + nRows - nCols*nRows./(y + 1);
    A = [ 1, 3, 4 ] * c/2;
    

Trailing Spaces

There should be no blank lines at the end of a method. [TODO] More to be added by Anthony regarding trailing spaces at the end of a line.

One Statement per Line

Only one statement should be written in a single line.

Formatting Control Structures

These mainly involve branching and looping. Statements like if, switch, for, while, etc., all fall into this category. Any control structure should be written with proper indentation, placing a single statement on each line. This in particular means that one-line if statements or one-line loops should be avoided. It is true that there are situations when this rule will seem too verbose, but to keep clarity and uniformity in the code we have agreed to adopt this principle. Here is an example:

for counter = 1:5    
    if ( a == b )
        a = 2*b;
    else
        a = 3*b;
    end
end

It is also suggested that for if and while the conditions which are being tested are always bracketed even if it is a single condition as in the example above.

If there are multiple conditionals within an if statement, then conditionals involving a logical binary operator such as == or >= will be enclosed in an extra set of parentheses. Here is an example. We write

if ( (a == b) || (a > c) && all(x) ) 

instead of

if a == b || a == c && all(x)

Similarly, we write

if ( (naring < 3) || isempty(x) ) 

instead of

if ( naring < 3 || isempty(x) ) 

This again adds clarity and avoids possible bugs in some situations.

Logical Assignment Format

While assigning the logical outcome of multiple conditionals to a variable, we use the format

isHappy = (a == b) && all(c)

instead of

isHappy = a == b && all(c)

Making Blocks of Code

The %% sign breaks a Matlab file into blocks which are called cells. When the cursor is moved within a single cell in the Matlab editor, the cell is highlighted, creating a visual aid to help concentrate on a particular part of the code. Chebfun programs encouraged to divide their code into cells (typically less than 20 lines). This improves modularization and readability within a single file, not only at the visual level but also at the logical level.

File Layouts

Methods

Here is a crude template for a general method file in Chebfun Version 5.

function y = foo(x)
%FOO   FOO computes the foo of a chebfun X and returns the result
%   in chebfun Y. Notice the spacing. There are three spaces after 
%   the function name in the first line and three spaces before 
%   the start of each line in the help text. A blank line is then
%   inserted before the "See also" section. The "See also" section
%   has only one space after the % symbol. Same is true for the 
%   copyright section which follows the "See also" section.
%   
% See also FOOBAR, WHATNOT.

% Copyright 2013 by The University of Oxford and The Chebfun Developers.
% See http://www.chebfun.org/ for Chebfun information.


%%
% First block of code with comments.
y = 0;
w = x+y;

%%
% Second block of code with comments.
z = y;
w = w+z;

end

Classdef Files

Hale and Austin have written a detailed layout for classdef files. This can be accessed through our shared Dropbox folder.

Naming

Good naming conventions are very useful in making the code self explanatory. Appropriately chosen names also act as meta-data of a code and allow the reader to extract useful information about the context, data type, and the actual data associated with a variable.

Within the Chebfun team, there is a consensus on at least one principle for naming variables: descriptive but at the same time not too verbose. These apparently contradictory demands already give us a hint that choosing an appropriate name for a variable might prove to be trickier than it seems. There are mainly three types of items that we need to name:

  • Variables or objects
  • Functions or methods
  • Classes

Let us start with variable names first.

Variable Names

CamelCase Notation

This notation is not widely used by Matlab toolbox programmers, but it is growing more popular. CamelCase notation is the practice of choosing compound words or phrases as variable names, where parts of the compound word are joined without spaces and are capitalized within the compound according to a certain rule. For example getThisValue(), MatSize or funVal etc.

In Chebfun, we make use of this notation at various levels. For example, the number of Chebyshev points used by a fun is accessed by fun.n in Version 4. In Chebfun Version 5, this changes to fun.nPts.

Linear Algebra

Chebfun relies heavily on matrices, and this requires a lot of indexing variables and variables determining row and column sizes etc.

  • Matrices vs. Vectors: A matrix should be named with a capital letter and a vector with a small letter. This fits naturally with Ax = b. However, this also entails the loss of distinction between scalars and vectors.

  • Operators: Linear operators should be denoted by L explicitly or by using L as a prefix, while non-linear operators by using N. For operator arithmetic, we may always use capital letters: A, B, C e.g. and define methods like C = plus(A, B).

  • Index Variables: In Matlab, i and j are used to represent the imaginary unit sqrt(-1). An expression of the sort i+j can be accordingly confusing (for some people at least). Also, when there is a series of loops within a single file, using i,j can be even more confusing because if i or j appear later in the code, they might already have have been defined with some unwanted values.

    We should also avoid the use of the variable name ``idx' as a shortened version for the word index'. This usage appears in many places in Chebfun Version 4, but it can be confusing—especially automatic differentiation people may think of it as the ID of some `x`.

    The preferred solution is to use i, j, and k as prefixes of an indicative name. Here is an example:

    for iRow = 1:nRows
        for jCol = 1:nCols
            A(iRow,jCol) = i+j;
        end
    end
    
  • Row--Column Sizes: Variables determining the total number of rows and columns of a matrix are extensively used in Chebfun. One should use the names nRows and nCols in cases when a single matrix is involved. When multiple matrices are involved, nRows — or nCols — can be used as prefixes. For example

    if( nRowsCheb == nColsDiff )
        display('Good to go!');
    else
        display('Multiplication not defined');
    end
    
  • Imaginary Numbers: The imaginary number z = sqrt(-1) will be written as z = 1i. Other examples include z = exp(2i*pi/n) etc.

Method Names

Chebfun extends many Matlab functions designed for discrete vectors to their continuous analogues. For example, the Matlab function (method) min gives the minimum of a vector, while Chebfun's min command gives the minimum of a function. Since method names in Matlab always start with small letters and do not use camel-case notation, it seems that Chebfun does not really have a choice when it comes to naming methods. Since our target audience is Matlab users, we want to make sure that the users are able to guess the correct name of the corresponding Chebfun method most of the time. Therefore, we should use lower-case, simple, short, easy to remember names for functions.

Class Names

Classes should also have names that are short but meaningful.

Comments

This is perhaps the single most important feature of any piece of code to improve its readability. We all agree that comments should be there—in fact, we should have have two tiers of comments intertwined within the code, one tier explaining what the particular piece of code does and the other explaining the code from an object-oriented, class hierarchy and input–output point of view also known as documentation comment blocks. In this section, we only discuss the former kind of commenting, i.e. the one which explains what the code is supposed to do. The latter would be dealt with in the “Documentation” sections.

Comments should be descriptive in nature and one of the aims of the code review process is to ensure this.

All comments will use English alphabets and other characters normally available on a standard keyboard. No accents or other special characters are allowed. Comments are allowed in both British and US spellings.

All comment lines will either be English sentences, starting with a capital letter and ending with a full stop or English phrases ending with a colon.

Commenting Control Structures

  • Branching Statements. This is how our standard if statement should look:

    % This if statement does this and that.
    if ( a == b )
        % Do this if that.
        a = b;
    else
        % Do that if this.
        a = c;
    end % End of if.
    
  • Loops. A simple for loop:

    % This for loop loops and loops.
    for iRow = 1:nRows
        % Loop through rows and do X.
        A(iRow,:) = (iRow^iCol) * ones(1, nCols);
    end % End of for.
    

It is not necessary to comment every single line of an if statement of a for loop. In particular, if the variable names are chosen wisely, it will often be obvious what the code is doing and such comments would seem redundant.

Referencing Variables and Functions in Comments

Help Text

Any occurrence of a variable name or a method name in the introductory help text of a method or any other file will be fully capitalized. An exception to this rule is made when an example code snippet is provided in the help text to explain the usage of the method. For example

function yOut = foo(xIn)
%FOO   The function FOO takes XIN as the input and
%   gives YOUT as the output.
%   Example:
%         yOut = foo(xIn);
%
% See also MYFOO, YOURFOO. 

Comments within the Code

When a local variable is referenced in comments explaining the code, the identifier will be used exactly and no extra capitalization will be done. However, method names whether local or non-local will be capitalized and followed by an open pair of parentheses in order to differentiate them from variable names. It is important to remember that following MATLAB conventions, no such distinction is made in the help text. Here is an example:

function yOut = foo(xIn)
%FOO   The function FOO takes XIN as the input and
%   gives YOUT as the output.
%   Example:
%         yOut = foo(xIn);
%
% See also MYFOO, YOURFOO. 

% The code begins here.
yOut = xIn;       % yOut is the same as xIn.
yOut = max(xIn);  % MAX() is used to assign the maximum of xIn to yOut.

end

While referring to Inf or NaN in comments, capitalization will never be used. On the other hand, within comments the key-words true and false will always be referred to as TRUE and FALSE.

Referring to an Interval in Comments

When we're referring to a mathematical interval in a comment, we write [a,b], not [a, b].

Warnings in Comments

We sometimes come across situations where we find a partial fix or an ingenious but obscure solution to a problem. The fragility and (or) reliability of such constructs should be made clear by using TODO or FIXME phrases in nearby comment lines. Here is an example:

% [FIXME] The following is a kludge and should be improved.
x = xx.^(x(end-1:-1:1))./y;

An Example Code File

We now illustrate how the guidelines will affect on existing Chebfun code. We take a very simple Chebfun file, @chebfun/sin.m. The is how the file looks like in Chebfun Version 4:

function Fout = sin(F)
% SIN   Sine of a chebfun.

% Copyright 2011 by The University of Oxford and The Chebfun Developers. 
% See http://www.maths.ox.ac.uk/chebfun/ for Chebfun information.

for k = 1:numel(F)
    if any(get(F(k),'exps')<0), error('CHEBFUN:sin:inf',...
        'SIN is not defined for functions which diverge to infinity'); end
end

Fout = comp(F, @(x) sin(x));
for k = 1:numel(F)
    Fout(k).jacobian = anon(['diag1 = diag(cos(F)); der2 = diff(F,u,''linop'');' ...
        'der = diag1*der2; nonConst = ~der2.iszero;'],{'F'},{F(k)},1,'sin');
    Fout(k).ID = newIDnum;
end

This is how the same code would look in Chebfun Version 5:

function Fout = sin(F)
%SIN   Sine of a chebfun. F is a quasimatrix and 
%   Fout is a quasimatrix of the same dimension. Each chebfun in the
%   quasimatrix Fout is the sine of the corresopnding chebfun in F.
%
% See also COS, TAN.

% Copyright 2013 by The University of Oxford and The Chebfun Developers.
% See http://www.chebfun.org/ for Chebfun information.

%%
% Loop through the columns of the quasimatrix and rule out singularities.
for k = 1:numel(F)
    % If the current chebfun has singularites, report error
    if ( any(get(F(k), 'exps') < 0) )
        error('CHEBFUN:sin:inf', ...
            'SIN is not defined for functions which diverge to infinity');
    end % End of if.
end % End of for.

%%
% The output function is a composition of the input function with the
% sine function.
Fout = comp(F, @(x) sin(x));

%%
% Update the Jacobian info in each chebfun within the quasimatrix Fout.
for k = 1:numel(F)
    % Update the current chebfun with the Jacobian of sin.
    Fout(k).jacobian = anon(...
        ['diag1 = diag(cos(F)); der2 = diff(F, u, ''linop'');' ...
        'der = diag1*der2; nonConst = ~der2.iszero;'], ...
        {'F'}, {F(k)}, 1, 'sin');
    % Update the ID of the current chebfun.
    Fout(k).ID = newIDnum;
end % End of for loop.

end