-
Notifications
You must be signed in to change notification settings - Fork 9
Coding style
As project size increases, consistency increases in importance. Unit testing and a consistent style are critical to having trusted code to integrate. Also, guesses about names and interfaces will be correct more often.
We aim to adhere, to a large extent, to PEP8.
-
Choose the name that people will most likely guess. Make it descriptive, but not too long:
curr_record
is better thanc
, orcurr
, orcurrent_genbank_record_from_database
. -
Good names are hard to find. Don't be afraid to change names except when they are part of interfaces that other people are also using. It may take some time working with the code to come up with reasonable names for everything: if you have unit tests, it's easy to change them, especially with global search and replace.
-
Use singular names for individual things, plural names for collections. For example, you'd expect
self.name
to hold something like a single string, butself.names
to hold something that you could loop through like a list or dict. Sometimes the decision can be tricky: isself.index
an int holding a positon, or a dict holding records keyed by name for easy lookup? If you find yourself wondering these things, the name should probably be changed to avoid the problem: tryself.position
orself.look_up
. -
Don't make the type part of the name. You might want to change the implementation later. Use
Records
rather thanRecordDict
orRecordList
, etc. Don't prefix the name with the type (i.e. Hungarian Notation). -
Make the name as precise as possible. If the variable is the name of the input file, call it
infile_name
, notinput
orfile
(which you shouldn't use anyway, since they're keywords), and notinfile
(because that looks like it should be a file object, not just its name). -
Use
result
to store the value that will be returned from a method or function. Usedata
for input in cases where the function or method acts on arbitrary data (e.g. sequence data, or a list of numbers, etc.) unless a more descriptive name is appropriate. -
One-letter variable names should only occur in math functions or as loop iterators with limited scope. Limited scope covers things like
for k in keys: print k
, wherek
survives only a line or two. Loop iterators should refer to the variable that they're looping through:for k in keys, i in items
, orfor key in keys, item in items
. If the loop is long or there are several 1-letter variables active in the same scope, rename them. -
Limit your use of abbreviations. A few well-known abbreviations are OK (see below), but you don't want to come back to your code in 6 months and have to figure out what
sptxck2
is. It's worth it to spend the extra time typingspecies_taxon_check_2
, but that's still a horrible name: what's check number 1? Far better to go with something liketaxon_is_species_rank
that needs no explanation, especially if the variable is only used once or twice.
The following list of abbreviations can be considered well-known and used with i
mpunity within mixed name variables, but some should not be used by themselves a
s they would conflict with common functions, python built-in's, or raise an exce
ption. Do not use the following by themselves as variable names: dir
, exp
(a common math
module function), in
, max
, and min
. They can,
however, be used as part of a name, eg matrix_exp
.
Full | Abbreviated |
---|---|
alignment | aln |
archaeal | arch |
auxillary | aux |
bacterial | bact |
citation | cite |
current | curr |
dictionary | dict |
directory | dir |
end of file | eof |
eukaryotic | euk |
frequency | freq |
expected | exp |
index | idx |
input | in |
maximum | max |
minimum | min |
mitochondrial | mt |
number | num |
observed | obs |
original | orig |
output | out |
parameter | param |
phylogeny | phylo |
previous | prev |
probability | prob |
protein | prot |
record | rec |
reference | ref |
sequence | seq |
standard deviation | stdev |
statistics | stats |
string | str |
structure | struct |
temporary | temp |
taxonomic | tax |
variance | var |
TODO: Refer to the numpy way of commenting
- Always update the docstring when the code changes. Like outdated comments, outdated docstrings can waste a lot of time. "Correct examples are priceless, but incorrect examples are worse than worthless." Jim Fulton
See the numpy guidelines
TODO: Update to pytest
.
The directory tests/data
contains a number of sample files that are useful for demonstration purposes.
>>> from cogent3 import load_aligned_seqs, load_tree
>>> aln = load_aligned_seqs('<path/to/Cogent3>/tests/data/brca1.fasta', moltype='dna')
>>> tree = load_tree('<path/to/Cogent3>/tests/data/murphy.tree')
Getting started
- Code of conduct
- How to set up your development environment
- The development workflow
- Development practices for cogent3
Types of issues
Guidelines