# Documenting code as a Data Scientist

## Why writing documentation:

* For you 
    * You will be using your code in 6 months
    * You want people to use your code and give you credit
    * You want to learn self-determination
    * Others would be encouraged to contribute to your code
* For others: 
    * Others can easily use your code and build upon it
* For science:
    * Advance the science
    * Encourage open science 
    * Allow reproducibility and transparency

Code that you wrote 6 months ago is often indistinguishable from code that someone else has written. You will look upon a file with a fond sense of remembrance. Then a sneaking feeling of foreboding, knowing that someone less experienced, less wise, had written it.

As you go through this selfless act of untangling things that were obvious or clever months ago, you will start to empathize with your users. If only I had written down why I had done this. Life would be so much simpler. Documentation allows you to transfer the why behind code. Much in the same way code comments explain the why, and not the how, documentation serves the same purpose.

You have written a piece of code, and released it into the world. You have done this because you think that others might find it useful. However, people need to understand why your code might be useful for them, before they decide to use it. Documentation tells people that this project is for them.

If people don’t know why your project exists,
they won’t use it.
If people can’t figure out how to install your code,
they won’t use it.
If people can’t figure out how to use your code,
they won’t use it.
There are a small number of people who will source dive and use any code out there. That is a vanishingly small number of people, compared to people who will use your code when properly documented. If you really love your project, document it, and let other people use it.

Version controlled plain text
As programmers we live in a world of plain text. Our documentation tooling should be no exception. We want tools that turn plain text into pretty HTML. We also have some of the best tooling available for tracking changes to files. Why would we forgo using those tools when writing documentation? This workflow is powerful, and familiar to developers.

Basic Example
Resources
---------

* Online documentation: http://docs.writethedocs.org/
* Conference: http://conf.writethedocs.org/
This will render into a header, with a list underneath it. The URLs will be hyperlinked automatically. It’s easy to write, still makes sense as plain text, and renders nicely into HTML.

README
Your first steps in documentation should go into your README. Code hosting services will render your README into HTML automatically if you provide the proper extension. It is also the first interaction that most users will have with your project. So having a solid README will serve your project well.

Some people even go as far as to start your project with a README

Template
A simple template for you to start with, for your README. Name the file README.md if you want to use markdown, or README.rst if you want to use reStruct

$project
========

$project will solve your problem of where to start with documentation,
by providing a basic explanation of how to do it easily.

Look how easy it is to use:

    import project
    # Get your stuff done
    project.do_stuff()

Features
--------

- Be awesome
- Make things faster

Installation
------------

Install $project by running:

    install project

Contribute
----------

- Issue Tracker: github.com/$project/$project/issues
- Source Code: github.com/$project/$project

Support
-------

If you are having issues, please let us know.
We have a mailing list located at: project@google-groups.com

License
-------

The project is licensed under the BSD license.

The point is that commenting is indeed only one of the tools available for making code more understandable. Sometime it is more successful, sometimes it is not.

## When you’d prefer to use comments rather than other tool?
Comments shine mostly when:

* the reader needs context, that is not possible to express using the existing entities
* the code has to make special assumptions about its environment, if that’s the case — the “special” part of environment should be normalized and the comment removed
* “discusses difficult or subtle algorithms and data structures” (source)
* other tools cannot be applied (due to limitations of size / time), if that’s the case — comments are obviously temporary
* heavy performance optimization that involve ugliness, well ok

## The cost of commenting
Having agreed on the three basic arguments, let’s consider what is the cost of using comments:

* Comments need maintenance, when refactoring code you have to refactor comments as well
* Good comments are hard to write, good comments should be precise and relevant. Precision requires stability and well-defined requirements / environment that doesn’t change. Lack of changes is a luxury in modern software projects.
* Temptation / culture that promotes using comments prevents developers from writing cleaner, explanatory code.
* Comments hide design issues — the design of a system cannot be expressed effectively with the entities and abstractions that are clear and understandable

While there is place to have comments here and there to mitigate temporary issues, most commonly commenting is evidence of inability to use other tools (renaming, decoupling, incapsulate) to express intentions of the code written.

## Documentation
There’re 2 main consumers for a written code: maintainers and external users.

### Maintainers are mainly concerned with the next questions:

* What the code does
* How the code does it

### External users are interested to know

* How to use the code


### Documentation For External Users

Traditionally, in-code documentation describes how to use it. That’s why it is common to document the API provided by a module / package. What makes having documentation in code so convenient is:

* Proximity to the code, i.e. portability
* Ability of tools / IDE to populate some part of documentation automatically
* IDEs ability to parse documentation and show inline hints

Now, when working on a project consider whether you have any external user? Users that would use your code and need to understand how to use it properly.

So go ahead and write a good documentation, describe the arguments and their type and assumptions; enjoy the automated tools that easily can create nice HTML and PDF.

Pay attention, though, that often the purpose of adding a formatted comment is not to explain how to use the code, but to include the method in listing of exposed API, i.e. the code is self explanatory and the documentation is only added for integrity.

### Documentation For Maintainers

Okay, we have a project and there’s a team of developers that work on it.

You want new / other developers to effectively engage and start development — they need to understand the overall design of the system, the terms and abstractions used, they need to understand what the code does and how.

So most chances, your project has:

* Design documentation
* API spec
* Readme
* Contribution guide
* FAQ
* Wiki

You will want to keep all of these up-to-date — this is your project’s documentation.

Not the code.

The information you would want to keep documented inline in code, but not in one of the mentioned items is very limited. Otherwise it will be duplication of information, which is, by obvious reasons is bad.

You may want to keep the relevant information close to the source code for the sake of convenience, but keep in mind that keeping it up-to-date will require and effort. Moreover, having non-updated documentation might be harmful.

You should document responsibly.