# Reading *An Introduction to Applied Bioinformatics*

**Bioinformatics, as I see it, is the application of the tools of computer science (things like programming languages, algorithms, and databases) to address biological problems (for example, inferring the evolutionary relationship between a group of organisms based on fragments of their genomes, or understanding if or how the community of microorganisms that live in my gut changes if I modify my diet).** Bioinformatics is a rapidly growing field, largely in response to the vast increase in the quantity of data that biologists now grapple with. Students from varied disciplines (e.g., biology, computer science, statistics, and biochemistry) and stages of their educational careers (undergraduate, graduate, or postdoctoral) are becoming interested in bioinformatics.

*An **I**ntroduction to **A**pplied **B**ioinformatics*, or **IAB**, is an open source, interactive bioinformatics text. **It introduces readers to the core concepts of bioinformatics in the context of their implementation and application to real-world problems and data.** IAB is closely tied to the [scikit-bio](www.scikit-bio.org) python package, which provides production-ready implementations of core bioinformatics algorithms and data structures. Readers therefore learn the concepts in the context of tools they can use to develop their own bioinformatics software and pipelines, enabling them to rapidly get started on their own projects. While some theory is discussed, the focus of IAB is on what readers need to know to be effective, practicing bioinformaticians. 

IAB is interactive, being **based on IPython Notebooks** which can be installed on a reader’s computer or viewed statically online. As readers are learning a concept, for example, pairwise sequence alignment, they are presented with its scikit-bio implementation directly in the text. scikit-bio code is well annotated (adhering to the [pep8](https://www.python.org/dev/peps/pep-0008/) and [numpydoc](https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt) conventions), so readers can use it to assist with their understanding of the concept. And, because IAB is presented as an IPython Notebook, readers can execute the code directly in the text. For example, when learning pairwise alignment, users can align sequences provided in IAB (or their own sequences) and modify parameters (or even the algorithm itself) to see how changes affect the resulting alignments. 

IAB is **completely open access**, with all software being BSD-licensed, and all text being licenced under Creative Commons Attribution Only (i.e., CC BY-NC-SA 4.0). All development and publication is coordinated under [public revision control on GitHub](https://github.com/gregcaporaso/An-Introduction-To-Applied-Bioinformatics). 

IAB is also an **electronic-only resource**. There are currently no plans to commercialize it or to create a print version. This means that, unlike printed bioinformatics texts which are generally out of date before the ink dries, IAB can be updated as the field changes. 

**The life cycle of IAB is more like a software package than a book.** There will be development and release versions of IAB, where the release versions are more polished but won't always contain the latest content, and the development versions will contain all of the latest materials, but won't necessarily be copy-edited and polished.

We are in the process of developing a **project status page** that will detail the plans for IAB. This will include the full table of contents, and what stage you can expect chapters to be at at different times. You can track progress of this on [IAB #97](https://github.com/gregcaporaso/An-Introduction-To-Applied-Bioinformatics/issues/97).

[My](http://github.com/gregcaporaso) goal for IAB is for it to make bioinformatics as accessible as possible to students from varied backgrounds, and to get more people about this hugely exciting field. I'm very interested in hearing from readers and instructors who are using IAB, so get in touch if you have corrections, suggestions for how to improve the content, or any other thoughts or comments on the text. In the spirit of openness, I'd prefer to be contacted via the [IAB issue tracker](https://github.com/gregcaporaso/An-Introduction-To-Applied-Bioinformatics/issues/). I'll respond to direct e-mail as will, but I'm always backlogged (just ask my students), so responses are likely to be slower.

I hope you find IAB useful, and that you enjoy reading it!

## Who should read IAB?

## Installation

See the [project website](http://caporasolab.us/An-Introduction-To-Applied-Bioinformatics/) for instructions on how to install and use *An Introduction To Applied Bioinformatics*.

## Using the IPython Notebook

These materials are based on the IPython Notebook, an interactive HTML-based python computing environment. The main source for information about the IPython Notebook is the [IPython Notebook website](http://ipython.org/notebook), and the [IPython Notebook example gallary](https://github.com/ipython/ipython/tree/master/examples/notebooks#a-collection-of-notebooks-for-using-ipython-effectively).

Below I illustrate some examples of how I use the notebooks in the context of this project.

Much of the code that is used for education purposes in this notebook is either include in this repository, or in the [scikit-bio](http://scikit-bio.org) package. You can access these as follows:

In [1]:
import skbio

from __future__ import print_function
from IPython.core import page
page.page = print

We can then access functions, variables, and classes from these modules.

In [2]:
print(skbio.title)
print(skbio.art)


*                                                    *
               _ _    _ _          _     _
              (_) |  (_) |        | |   (_)
      ___  ___ _| | ___| |_ ______| |__  _  ___
     / __|/ __| | |/ / | __|______| '_ \| |/ _ \
     \__ \ (__| |   <| | |_       | |_) | | (_) |
     |___/\___|_|_|\_\_|\__|      |_.__/|_|\___/

*                                                    *



           Opisthokonta
                   \  Amoebozoa
                    \ /
                     *    Euryarchaeota
                      \     |_ Crenarchaeota
                       \   *
                        \ /
                         *
                        /
                       /
                      /
                     *
                    / \
                   /   \
        Proteobacteria  \
                       Cyanobacteria



We'll inspect a lot of source code through-out these notebooks to study core algorithms and objects used in bioinformatics. For example, if you're interested in a function in one of these packages, you can view the source code for that function as follows.

In [3]:
from skbio.alignment import Alignment

%psource Alignment.position_entropies

    [0;32mdef[0m [0mposition_entropies[0m[0;34m([0m[0mself[0m[0;34m,[0m [0mbase[0m[0;34m=[0m[0mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m                           [0mnan_on_non_standard_chars[0m[0;34m=[0m[0mTrue[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m        [0;34m"""Return Shannon entropy of positions in Alignment[0m
[0;34m[0m
[0;34m        Parameters[0m
[0;34m        ----------[0m
[0;34m        base : float, optional[0m
[0;34m            log base for entropy calculation. If not passed, default will be e[0m
[0;34m            (i.e., natural log will be computed).[0m
[0;34m        nan_on_non_standard_chars : bool, optional[0m
[0;34m            if True, the entropy at positions containing characters outside of[0m
[0;34m            the first sequence's `iupac_standard_characters` will be `np.nan`.[0m
[0;34m            This is useful, and the default behavior, as it's not clear how a[0m
[0;34m            gap or degenerate character sh

The documentation for scikit-bio is also very extensive (though the package itself is still in early development). You can view the documentation for the `Alignment` object, for example, [here](http://scikit-bio.org/generated/skbio.core.alignment.Alignment.html#skbio.core.alignment.Alignment). These documents will be invaluable for learning how to use the objects.

## Need help?

If you're having issues getting *An Introduction to Applied Bioinformatics* running on your computer, or you have corrections or suggestions on the content, you should get in touch through our [GitHub issue tracker](https://github.com/gregcaporaso/An-Introduction-To-Applied-Bioinformatics/issues). This will generally be much faster than e-mailing the author directly, as there are multiple people who monitor the issue tracker. It also helps us manage our technical support load if we can consolidate all requests and responses in one place.

## About the author

<div style="float: right; margin-left: 30px; width: 200px"><img title="Logo by @gregcaporaso." style="float: right;margin-left: 30px;" src="https://raw.github.com/gregcaporaso/An-Introduction-To-Applied-Bioinformatics/master/images/ponytail.png" align=right height=250/></div>

I teach bioinformatics at the undergraduate and graduate levels at Northern Arizona University. 