In [None]:
(*** hide ***)

(*** condition: prepare ***)
#r "nuget: Plotly.NET, 4.2.0"
#r "nuget: FSharpAux, 2.0.0"
#r "nuget: FSharpAux.IO, 2.0.0"
#r "nuget: FSharp.Stats, 0.4.11"
#r "../src/BioFSharp/bin/Release/netstandard2.0/BioFSharp.dll"
#r "../src/BioFSharp.IO/bin/Release/netstandard2.0/BioFSharp.IO.dll"
#r "../src/BioFSharp.BioContainers/bin/Release/netstandard2.0/BioFSharp.BioContainers.dll"
#r "../src/BioFSharp.ML/bin/Release/netstandard2.0/BioFSharp.ML.dll"
#r "../src/BioFSharp.Stats/bin/Release/netstandard2.0/BioFSharp.Stats.dll"

# BioFSharp

BioFSharp aims to be a user-friendly functional library for bioinformatics written in F#. It contains the basic data structures for common biological objects like amino acids and nucleotides based on chemical formulas and chemical elements.

BioFSharp facilitates working with sequences in a strongly typed way and is designed to work well with F# Interactive.
It provides a variety of parsers for many biological file formats and a variety of algorithms suited for bioinformatic workflows.

The core datamodel implements in ascending hierarchical order:

- Chemical elements and [formulas](https://csbiology.github.io/BioFSharp/Formula.html) which are a collection of elements
- Amino Acids, Nucleotides and Modifications, which all implement the common [IBioItem interface](https://csbiology.github.io/BioFSharp/BioItem.html#Basics)
- [BioCollections](https://csbiology.github.io/BioFSharp/BioCollections.html) (BioItem,BioList,BioSeq) as representation of biological sequences

</br>

![Data model](https://i.imgur.com/LXBvhmi.png)

</br>

---

## Installation

### For applications and libraries

You can find all available package versions on [nuget](https://www.nuget.org/packages?q=BioFSharp).

 - dotnet CLI

    ```shell
    dotnet add package BioFSharp
    ```

 - paket CLI

    ```shell
    paket add BioFSharp
    ```

 - package manager

    ```shell
    Install-Package BioFSharp -Version {{fsdocs-package-version}}
    ```

    Or add the package reference directly to your `.*proj` file:

    ```
    <PackageReference Include="BioFSharp" Version="{{fsdocs-package-version}}" />
    ```

### For scripting and interactive notebooks
You can include the package via an inline package reference:

```
#r "nuget: BioFSharp"
```

---

## Example

The following example shows how easy it is to start working with sequences:

Create a peptide sequence:

In [2]:
open BioFSharp

"PEPTIDE" |> BioArray.ofAminoAcidString

Create a nucleotide sequence:

In [3]:
"ATGC" |> BioArray.ofNucleotideString

BioFSharp comes equipped with a broad range of features and functions to map amino acids and nucleotides. 

In [4]:
// Returns the corresponding nucleotide of the complementary strand
Nucleotides.G |> Nucleotides.complement

In [5]:
// Returns the monoisotopic mass of Arginine (minus H2O)
AminoAcids.Arg |> AminoAcids.monoisoMass

The various file readers in BioFSharp help to easily retrieve information and write biology-associated file formats like for example FastA:

In [8]:
open BioFSharp.IO

let filepathFastaA = (__SOURCE_DIRECTORY__ + "/data/Chlamy_Cp.fastA")
//reads from file to an array of FastaItems.

let fastaItems = FastA.fromFile BioArray.ofAminoAcidString filepathFastaA

This will return a sequence of `FastaItem`s, where you can directly start working with the individual sequences represented as a `BioArray` of amino acids. 

In [9]:
fastaItems |> Seq.item 0

Unnamed: 0,Unnamed: 1
Header,sp|P19528| cytochrome b6/f complex subunit 4 GN=petD PE=petD.p01
Sequence,1 MSVTKKPDLS DPVLKAKLAK GMGHNTYGEP AWPNDLLYMF PVVILGTFAC VIGLSVLDPA  61 AMGEPANPFA TPLEILPEWY FYPVFQILRV VPNKLLGVLL MAAVPAGLIT VPFIESINKF  121 QNPYRRPIAT ILFLLGTLVA VWLGIGSTFP IDISLTLGLF *


For more detailed examples continue to explore the BioFSharp documentation.
In the near future we will start to provide a cookbook like tutorial in the [CSBlog](https://csbiology.github.io/CSBlog/).

## Contributing and copyright

The project is hosted on [GitHub][gh] where you can [report issues][issues], fork 
the project and submit pull requests. If you're adding a new public API, please also 
consider adding [samples][docs] that can be turned into a documentation.

The library is available under the OSI-approved MIT license. For more information see the 
[License file][license] in the GitHub repository. 

  [docs]: https://github.com/CSBiology/BioFSharp/tree/developer/docs
  [gh]: https://github.com/CSBiology/BioFSharp
  [issues]: https://github.com/CSBiology/BioFSharp/issues
  [license]: https://github.com/CSBiology/BioFSharp/blob/developer/LICENSE