# Use cases

## Itroduction

The pybioum llibrary provides a list of methods to interact with [BioUML](https://www.biouml.org/) server from within python. BioUML is an open source integrated Java platform for analysis of data from omics sciences research and other advanced computational biology, for building the virtual cell and the virtual physiological human. It spans a comprehensive range of capabilities, including access to databases with experimental data, tools for formalized description of biological systems structure and functioning, as well as tools for their visualization, simulation, parametersfitting and analyses.

## Getting started

### ConnectingtoBioUMLserver

The first thing youneed to do is load thepackage and log into the BioUML server.As an example we will connect to the free public [BioUML server](https://ict.biouml.org).

Library instalation throuth pip:  
```bash
pip install pybiouml
```

Nex step will be library importing and making a object of your future connaction:

In [1]:
from pybiouml import pybiouml


After importing library we create an object using PyBiouml class in library:

In [2]:
my_work = pybiouml.PyBiouml()


Now we are ready to work. At first, we need to loging into server. It can be done using `login` method of early created class  ```my_work```.  
The `login` also accepts `user` and `password`, but we leave them empty in the example above for anonymous login. Alternatively you can install BioUML on your local computer and connect to it in the same way:  
```python
my_work.login('localhost:8080')
```
See [BioUML installation](http://wiki.biouml.org/index.php/BioUML_server_installation) for details on BioUML server installation.


In [3]:
my_work.login(url='https://ict.biouml.org')

### Querying BioUML repository

The BioUML repository (or simply repository) is the central BioUML data storage place. Basically, all the data you work with in BioUML is stored in the repository. The repository has a hierarchical structure similar to file systems. On the top level the repository consists of several root folders. The most common ones are:  
- `databases` contains preinstalled or user-defined modules. 
- `data` contains user projects and public examples.

The `ls`  lists the contents of given folder in repository. The list of databases available in BioUML server:

In [5]:
import requests
my_work.ls('databases')

Unnamed: 0,name,hasChildren,class
0,Biomodels,True,0
1,EnsemblArabidopsisThaliana91,True,1
2,EnsemblFruitfly91,True,1
3,EnsemblHuman85_38,True,1
4,EnsemblMouse81_38,True,1
5,EnsemblNematoda91,True,1
6,EnsemblRat91,True,1
7,EnsemblSaccharomycesCerevisiae91,True,1
8,EnsemblSchizosaccharomycesPombe91,True,1
9,EnsemblZebrafish92,True,1


The list of data elements available in BioUML examples folder:

In [6]:
my_work.ls('data/Examples/Optimization/Data/Experiments')

Unnamed: 0,name,hasChildren,class
0,exp_data_1,False,0
1,exp_data_2,False,0
2,exp_data_3,False,0


The pybiouml `get` method fetches a table from BioUML repository as `pandas.DataFrame`:

In [7]:
exp_1 = my_work.get('data/Examples/Optimization/Data/Experiments/exp_data_1')
exp_1.head()

Unnamed: 0,time,p43p41,pro8,casp8
0,0,0.057725,59.963164,0.0
1,10,0.268144,57.564637,0.041075
2,20,4.760481,58.589814,0.316117
3,30,8.251935,59.421561,1.397356
4,45,16.144483,48.189751,3.520371


This function allows to fetch not only true BioUML tables,but any data elements which have tabular representation, including profiles, user uploaded tracks and soon.  
To store `pandas.DataFrame` as a table into BioUML repository use `put` method:

In [8]:
exp_1['sum_column'] = exp_1[['pro8', 'casp8']].sum(axis=1)
exp_1

Unnamed: 0,time,p43p41,pro8,casp8,sum_column
0,0,0.057725,59.963164,0.0,59.963164
1,10,0.268144,57.564637,0.041075,57.605712
2,20,4.760481,58.589814,0.316117,58.90593
3,30,8.251935,59.421561,1.397356,60.818917
4,45,16.144483,48.189751,3.520371,51.710122
5,60,17.020606,38.950266,3.947229,42.897495
6,90,15.269292,23.501692,4.871417,28.373108
7,120,12.53013,13.127419,4.87786,18.00528
8,150,10.334704,10.703102,4.228311,14.931413


In [14]:
my_work.put('data/Collaboration/Demo/Data/pybiouml_test/exp_1_pybiouml', exp_1)

In [15]:
my_work.ls('data/Collaboration/Demo/Data/pybiouml_test')

Unnamed: 0,name,hasChildren,class
0,exp_1_pybiouml,False,0


### Using BioUML analyses

BioUML provides a set of analyses organized in groups.The list of analyses available in the currents server can be fetched with `analysis_list` method.

In [16]:
a_l = my_work.analysis_list()
a_l

Unnamed: 0,Group,Name
0,ChIP-seq,ChIP-seq Quality control analysis
1,ChIP-seq,ChIP-seq peak profile
2,ChIP-seq,Quality control analysis
3,ChIP-seq,Report generator for quality control analysis
4,ChIP-seq,Run MACS 1_3_7 on ChiP-Seq
...,...,...
229,Workflow utils,Check Workflow consistency
230,Workflow utils,Copy data element
231,Workflow utils,Copy folder
232,Workflow utils,Create folder


In [17]:
a_l['Group'].unique()

array(['ChIP-seq', 'Composite module analyses',
       'Differential algebraic equations', 'GATK', 'GTRD',
       'Gene set analysis', 'Import', 'Match sites and genes', 'MicroRNA',
       'Molecular networks', 'Motif discovery', 'Mutations',
       'NGS alignment', 'NGS color-space', 'NGS utils',
       'Operations with genomic tracks', 'Parameter fitting',
       'Plots and charts', 'RNA-seq', 'Statistics',
       'TF binding site search', 'Table manipulation', 'Workflow utils'],
      dtype=object)

In [47]:
a_l[a_l['Group'] == 'Table manipulation']

Unnamed: 0,Group,Name
213,Table manipulation,Add calculated column
214,Table manipulation,Annotate table
215,Table manipulation,Convert table
216,Table manipulation,Convert table via homology
217,Table manipulation,Filter duplicate rows
218,Table manipulation,Filter table
219,Table manipulation,Group table rows
220,Table manipulation,Join several tables
221,Table manipulation,Join two tables
222,Table manipulation,Merge table columns


Each biouml analysis has a set of parameters, `analysis_parameters` returns a `pandas.DataFrame` with row names corresponding to parameter names and one column 'description'.

In [49]:
my_work.analysis_parameters('Filter table')

Unnamed: 0,Name,Description
0,inputPath,Table to filter
1,filterExpression,Expression in JavaScript like 'ColumnName1 > 5...
2,filteringMode,Which rows to select
3,outputPath,Path to the filtered table


The `analysis` method launches analysis with given parameters.

In [19]:
my_work.analysis('Filter table', 
                 parameters={
                     'inputPath': 'data/Examples/Optimization/Data/Experiments/exp_data_1', 
                     'filterExpression': 'time < 40',
                     'outputPath': 'data/Collaboration/Demo/Data/pybiouml_test/exp_data_1 filtered'
                 }
                )

INFO - Analysis 'Filter table' added to queue
INFO - Analysis 'Filter table' started
INFO - Filtering...



INFO - Writing result...

INFO - Analysis 'Filter table' finished (3.968 s)

RJOB202202131652562


### Importing files to and from BioUML
As described previously, `pandas.DataFrame` can be fetched from and stored to BioUML repository using pybiouml `get` and `put` methods. In addition, data can be imported from files and exported to files in various formats. The list of importers can be obtained with `importers` method.

In [58]:
my_work.importers()[:10]

['BioUML format(*.dml)',
 'BioUML Simulation result',
 'ZIP-archive (*.zip)',
 'Generic file',
 'Image file (*.png, *jpeg, *.gif etc)',
 'Text file (*.txt)',
 'HTML file (*.html, *.htm)',
 'SBML',
 'SBML(CellDesigner)',
 'BioPAX file (*.owl, *.xml)']

As an example we will import fasta file to BioUML. This fasta file can be downloaded from our [github](https://github.com/Biosoft-ru/pybiouml)

In [20]:
fasta = 'hiv1.fna'
out = 'data/Collaboration/Demo/Data/pybiouml_test'
my_work.to_import(fasta, out, importer='Fasta format (*.fasta)')
    



data/Collaboration/Demo/Data/pybiouml_test/hiv1


'data/Collaboration/Demo/Data/pybiouml_test/hiv1'

In [21]:
my_work.ls(out)

Unnamed: 0,name,hasChildren,class
0,exp_1_pybiouml,False,0
1,exp_data_1 filtered,False,0
2,hiv1,True,1


Similarly, we can use `export` method to export data from BioUML repository.

In [22]:
my_work.exporters()

['JPEG file (*.jpg)',
 'Bitmap file(*.bmp)',
 'Portable Network Graphics (*.png)',
 'BioUML format(*.dml)',
 'BioUML state (*.xml)',
 'Pair graph file(*.txt)',
 'Archive containing exported elements (*.zip)',
 'Generic file',
 'Zipped HTML file',
 'SBML',
 'BioPAX (*.owl)',
 'FASTA sequences (*.fasta)',
 'BED format (*.bed)',
 'Interval format (*.interval)',
 'General Feature Format (*.gff)',
 'Gene Transfer Format (*.gtf)',
 'Match format (*.match)',
 'Variant Call Format (*.vcf)',
 'Wiggle format (*.wig)',
 'SAM or BAM alignment file (*.sam, *.bam)',
 'ZHTML document (*.zhtml)',
 'SDF file (*.sdf)',
 'GraphML(*.graphml)',
 'Scalable Vector Graphics(*.svg)',
 'SBGN-ML',
 'COMBINE archive (*.omex)',
 'BioNetGen language format (*.bngl)',
 'Cytoscape (*.cx)',
 'Antimony',
 'Tab-separated text (*.txt)',
 'Comma-separated values (*.csv)',
 'HTML document (*.html)']

In [24]:
my_work.export('data/Collaboration/Demo/Data/pybiouml_test/hiv1.fa', 
               exporter='Fasta format (*.fasta)', 
               target_file='downloaded_hiv1.fa'
              )

In [1]:
import os
os.listdir()

['.ipynb_checkpoints', 'downloaded_hiv1.fa', 'hiv1.fna', 'Use_cases.ipynb']