<h1 style="color:purple"  align="center"><b><u>kProcessor: GenericDecoder Tutorial</u></b></h1>

### Data Download

In [1]:
#!wget https://github.com/drtamermansour/nu-ngs02/raw/master/kProcessor/data/data.zip
#!unzip data.zip
#!ls data/

### Import kprocessor python package

In [2]:
import kProcessor as kp

<hr style="height:2px;">

<h2 align="center"> <u> Main Functions </u> </h2>
<hr>


<hr style="height:1px;">

<h3 style="color:green">1- Parsing GenericDecoder object</h3>
<ul>
<li><b>kp.parseSequences(GenericDecoder, kDataFrame): </b>Parse the database and insert children in the kDataFrame with their count</li>
</ul>

<hr style="height:1px;">

<h3 style="color:green">2- Loading and saving hashed values of children</h3>
<ul>
<li><b>kp.initialize_MAPhasher(ifilePath): </b>load saved files with ifilePath prefix</li>
</ul>
<ul>
<li><b>kp.save_MAPhasher(hashMethod, ifilePath): </b>save hashed values in hashMethod with ifilePath prefix</li>
</ul>

<hr style="height:1px;">

<h3 style="color:green"><b>3- Initialization </b></h3>
<ul>
<li><b>kp.initialize_genericDecoder(ofilePath, ifilePath, source): </b> prepare genericDecoder to parse ifilePath to extract asscotiation list according to the given source</li>
</ul>
<ul>
<li><b>kp.initialize_genericDecoder(ofilePath, source): </b> prepare genericDecoder to check the last version of the give database (source) in order to be parsed</li>
</ul>
<ul>
<li><b>kp.initialize_genericDecoder(aListFilePath): </b> prepare genericDecoder to parse a json file that contains a list of association lists</li>
</ul>
<ul>
<li><b>kp.initialize_genericDecoder(ofilePath, ifilePath, source ,hashMethod): </b> initialize genericDecoder while using an old hashed values of childrens</li>
</ul>
    
<hr style="height:1px;">

<h3 style="color:green"><b>4- Adding options </b></h3>
<ul>
<li><b>kp.set_hashMethod(GD, hashMethod): </b> set GenericDecoder to use an old hashed values of childrens</li>
</ul>
<ul>
<li><b>set_minList(GD, minList): </b> set a minimum length of asscotiation lists, so that it decodes parents with certain number of childrens</li>
</ul>
<ul>
<li><b>set_filterPath(GD, filterPath): </b> add a filter that controls which parents would be added depending on the values provided by the database</li>
</ul>
<ul>
<li><b>set_dictionaryPath(GD, dictionaryPath): </b> use dictionary to translate childrens using a tabular file with the word and its meaning in a tab seperated format</li>
</ul>
    
<hr style="height:1px;">

<h3 style="color:green"><b>5- Decoding </b></h3>
<ul>
<li><b>kp.decode(GD): </b> decode GenericDecoder object to create asscotiation list, so that GenericDecoder is ready to be indexed</li>
</ul>
    
<hr style="height:1px;">
    
<h3 style="color:green"><b>6- Indexing </b></h3>
<ul>
<li><b>kp.index(GenericDecoder, namesFile , kDataFrame): </b>Perform children indexing of the given file with respect to the namesFile, filling the passed kDataFrame and returns a coloredKDataFrame</li>
</ul>
    
<hr style="height:1px;">

<hr style="height:2px;">

<h2  align="center"> Example 1 (Initialize GenericDecoder from an offline data) </h2>
<hr>
<h3><b>Description</b></h3>
<ol>
    <li> Initialize GenericDecoder </li> 
    <li> Decode input file</li>
<ol>

### Initialize GenericDecoder

In [2]:
GD = kp.initialize_genericDecoder("data/", "data/all_gene_disease_associations", "DisGeNET")

### decode input file

In [3]:
GD = kp.decode(GD)

<hr style="height:5px">

<h2  align="center"> Example 2 (Loading hashed values to be used in different GenericDecoder object) </h2>
<hr>
<h3><b>Description</b></h3>
<ol>
    <li> Save hashed values of a decoded GenericDecoder (saved by default after decoding with the same prefix as ifilePath) </li> 
    <li> Initialize MAPhasher from the saved file</li>
    <li> Pass the  MAPhasher to a new GenericDecoder object</li>
    <li> save the  MAPhasher on disk</li>
<ol>

### Save hashed values of a decoded GenericDecoder (saved by default after decoding with the same prefix as ifilePath)

In [4]:
GD = kp.decode(GD)

### Initialize MAPhasher from the saved file

In [5]:
hashMethod = initialize_MAPhasher("data/all_gene_disease_associations")

### Pass the  MAPhasher to a new GenericDecoder object

In [6]:
GD1 = kp.initialize_genericDecoder("data/", "data/CPDB_pathways_genes", "consensusPathDB" ,hashMethod)
GD1 = kp.decode(GD1)

### save the  MAPhasher on disk

In [6]:
kp.save_MAPhasher(hashMethod, "data/all_gene_disease_associations")

<hr style="height:5px">

<h2  align="center"> Example 3 (Initialize GenericDecoder to download the last updated version of database) </h2>
<hr>
<h3><b>Description</b></h3>
<ol>
    <li> Initialize GenericDecoder </li> 
    <li> Decode downloaded file</li>
<ol>

### Initialize GenericDecoder

In [7]:
GD = kp.initialize_genericDecoder("data/", "DisGeNET")

### decode downloaded file

In [8]:
GD = kp.decode(GD)

<hr style="height:5px">

<h2  align="center"> Example 4 (Adding features to GenericDecoder before decoding) </h2>
<hr>
<h3><b>Description</b></h3>
<ol>
    <li> Add filter </li> 
    <li> Set minimum number of childrens in asscotiation lists </li>
    <li> Use an initialized MAPhasher </li>
<ol>

## Add filter

In [9]:
GD = set_filterPath(GD, "data/all_gene_disease_associations_filter")

## Set minimum number of childrens in asscotiation lists

In [10]:
GD = set_minList(GD, 7)

### Use an initialized MAPhasher

In [11]:
GD = set_hashMethod(GD, hashMethod)

<hr style="height:5px">

<h2  align="center"> Example 5 (Concatenate two GenericDecoder objects) </h2>
<hr>
<h3><b>Description</b></h3>
<ol>
    <li>Initialize and decode a GenericDecoder object</li> 
    <li>Initialize and decode another GenericDecoder object</li><ol>

In [12]:
# Initialize and decode a GenericDecoder object
GD = kp.initialize_genericDecoder("data/", "DisGeNET")
GD = kp.decode(GD)

In [13]:
# Initialize another GenericDecoder object to be appended to GD
GD1 = kp.initialize_genericDecoder(GD, "data/", "consensusPathDB")
# Decode Concatenated GenericDecoder objects
GD = kp.decode(GD)

<h2  align="center"> Example 6 (Indexing a GenericDecoder object) </h2>
<hr>
<h3><b>Description</b></h3>
<ol>
    <li>Initialize and decode a GenericDecoder object.</li> 
    <li>Initialize kDataFramePHMAP object.</li> 
    <li>Index GenericDecoder object into kDataFramePHMAP object.</li> 
<ol>

In [14]:
# Initialize and decode a GenericDecoder object
GD = kp.initialize_genericDecoder("data/", "DisGeNET")
GD = set_filterPath(GD, "data/all_gene_disease_associations_filter")
GD = kp.decode(GD)

In [15]:
# Initialize kDataFramePHMAP object
kf = kp.kDataFramePHMAP(21)

In [16]:
# Perform indexing
cfk = kp.index(GD, "data/all_gene_disease_associations.tsv.names" ,KF)

# save coloredKDataFrame to disk
cfk.save("data/all_gene_disease_associations_indexed")