The unsupervised biclustering strategy works both in interaction data and expression data. Initially, it converts the expression data into binary data using mixture of left truncated Gaussian distribution model (LTMG) and find the biclusters using novel encoding and template searching strategy and finally generates the biclusters in two modes base and flex. In base mode RUBic generates maximal biclusters (green borders) and in flex mode results less and biological significant clusters (red bordered). Coloured cell box within the clusters indicates the selected row and column positions.
If you have used RUBic in your research, please kindly cite the following publications:
Sriwastava, B.K., Halder, A.K., Basu, S., Chakraborti, T. RUBic: rapid unsupervised biclustering. BMC Bioinformatics 24, 435 (2023). DOI https://doi.org/10.1186/s12859-023-05534-3
The data directory contains the Dummy data and 5 expression datasets and a PPI data matrix.
The Dummy data includes two files,
a) SBMat.txt: sample input binary data,
b) resultRB.txt: corresponding output file by generated by RUBIC on dummy input data.
Five different experimental datasets along with a PPI are also included in the data directory
a) Expression+KEGG : contains expression matrix, binary matrix and kegg annotation for each of 4 sets (ecoli_colombos, ecoli_dream5, yeast_dream5 and yeast_gpl2529)
b) Match_score_csv
c) raw datset_CNS
d) Performance_test_csv
e) Match_score_density_200x200_csv and
f) PPI
The RUBIC directory contains biclustering scripts(RUBIC.c), installation scripts (P1-installandCompile.sh,P2-runwargs.sh) and jupyter notebook file(RUBIC-Result-Analysis.ipynb) with auxilary python scripts (load_matrix_data.py, ParseCluster.py, plotHeatmap.py).
In any linux enviournment open a terminal and execute the following commands:
Navigate to the directory RUBIC
chmod +x P1-installandCompile.sh
P1-installandCompile.sh RUBIC.c
chmod +x P2-runwargs.sh
P2-runwargs.sh RUBIC inputdata.txt output.txt 2 2 1
-
GCC compiler and/or C++ 11 compatible compiler
-
Result Processing and visualisation :
a) python >= 3
b) seaborn
The input to RUBic is in two formats:
- Binary matrix [Interaction data: eg. Protein-protein interactions, Drug-Drug interaction)]
- non Binary matrix [Gene expression data of m rows (Genes) and n column (conditions)]
The data file should be comma delimited. A sample data format is given in RUBIC directory.
Step 1 : Compile the RUBIC.c file with GCC compiler. : RUBIC.o
Step 2 : Convert the expression data into binary matrix if not binary matrix.
Step 3 : Keep the input file in the same directory <Example: inputdata.txt>
Step 4 : Execute RUBIC wit the command:
./RUBIC.o <inputfile> <outputfile> <mnc> <mnr> <threshold>
inputfile: input file name
outputfile: output file name
mnc: minimum no. of column
mnr: minimum no. of row
threshold: for binary 1.
To visualise the result we have created a python notebook. You can find the details at (https://github.com/CMATERJU-BIOINFO/RUBic/blob/main/RUBIC/RUBIC-Result-Analysis.ipynb). The Jupyter file first demostrates the input binary matrix with visual representation and marks the row-col positions of identified bi-clusters from RUBic on dummy data. In the later section of the jupyter file, describe the figure preparation. Expression level Heatmaps and plots.