## Tutorial - VHIP Network Analysis
### This package can be used to visualize the output of VHIP 2.0. Options include prediction heatmaps, probability heatmaps, and network graphs. There are also functions to calculate statistics for the networks, such as centrality. The steps below describe how to achieve each of these.
### All of the functionalities are run on a small subset of the data (only 6 viruses and 3 hosts) to help users gain understanding, and then a larger subset is used at the end.

In [None]:
%reload_ext autoreload
%autoreload 2

#### Import libraries

In [1]:
from VirusHostNetworkAnalysis.prediction_matrix import PredictionMatrix
from VirusHostNetworkAnalysis.visualize import Graph

#### Create a matrix object by passing in the data file that is output by VHIP 2.0.
#### The data will be the result of running VHIP 2.0. If VHIP 2.0 was run correctly, there should be 7 columns in the dataset.
#### This tutorial will use the Aug4_predictions.tsv file from the sample input files folder.

In [None]:
subset = PredictionMatrix("Sample_Input/data_subset.tsv")

#### Function make_rectangular_matrix() can be called with argument "prediciton" to make a prediciton matrix or with argument "probability" to make the probability matrix. This tool has functions built into it that will transform the input data, where each row represents a virus:host pair, into a matrix where hosts are columns and viruses are rows. The resulting matrix will likley be very tall, as it is common for an area to have many more viruses than hosts.
#### The example below uses the prediction matrix.

In [None]:
subset.make_rectangular_matrix('prediction')

In [None]:
subset.make_rectangular_matrix('probability')

#### Because the number of viruses and hosts are inbalanced, it is possible to convert the matrix to a square by expanding the rows and columns. To do this, viruses are added as columns and hosts are added as rows, and it is assumed for the purpose of this project that virus-virus and host-host interactions are non-existent. Although these relationships might exist in the wild, this project is only looking into virus-host interactions. The image below shows how the matrix is expanded.

![image.png](attachment:image.png)

#### Below, the code shows how to transform your matrix into a square using the function. Here we used "prediction", but as with the rectangle matrix, it is possible to fill it with probabilities. 

In [None]:
subset.make_square_matrix('prediction')

#### Now, a heatmap can be created from the rectangular matrix. Heatmaps are useful for this virus-host data, because they can tell us more information about the infection relationships in a community. Our tool sorts the columns and rows, so that patterns in the matrices are easier to observe. The image below shows examples of patterns that we might see in a virus-host matrix for a particular population.

![image.png](attachment:image.png)

#### The plot_heatmap function takes one argument, where 'prediction' or 'probability' can be passed.

In [None]:
subset.plot_heatmap('prediction')


In [None]:
print(subset.virus_host_array)
# get the second column of the virus-host array
subset.virus_host_array[0:, 1]


#### This tool can also be used to build networks for the predicted virus-host pairs. First, we will show the actual graph for the data.

In [None]:
subset_graph = Graph(subset.virus_host_array, subset.rows, subset.columns)
subset_graph.draw_graph(False)

#### Next, we will create a random graph for the subset data using an Erdős–Rényi (ER) model. The ER model will choose virus-host pairs, choose a random p, and if that p is over a pre-set cutoff, an edge will be drawn between the two. The purpose of this random graoh is to serve as a null model for the actual network

In [None]:
from VirusHostNetworkAnalysis.null_model import ER
subset_er = ER(len(subset.rows), len(subset.columns), 0.1).fill_ER_graph()
subset_rand_graph = Graph(subset_er, subset.rows, subset.columns)
subset_rand_graph.draw_graph(False)

#### Finally, a configuration model is constructed and display for the subset data.

In [None]:
from VirusHostNetworkAnalysis.null_model import CM
subset_config = CM(subset.virus_host_array)
subset_config.bootstrap_stats(1000)
subset_plot_config = Graph(subset_config.matrix_vhip, subset.rows, subset.columns)
subset_plot_config.draw_graph(False)

### Now, we will show all of the above steps on a larger subset of the total data.

In [None]:
aug4_data = PredictionMatrix("Sample_Input/Aug4_predictions.tsv")
aug4_data.make_rectangular_matrix('prediction')
aug4_data.make_square_matrix('prediction')
aug4_data.plot_heatmap('prediction')

In [None]:
aug4_data.make_rectangular_matrix('probability')
aug4_data.plot_heatmap('probability')

In [None]:
from VirusHostNetworkAnalysis.visualize import Graph
ER_matrix = ER(len(aug4_data.rows), len(aug4_data.columns), 0.5).fill_ER_graph()
rand_graph = Graph(ER_matrix, aug4_data.rows, aug4_data.columns)
rand_graph.draw_graph(False)

### Centrality measurements

In [None]:
subset_config = CM(subset.virus_host_array)
subset_config.bootstrap_stats(1000)
subset_plot_config = Graph(subset_config.matrix_vhip, subset.rows, subset.columns)
subset_plot_config.initialize_graph()
subset_plot_config.draw_graph(True)
subset_plot_config.calculate_centrality(100)
subset_plot_config.plot_eigenvectors()

In [2]:
from VirusHostNetworkAnalysis.prediction_matrix import Calculations
test_matrix = [[1, 0, 1, 1, 1], [1, 1, 1, 0, 0], [0, 1, 1, 1, 0], [1, 1, 0, 0, 0], [1, 1, 0, 0, 0]]
cal = Calculations(test_matrix, True)
cal.nestedness_rows(cal.pairs(0))
cal.nestedness_cols(cal.pairs(1))

[[1 0 1 1 1]
 [1 1 1 0 0]] [[1 0 1 1 1]
 [0 1 1 1 0]]
[[1 0 1 1 1]
 [1 1 1 0 0]]


ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In [3]:
#cal.run_parallel(8)
print(cal.run_parallel(4))

[1 0 1 1 1] [0 1 1 1 0]
[1 0 1 1 1]
compare ran
[1 1 1 0 0] [0 1 1 1 0]
[1 0 1 1 1] [1 1 1 0 0]
compare ran
[1 1 0 0 0]
[1 0 1 1 1]
[1 1 1 0 0] compare ran
[1 1 0 0 0]
[1 1 1 0 0]
compare ran
[1 0 1 1 1] [1 1 1 0 0][1 1 1 0 0]
 [1 1 0 0 0]
[1 0 1 1 1]
compare ran
[1 1 1 0 0]
compare ran
[0 1 1 1 0] [1 1 0 0 0]
[0 1 1 1 0]
[1 1 0 0 0] compare ran
[1 1 0 0 0]
[1 1 0 0 0][0 1 1 1 0]
 compare ran
[1 1 0 0 0]
[0 1 1 1 0]
compare ran
[1 0 1 1 1] [1 1 0 0 0]
[1 0 1 1 1]
compare ran
compare ran
compare ran
compare ran
compare ran
compare ran
compare ran
compare ran
compare ran
compare ran
compare ran
53.33333333333333 63.33333333333333
58.33333333333333
