# Infer network

Infer a network and plot with gene labels.

## More details
Plot networks inferred from datasets simulated from realistic network structures. Several datasets are available, from different organisms, different numbers of genes in the networks and different numbers of cells in the dataset. To infer a network from a different dataset, make sure the data file is in the same format as the simulated datasets, and change `dataset_name` to the relevant path.

For very large datasets (tens of thousands of cells, thousands of genes), network inference may take a long time. Benchmarking scripts are included to indicate how number of cells, number of genes and algorithm affect the time taken to infer a network.

The network inference algorithms rank all edges, between every possible pair of genes. In order to progress from a ranked list to a network, a threshold must be set, indicating what percentage of the highest-ranked edges to include.

In [None]:
# Include packages

using NetworkInference
using LightGraphs
using GraphPlot

In [None]:
# Customize the dataset, algorithm and percentage threshold (for plotting)

# Use these options for datasets generated from GeneNetWeaver...
# 50 or 100
number_of_genes = 50
# "ecoli1", "ecoli2", "yeast1", "yeast2" or "yeast3"
organism = "yeast1"
# "large", "medium", or "small"
dataset_size = "large"

# ...Or override dataset_name to point to your data file:
dataset_name = string("../simulated_datasets/", number_of_genes, "_", organism, "_", dataset_size, ".txt")

# Choose an algorithm
# PIDCNetworkInference(), PUCNetworkInference(), CLRNetworkInference() or MINetworkInference()
algorithm = PIDCNetworkInference()

# Keep the top x% highest-scoring edges
# 0.0 < threshold < 1.0
threshold = 0.15

In [None]:
# Get the genes and discretize the expression levels

@time genes = get_nodes(dataset_name);

# Troubleshooting: the default discretizer is "bayesian_blocks"
# If this doesn't work, try the "uniform_width" discretizer:
# @time genes = get_nodes(dataset_name, discretizer = "uniform_width");

In [None]:
# Infer the network

@time network = InferredNetwork(algorithm, genes);

In [None]:
# Get the adjacency matrix, then make a LightGraphs.SimpleGraphs.SimpleGraph

adjacency_matrix, labels_to_ids, ids_to_labels = get_adjacency_matrix(network, threshold)
graph = LightGraphs.SimpleGraphs.SimpleGraph(adjacency_matrix)

# Get the node labels, in order of index

number_of_nodes = size(adjacency_matrix)[1]
nodelabels = []
for i in 1 : number_of_nodes
    push!(nodelabels, ids_to_labels[i])
end 

In [None]:
# Plot the network at the given threshold

display(gplot(graph, nodelabel = nodelabels))