# Creating Networks from TSV Tutorial

In this tutorial, we will show you how to create an NDEx network using a TSV file and a loading plan file in json that we will refer to as Loading Plan. In addition, we will cover some important files like delim2cx.py. This tutorial requires Python 2.7.9 and the latest version of the PIP Python package manager for installation. For further details on installing and using the NDEx module, see the NDEx Client Tutorial.

## Importing Packages

First make sure that the NDEx and gspread modules are installed using the PIP Python package manager. 

## Important Documents

You need two documents to create a network that are covered in detail below. In this example, we will use two example documents: **idekerlab-1.txt and idekerlab-1-plan.json**. For your own purposes, you may change these documents. 

The TSV file, in this case idekerlab-1.txt, should be organized where information is separated by tabs. The first row contains the attribute titles, including the source node, source attributes, target node, target attributes, edge, and edge attributes. This format can then be converted to an excel file delimited by tabs. Below is a segment of idekerlab-1.txt as an example:

In [8]:
#The below code is not important, it is just for formatting
import pandas as pd
formatted = pd.read_csv('idekerlab-1.txt', sep='\t')
formatted[:10]

Unnamed: 0,BAIT_GENE_ID,BAIT_OFFICIAL_SYMBOL,PREY_GENE_ID,PREY_OFFICIAL_SYMBOL,EXPERIMENTAL_SYSTEM
0,1432,MAPK14,166,AES,Two-Hybrid
1,1432,MAPK14,1843,DUSP1,Two-Hybrid
2,5879,RAC1,5058,PAK1,Two-Hybrid
3,998,CDC42,396,ARHGDIA,Two-Hybrid
4,1432,MAPK14,2316,FLNA,Two-Hybrid
5,5058,PAK1,8874,ARHGEF7,Two-Hybrid
6,4802,NFYC,4800,NFYA,Two-Hybrid
7,998,CDC42,5058,PAK1,Two-Hybrid
8,3265,HRAS,5900,RALGDS,Two-Hybrid
9,5879,RAC1,5062,PAK2,Two-Hybrid


The loading plan file, in this case idekerlab-1-plan, is a json file that has to contain at least a source plan, a target plan, and a edge plan. Within each plan, clearly label what each column in the TSV file represents, whether it's a node attribute or node name. Below is a segment of idekerlab-1-plan.json as an example:

In [None]:
{
    "source_plan": 
    {
        "id_column": "BAIT_OFFICIAL_SYMBOL",
        "node_name_column": "BAIT_OFFICIAL_SYMBOL",
        "property_columns": ["BAIT_GENE_ID"]
    },
    "target_plan": 
    {
        "id_column": "PREY_OFFICIAL_SYMBOL",
        "node_name_column": "PREY_OFFICIAL_SYMBOL",
        "property_columns": ["PREY_GENE_ID"]
    },
    "edge_plan": 
    {
        "default_predicate": "binds to",
        "property_columns": ["EXPERIMENTAL_SYSTEM"]
    }
}

## delim2cx.py

The delim2cx module has two main classes: TSVLoadingPlan and TSV2CXConverter.

### TSVLoadingPlan

The primary purpose of TSVLoadingPlan is to check that the Loading Plan fulfills all requirements that are necessary to properly load the network. In order to accomplish that, this compares the Loading Plan to another json file called loading_plan_schema.json. Your plan must include the properties *source_plan, target_plan, and edge_plan*; the first two are similarly defined as *Nodemapping* which is clearly outlined in a previous section of the code. *Nodemapping*  requires at least an *id_column or node_name_column*, but it could also contain the properties *id_prefix and property_columns*. Layered within *property_columns* is also a requirement to at least contain the *column_name or attribute_name* and possible properties such as *value_prefix, data_type, and default_value*.

### TSV2CXConverter

If TSVLoadingPlan does not bring up any errors in the Loading Plan, a TSV2CXConverter object will be created from the TSV file and the Loading Plan.

This class has two main functions that work together to convert the TSV file: *check_header_vs_plan(), and convert_tsv_to_cx()*. *check_header_vs_plan()* makes sure that the columns and headers matches the Loading Plan that you loaded, and if it passes this check, *convert_tsv_to_cx()* will be run that goes down every row in the TSV file, creates any necessary nodes, creates edges between the specified nodes in the TSV file, and updates the network that you are creating.

## create_network_from_tsv.py

This tutorial uses a python script as an example for the delim2cx module that uploads the network onto the dev2.ndexbio website.  

### Parameters

In order to upload a network using this script, there are at least seven script parameters necessary: 
* username
* password
* server
* tsv
* plan
* name of network
* description of network

Optional parameters are:
* template id
* layout
* uuid of network to update

### Overview

Once the parameters are loaded into the script properly, it will first attempt to parse the loading plan and then parse the TSV file using the loading plan. The way that the script does this is outlined in more detail in the section titled delim2cx.py. If there is a specified template that you inputted into the script parameters, the script will load that template. Otherwise, it will go with a default template. Then, it will upload the network onto the server using your username and password. 

### Loading Networks

This section will explain the syntax required to run this program in Jupyter Notebook. Running this program from the command line is very similar.

### Jupyter Notebook

The specific syntax is:

In [2]:
%run ./create_network_from_tsv.py janedoe janedoepass dev2.ndexbio.org idekerlab-1.txt idekerlab-1-plan.json Title_of_Network "Description of network"

loading plan from: idekerlab-1-plan.json
parsing tsv file using loading plan ...
Done.


Note that after the file name, each parameter that is necessary for the upload follows and is separated from each other by spaces. 