# Creating Networks from TSV Tutorial

In this tutorial, we will show you how to create an NDEx network using a TSV file and a loading plan file in json that we will refer to as Loading Plan. In addition, we will cover some important files like delim2cx.py.

## Requirements

This tutorial requires:

* Python 2.7.9
* The latest version of the PIP Python package manager
* These packages for TSV Loader:
    * ndex
    * gspread
* Packages specific to this tutorial:
    * pandas

In addition to these programs and packages, this tutorial requires an account on one of the NDEx servers. In this tutorial, we use the dev2.ndex server with an account called "Jane Doe". Jane Doe uses the us
    

## Documents Used in this Tutorial

You need two documents to create a network that are covered in detail below. In this example, we will use two example documents: **idekerlab-1.txt and idekerlab-1-plan.json**. For your own purposes, you may change these documents. 

The TSV file, in this case idekerlab-1.txt, should be organized where information is separated by tabs. The first row contains the attribute titles, including the source node, source attributes, target node, target attributes, edge, and edge attributes. This format can then be converted to an excel file delimited by tabs. Below is a segment of idekerlab-1.txt as an example:

In [24]:
#Formatting delimited file as table
import pandas as pd
formatted = pd.read_csv('idekerModified2.txt', sep='\t')
formatted[:10]

Unnamed: 0,BAIT_GENE_ID,BAIT_OFFICIAL_SYMBOL,PREY_GENE_ID,PREY_OFFICIAL_SYMBOL,EXPERIMENTAL_SYSTEM,Float,Boolean,Char,Integer,ListString,ListFloat
0,1432,MAPK14,166,AES,Two-Hybrid,1,True,a,1,"[""Hi"", ""Bye""]","[1.2, 32.4]"
1,1432,MAPK14,1843,DUSP1,Two-Hybrid,1,False,b,2,"[""Hi"", ""Bye""]","[1.2, 32.4]"
2,5879,RAC1,5058,PAK1,Two-Hybrid,1,True,c,3,"[""Hi"", ""Bye""]","[1.2, 32.4]"
3,998,CDC42,396,ARHGDIA,Two-Hybrid,1,False,a,4,"[""Hi"", ""Bye""]","[1.2, 32.4]"
4,1432,MAPK14,2316,FLNA,Two-Hybrid,1,True,b,5,"[""Hi"", ""Bye""]","[1.2, 32.4]"
5,5058,PAK1,8874,ARHGEF7,Two-Hybrid,1,False,c,6,"[""Hi"", ""Bye""]","[1.2, 32.4]"
6,4802,NFYC,4800,NFYA,Two-Hybrid,1,True,a,7,"[""Hi"", ""Bye""]","[1.2, 32.4]"
7,998,CDC42,5058,PAK1,Two-Hybrid,1,False,b,8,"[""Hi"", ""Bye""]","[1.2, 32.4]"
8,3265,HRAS,5900,RALGDS,Two-Hybrid,1,True,c,9,"[""Hi"", ""Bye""]","[1.2, 32.4]"
9,5879,RAC1,5062,PAK2,Two-Hybrid,1,False,a,0,"[""Hi"", ""Bye""]","[1.2, 32.4]"


The loading plan file, in this case idekerlab-1-plan, is a json file that has to contain at least a source plan, a target plan, and a edge plan. These plans specify the interpretation of each column in the TSV file. Columns without a specified interpretation will be ignored.

Your plan must include the properties *source_plan, target_plan, and edge_plan*.*Nodemapping*  requires at least an *id_column or node_name_column*, but it could also contain the properties *id_prefix and property_columns*. Layered within *property_columns* is also a requirement to at least contain the *column_name or attribute_name* and possible properties such as *value_prefix, data_type, and default_value*.

Source and Target Plan:
* **id_column**: unique identifier for nodes in the network
* **node_name_column**: name for the node, if no identifier is specified, the name will be used as the identifier
* **property_columns**: column names to map to the specified node. Unspecified columns will be ignored.

Edge Plan:
* **default_predicate**: specifies the predicate (edge type) for edges unless explicity specified
* **predicate**: specifies the predicate (edge type) for the specified edge
* **property_columns**: column names to map to the specified edge. Unspecified columns will be ignored.

Below is idekerlab-1-plan.json as an example:

In [None]:
{
    "context": {
            "genecards": "http://www.genecards.org/cgi-bin/carddisp.pl?gene=",
            "kegg" : "http://identifiers.org/keggpathway/",
            "GO" : "http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:"
        },
    "source_plan": 
    {
        "id_prefix":"genecards",
        "id_column": "BAIT_OFFICIAL_SYMBOL",
        "node_name_column": "BAIT_OFFICIAL_SYMBOL",
        "property_columns": ["BAIT_GENE_ID", {"attribute_name":"molecule_type", "default_value": "unknown"}]
    },
    "target_plan": 
    {
        "id_prefix":"genecards",
        "id_column": "PREY_OFFICIAL_SYMBOL",
        "node_name_column": "PREY_OFFICIAL_SYMBOL",
        "property_columns": ["PREY_GENE_ID", {"attribute_name":"molecule_type", "default_value": "unknown"}]
    },
    "edge_plan": 
    {
        "default_predicate": "binds to",
        "property_columns": ["EXPERIMENTAL_SYSTEM", "Float::float", "Boolean::boolean", "Char::char", "Integer::integer", "ListString::list_of_string", "ListFloat::list_of_float"]
    }
}

## delim2cx.py

The delim2cx module has two main classes: TSVLoadingPlan and TSV2CXConverter.

### TSVLoadingPlan

The TSVLoadingPlan object contains the structure loaded from the loading plan file and checks it vs a json schema (from the internal file loading_plan_schema.json) to verify that it fulfills all requirements. 

### TSV2CXConverter

A TSV2CXConverter object is then created from the TSV file and the TSVLoadingPlan object. The TSV file is processed as the TSV2CXConverter is instantiated.

In [16]:
import delim2cx as d2c
import ndex.client as nc

import ndex.beta.toolbox as toolbox
import ndex.beta.layouts as layouts
import ndex.networkn as networkn
import requests

In [17]:
my_ndex = nc.Ndex("http://" + "dev2.ndexbio.org", "janedoe", "janedoepass")

In [18]:
loading_plan_name = "idekerlab-1-plan-modified.json"
print "loading plan from: " + loading_plan_name
import_plan = d2c.TSVLoadingPlan(loading_plan_name)

loading plan from: idekerlab-1-plan-modified.json


In [19]:
print "parsing tsv file using loading plan ..."
tsv_converter = d2c.TSV2CXConverter(import_plan)

parsing tsv file using loading plan ...


In [20]:
response = my_ndex.get_network_as_cx_stream("2b06a9e9-6724-11e7-8945-0660b7976219")
template_cx = response.json()
template_network = networkn.NdexGraph(template_cx)




In [21]:
tsv_network = tsv_converter.convert_tsv_to_cx("idekerModified2.txt", name="TestName", description = "My description")
toolbox.apply_network_as_template(tsv_network, template_network)

In [22]:
if "df_simple" == "df_simple":
    layouts.apply_directed_flow_layout(tsv_network)

14 disconnected subgraphs: adding centerpoint attractor with edges to one of the least connected nodes in each subgraph


In [23]:

my_ndex.save_cx_stream_as_new_network(tsv_network.to_cx_stream())

u'http://dev2.ndexbio.org/v2/network/f7ba139d-68ac-11e7-8ac9-0660b7976219'

## create_network_from_tsv.py

This tutorial uses a python script as an example for the delim2cx module that uploads the network onto the dev2.ndexbio website.  

### Parameters

In order to upload a network using this script, there are at least seven script parameters necessary: 
* username
* password
* server
* tsv
* plan
* name of network
* description of network

Optional parameters are:
* template id
* layout
* uuid of network to update

### Overview

Once the parameters are loaded into the script properly, it will first attempt to parse the loading plan and then parse the TSV file using the loading plan. The way that the script does this is outlined in more detail in the section titled delim2cx.py. If there is a specified template that you inputted into the script parameters, the script will load that template. Otherwise, it will go with a default template. Then, it will upload the network onto the server using your username and password. 

### Loading Networks

This section will explain the syntax required to run this program in Jupyter Notebook. Running this program from the command line is very similar.

### Jupyter Notebook

The specific syntax is:

In [None]:
%run ./create_network_from_tsv.py janedoe janedoepass dev2.ndexbio.org idekerlab-1.txt idekerlab-1-plan.json Title_of_Network "Description of network"

Note that after the file name, each parameter that is necessary for the upload follows and is separated from each other by spaces. 