# Creating Networks from TSV Tutorial

In this tutorial, we will show you how to create an NDEx network using a TSV file and a json plan file. In addition, we will cover some important files like delim2cx.py. This tutorial requires Python 2.7.9 and the latest version of the PIP Python package manager for installation. For further details on installing and using the NDEx module, see the NDEx Client Tutorial.

## Importing Packages

First make sure that the NDEx and gspread modules are installed using the PIP Python package manager. 

## Important Documents

You need two documents to create a network that are covered in detail below. In this example, we will use two example documents: **idekerlab-1.txt and idekerlab-1-plan.json**. For your own purposes, you may change these documents. 

The TSV file, in this case idekerlab-1.txt, should be organized where information is separated by tabs. The first row contains the attribute titles, including the source node, source attributes, target node, target attributes, edge, and edge attributes. This format can then be converted to an excel file delimited by tabs. Below is a segment of idekerlab-1.txt as an example:

In [8]:
#The below code is not important, it is just for formatting
import pandas as pd
formatted = pd.read_csv('idekerlab-1.txt', sep='\t')
formatted[:10]

Unnamed: 0,BAIT_GENE_ID,BAIT_OFFICIAL_SYMBOL,PREY_GENE_ID,PREY_OFFICIAL_SYMBOL,EXPERIMENTAL_SYSTEM
0,1432,MAPK14,166,AES,Two-Hybrid
1,1432,MAPK14,1843,DUSP1,Two-Hybrid
2,5879,RAC1,5058,PAK1,Two-Hybrid
3,998,CDC42,396,ARHGDIA,Two-Hybrid
4,1432,MAPK14,2316,FLNA,Two-Hybrid
5,5058,PAK1,8874,ARHGEF7,Two-Hybrid
6,4802,NFYC,4800,NFYA,Two-Hybrid
7,998,CDC42,5058,PAK1,Two-Hybrid
8,3265,HRAS,5900,RALGDS,Two-Hybrid
9,5879,RAC1,5062,PAK2,Two-Hybrid


The plan file, in this case idekerlab-1-plan, is a json file that has to contain at least a source plan, a target plan, and a edge plan in that order. Within each plan, clearly label what each column in the TSV file represents, whether it's a node attribute or node name. Below is a segment of idekerlab-1-plan.json as an example:

In [None]:
{
    "source_plan": 
    {
        "id_column": "BAIT_OFFICIAL_SYMBOL",
        "node_name_column": "BAIT_OFFICIAL_SYMBOL",
        "property_columns": ["BAIT_GENE_ID"]
    },
    "target_plan": 
    {
        "id_column": "PREY_OFFICIAL_SYMBOL",
        "node_name_column": "PREY_OFFICIAL_SYMBOL",
        "property_columns": ["PREY_GENE_ID"]
    },
    "edge_plan": 
    {
        "default_predicate": "binds to",
        "property_columns": ["EXPERIMENTAL_SYSTEM"]
    }
}

## delim2cx.py

There are two main classes that build up this file: TSVLoadingPlan and TSV2CXConverter.

### TSVLoadingPlan

The primary purpose of TSVLoadingPlan is to check that the json plan file you use fulfills all requirements that are necessary to properly load the network. In order to accomplish that, this compares the json plan file to another json file called loading_plan_schema.json. Your plan must include the properties *source_plan, target_plan, and edge_plan*, all three of which are defined as *Nodemapping* which is clearly outlined in a previous section of the code. *Nodemapping* itself requires at least an *id_column or node_name_column*, but it could also contain the properties *id_prefix and property_columns*. Layered within *property_columns* is also a requirement to at least contain the *column_name or attribute_name* and possible properties such as *value_prefix, data_type, and default_value*.

### TSV2CXConverter

This class also has two main sections that work together to convert the TSV file: *check_header_vs_plan() and convert_tsv_to_cx() or convert_google_worksheet_to_cx()*.

#### Checker

*check_header_vs_plan()* uses three main functions: *check_column, check_plan_property_columns, and check_property_columns* *check_column* confirms that each header in the TSV file contains the required column title as listed in the json plan file. *check_property_columns* also ensures that the properties are all fulfilled and there are no issues in formatting. *check_plan_property_columns* utilizes both of these functions to make sure that the TSV file matches the json plan file.

#### Converters

For both converters they use *process_row()*, which goes through each row of the TSV file and puts it in the network. It uses the functions *create_node(), create_attr_obj(), and create_edge()*. *create_node()* uses the node_plan from the json plan file as well as the function *create_attr_obj()* to create a node if it hasn't already been created, and *create_edge()* links two nodes together based on the rows in the json plan file.

## Parameters

In order to upload a network, there are at least seven script parameters necessary: username, password, server, tsv, plan, name of network, description of network. Optional parameters are template id, layout, uuid of network to update.

## Loading Networks

This section will explain the syntax required to run this program. We will show two different methods to accomplish this: using Jupyter Notebook and PyCharm. For both methods, it is important to remember to include all necessary files in the same folder, or at least make sure the program is directed to the correct folders while importing or running and make any changes necessary to the code. 

### Jupyter Notebook

The specific syntax is:

In [2]:
%run ./create_network_from_tsv.py janedoe janedoepass dev2.ndexbio.org idekerlab-1.txt idekerlab-1-plan.json Title_of_Network "Description of network"

loading plan from: idekerlab-1-plan.json
parsing tsv file using loading plan ...
Done.


Note that after the file name, each parameter that is necessary for the upload follows and is separated from each other by spaces. 