# Goals

1. Read gene co-expression data from csv
2. Process data
3. Load into `neo4j` graph

## Imports and Parameters

This notebook uses `py2neo` to interface with `neo4j`.

In [1]:
import pandas as pd
import numpy as np
from py2neo import Graph, Node, Relationship

data_path = "~/Desktop/toy.csv"
neo_gene_class = "Gene"
neo_connection = "TO"
neo_url = "http://localhost:7474/db/data/"

## Read

Data is read using `pandas`.

In [2]:
df = pd.read_csv(data_path)
df.head()

Unnamed: 0,gene_a,gene_b,correlation
0,M,t,0.763
1,k,n,0.956
2,r,x,0.248
3,I,q,0.97
4,H,a,0.269


## Process

Genes are identified and `py2neo` nodes are instantiated.

In [3]:
c1, c2, c3 = df.columns

In [4]:
genes = set(df[c1]) | set(df[c2])

gene_map = {}
for gene in genes:
    gene_map[gene] = Node(neo_gene_class, name=gene)

len(genes)

38

Relationships are identified and `py2neo` relationships are instantiated.

In [5]:
rels = []
for gene_a, gene_b, correlation in np.array(df):
    rel = Relationship(gene_map[gene_a], neo_connection, gene_map[gene_b], correlation=correlation)
    rels.append(rel)
    
len(rels)

138

## Load

All data are loaded to the given `neo4j` graph.

In [6]:
graph = Graph(neo_url)

for node in gene_map.values():
    graph.create(node)
    
for rel in rels:
    graph.create(rel)