# Load /Update Data in neo4j graph database
If first time loading, make sure to create constraints and indexes, and load in the following sequence
- GO
- ChEBI
- Mesh
- Enzyme
- NCBI Taxonomy
- NCBI Gene
- BioCyc
- RegulonDB
- UniProt
- StringDB
- Kegg
- PubMed
- Literature

In [4]:
import os, sys
from neo4j import GraphDatabase
import pandas as pd
import importlib

module_dir = os.getcwd().replace("notebook", "src")
if module_dir not in sys.path:
    sys.path.append(module_dir)

In [2]:
from common.database import *

#### Database connection
Belong are the code for presetting database connection (the second parameter is databaseName): 

```
local_database = get_database(Neo4jInstance.LOCAL, 'neo4j')
dtu_database = get_database(Neo4jInstance.DTU, 'lifelike')
google_stg_database = get_database(Neo4jInstance.GOOGLE_STG, 'neo4j')
google_prod_database = get_database(Neo4jInstance.GOOGLE_PROD, 'neo4j')
```

#### Environmental variable
set BASE_DATA_DIR to be the parent directory of download and processed files. The structure is as follows: 
```
export BASE_DATA_DIR = {your_data_dir}
```

- BASE_DATA_DIR
    - download
        - biocyc
        - gene
        - taxonomy
        - uniprot
        - kegg
        - stringdb
    - processed
        - biocyc
        - gene
        - taxonomy
        - uniprot
        - kegg
        - stringdb
        




# Load/Update GO (Gene Ontology)
download  http://current.geneontology.org/ontology/go.obo {your base data dir}/download/go folder
```
curl -o $DOWNLOAD_DIR/go/go.obo http://current.geneontology.org/ontology/go.obo
```

#### Load go data  into neo4j

In [6]:
from go.go_parser import GoOboParser
database = get_database(Neo4jInstance.LOCAL, 'neo4j')
parser = GoOboParser()
parser.create_indexes(database)
parser.load_data_to_neo4j(database)
database.close()

#### Update go data

In [8]:
from go.go_parser import GoOboParser
database = get_database(Neo4jInstance.LOCAL, 'neo4j')
parser = GoOboParser()
parser.load_data_to_neo4j(database)
database.close()

## Load/Update NCBI Taxonomy
url: https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump
download new_taxdump.zip file to {your base dir}/download/taxonomy/, then unzip.  
The unzipped files (*.dmp) will be at {your base dir}/download/taxonomy/new_taxdump
    

## Load/Update NCBI Genes
Download ncbi genes from ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/.  Parse and load data from gene_info file, and gene2go. 

#####  Run the following python script

In [4]:
from ncbi.ncbi_gene_parser import GeneParser
# Make sure use the right database connection
database = get_database(Neo4jInstance.LOCAL, 'neo4j')
parser = GeneParser()
parser.load_data_to_neo4j(database)
database.close()