## NeoSchema library - Tutorial 2 : set up a simple Schema (Classes, Properties) 
## and perform a data import (Data Nodes)

#### CAUTION: running this tutorial will clear out the database

#### [Article](https://julianspolymathexplorations.blogspot.com/2022/11/schema-graph-databases-neo4j.html) to accompany this tutorial

In [1]:
import set_path      # Importing this module will add the project's home directory to sys.path

Added 'D:\Docs\- MY CODE\Brain Annex\BA-Win7' to sys.path


In [2]:
import os
import sys
import getpass
import pandas as pd

from neoaccess import NeoAccess
from BrainAnnex.modules.neo_schema.neo_schema import NeoSchema

# Connect to the database, 
#### using the `NeoAccess` library
#### You can use a free local install of the Neo4j database, or a remote one on a virtual machine under your control, or a hosted solution, or simply the FREE "Sandbox" : [instructions here](https://julianspolymathexplorations.blogspot.com/2023/03/neo4j-sandbox-tutorial-cypher.html)
NOTE: This tutorial is tested on version 4 of the Neo4j database, but will probably also work on the new version 5

In [3]:
# Save your credentials here (and skip the next cell!) - or use the prompts given by the next cell
host = ""
password = ""

In [4]:
db = NeoAccess(host=host,
               credentials=("neo4j", password), debug=False)   # Notice the debug option being OFF

Connection to Neo4j database established.


In [5]:
print("Version of the Neo4j driver: ", db.version())

Version of the Neo4j driver:  4.4.11


# Import of data from a Pandas data frame using the `NeoSchema` library

### Initial setup

In [12]:
#db.empty_dbase()       # ******  WARNING: USE WITH CAUTION!!!  ******

In [13]:
NeoSchema.set_database(db)

### Create the Schema

In [14]:
# Create an "Images" Class node - together with its Properties

NeoSchema.create_class_with_properties(name="Images", property_list=["uri", "basename", "suffix", "caption", "width", "height", "date_created"])

(20, 1)

In [15]:
df = pd.read_csv('D:\Docs\Brain Annex\- DATA TRANSFER IP\BA_1_i_images_old_neuro_notes.csv', encoding = "ISO-8859-1")
df

Unnamed: 0,uri,basename,suffix,caption,date_created
0,i-12,old-neuro-notes-1,jpg,,2015-09-07 01:31:07
1,i-13,old-neuro-notes-2,jpg,,2015-09-07 01:41:32
2,i-14,old-neuro-notes-3,jpg,,2015-09-07 01:41:32
3,i-15,old-neuro-notes-4,jpg,,2015-09-07 02:00:23
4,i-16,old-neuro-notes-5,jpg,,2015-09-07 02:00:23
5,i-17,old-neuro-notes-6,jpg,,2015-09-07 02:01:24
6,i-18,old-neuro-notes-7,jpg,,2015-09-07 02:01:24
7,i-19,old-neuro-notes-8,jpg,,2015-09-07 02:01:24
8,i-20,old-neuro-notes-9,jpg,,2015-09-07 02:01:25
9,i-21,old-neuro-notes-9b,jpg,,2015-09-07 02:01:25


In [20]:
new_ids = NeoSchema.import_pandas_nodes(df=df, class_node="Images", extra_labels = "BA", schema_code="i", datetime_cols="date_created")
len(new_ids)

37

In [16]:
df_2 = pd.read_csv('D:\Docs\Brain Annex\- DATA TRANSFER IP\BA_1_i_images.csv', encoding = "ISO-8859-1")
df_2

Unnamed: 0,uri,basename,suffix,caption,width,height,date_created
0,i-1,1-Duck Capture Hot Keys,png,Duck Capture Hot Keys,,,2015-07-10 00:00:00
1,i-10,10-49,jpg,Use this with Outlook2007 & Gmail when non-Gma...,,,2015-08-28 01:30:13
2,i-100,100-LunarPages,JPG,LunarPages,,,2016-08-03 00:00:31
3,i-1000,1000-Blow-up,PNG,"Blow-up, in spite of reasonable-sounding learn...",262.0,858.0,2018-11-05 21:41:45
4,i-1001,1001-y_predicted with a zeo value,PNG,The root cause of the problem: y_predicted wit...,662.0,371.0,2018-11-05 21:41:46
...,...,...,...,...,...,...,...
5263,i-994,994-width 1 is a disaster zone,PNG,width 1 is a disaster zone,1269.0,622.0,2018-11-03 21:42:48
5264,i-995,995-width 2 gives mediocre results,PNG,width 2 gives mediocre results,1255.0,611.0,2018-11-03 21:42:48
5265,i-996,996-widths 3-100 all pretty good,PNG,widths 3-100 all pretty good,1211.0,1015.0,2018-11-03 21:42:53
5266,i-997,997-Different types of non-relational databases,png,Different types of non-relational databases an...,1946.0,1020.0,2018-11-05 12:22:40


In [17]:
df_2_small = df_2[0:10]
df_2_small

Unnamed: 0,uri,basename,suffix,caption,width,height,date_created
0,i-1,1-Duck Capture Hot Keys,png,Duck Capture Hot Keys,,,2015-07-10 00:00:00
1,i-10,10-49,jpg,Use this with Outlook2007 & Gmail when non-Gma...,,,2015-08-28 01:30:13
2,i-100,100-LunarPages,JPG,LunarPages,,,2016-08-03 00:00:31
3,i-1000,1000-Blow-up,PNG,"Blow-up, in spite of reasonable-sounding learn...",262.0,858.0,2018-11-05 21:41:45
4,i-1001,1001-y_predicted with a zeo value,PNG,The root cause of the problem: y_predicted wit...,662.0,371.0,2018-11-05 21:41:46
5,i-1002,1002-Tiny corrective quantities,PNG,Tiny corrective quantities,1262.0,1009.0,2018-11-05 21:41:48
6,i-1004,1004-PICC server architecture,svg,PICC server architecture,,,2018-11-06 16:53:03
7,i-1005,1005-PICC workflow,svg,PICC workflow,,,2018-11-06 22:33:22
8,i-1006,1006-Exploring various initial conditions,PNG,Exploring various initial conditions,156.0,303.0,2018-11-06 23:59:10
9,i-1007,1007-Mediocre local minimum,PNG,Mediocre local minimum,1250.0,614.0,2018-11-06 23:59:12


In [19]:
new_ids_2 = NeoSchema.import_pandas_nodes(df=df_2, class_node="Images", extra_labels = "BA", schema_code="i", 
                                          datetime_cols="date_created", int_cols=["width", "height"])
len(new_ids_2)

5268

In [23]:
df_rel = pd.read_csv('D:\Docs\Brain Annex\- DATA TRANSFER IP\BA_images_cat_.csv', encoding = "ISO-8859-1")
df_rel

Unnamed: 0,item_uri,old_ba_id,pos
0,i-1,54,20
1,i-2,52,30
2,i-3,80,20
3,i-4,100,10
4,i-6,107,20
...,...,...,...
5478,i-6145,1055,-40
5479,i-6146,1055,-50
5480,i-6147,1085,20
5481,i-6148,67,308


In [24]:
NeoSchema.create_class_with_properties(name="Categories" , property_list=["name", "old_ba_id"])

(5316, 9)

In [25]:
NeoSchema.create_data_node(class_node="Categories", properties={'name': 'Duck Capture', 'old_ba_id': 54})

5319

In [26]:
NeoSchema.create_data_node(class_node="Categories", properties={'name': 'RAID', 'old_ba_id': 52})

5320

In [27]:
NeoSchema.create_data_node(class_node="Categories", properties={'name': 'Dreamweaver', 'old_ba_id': 80})

5321

In [28]:
new_link = NeoSchema.import_pandas_links(df=df_rel, class_from="Images", class_to="Categories",
                             col_from="item_uri", col_to="old_ba_id",                            
                             link_name="BA_in_category", col_link_props="pos",
                             name_map={"item_uri": "uri"}, skip_errors=True)

import_pandas_links(): failed to create a new relationship for Pandas row: {'item_uri': 'i-4', 'old_ba_id': 100, 'pos': 10}
import_pandas_links(): failed to create a new relationship for Pandas row: {'item_uri': 'i-6', 'old_ba_id': 107, 'pos': 20}
import_pandas_links(): failed to create a new relationship for Pandas row: {'item_uri': 'i-7', 'old_ba_id': 49, 'pos': 90}
import_pandas_links(): failed to create a new relationship for Pandas row: {'item_uri': 'i-10', 'old_ba_id': 49, 'pos': 100}
import_pandas_links(): failed to create a new relationship for Pandas row: {'item_uri': 'i-11', 'old_ba_id': 49, 'pos': 110}
import_pandas_links(): failed to create a new relationship for Pandas row: {'item_uri': 'i-12', 'old_ba_id': 185, 'pos': 10}
import_pandas_links(): failed to create a new relationship for Pandas row: {'item_uri': 'i-13', 'old_ba_id': 185, 'pos': 20}
import_pandas_links(): failed to create a new relationship for Pandas row: {'item_uri': 'i-14', 'old_ba_id': 185, 'pos': 30}
impo

In [29]:
len(new_link)

11