## NeoSchema library - Tutorial 2 : set up a simple Schema (Classes, Properties) 
## and perform a data import (Data Nodes)

#### CAUTION: running this tutorial will clear out the database

#### [Article](https://julianspolymathexplorations.blogspot.com/2022/11/schema-graph-databases-neo4j.html) to accompany this tutorial

In [1]:
import set_path      # Importing this module will add the project's home directory to sys.path

Added 'D:\Docs\- MY CODE\Brain Annex\BA-Win7' to sys.path


In [2]:
import os
import sys
import getpass
import pandas as pd

from datetime import datetime 
from neo4j.time import DateTime

from neoaccess import NeoAccess
from BrainAnnex.modules.neo_schema.neo_schema import NeoSchema

# Connect to the database, 
#### using the `NeoAccess` library
#### You can use a free local install of the Neo4j database, or a remote one on a virtual machine under your control, or a hosted solution, or simply the FREE "Sandbox" : [instructions here](https://julianspolymathexplorations.blogspot.com/2023/03/neo4j-sandbox-tutorial-cypher.html)
NOTE: This tutorial is tested on version 4 of the Neo4j database, but will probably also work on the new version 5

In [3]:
# Save your credentials here (and skip the next cell!) - or use the prompts given by the next cell
host = "bolt://YOUR_OWN"
password = "YOUR_PASS"

In [4]:
db = NeoAccess(host=host,
               credentials=("neo4j", password), debug=False)   # Notice the debug option being OFF

Connection to Neo4j database established.


In [5]:
print("Version of the Neo4j driver: ", db.version())

Version of the Neo4j driver:  4.4.11


# Import of data from a Pandas data frame using the `NeoSchema` library

### Initial setup

In [6]:
db.empty_dbase()       # ******  WARNING: USE WITH CAUTION!!!  ******

In [7]:
NeoSchema.set_database(db)

### Create the Schema

In [8]:
# Create a "SiteLinks" Class node - together with its Properties

NeoSchema.create_class_with_properties(name="Site Link", property_list=["uri", "url", "name", "comments", "rating", "read", "date_created"])

(852, 1)

In [9]:
df = pd.read_csv('D:\Docs\Brain Annex\- DATA TRANSFER IP\BA_siteLinks.csv', encoding = "ISO-8859-1")
df

Unnamed: 0,uri,url,name,comments,rating,read,date_created
0,sl-1,https://www.coursera.org/,Coursera,taken several courses. But they hide some past...,4.0,,2015-08-01 00:00:00
1,sl-2,https://www.edx.org/,edX,taken course from,4.0,,2015-08-01 00:00:00
2,sl-3,https://www.khanacademy.org/,Khan Academy,taken course from,4.0,,2015-08-01 00:00:00
3,sl-4,https://www.youtube.com/user/MIT/playlists,MIT (YouTube playlists),taken courses from,4.0,,2015-08-01 00:00:00
4,sl-5,http://ocw.mit.edu/index.htm,MIT (own site),,4.0,,2015-08-01 00:00:00
...,...,...,...,...,...,...,...
5174,sl-5548,https://www.facebook.com/groups/31768815836776...,Friends looking for SF housing! -Anjou,12/2023 - much-revived; essentially 100% SF focus,3.0,y,2023-12-12 20:19:35
5175,sl-5549,https://www.facebook.com/groups/21767166258966...,"San Francisco Housing, Rooms, Apartments, Sublets","12/2023 - trashy, spammy, commercial slant, st...",3.0,no,2023-12-12 20:27:16
5176,sl-5550,https://www.facebook.com/groups/CaliforniaHous...,Housing @ San Francisco Bay Area Apartments/Ro...,"12/2023 - somewhat spammy, strong SF focus",3.0,no,2023-12-12 20:35:46
5177,sl-5551,https://www.facebook.com/groups/30324133972548...,Bay Area Conscious Community Housing Board,"12/2023 - maybe the best group, all around",3.0,no,2023-12-13 20:14:00


In [10]:
df_small = df.loc[0:35]
df_small

Unnamed: 0,uri,url,name,comments,rating,read,date_created
0,sl-1,https://www.coursera.org/,Coursera,taken several courses. But they hide some past...,4.0,,2015-08-01 00:00:00
1,sl-2,https://www.edx.org/,edX,taken course from,4.0,,2015-08-01 00:00:00
2,sl-3,https://www.khanacademy.org/,Khan Academy,taken course from,4.0,,2015-08-01 00:00:00
3,sl-4,https://www.youtube.com/user/MIT/playlists,MIT (YouTube playlists),taken courses from,4.0,,2015-08-01 00:00:00
4,sl-5,http://ocw.mit.edu/index.htm,MIT (own site),,4.0,,2015-08-01 00:00:00
5,sl-6,http://sciconcilium.com/,"Sciconcilium - Connecting Science, Technology ...",,3.0,,2015-08-01 00:00:00
6,sl-7,https://chrome.google.com/webstore/detail/sess...,Session Manager Chrome Extension,recommended on vox.com,4.0,,2015-08-01 00:00:00
7,sl-8,http://www.vox.com/2015/6/1/8695555/browser-tabs,"I have 227 browser tabs open, and my computer ...",Management of large # of browser tabs,4.0,,2015-08-01 00:00:00
8,sl-9,https://www.yahoo.com/tech/s/roku-vs-apple-tv-...,Roku vs. Chromecast vs. Amazon Fire TV vs. Goo...,Comparison of video streamers,3.0,,2015-08-01 00:00:00
9,sl-10,https://www.quantamagazine.org/20150514-the-pa...,Ultrahigh-energy cosmic rays,,3.0,,2015-08-01 00:00:00


In [11]:
new_ids = NeoSchema.import_pandas_nodes(df=df_small, class_node="Site Link", extra_labels = "BA", schema_code="sl", datetime_cols="date_created")

In [12]:
len(new_ids)

36

In [13]:
df_rel = pd.read_csv('D:\Docs\Brain Annex\- DATA TRANSFER IP\BA_headers_cat.csv', encoding = "ISO-8859-1")
df_rel

Unnamed: 0,item_uri,old_ba_id,pos
0,h-4,1195,0
1,h-5,1195,60
2,h-6,756,0
3,h-7,756,60
4,h-9,823,0
...,...,...,...
874,h-1186,67,305
875,h-1187,1154,50
876,h-1188,423,87
877,h-1189,431,95


In [14]:
NeoSchema.create_class_with_properties(name="Categories" , property_list=["name", "old_ba_id"])

(356, 4)

In [15]:
NeoSchema.create_data_node(class_node="Categories", properties={'name': 'SciPy & Scikit-Learn', 'old_ba_id': 1195})

359

In [16]:
NeoSchema.create_data_node(class_node="Categories", properties={'name': 'Tensor Flow', 'old_ba_id': 756})

360

In [17]:
NeoSchema.create_data_node(class_node="Categories", properties={'name': 'Melissa Data Products', 'old_ba_id': 823})

361

In [18]:
df_rel_small = df_rel  # df_rel[0:8]
df_rel_small

Unnamed: 0,item_uri,old_ba_id,pos
0,h-4,1195,0
1,h-5,1195,60
2,h-6,756,0
3,h-7,756,60
4,h-9,823,0
...,...,...,...
874,h-1186,67,305
875,h-1187,1154,50
876,h-1188,423,87
877,h-1189,431,95


In [19]:
'''
NeoSchema.import_pandas_links(df=df_rel_small, class_from="Headers", class_to="Categories",
                             col_from="item_uri", col_to="old_ba_id",                            
                             link_name="BA_in_category", col_link_props="pos",
                             name_map={"item_uri": "uri"}, skip_errors=True)
'''

import_pandas_links(): failed to create a new relationship for Pandas row: {'item_uri': 'h-10', 'old_ba_id': 819, 'pos': 0}
import_pandas_links(): failed to create a new relationship for Pandas row: {'item_uri': 'h-11', 'old_ba_id': 819, 'pos': 90}
import_pandas_links(): failed to create a new relationship for Pandas row: {'item_uri': 'h-13', 'old_ba_id': 819, 'pos': 350}
import_pandas_links(): failed to create a new relationship for Pandas row: {'item_uri': 'h-15', 'old_ba_id': 819, 'pos': 480}
import_pandas_links(): failed to create a new relationship for Pandas row: {'item_uri': 'h-16', 'old_ba_id': 819, 'pos': 300}
import_pandas_links(): failed to create a new relationship for Pandas row: {'item_uri': 'h-18', 'old_ba_id': 636, 'pos': 120}
import_pandas_links(): failed to create a new relationship for Pandas row: {'item_uri': 'h-21', 'old_ba_id': 819, 'pos': 670}
import_pandas_links(): failed to create a new relationship for Pandas row: {'item_uri': 'h-23', 'old_ba_id': 728, 'pos': 

[1603, 1604, 1605, 1606, 1607, 1608, 1609, 1610, 1611, 1612]

In [20]:
NeoSchema.import_pandas_links(df=df_rel_small, class_from="Headers", class_to="Categories",
                             col_from="item_uri", col_to="old_ba_id",                            
                             link_name="BA_in_category", col_link_props="pos",
                             name_map={"item_uri": "uri"}, skip_errors=True)

import_pandas_links(): failed to create a new relationship for Pandas row: {'item_uri': 'h-4', 'old_ba_id': 1195, 'pos': 0}
import_pandas_links(): failed to create a new relationship for Pandas row: {'item_uri': 'h-5', 'old_ba_id': 1195, 'pos': 60}
import_pandas_links(): failed to create a new relationship for Pandas row: {'item_uri': 'h-6', 'old_ba_id': 756, 'pos': 0}
import_pandas_links(): failed to create a new relationship for Pandas row: {'item_uri': 'h-7', 'old_ba_id': 756, 'pos': 60}
import_pandas_links(): failed to create a new relationship for Pandas row: {'item_uri': 'h-9', 'old_ba_id': 823, 'pos': 0}
import_pandas_links(): failed to create a new relationship for Pandas row: {'item_uri': 'h-10', 'old_ba_id': 819, 'pos': 0}
import_pandas_links(): failed to create a new relationship for Pandas row: {'item_uri': 'h-11', 'old_ba_id': 819, 'pos': 90}
import_pandas_links(): failed to create a new relationship for Pandas row: {'item_uri': 'h-13', 'old_ba_id': 819, 'pos': 350}
import

[]