## NeoSchema library - Tutorial 2 : set up a simple Schema (Classes, Properties) 
## and perform a data import (Data Nodes and relationships among them)

If you're new to Graph Databases, you can think of "Classes" and "Properties" along the lines of, respectively, "Table names" and "Table field lists".

If you need to first clear out your test database, one of the cells below (currently commented out) will conveniently let you do it

#### [Background Article on Schema in Graph Databases](https://julianspolymathexplorations.blogspot.com/2022/11/schema-graph-databases-neo4j.html)

In [1]:
import set_path      # Importing this module will add the project's home directory to sys.path

Added 'D:\Docs\- MY CODE\Brain Annex\BA-Win7' to sys.path


In [2]:
import os
import sys
import getpass
import pandas as pd

from neoaccess import NeoAccess
from BrainAnnex.modules.neo_schema.neo_schema import NeoSchema

# Connect to the database, 
#### using the `NeoAccess` library
#### You can use a free local install of the Neo4j database, or a remote one on a virtual machine under your control, or a hosted solution, or simply the FREE "Sandbox" : [instructions here](https://julianspolymathexplorations.blogspot.com/2023/03/neo4j-sandbox-tutorial-cypher.html)
NOTE: This tutorial is tested on version 4.4 of the Neo4j database, but will probably also work on the new version 5

In [3]:
# Save your credentials here (and skip the next cell!) - or use the prompts given by the next cell
#host = ""             # EXAMPLES:  bolt://123.456.789.012   OR   neo4j://localhost
#password = ""

In [2]:
print("To create a database connection, enter the host IP, but leave out the port number: (EXAMPLES:  bolt://1.2.3.4  OR  neo4j://localhost )\n")

host = input("Enter host IP WITHOUT the port number.  EXAMPLE: bolt://123.456.789.012 ")
host += ":7687"    # EXAMPLE of host value:  "bolt://123.456.789.012:7687"

password = getpass.getpass("Enter the database password:")

print(f"\n=> Will be using: host='{host}', username='neo4j', password=**********")

To create a database connection, enter the host IP, but leave out the port number: (EXAMPLES:  bolt://1.2.3.4  OR  neo4j://localhost )



Enter host IP WITHOUT the port number.  EXAMPLE: bolt://123.456.789.012  bolt://123.456.789.012
Enter the database password: ········



=> Will be using: host='bolt://123.456.789.012:7687', username='neo4j', password=**********


In [4]:
db = NeoAccess(host=host,
               credentials=("neo4j", password), debug=False)   # Notice the debug option being OFF

Connection to Neo4j database established.


In [5]:
print("Version of the Neo4j driver: ", db.version())

Version of the Neo4j driver:  4.4.11


# Import of data from a Pandas data frame using the `NeoSchema` library

### Initial setup

In [6]:
#db.empty_dbase()       # ******  Recommended for use with test databases.   WARNING: USE WITH CAUTION!!!  ******

In [7]:
NeoSchema.set_database(db)

### Create the Schema

In [8]:
# Create a "City" Class node - together with its Properties, based on the data to import

NeoSchema.create_class_with_properties(name="City", property_list=["City ID", "name"])

(530, 1)

In [9]:
# Likewise for a "State" Class node - together with its Properties, based on the data to import

NeoSchema.create_class_with_properties(name="State", property_list=["State ID", "name", "2-letter abbr"])  

(533, 4)

In [10]:
# Now add a relationship named "IS_IN", from the "City" Class to the "State" Class

NeoSchema.create_class_relationship(from_class="City", to_class="State", rel_name="IS_IN")

### Now import some data
We'll pass our data as Pandas data frames; those could easily be read in from CSV files, for example

In [11]:
city_df = pd.DataFrame({"City ID": [1, 2, 3, 4], "name": ["Berkeley", "Chicago", "San Francisco", "New York City"]})
city_df

Unnamed: 0,City ID,name
0,1,Berkeley
1,2,Chicago
2,3,San Francisco
3,4,New York City


In [12]:
state_df = pd.DataFrame({"State ID": [1, 2, 3], "name": ["California", "Illinois", "New York"], "2-letter abbr": ["CA", "IL", "NY"]})
state_df

Unnamed: 0,State ID,name,2-letter abbr
0,1,California,CA
1,2,Illinois,IL
2,3,New York,NY


In [13]:
# In this example, we assume a separate table ("join table") with the data about the relationships;
# this would always be the case for many-to-many relationships; 
# 1-to-many relationships, like we have here, could also be stored differently
state_city_links_df = pd.DataFrame({"State ID": [1, 1, 2, 3], "City ID": [1, 3, 2, 4]})
state_city_links_df

Unnamed: 0,State ID,City ID
0,1,1
1,1,3
2,2,2
3,3,4


#### Note: those dataframes would often be read in from CSV files, with instruction such as
#### city_df = pd.read_csv('D:\my_path\city_data_file.csv', encoding = "ISO-8859-1")

# Ingesting the Data Frames into the graph database is quite easy:

In [14]:
NeoSchema.import_pandas_nodes(df=city_df, class_node="City")

Getting ready to import 4 records...
    FINISHED importing a total of 4 records


[537, 538, 539, 540]

In [15]:
NeoSchema.import_pandas_nodes(df=state_df, class_node="State")

Getting ready to import 3 records...
    FINISHED importing a total of 3 records


[541, 542, 543]

In [16]:
NeoSchema.import_pandas_links(df=state_city_links_df,
                              col_from="City ID", col_to="State ID",                            
                              link_name="IS_IN")

Getting ready to import 4 links...
    FINISHED importing a total of 4 links


[647, 648, 649, 650]

_This is what we have created with our import:_

![Schema](../BrainAnnex/docs/schema_tutorial_2_import.jpg)

#### Notice that the data from that "join table" (with "State ID" and "City ID") that was used for the import, to link up states and cities, is now stored as RELATIONSHIPS - nativety represented in the graph database