<a href="https://colab.research.google.com/github/SwapnaKasula/GenerativeAI/blob/master/Knowledge_Graph/KnowledgeGraph_WithNeo4j.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hands-On: Working with Neo4J

In this hands-on session we will use a free Neo4j Sandbox database and we will explore a very basic of movies in order to understand better the property graph data model and the Cypher query language.  

For a more comprehensive guide on Cypher consider the following resources:

* [Neo4j Cheat Sheet](https://quickref.me/neo4j)
* [Cypher Reference Card](https://neo4j.com/docs/cypher-cheat-sheet/5/auradb-enterprise/)

## Create a Neo4J sandbox database instance

To create an instance go to this [link](https://sandbox.neo4j.com/), log in, and click on "New Project."  From here, select the Movies graph and "Create".

NOTE: This instance will be read-only; we won't be able to add/edit data


## Connect to the database

To connect to the instance we need to specify the Bolt URL, the username and the password. These are available under "Connection Details" tab.

In [None]:
bolt_url = "bolt://98.80.182.68:7687"
username = "neo4j"
pwd = "amperages-soldiers-warranties"

Then we can connect to the database instance using the py2neo library.

In [None]:
! pip install py2neo

Collecting py2neo
  Downloading py2neo-2021.2.4-py2.py3-none-any.whl.metadata (9.9 kB)
Collecting interchange~=2021.0.4 (from py2neo)
  Downloading interchange-2021.0.4-py2.py3-none-any.whl.metadata (1.9 kB)
Collecting monotonic (from py2neo)
  Downloading monotonic-1.6-py2.py3-none-any.whl.metadata (1.5 kB)
Collecting pansi>=2020.7.3 (from py2neo)
  Downloading pansi-2024.11.0-py2.py3-none-any.whl.metadata (3.1 kB)
Downloading py2neo-2021.2.4-py2.py3-none-any.whl (177 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m177.2/177.2 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading interchange-2021.0.4-py2.py3-none-any.whl (28 kB)
Downloading pansi-2024.11.0-py2.py3-none-any.whl (26 kB)
Downloading monotonic-1.6-py2.py3-none-any.whl (8.2 kB)
Installing collected packages: monotonic, pansi, interchange, py2neo
Successfully installed interchange-2021.0.4 monotonic-1.6 pansi-2024.11.0 py2neo-2021.2.4


In [None]:
from py2neo import Graph
conn = Graph(bolt_url, auth=(username, pwd))

## Investigate the graph schema

### Get all labels and their node count

In [None]:
query = """MATCH (n) RETURN distinct labels(n), count(n)"""
result = conn.query(query)
result



labels(n),count(n)
['Movie'],38
['Person'],133


### Get outgoing relations of "Person" nodes

In [None]:
outgoing_relations_query = """MATCH (:Person)-[r]->(n) RETURN distinct type(r), labels(n)"""
result = conn.query(outgoing_relations_query).data()
result

[{'type(r)': 'ACTED_IN', 'labels(n)': ['Movie']},
 {'type(r)': 'DIRECTED', 'labels(n)': ['Movie']},
 {'type(r)': 'PRODUCED', 'labels(n)': ['Movie']},
 {'type(r)': 'WROTE', 'labels(n)': ['Movie']},
 {'type(r)': 'FOLLOWS', 'labels(n)': ['Person']},
 {'type(r)': 'REVIEWED', 'labels(n)': ['Movie']}]

### *Question*: How would you identify the incoming relations of "Movie" nodes?

In [None]:
# insert your code here
query = """MATCH (:Movie)<-[r]-(n) RETURN distinct type(r), labels(n)"""
result = conn.query(query).data()
result

[{'type(r)': 'ACTED_IN', 'labels(n)': ['Person']},
 {'type(r)': 'PRODUCED', 'labels(n)': ['Person']},
 {'type(r)': 'DIRECTED', 'labels(n)': ['Person']},
 {'type(r)': 'WROTE', 'labels(n)': ['Person']},
 {'type(r)': 'REVIEWED', 'labels(n)': ['Person']}]

### Get node properties per label

In [None]:
query = """call db.schema.nodeTypeProperties()"""
result = conn.query(query).data()
result



[{'nodeType': ':`Movie`',
  'nodeLabels': ['Movie'],
  'propertyName': 'title',
  'propertyTypes': ['String'],
  'mandatory': True},
 {'nodeType': ':`Movie`',
  'nodeLabels': ['Movie'],
  'propertyName': 'released',
  'propertyTypes': ['Long'],
  'mandatory': True},
 {'nodeType': ':`Movie`',
  'nodeLabels': ['Movie'],
  'propertyName': 'tagline',
  'propertyTypes': ['String'],
  'mandatory': False},
 {'nodeType': ':`Person`',
  'nodeLabels': ['Person'],
  'propertyName': 'name',
  'propertyTypes': ['String'],
  'mandatory': True},
 {'nodeType': ':`Person`',
  'nodeLabels': ['Person'],
  'propertyName': 'born',
  'propertyTypes': ['Long'],
  'mandatory': False}]

## Querying the data

### Find all the movies Tom Hanks acted in

In [None]:
query = """MATCH (n:Person {name:"Tom Hanks"})-[r:ACTED_IN]->(m:Movie) RETURN m.title"""
result = conn.query(query).data()
result

[{'m.title': 'Apollo 13'},
 {'m.title': "You've Got Mail"},
 {'m.title': 'A League of Their Own'},
 {'m.title': 'Joe Versus the Volcano'},
 {'m.title': 'That Thing You Do'},
 {'m.title': 'The Da Vinci Code'},
 {'m.title': 'Cloud Atlas'},
 {'m.title': 'Cast Away'},
 {'m.title': 'The Green Mile'},
 {'m.title': 'Sleepless in Seattle'},
 {'m.title': 'The Polar Express'},
 {'m.title': "Charlie Wilson's War"}]

### Find all the movies Tom Hanks acted in AND directed

In [None]:
query = """MATCH (n:Person {name:"Tom Hanks"})-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(n) RETURN m.title"""
result = conn.query(query).data()
result

[{'m.title': 'That Thing You Do'}]

### Find the persons who have not directed a movie

In [None]:
query = """MATCH (n:Person) WHERE NOT (n)-[:DIRECTED]->() return n.name"""
result = conn.query(query).data()
result

[{'n.name': 'Keanu Reeves'},
 {'n.name': 'Carrie-Anne Moss'},
 {'n.name': 'Laurence Fishburne'},
 {'n.name': 'Hugo Weaving'},
 {'n.name': 'Joel Silver'},
 {'n.name': 'Emil Eifrem'},
 {'n.name': 'Charlize Theron'},
 {'n.name': 'Al Pacino'},
 {'n.name': 'Tom Cruise'},
 {'n.name': 'Jack Nicholson'},
 {'n.name': 'Demi Moore'},
 {'n.name': 'Kevin Bacon'},
 {'n.name': 'Kiefer Sutherland'},
 {'n.name': 'Noah Wyle'},
 {'n.name': 'Cuba Gooding Jr.'},
 {'n.name': 'Kevin Pollak'},
 {'n.name': 'J.T. Walsh'},
 {'n.name': 'Christopher Guest'},
 {'n.name': 'Aaron Sorkin'},
 {'n.name': 'Kelly McGillis'},
 {'n.name': 'Val Kilmer'},
 {'n.name': 'Anthony Edwards'},
 {'n.name': 'Tom Skerritt'},
 {'n.name': 'Meg Ryan'},
 {'n.name': 'Jim Cash'},
 {'n.name': 'Renee Zellweger'},
 {'n.name': 'Kelly Preston'},
 {'n.name': "Jerry O'Connell"},
 {'n.name': 'Jay Mohr'},
 {'n.name': 'Bonnie Hunt'},
 {'n.name': 'Regina King'},
 {'n.name': 'Jonathan Lipnicki'},
 {'n.name': 'River Phoenix'},
 {'n.name': 'Corey Feldman'

### Question: How would you find the movies that have been reviewed?

In [None]:
# insert your code here
query = """MATCH (n:Movie) WHERE ()-[:REVIEWED]->(n) return n.title"""
result = conn.query(query).data()
result

[{'n.title': 'Jerry Maguire'},
 {'n.title': 'The Replacements'},
 {'n.title': 'The Birdcage'},
 {'n.title': 'Unforgiven'},
 {'n.title': 'Cloud Atlas'},
 {'n.title': 'The Da Vinci Code'}]