<div>
<img src=https://www.institutedata.com/wp-content/uploads/2019/10/iod_h_tp_primary_c.svg width="300">
</div>

# Lab 3.1.5 
# *Neo4j and Python*

## Introduction

Neo4j is the most popular graph database. Free versions include the Desktop (Developer) edition and the Community Server edition (which we can drive from Python). 

We will begin this lab by working through the tutorial embedded in the Neo4j *start* page to learn about graph databases structures and the Cypher query language. We will then see how to integrate a Neo4j database with a Python program.

The Community Server version can be downloaded here: https://neo4j.com/download-center/#releases 


- Go through the *Concepts* tutorial. 
- At the end, click *Intro* under *Keep getting started* heading and go through the tutorial.
- At the end, click *Cypher* under *Keep getting started* heading and go through the tutorial.
- At the end, click *The Movie Graph* under *Jump into code* heading and go through the tutorial.


## Driving Neo4j from Python

There are a variety of Python libraries for Neo4j, some of which provide more compact (and simpler) ways of executing commands. To avoid having to learn too many different ways of doing the same thing, however, we will use the official one, which is based on the syntax of the Cypher query language.

The ***Neo4j Bolt Driver for Python*** is documented at https://neo4j.com/docs/api/python-driver/current/.

First install via terminal

pip install Neo4j

In [1]:
# Establish a connection to Neo4j graph database using the official Neo4j Python driver
from neo4j import GraphDatabase

# Define the database URI (Uniform Resource Identifier) which specifies the address and port the Neo4j database is hosted
# Bolt is a protocol used to communicate with Neo4j
uri = "bolt://localhost:7687"

In [2]:
# After executing this code below, the driver variable will hold an instance of the Neo4j database driver. 
# This driver can be used to create sessions and execute queries or transactions against the Neo4j database.
driver = GraphDatabase.driver(uri, auth=("neo4j", "satibatiba"))

To execute a query against a database using this driver, we need to wrap the Cypher query string in a function definition and pass the function to the `read_transaction` method of the `session` object. Our query function then has access to the `tx` object.

Here is a function that finds all the movies that the requested `Person` acted in:

In [3]:
# Function definition: 
# 'tx' represents a Neo4j transaction, 'name' represents the name of person. tx.run used to execute the Cypher query
def print_movies_by(tx, name):
    for record in tx.run("MATCH (a:Person)-[:ACTED_IN]->(anyMovies) "
                         "WHERE a.name = $name "
                         "RETURN anyMovies", name = name):
        print(record["anyMovies"])

The Cypher query itself consists of the following components:

- MATCH (a:Person)-[:ACTED_IN]->(anyMovies): This part of the query matches a pattern in the graph. It looks for a node labeled as Person (representing a person) connected by an ACTED_IN relationship to another node labeled as anyMovies (representing a movie).

- WHERE a.name = $name: This part of the query filters the results based on the name property of the Person node, matching it to the provided name parameter.

- RETURN anyMovies: This part of the query specifies that we want to return the anyMovies nodes that match the pattern.

Here is how to use it to list Tom Hanks' movies:

In [4]:
with driver.session() as session:
    session.execute_read(print_movies_by, "Tom Hanks")

<Node element_id='4:e254e499-a0f3-43b3-a5e0-0b024dbbfe4e:67' labels=frozenset({'Movie'}) properties={'tagline': 'At odds in life... in love on-line.', 'title': "You've Got Mail", 'released': 1998}>
<Node element_id='4:e254e499-a0f3-43b3-a5e0-0b024dbbfe4e:142' labels=frozenset({'Movie'}) properties={'tagline': 'Houston, we have a problem.', 'title': 'Apollo 13', 'released': 1995}>
<Node element_id='4:e254e499-a0f3-43b3-a5e0-0b024dbbfe4e:78' labels=frozenset({'Movie'}) properties={'tagline': 'A story of love, lava and burning desire.', 'title': 'Joe Versus the Volcano', 'released': 1990}>
<Node element_id='4:e254e499-a0f3-43b3-a5e0-0b024dbbfe4e:85' labels=frozenset({'Movie'}) properties={'tagline': 'In every life there comes a time when that thing you dream becomes that thing you do', 'title': 'That Thing You Do', 'released': 1996}>
<Node element_id='4:e254e499-a0f3-43b3-a5e0-0b024dbbfe4e:105' labels=frozenset({'Movie'}) properties={'tagline': 'Everything is connected', 'title': 'Cloud A

Code explanation:
- driver.session() opens a new session to interact with the Neo4j database. The 'with' statement ensures the session is properly closed after transaction is executed.
- session.execute_read() is a method that initiates a read transaction within the opened session. Read transactions are used for read-only operations in Neo4j.
- print_movies_by is the function that will be executed as part of this read transaction. It takes two arguments: 
- tx: This argument represents the transaction context and is provided automatically by the Neo4j driver.
- "Tom Hanks": This is the value passed as the name argument to the print_movies_by function.

Clearly, some further wrangling is required to produce neat output. (Read the documentation before you attempt this.) 

In fact, both the method of using the Neo4j Bolt Driver and the data returned by it are unwieldy. This is typical of low-level drivers. 

Try building and running some more queries based on the code in examples queries in The Movie Graph tutorial.

## - END -



---



---



> > > > > > > > > © 2023 Institute of Data


---



---



