Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Bite-Sized Neo4j for Data Scientists

Written by: Dr. Clair J. Sullivan, Data Science Advocate, Neo4j


Twitter: @CJLovesData1

Last updated: April 13, 2022

All notebooks can be found in notebooks/. Some videos are strictly based on Cypher querys, which can be found in cypher/.


Stay tuned to the Neo4j YouTube channel for new episodes coming soon!


The notebooks in this repository are not meant to be stand-alone and thus are not commented. They go with the videos. So you are encouraged to watch the videos and then consult the notebooks should you will to look at the actual code in depth.


Find this video series as its own webpage on the Neo4j webpage!!!

Complete YouTube playlist of full series

Part 1: Connect from Jupyter to a Neo4j Sandbox

Part 2: Using the py2neo Python Driver

Part 3: Using the Neo4j Python Driver

Part 4: Basic Cypher Queries (and with Google Colab)

  • This video uses a Google Colab notebook, which can be found here

Part 5: Populating the Database from Pandas

  • This video refers to a YouTube video on how to create efficient Cypher queries, which is linked in the references below.

Part 6: Populating the Database with LOAD CSV

Part 7: Populating the Database with the neo4j-admin tool

  • This video works from the command line using Docker. The shell commands are provided in GitHub gists, which can be found here.
  • The data for this part can be found in data/ (the files are got-s1-nodes.csv and got-s1-edges.csv).

Part 8: Populating the Database from a JSON file

  • This video references a JSON file I created for my NODES 2021 tutorial, "Creating a Knowledge Graph with Neo4j: A Simple Machine Learning Approach."

Part 9: Cypher Queries 2

Part 10: Creating In-Memory Graphs with Cypher Projections

Part 11: Import RDF Data from Wikidata

  • To query Wikidata, it is helpful to know how to use SPARQL. The query builder that I showed (which has several great example queries) can be found here. Wikidata also provides a good SPARQL tutorial.
  • This video shows the use of Neosemantics for importing the RDF data. See below in the References for docs on how to use it.
  • This video also shows very quickly demonstrates Neo4j Bloom for visualization and queries. For an in-depth look at how to use Bloom, see this video.

Part 12: Creating In-Memory Graphs with Native Projections

  • This is the sister video for Part 10, which explored the other method for creating in-memory graphs.

Part 13: Calculating Centrality

Part 14: Community Detection with the Louvain Method

Part 15: Community Detection via Weakly Connected Components

Part 16: Using Strongly Connected Components to Detect Communities

Part 17: Creating FastRP Graph Embeddings

Part 18: Putting Graph Embeddings into a Machine Learning Model

  • This video moves quickly! It will be important to read this blog post, particularly for understanding how to get the embeddings into a format for the machine learning model.

Part 19: Starting with a SQL table...

  • This video is the start of a series looking at why we might want to go from SQL to a graph database
  • It is based off of the graph data that can be found in here
  • I use PostgreSQL for my demonstrations, but you can use your SQL of choice
  • All queries to populate your database are in ./sql_queries/part19

Part 20: ...And compare it to a graph... (2/n)

  • This video builds off of Part 19, using the same data imported into Neo4j
  • To create the CSV files used for this graph, I exported each of the tables in Part 19 directly from Postgres via pgAdmin
    • I made some tweaks of the headers to get them into Neo4j via LOAD CSV easily
    • The data files can be found in ./data

Part 21: An example of when querying a graph can be easier than SQL (3/n)

  • This video builds off of Parts 19 and 20 of this series
  • If you do not already have a Neo4j database populated with this data, follow the instructions in Part 20 or run the script ./cypher_queries/part20.cql to populate the database

Part 22: A side-by-side calculation of degree using SQL and Neo4j (4/n)

  • This video builds off of Parts 19-21 of this series
  • If you do not already have a SQL database populated with this data, use the queries in ./sql_queries/part19/
  • If you do not already have a Neo4j database populated with this data, follow the instructions in Part 20 or run the script ./cypher_queries/part20.cql to populate the database

Part 23: PageRank done two ways (5/n)

  • This video builds off of Parts 19-22 of this series
  • We will be using a very simplistic graph for this demonstration
  • The PageRank SQL query was taken from this Stack Overflow post, which was originally written for T-SQL and has been modified in this repo to work in PostgreSQL

Page 24: Why graphs? (6/6)

  • This video builds off of Parts 19-23 of this series
  • This is the final video in the mini series-within-a-series for the SQL vs. Neo4j comparisons

Part 25: Creating a graph for a Kaggle competition

Part 26: Creating a graph model of the Kaggle competition (2/n)

Part 27: Node similarity of Kaggle competition graph (3/n)

Part 28: Using KNN to identify similar items of Kaggle competition graph (4/n)

  • This video is based off of Parts 25-27
  • If you need a refresher on how to create an in-memory graph projection as is done in this video, please consult Part 12
  • In this video we will do some very basic feature engineering to explore the K-Nearest Neighbors for each article of clothing to obtain similar articles
  • (The next video will also do KNN, but using some much more sophisticated features!)

Part 29: Using KNN with more sophisticated feature vectors (5/n)

  • This video is based off of Parts 25-28

Part 30: Introducing GDS 2.0!

  • This video just scrapes the surface of all of the new offerings within GDS 2.0, but focuses on the new GDS Python Client



No description, website, or topics provided.






No releases published


No packages published