Skip to content

Latest commit

 

History

History

neo4j_project

Neo4j Project: Relational vs. Graph Databases

  1. Project Description
  2. Execution
  3. Benchmarking
  4. Team
  5. External Resources

In this project, a dataset was employed with the objective of representing it in both the relational and graph models, executing various queries, and evaluating their performance. The steps undertaken are outlined below:

1. Go to Stanford Large Network Dataset Collection.

2. Use the Pokec dataset in Social Networks category.

3. For each user keep the 'user_id', 'age', and 'gender' attributes.

4. Load the data into MySQL and Neo4j systems.

5. Author queries in SQL and Cypher for the following tasks and benchmark the performance on both systems:

  • For each user, count his/her friends.
  • For each user, count his/her friends of friends.
  • For each user, count his/her friends that are over 30.
  • For each male user, count how many male and female friends he is having.

The project was implemented in the context of the course "Big Data Management Systems" taught by Prof. Damianos Chatziantoniou. A detailed description of the assignment can be found here.


1. We assume that MySQL and Neo4j are already installed and configured on the system.

2. Clone this repository:

$ git clone https://github.com/ChryssaNab/BDMS-AUEB.git
$ cd /BDMS-AUEB/neo4j_project/

3. Download the dataset:

neo4j_project$ mkdir dataset
neo4j_project$ cd dataset
neo4j_project/dataset$ wget "https://snap.stanford.edu/data/soc-pokec-profiles.txt.gz"
neo4j_project/dataset$ wget "https://snap.stanford.edu/data/soc-pokec-relationships.txt.gz"

4. Execute the convert_tsv_to_csv.py script to convert the files from tab-separated (TSV) format to comma-separated (CSV) format, while retaining only the user_id, age, and gender attributes of the profiles:

 $ python convert_tsv_to_csv.py

5. Execute the write_soc_pokec_mysql.py script to import the data into MySQL:

 $ python write_soc_pokec_mysql.py

6. Run the following command in the command line to import the data into Neo4j:

neo4j_home$ bin/neo4j-admin import
-- id-type INTEGER
-- nodes:User ../import/soc-pokec-profiles-new.csv
-- relationships:HAS_FRIEND ../import/soc-pokec-relationships-new.csv

Two examples of the Neo4j graph model are depicted below:

To access Neo4j from a regular browser window, simply enter http://localhost:7474 and log in with the following credentials: Username: neo4j, Password: Your password.

7. Execute the run_queries_mysql.py script to run the queries in MySQL:

 $ python run_queries_mysql.py

8. Execute the run_queries_neo4j.py script to run the queries in Neo4j:

 $ python run_queries_neo4j.py

Queries are executed for 100, 1000, 10,000, 100,000 results, and eventually for the entire dataset. This approach was chosen to observe and compare the performance of MySQL and Neo4j systems across various data sizes.