Name		Name	Last commit message	Last commit date
parent directory ..
configScreenshots		configScreenshots
queries		queries
report		report
results		results
src		src
Proj3_Neo4j_Description.pdf		Proj3_Neo4j_Description.pdf
README.md		README.md
requirements.txt		requirements.txt

Neo4j Project: Relational vs. Graph Databases

In this project, a dataset was employed with the objective of representing it in both the relational and graph models, executing various queries, and evaluating their performance. The steps undertaken are outlined below:

1. Go to Stanford Large Network Dataset Collection.

2. Use the Pokec dataset in Social Networks category.

3. For each user keep the 'user_id', 'age', and 'gender' attributes.

4. Load the data into MySQL and Neo4j systems.

5. Author queries in SQL and Cypher for the following tasks and benchmark the performance on both systems:

For each user, count his/her friends.
For each user, count his/her friends of friends.
For each user, count his/her friends that are over 30.
For each male user, count how many male and female friends he is having.

The project was implemented in the context of the course "Big Data Management Systems" taught by Prof. Damianos Chatziantoniou. A detailed description of the assignment can be found here.

Execution

1. We assume that MySQL and Neo4j are already installed and configured on the system.

2. Clone this repository:

$ git clone https://github.com/ChryssaNab/BDMS-AUEB.git
$ cd /BDMS-AUEB/neo4j_project/

3. Download the dataset:

neo4j_project$ mkdir dataset
neo4j_project$ cd dataset
neo4j_project/dataset$ wget "https://snap.stanford.edu/data/soc-pokec-profiles.txt.gz"
neo4j_project/dataset$ wget "https://snap.stanford.edu/data/soc-pokec-relationships.txt.gz"

4. Execute the convert_tsv_to_csv.py script to convert the files from tab-separated (TSV) format to comma-separated (CSV) format, while retaining only the user_id, age, and gender attributes of the profiles:

 $ python convert_tsv_to_csv.py

5. Execute the write_soc_pokec_mysql.py script to import the data into MySQL:

 $ python write_soc_pokec_mysql.py

6. Run the following command in the command line to import the data into Neo4j:

neo4j_home$ bin/neo4j-admin import
-- id-type INTEGER
-- nodes:User ../import/soc-pokec-profiles-new.csv
-- relationships:HAS_FRIEND ../import/soc-pokec-relationships-new.csv

Two examples of the Neo4j graph model are depicted below:

To access Neo4j from a regular browser window, simply enter http://localhost:7474 and log in with the following credentials: Username: neo4j, Password: Your password.

7. Execute the run_queries_mysql.py script to run the queries in MySQL:

 $ python run_queries_mysql.py

8. Execute the run_queries_neo4j.py script to run the queries in Neo4j:

 $ python run_queries_neo4j.py

Neo4j-admin Import command line tool

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

neo4j_project

neo4j_project

README.md

Neo4j Project: Relational vs. Graph Databases

Contents

Project Description

Execution

Benchmarking

Team

External Resources

Files

neo4j_project

Directory actions

More options

Directory actions

More options

Latest commit

History

neo4j_project

Folders and files

parent directory

README.md

Neo4j Project: Relational vs. Graph Databases

Contents

Project Description

Execution

Benchmarking

Team

External Resources