Neo4J Property Graph Model of Semantic Scholar Papers Dataset

This README outlines the steps needed to set up and run the project environment, including installing necessary libraries, processing data, and loading it into a Neo4j database.

Environment Setup

Ensure you have Python 3.9 installed on your system. You can check your Python version by running:

python --version

Install Libraries

First, install the required libraries listed in requirements.txt by running the following command:

pip install -r requirements.txt

Data Preparation

Download Dataset

Download the dataset from the provided hyperlink. Ensure the downloaded dataset is saved in the project's root directory.

Download Dataset

Data Transformation

Run the notebooks/etl.ipynb Jupyter notebook to create necessary files from the dataset. Initially, all_data.csv will be used.

Skipping Keyword Creation

The keyword creation part of the ETL process is time-consuming. If you prefer to skip this step, directly use all_data_with_keywords.csv in the notebook instead of all_data.csv.

Neo4j Database Setup

After processing the data, set up a Neo4j DBMS and obtain the path to its DBMS directory. On macOS, the path looks like this:

/Users/user/Library/Application%20Support/Neo4j%20Desktop/Application/relate-data/dbmss/<dbms-id>/import/csv_path.txt

Replace <dbms-id> with the appropriate folder name for your DBMS.

Moving CSV Files

Move all the CSV files generated by etl.ipynb to the directory path you obtained in the previous step.

Running the Data Pipeline

Execute the data pipeline with the following bash script command. Ensure to replace --config with your configuration file path, if necessary. Also, make the necessary modifications of the username, password, and database in the config file.

bash run_loader.sh --config config.ini

Results

Below are the sample results we obtained from running the pipeline:

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
images		images
notebooks		notebooks
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
PartAKhanPaudel.py		PartAKhanPaudel.py
PartBKhanPaudel.py		PartBKhanPaudel.py
PartCKhanPaudel.py		PartCKhanPaudel.py
PartDKhanPaudel.py		PartDKhanPaudel.py
README.md		README.md
config.ini		config.ini
connection.py		connection.py
loader.py		loader.py
requirements.txt		requirements.txt
run_loader.sh		run_loader.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neo4J Property Graph Model of Semantic Scholar Papers Dataset

Environment Setup

Install Libraries

Data Preparation

Download Dataset

Data Transformation

Skipping Keyword Creation

Neo4j Database Setup

Moving CSV Files

Running the Data Pipeline

Results

About

Releases

Packages

Languages

License

QasimKhan5x/Semantic-Scholar-Property-Graph-Neo4j

Folders and files

Latest commit

History

Repository files navigation

Neo4J Property Graph Model of Semantic Scholar Papers Dataset

Environment Setup

Install Libraries

Data Preparation

Download Dataset

Data Transformation

Skipping Keyword Creation

Neo4j Database Setup

Moving CSV Files

Running the Data Pipeline

Results

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages