Dataset_Recommendation_using_ensembel_approach-CoAuthor-

Introduction

This repository is the required dataset and python implementation of paper "Recommending Scientific Datasets Using Author Networks in Ensemble Methods" with authors Xu Wang, Frank van Harmelen and Zhisheng Huang.

Requirement before running experiment

Make sure your python version >= 3.6. You should "pip" install followling library in your python environment:

Or simply use pip install -r requirements.txt to install all needed library.

Dataset

The dataset you needed for our ensembel datset recommendation algorithm:

MAKG coauthor RDF/HDT file download link
MAKG paper/dataset title RDF/HDT file download link
MAKG paper/dataset abstract RDF/HDT file download link
MAKG pretrained author-entity embedding download link
Seed dataset/paper txt file one dataset per line
Candidate dataset/paper txt file one dataset per line
Gold standard link between seeds and candidates RDF/HDT file

Python Implementation of Dataset Recommendation with Co-author network in ensemble methods

The algorithm in paper is implemented in Recommend_walk_embed_bm.py:

Graph walk implementation
- graphwalk function in line 47
- line 217-220 of step function
Author entity embedding similarity
- clean_candidate_with_ent_embed in line 107
- line 221-222 of step function
BM25
- line 253-260 of step function

Usage

usage: Recommend_walk_embed_bm.py [-h] -th THRESHOLD -bth BM25_THRESHOLD -hp HOP -sd SEED -cd CANDIDATE -gd STANDARD [-d DIR]

optional arguments:
  -h, --help            show this help message and exit
  -th THRESHOLD, --threshold THRESHOLD
                        Threshold for similarity between entity(author) embedding
  -bth BM25_THRESHOLD, --bm25_threshold BM25_THRESHOLD
                        Threshold for BM25 ranking
  -hp HOP, --hop HOP    Hop number for graph walk
  -sd SEED, --seed SEED
                        Path to [seed file].txt
  -cd CANDIDATE, --candidate CANDIDATE
                        Path to [candidate file].txt
  -gd STANDARD, --standard STANDARD
                        Path to [standard file].hdt
  -d DIR, --dir DIR     Directory to read all needed files and to store all results. Default is directory of this python file

Sample experiment

41 seed datasets and 116 candidate datasets, with 117 gold standard link.

python Recommend_walk_embed_bm.py -th [threshold of embedding similarity] -bth [threshold of bm25] -hp [hop number of graph walk] -sd [path_to_seed] -cd [path_to_candidate] -gd [path_to_standard.hdt] -d [path_to_dir_of_all_datasets]

After running python file, it will return result file in directory with format per line:

seed_dataset_id[Tab Separated]Correct_Count[Tab Separated]Standard_Count[Tab Separated]Recommended_Count[Tab Separated]Recall[Tab Separated]Precision

where Standard_Count is the number of standard linked datasets for seed dataset; Recommended_Count is the number of datasets returned by recommendation alogrithm for seed dataset; Correct_Count is the number of intersection between standard linked datasets and datasets returned by recommendation alogrithm for seed dataset; Recall is Correct_Count divided by Standard_Count; Precision is Correct_Count divided by Recommended_Count.

License

This repository is licensed under GNU General Public License v3.0.

The Microsoft Academic Knowledge Graph, the linked data description files, and the ontology are licensed under the Open Data Commons Attribution License (ODC-By) v1.0.

Citation of Data

Wang, Xu, 2022, "Data For "Recommending Scientiﬁc Datasets Using Author Networks in Ensemble Methods"", https://doi.org/10.34894/W6C7P7, DataverseNL, V1

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
Recommend_walk_embed_bm.py		Recommend_walk_embed_bm.py
Standard_sample.hdt		Standard_sample.hdt
Standard_sample.hdt.index.v1-1		Standard_sample.hdt.index.v1-1
cands_sample.txt		cands_sample.txt
requirements.txt		requirements.txt
seeds_sample.txt		seeds_sample.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CITATION.cff

CITATION.cff

LICENSE

LICENSE

README.md

README.md

Recommend_walk_embed_bm.py

Recommend_walk_embed_bm.py

Standard_sample.hdt

Standard_sample.hdt

Standard_sample.hdt.index.v1-1

Standard_sample.hdt.index.v1-1

cands_sample.txt

cands_sample.txt

requirements.txt

requirements.txt

seeds_sample.txt

seeds_sample.txt

Repository files navigation

Dataset_Recommendation_using_ensembel_approach-CoAuthor-

Introduction

Requirement before running experiment

Dataset

Python Implementation of Dataset Recommendation with Co-author network in ensemble methods

Usage

Sample experiment

License

Citation of Data

About

Releases

Packages

Languages

License

xuwang0010/Dataset_Recommendation_using_ensemble_approach-CoAuthor-

Folders and files

Latest commit

History

Repository files navigation

Dataset_Recommendation_using_ensembel_approach-CoAuthor-

Introduction

Requirement before running experiment

Dataset

Python Implementation of Dataset Recommendation with Co-author network in ensemble methods

Usage

Sample experiment

License

Citation of Data

About

Resources

License

Stars

Watchers

Forks

Languages