### Research Paper Recommendation System using SBERT
The recommendation system will sugget a list of most similar papers for a given research paper.

In [None]:
!pip install sentence-transformers

In [None]:
from sentence_transformers import SentenceTransformer, util
import os
import json
import requests

#### Load Dataset

In [4]:
response = requests.get('https://sbert.net/datasets/emnlp2016-2018.json')
papers = json.loads(response.text)

In [5]:
papers[0]

{'title': 'Rule Extraction for Tree-to-Tree Transducers by Cost Minimization',
 'abstract': 'Finite-state transducers give efficient representations of many Natural Language phenomena. They allow to account for complex lexicon restrictions encountered, without involving the use of a large set of complex rules difficult to analyze. We here show that these representations can be made very compact, indicate how to perform the corresponding minimization, and point out interesting linguistic side-effects of this operation.',
 'url': 'http://aclweb.org/anthology/D16-1002',
 'venue': 'EMNLP',
 'year': '2016'}

In [6]:
len(papers)

974

#### SBERT Model

In [None]:
# here we use the SPECTER model (https://arxiv.org/pdf/2004.07180.pdf)
model = SentenceTransformer('allenai-specter')

In [None]:
# encodes paper titles and abstracts
paper_texts = [paper['title'] + '[SEP]' + paper['abstract'] for paper in papers]
corpus_embeddings = model.encode(paper_texts, convert_to_tensor=True, show_progress_bar=True)

In [9]:
def search(title, abstract):
  query_embedding = model.encode(title + '[SEP]' + abstract, convert_to_tensor=True)

  search_hits = util.semantic_search(query_embedding, corpus_embeddings, top_k=3)[0]

  print("Most Similar Papers\n")
  for hit in search_hits:
    related_paper = papers[hit['corpus_id']]
    print(related_paper['title'])
    print(related_paper['abstract'])
    print('\n\n')

In [10]:
# make recommendations
title = 'Applications of big data in emerging management disciplines: A literature review using text mining'
abstract = 'The importance of data-driven decisions and support is increasing day by day in every management area. The constant access to volume, variety, and veracity of data has made big data an integral part of management studies. New sub-management areas are emerging day by day with the support of big data to drive businesses. This study takes a systematic literature review approach to uncover the emerging management areas supported by big data in contemporary times. For this, we have analyzed the research papers published in the reputed management journals in the last ten years, fir using network analysis followed by natural language processing summarization techniques to find the emerging new management areas which are yet to get much attention. Furthermore, we ran the same exercise in each of these management areas to uncover these areas better. This research will act as a reference for future information systems (IS) scholars who want to perform analysis that is deep-dive in nature on each of these management areas, which in the coming times will get all the due attention to become dedicated research domains in the management area. We finally conclude the study by identifying the scope of future research in each of these management areas, which will be a true value addition for IS researchers.'
search(title, abstract)

Most Similar Papers

Challenges of Using Text Classifiers for Causal Inference
Causal understanding is essential for many kinds of decision-making, but causal inference from observational data has typically only been applied to structured, low-dimensional datasets. While text classifiers produce low-dimensional outputs, their use in causal inference has not previously been studied. To facilitate causal analyses based on language data, we consider the role that text classifiers can play in causal inference through established modeling mechanisms from the causality literature on missing data and measurement error. We demonstrate how to conduct causal analyses using text classifiers on simulated and Yelp data, and discuss the opportunities and challenges of future work that uses text data in causal inference.



PubSE: A Hierarchical Model for Publication Extraction from Academic Homepages
Despite recent evidence that Microsoft Academic is an extensive source of citation counts for journa