Skip to content

beyzayaman/NER-assessment-for-springer-nature-abstracts

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

89 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NER Assessment for Springer Nature Abstracts

You can use this repository to produce named entity links or candidate links using DBpedia Spotlight tool by mapping multiple N-Triple (or N-Quad) files. This repository is used to produce named entity links for Springer-Nature datasets (it can be used for any dataset though).

The steps are explained for book chapters but you can use any of the dataset from SciGraph Exlorer. To produce data, you can follow given steps:

Preparing the dataset :

  1. Download Springer-Nature book chapters dataset from this link and book chapters abstracts from this link.

  2. Use bash commands to retrieve portion of the data from book chapters dataset for needed properties (field of research and language)

  1. Compress output files
  • bzip2 outputFieldOfResearch.ttl
  • bzip2 outputLanguage.ttl
  1. Sort compressed output files and book chapter abstracts bzip2 -cd "$file" | cat | sort --parallel=8 --batch-size=512 --buffer-size=50% | parallel --pipe --recend '' -k bzip2 > "$newFile" ;

Setting up the framework (for Linux):

  1. Install Scala
  2. Install IntelliJ
  3. Import repository from Github to Scala
  4. Change default.properties with pointers to the datasets
  • Set base-dir as path to your datasets
  • Set primary-input-dataset as book chapters abstracts
  • Set input-datasets as field of research portion and language portion
  1. Execute main class
  • Run SortedQuadTraversal class as main file
  • Run-> Edit Configurations -> Program Arguments set as default.properties
  • Run main class again

As a result you are going to get two output datasets

About

A framework to to produce named entity links for datasets using DBpedia Spotlight

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Scala 82.6%
  • Java 17.1%
  • Shell 0.3%