NER Assessment for Springer Nature Abstracts

You can use this repository to produce named entity links or candidate links using DBpedia Spotlight tool by mapping multiple N-Triple (or N-Quad) files. This repository is used to produce named entity links for Springer-Nature datasets (it can be used for any dataset though).

The steps are explained for book chapters but you can use any of the dataset from SciGraph Exlorer. To produce data, you can follow given steps:

Preparing the dataset :

Download Springer-Nature book chapters dataset from this link and book chapters abstracts from this link.
Use bash commands to retrieve portion of the data from book chapters dataset for needed properties (field of research and language)

bzcat FILE | grep 'http://scigraph.springernature.com/ontologies/core/hasFieldOfResearchCode' > outputFieldOfResearch.ttl
bzcat FILE | grep 'http://scigraph.springernature.com/ontologies/core/language' > outputLanguage.ttl

Compress output files

bzip2 outputFieldOfResearch.ttl
bzip2 outputLanguage.ttl

Sort compressed output files and book chapter abstracts bzip2 -cd "$file" | cat | sort --parallel=8 --batch-size=512 --buffer-size=50% | parallel --pipe --recend '' -k bzip2 > "$newFile" ;

Setting up the framework (for Linux):

Install Scala
Install IntelliJ
Import repository from Github to Scala
Change default.properties with pointers to the datasets

Set base-dir as path to your datasets
Set primary-input-dataset as book chapters abstracts
Set input-datasets as field of research portion and language portion

Execute main class

Run SortedQuadTraversal class as main file
Run-> Edit Configurations -> Program Arguments set as default.properties
Run main class again

As a result you are going to get two output datasets

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
.idea		.idea
src		src
README.md		README.md
default.properties		default.properties
dump-splitter.properties		dump-splitter.properties
pom.xml		pom.xml
quad-sorter.properties		quad-sorter.properties
run		run

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

src

src

README.md

README.md

default.properties

default.properties

dump-splitter.properties

dump-splitter.properties

pom.xml

pom.xml

quad-sorter.properties

quad-sorter.properties

run

run

Repository files navigation

NER Assessment for Springer Nature Abstracts

Preparing the dataset :

Setting up the framework (for Linux):

About

Releases

Packages

Languages

beyzayaman/NER-assessment-for-springer-nature-abstracts

Folders and files

Latest commit

History

Repository files navigation

NER Assessment for Springer Nature Abstracts

Preparing the dataset :

Setting up the framework (for Linux):

About

Resources

Stars

Watchers

Forks

Languages