An algorithm to extract keywords from any sentence using Stanford's Natural Language Processing Log-linear POS Tagger

I have used Stanford's (natural language processing) Log-linear POS tagger in java to handle .xml files and extract sentences which are present inside the <title>.....</title> tag.

The sentences extracted were then tagged using the Parts of Speech tagger, the library for which is available on Stanford NLP's website. You can download the library and get information regarding it's usage here.

In the Java code, for the keywords I have mainly considered nouns, adjectives and verbs. As basically, these are the kind of words which actually contribute in a query. For an example if we have a sentence like - "Mercedes and it's cars". The words of interest here mainly are "Mercedes" and "car", which as it turns out are Noun. Details regarding the POS tags can be found in the "POS tagging terms meanings.txt" or here.

For the code to function, create a folder and copy all the files (apart from - "title.txt", "query.txt", "reqfile.txt" and the "POS tagging terms meanings") into it, and paste the file in your Java IDE's workspace. Import the entire file and then locate the POS tagger library with your IDE. The POS tagger which I have used here is dated 09-06-2017. You can download the latest version from the link provided above. Talking about the .xml file, "query.txt" is the actual query file which the code reads and processes to create "title.txt" and then create the file containing the keywords "reqfile.txt".

Download basic English Stanford Tagger version 3.9.1
Download full Stanford Tagger version 3.9.1

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
META-INF		META-INF
src/brillTaggerStanford		src/brillTaggerStanford
taggers		taggers
.classpath		.classpath
.gitattributes		.gitattributes
.gitignore		.gitignore
.project		.project
POS tagging terms meanings.txt		POS tagging terms meanings.txt
README.md		README.md
query.txt		query.txt
reqfile.txt		reqfile.txt
title.txt		title.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

META-INF

META-INF

src/brillTaggerStanford

src/brillTaggerStanford

taggers

taggers

.classpath

.classpath

.gitattributes

.gitattributes

.gitignore

.gitignore

.project

.project

POS tagging terms meanings.txt

POS tagging terms meanings.txt

README.md

README.md

query.txt

query.txt

reqfile.txt

reqfile.txt

title.txt

title.txt

Repository files navigation

An algorithm to extract keywords from any sentence using Stanford's Natural Language Processing Log-linear POS Tagger

About

Releases

Packages

Languages

Swapneel01/POS-Tagging-for-KeyWord-Extraction

Folders and files

Latest commit

History

Repository files navigation

An algorithm to extract keywords from any sentence using Stanford's Natural Language Processing Log-linear POS Tagger

About

Topics

Resources

Stars

Watchers

Forks

Languages