# Keyword Extraction with Graph-of-Words

This notebook reproduces the example from the following paper:

[A Graph Degeneracy-based Approach to Keyword Extraction](https://www.aclweb.org/anthology/D16-1191/).
     *Tixier, Antoine, Malliaros, Fragkiskos, and Vazirgiannis, Michalis*.
     *Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing*.
     (EMNLP 2016)
     
It exemplifies the results of these alternative methods:
- Batch keyword extraction based on k-core
   + main core
   + k-core + dense selection method
   + k-core + inflexion selection method
- Word-level keyword extraction
   + CoreRank + elbow method
   + CoreRank + top 33%

In [1]:
from gowpy.summarization.unsupervised import KcoreKeywordExtractor
from gowpy.summarization.unsupervised import CoreRankKeywordExtractor

## Example

In [2]:
"""
Mathematical aspects of computer-aided share trading. We consider
problems of statistical analysis of share prices and propose
probabilistic characteristics to describe the price series.
We discuss three methods of mathematical modelling of price 
series with given probabilistic characteristics.
"""

preprocessed_text = """
Mathemat aspect computer-aid  share trade  problem 
statist analysi share price probabilist characterist price  
seri method mathemat model  price   seri probabilist
characterist
""".strip().lower()

## Batch keyword extraction based on k-core

### Main Core

In [3]:
extractor_kw = KcoreKeywordExtractor(directed=False, weighted = True, window_size=8)

In [4]:
extractor_kw.extract(preprocessed_text)

[('mathemat', 11),
 ('method', 11),
 ('model', 11),
 ('probabilist', 11),
 ('price', 11),
 ('characterist', 11),
 ('seri', 11)]

### Density

In [5]:
extractor_kw = KcoreKeywordExtractor(directed=False, weighted = True, window_size=8, 
                                     selection_method='density')

In [6]:
extractor_kw.extract(preprocessed_text)

[('mathemat', 11),
 ('price', 11),
 ('probabilist', 11),
 ('characterist', 11),
 ('seri', 11),
 ('method', 11),
 ('model', 11),
 ('share', 10)]

### Inflexion

In [7]:
extractor_kw = KcoreKeywordExtractor(directed=False, weighted = True, window_size=8, 
                                     selection_method='inflexion')

In [8]:
extractor_kw.extract(preprocessed_text)

[('mathemat', 11),
 ('price', 11),
 ('probabilist', 11),
 ('characterist', 11),
 ('seri', 11),
 ('method', 11),
 ('model', 11),
 ('share', 10),
 ('trade', 9),
 ('problem', 9),
 ('statist', 9),
 ('analysi', 9)]

## Word-level keyword extraction

### CoreRank + elbow

In [9]:
extractor_kw_cr = CoreRankKeywordExtractor(directed=False, weighted = True, window_size=8)

In [10]:
extractor_kw_cr.extract(preprocessed_text)

[('mathemat', 128),
 ('price', 120),
 ('analysi', 119),
 ('share', 118),
 ('probabilist', 112),
 ('characterist', 112),
 ('statist', 108),
 ('trade', 97),
 ('problem', 97),
 ('seri', 94)]

### CoreRank + top 33%

In [11]:
extractor_kw_cr = CoreRankKeywordExtractor(directed=False, weighted = True, window_size=8, n=0.33)

In [12]:
extractor_kw_cr.extract(preprocessed_text)

[('mathemat', 128),
 ('price', 120),
 ('analysi', 119),
 ('share', 118),
 ('probabilist', 112)]