# Background key phrase matching

This is an example of matching background key phrase to text key phrases
by using [EmbedRank implementation](https://github.com/swisscom/ai-research-keyphrase-extraction).

EmbedRank embeds both the document and background phrases into the same embedding space.
Current background phrases:

**"hill", "beach", "mountain", "valley", "city"**

Determining a suitable tag is done by using [Maximal Margin Relevance](https://medium.com/tech-that-works/maximal-marginal-relevance-to-rerank-results-in-unsupervised-keyphrase-extraction-22d95015c7c5#:~:text=Maximal%20Marginal%20Relevance%20a.k.a.%20MMR,already%20ranked%20documents%2Fphrases%20etc.)
using the cosine similarity between the background tags
and the document in order to model the informativness and the cosine similarity between
the tags is used to model the diversity.

[Example text](text.txt)

### Libraries

In [1]:
import sys
import pandas as pd

sys.path.append("..")

from swisscom import launch

### Reading text

In [2]:
file = open("text.txt")
raw_text = file.read()

### Creating embedding distributor and position tagger

In [3]:
embedding_distributor = launch.load_local_embedding_distributor()
pos_tagger = launch.load_local_corenlp_pos_tagger()

### Matching key phrases

In [4]:
kp = launch.extract_keyphrases(embedding_distributor, pos_tagger, raw_text, 5, 'en')  # extract 5 keyphrases

phrases, relevances, aliases = kp

In [15]:
data = { 'Phrase': phrases,
         'Relevance': relevances,
         'Aliases': aliases
         }

df = pd.DataFrame(data, columns=['Phrase', 'Relevance', 'Aliases'])
df.style.hide_index()


Phrase,Relevance,Aliases
mountain,1.0,[]
city,0.568412,[]
valley,0.819151,[]
hill,0.707898,[]
beach,0.371686,[]
