![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/legal-nlp/14.0.Legal_ChunkKeyPhraseExtraction.ipynb.ipynb)

#🎬 Installation

In [None]:
! pip install -q johnsnowlabs

##🔗 Automatic Installation
Using my.johnsnowlabs.com SSO

In [None]:
from johnsnowlabs import nlp, finance, legal

nlp.install(force_browser=True)

##🔗 Manual downloading
If you are not registered in my.johnsnowlabs.com, you received a license via e-email or you are using Safari, you may need to do a manual update of the license.

- Go to my.johnsnowlabs.com
- Download your license
- Upload it using the following command

In [None]:
from google.colab import files
print('Please Upload your John Snow Labs License using the button below')
license_keys = files.upload()

- Install it

In [None]:
nlp.install()

#📌 Starting

In [None]:
spark = nlp.start()

⏳ Load sample txt file

In [None]:
text = """
INTELLECTUAL PROPERTY AGREEMENT

This INTELLECTUAL PROPERTY AGREEMENT (this "Agreement"), dated as of December 31, 2018 (the "Effective Date") is entered into by and between Armstrong Flooring, Inc., a Delaware corporation ("Seller") and AFI Licensing LLC, a Delaware limited liability company ("Licensing" and together with Seller, "Arizona") and AHF Holding, Inc. (formerly known as Tarzan HoldCo, Inc.), a Delaware corporation ("Buyer") and Armstrong Hardwood Flooring Company, a Tennessee corporation (the "Company" and together with Buyer the "Buyer Entities") (each of Arizona on the one hand and the Buyer Entities on the other hand, a "Party" and collectively, the "Parties").
"""

In [None]:
empty_data = spark.createDataFrame([[""]]).toDF("text")
textDF = spark.createDataFrame([[text]]).toDF("text")

## 🔎 **Chunk Key Phrase Extraction**


📜Explanation:

Chunk Key Phrase Extraction is a technique used in natural language processing (NLP) to identify and extract key phrases or important chunks of text from a given document or text corpus. Key phrases are typically defined as meaningful and informative phrases that capture the essence of the content.

The process of Chunk Key Phrase Extraction involves several steps:

- **Tokenization:** The input text is divided into smaller units called tokens, which can be words, phrases, or even characters. Tokenization helps in breaking down the text into meaningful components that can be further analyzed.

- **Part-of-Speech (POS) Tagging:** Each token is assigned a part-of-speech tag, which indicates the grammatical category or role of the word in the sentence (e.g., noun, verb, adjective). POS tagging helps in understanding the syntactic structure of the text.

- **Chunking:** Chunking is the process of grouping together tokens based on specific patterns or rules. It involves identifying and extracting meaningful chunks of words that form meaningful phrases or constituents. These chunks are typically noun phrases or verb phrases that convey important information.

- **Key Phrase Extraction:** From the extracted chunks, the algorithm selects and ranks key phrases based on their importance or relevance to the overall content. Various techniques can be employed for ranking, such as frequency-based approaches or statistical models that consider the contextual information of the phrases.

Chunk Key Phrase Extraction is often used in applications such as information retrieval, document summarization, sentiment analysis, and text classification. It helps in identifying the most significant and informative phrases in a text, enabling better understanding and analysis of the content.

In [None]:
documenter = nlp.DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("document")

sentencer = nlp.SentenceDetector() \
    .setInputCols(["document"]) \
    .setOutputCol("sentences")

tokenizer = nlp.Tokenizer() \
    .setInputCols(["document"]) \
    .setOutputCol("tokens") \
    .setSplitChars(['\[','\]']) \

stop_words_cleaner = nlp.StopWordsCleaner.pretrained()\
    .setInputCols("tokens")\
    .setOutputCol("clean_tokens")\
    .setCaseSensitive(False)

ngram_generator = nlp.NGramGenerator()\
    .setInputCols(["clean_tokens"])\
    .setOutputCol("ngrams")\
    .setN(3)

ngram_key_phrase_extractor = legal.ChunkKeyPhraseExtraction.pretrained()\
    .setTopN(10) \
    .setDivergence(0.4)\
    .setInputCols(["sentences", "ngrams"])\
    .setOutputCol("ngram_key_phrases")

ngram_pipeline = nlp.Pipeline(stages=[
    documenter, 
    sentencer, 
    tokenizer, 
    stop_words_cleaner,
    ngram_generator,
    ngram_key_phrase_extractor
])

In [None]:
ngram_results = ngram_pipeline.fit(empty_data).transform(textDF)

**Lets show N-Gram results.**

In [None]:
ngram_results.selectExpr("explode(ngrams) AS key_phrase_candidate").show(30,truncate=False)

+---------------------------------------------------------------------------------+
|key_phrase_candidate                                                             |
+---------------------------------------------------------------------------------+
|{chunk, 1, 31, INTELLECTUAL PROPERTY AGREEMENT, {sentence -> 0, chunk -> 0}, []} |
|{chunk, 14, 50, PROPERTY AGREEMENT INTELLECTUAL, {sentence -> 0, chunk -> 1}, []}|
|{chunk, 23, 59, AGREEMENT INTELLECTUAL PROPERTY, {sentence -> 0, chunk -> 2}, []}|
|{chunk, 39, 69, INTELLECTUAL PROPERTY AGREEMENT, {sentence -> 0, chunk -> 3}, []}|
|{chunk, 52, 71, PROPERTY AGREEMENT (, {sentence -> 0, chunk -> 4}, []}           |
|{chunk, 61, 77, AGREEMENT ( ", {sentence -> 0, chunk -> 5}, []}                  |
|{chunk, 71, 86, ( " Agreement, {sentence -> 0, chunk -> 6}, []}                  |
|{chunk, 77, 89, " Agreement "),, {sentence -> 0, chunk -> 7}, []}                |
|{chunk, 78, 95, Agreement "), dated, {sentence -> 0, chunk -> 8}, []}      

**Check the key phrases from N-Gram results.**

In [None]:
ngram_results.selectExpr("explode(ngram_key_phrases) AS ngram_key_phrases").show(truncate=170)

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                                                         ngram_key_phrases|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{chunk, 253, 267, LLC Delaware, {sentence -> 0, chunk -> 32, DocumentSimilarity -> 0.67082428827937, MMRScore -> 0.4024945889613194}, [-0.054024786, 0.8892173, -0.0607...|
|{chunk, 243, 256, Licensing LLC, {sentence -> 0, chunk -> 31, DocumentSimilarity -> 0.623435274176818, MMRScore -> 0.07391284062011122}, [0.3169803, 1.18685, -0.122759...|
|{chunk, 297, 331, Licensing " Seller, {sentence -> 0, chunk -> 39, DocumentSimilarity -> 0.5620123028023943, MMRScore -> 0.10101117303

**Show the selected key phrases, the cosine similarity to the document, the Maximal Marginal Relevance score and the sentence they where key phrase was found in.**

In [None]:
import pyspark.sql.functions as F

ngram_results.select(F.explode(F.arrays_zip(ngram_results.ngram_key_phrases.result,
                                            ngram_results.ngram_key_phrases.metadata)).alias("cols"))\
              .select(F.expr("cols['0']").alias("key_phrase"),
                      F.expr("cols['1']['DocumentSimilarity']").alias("DocumentSimilarity"),
                      F.expr("cols['1']['MMRScore']").alias("MMRScore"),
                      F.expr("cols['1']['sentence']").alias("sentence")).show(truncate=False)

+-------------------------------+-------------------+--------------------+--------+
|key_phrase                     |DocumentSimilarity |MMRScore            |sentence|
+-------------------------------+-------------------+--------------------+--------+
|LLC Delaware                   |0.67082428827937   |0.4024945889613194  |0       |
|Licensing LLC                  |0.623435274176818  |0.07391284062011122 |0       |
|Licensing " Seller             |0.5620123028023943 |0.10101117303174226 |0       |
|corporation " Seller           |0.5601213486826419 |0.03941356508847493 |0       |
|Arizona hand Buyer             |0.5298374343984847 |0.11960417979726506 |0       |
|INTELLECTUAL PROPERTY AGREEMENT|0.5139882459942415 |0.11717650664539808 |0       |
|Company Tennessee              |0.4410448906986666 |0.021020910594832964|0       |
|limited liability company      |0.4326145252236725 |0.03342918943901377 |0       |
|Tarzan HoldCo                  |0.36380691282041117|0.05453937275093976 |0 

# with NER Model

Now we will show how to get key phrases from NER chunks by feeding `ChunkKeyPhraseExtraction` with the output of `NerConverter`.

In [None]:
documenter = nlp.DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("document")

sentencer = nlp.SentenceDetector() \
    .setInputCols(["document"]) \
    .setOutputCol("sentences")

tokenizer = nlp.Tokenizer() \
    .setInputCols(["document"]) \
    .setOutputCol("tokens") \
    .setSplitChars(['\[','\]']) 

embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base", "en") \
    .setInputCols("sentences", "tokens") \
    .setOutputCol("embeddings")\
    .setMaxSentenceLength(512)\
    .setCaseSensitive(True)

ner_tagger = legal.NerModel.pretrained('legner_contract_doc_parties_lg', 'en', 'legal/models')\
    .setInputCols(["sentences", "tokens", "embeddings"]) \
    .setOutputCol("ner_tags")

ner_converter = legal.NerConverterInternal()\
    .setInputCols("sentences", "tokens", "ner_tags")\
    .setOutputCol("ner_chunks")

ner_key_phrase_extractor = legal.ChunkKeyPhraseExtraction.pretrained()\
    .setTopN(10) \
    .setDivergence(0.4)\
    .setInputCols(["sentences", "ner_chunks"])\
    .setOutputCol("ner_key_phrases")

ner_pipeline = nlp.Pipeline(stages=[
    documenter, 
    sentencer, 
    tokenizer, 
    embeddings, 
    ner_tagger, 
    ner_converter, 
    ner_key_phrase_extractor
])

In [None]:
ner_results = ner_pipeline.fit(empty_data).transform(textDF)

In [None]:
# ner_chunk results
ner_results.select(F.explode(F.arrays_zip(ner_results.ner_chunks.result,
                                          ner_results.ner_chunks.metadata)).alias("cols"))\
           .select(F.expr("cols['0']").alias("ner_chunk"),
                   F.expr("cols['1']['entity']").alias("label")).show(50, truncate=False)

+-----------------------------------+-----------+
|ner_chunk                          |label      |
+-----------------------------------+-----------+
|INTELLECTUAL PROPERTY AGREEMENT    |DOC        |
|INTELLECTUAL PROPERTY AGREEMENT    |DOC        |
|December 31, 2018                  |EFFDATE    |
|Armstrong Flooring, Inc            |PARTY      |
|Seller                             |ALIAS      |
|AFI Licensing LLC                  |PARTY      |
|Licensing                          |ALIAS      |
|Seller                             |PARTY      |
|Arizona                            |ALIAS      |
|AHF Holding, Inc                   |PARTY      |
|Tarzan HoldCo, Inc                 |FORMER_NAME|
|Buyer                              |ALIAS      |
|Armstrong Hardwood Flooring Company|PARTY      |
|Company                            |ALIAS      |
|Buyer                              |PARTY      |
|Buyer Entities                     |ALIAS      |
|Arizona                            |PARTY      |


In [None]:
ner_results.select(F.explode(F.arrays_zip(ner_results.ner_key_phrases.result, 
                                          ner_results.ner_key_phrases.metadata)).alias("cols"))\
           .select(F.expr("cols['0']").alias("key_phrase"),
                   F.expr("cols['1']['entity']").alias("label"),
                   F.expr("cols['1']['DocumentSimilarity']").alias("DocumentSimilarity"),
                   F.expr("cols['1']['MMRScore']").alias("MMRScore"),
                   F.expr("cols['1']['sentence']").alias("sentence")).show(truncate=False)

+-----------------------------------+-----------+-------------------+---------------------+--------+
|key_phrase                         |label      |DocumentSimilarity |MMRScore             |sentence|
+-----------------------------------+-----------+-------------------+---------------------+--------+
|Buyer Entities                     |ALIAS      |0.5784693799173213 |0.34708164174217754  |1       |
|AFI Licensing LLC                  |PARTY      |0.565162563524993  |0.09504163803345092  |0       |
|Tarzan HoldCo, Inc                 |FORMER_NAME|0.5367663102209614 |0.12806869955230965  |1       |
|AHF Holding, Inc                   |PARTY      |0.5268001425682903 |0.0650575684927151   |0       |
|INTELLECTUAL PROPERTY AGREEMENT    |DOC        |0.5139882459942415 |0.11127398683991158  |0       |
|Armstrong Flooring, Inc            |PARTY      |0.47952994842282104|-0.01620789491482061 |0       |
|Armstrong Hardwood Flooring Company|PARTY      |0.43693383662177004|0.07879576837459534  |

# with NGramGenerator and NER Model

NGramGenerator and NER (Named Entity Recognition) Mode are additional components or techniques that can be used in conjunction with Chunk Key Phrase Extraction to enhance the extraction of key phrases.

- NGramGenerator: An NGram refers to a contiguous sequence of n items from a given text, where an item can be a word, character, or any other linguistic unit. NGramGenerator is a component that generates NGrams from the input text. By considering NGrams of varying lengths (unigrams, bigrams, trigrams, etc.), the NGramGenerator captures both single words and multi-word expressions, which can be valuable key phrases.

For example, if the input text is "I love to play soccer," the NGramGenerator can produce unigrams like "I," "love," "to," "play," and "soccer," as well as bigrams like "I love," "love to," "to play," and "play soccer." These NGrams provide more context and improve the extraction of meaningful key phrases.

- NER Mode (Named Entity Recognition): Named Entity Recognition is a subtask of NLP that aims to identify and classify named entities, such as person names, locations, organizations, dates, etc., in text. NER Mode is a specific setting or approach used during Chunk Key Phrase Extraction, where named entities are recognized and treated as important chunks or key phrases.

By incorporating NER Mode, the extraction process can specifically focus on extracting key phrases that represent named entities, which are typically highly informative and relevant in many applications. For instance, in a news article, named entities like "Barack Obama," "New York City," or "Apple Inc." are important key phrases that convey crucial information.

Using NGramGenerator and NER Mode in combination with Chunk Key Phrase Extraction can lead to more accurate and comprehensive extraction of key phrases from text. These techniques allow for the identification of meaningful phrases, including single words, multi-word expressions, and named entities, which contribute to a better understanding of the content and enable more effective analysis.

In [None]:
documenter = nlp.DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("document")

sentencer = nlp.SentenceDetector() \
    .setInputCols(["document"]) \
    .setOutputCol("sentences")

tokenizer = nlp.Tokenizer() \
    .setInputCols(["document"]) \
    .setOutputCol("tokens") \
    .setSplitChars(['\[','\]']) 

stop_words_cleaner = nlp.StopWordsCleaner.pretrained()\
    .setInputCols("tokens")\
    .setOutputCol("clean_tokens")\
    .setCaseSensitive(False)

ngram_generator = nlp.NGramGenerator()\
    .setInputCols(["clean_tokens"])\
    .setOutputCol("ngrams")\
    .setN(3)
        
embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base", "en") \
    .setInputCols("sentences", "tokens") \
    .setOutputCol("embeddings")\
    .setMaxSentenceLength(512)\
    .setCaseSensitive(True)

ner_tagger = legal.NerModel.pretrained('legner_contract_doc_parties_lg', 'en', 'legal/models')\
    .setInputCols(["sentences", "tokens", "embeddings"]) \
    .setOutputCol("ner_tags")

ner_converter = legal.NerConverterInternal()\
    .setInputCols("sentences", "tokens", "ner_tags")\
    .setOutputCol("ner_chunks")

chunk_merger = legal.ChunkMergeApproach()\
    .setInputCols("ngrams", "ner_chunks")\
    .setOutputCol("merged_chunks")\
    .setMergeOverlapping(False)

ngram_ner_key_phrase_extractor = legal.ChunkKeyPhraseExtraction.pretrained()\
    .setTopN(10) \
    .setDivergence(0.4)\
    .setInputCols(["sentences", "merged_chunks"])\
    .setOutputCol("key_phrases")

ngram_ner_pipeline = nlp.Pipeline(stages=[
    documenter, 
    sentencer, 
    tokenizer, 
    stop_words_cleaner,
    ngram_generator,
    embeddings, 
    ner_tagger, 
    ner_converter, 
    chunk_merger,
    ngram_ner_key_phrase_extractor
])

In [None]:
ngram_ner_results = ngram_ner_pipeline.fit(empty_data).transform(textDF)

**Show the merged key phrase candidate results. `UNK` ones from NGramGenerator and the others from `ner_jsl` model.**

In [None]:
ngram_ner_results.selectExpr("explode(merged_chunks) AS key_phrase_candidate").show(30,truncate=False)

+---------------------------------------------------------------------------------------------------------------------------------------------------+
|key_phrase_candidate                                                                                                                               |
+---------------------------------------------------------------------------------------------------------------------------------------------------+
|{chunk, 1, 31, INTELLECTUAL PROPERTY AGREEMENT, {entity -> UNK, chunk -> 0, sentence -> 0}, []}                                                    |
|{chunk, 1, 31, INTELLECTUAL PROPERTY AGREEMENT, {chunk -> 1, confidence -> 0.83703333, ner_source -> ner_chunks, entity -> DOC, sentence -> 0}, []}|
|{chunk, 14, 50, PROPERTY AGREEMENT INTELLECTUAL, {entity -> UNK, chunk -> 2, sentence -> 0}, []}                                                   |
|{chunk, 23, 59, AGREEMENT INTELLECTUAL PROPERTY, {entity -> UNK, chunk -> 3, sentence -> 0}, []}   

In [None]:
# NER chunk results
ngram_ner_results.select(F.explode(F.arrays_zip(ngram_ner_results.merged_chunks.result,
                                                ngram_ner_results.merged_chunks.metadata)).alias("cols"))\
                 .select(F.expr("cols['0']").alias("key_phrase_candidate"),
                         F.expr("cols['1']['entity']").alias("label")).filter("label != 'UNK'").show(50, truncate=False)

+-----------------------------------+-----------+
|key_phrase_candidate               |label      |
+-----------------------------------+-----------+
|INTELLECTUAL PROPERTY AGREEMENT    |DOC        |
|INTELLECTUAL PROPERTY AGREEMENT    |DOC        |
|December 31, 2018                  |EFFDATE    |
|Armstrong Flooring, Inc            |PARTY      |
|Seller                             |ALIAS      |
|AFI Licensing LLC                  |PARTY      |
|Licensing                          |ALIAS      |
|Seller                             |PARTY      |
|Arizona                            |ALIAS      |
|AHF Holding, Inc                   |PARTY      |
|Tarzan HoldCo, Inc                 |FORMER_NAME|
|Buyer                              |ALIAS      |
|Armstrong Hardwood Flooring Company|PARTY      |
|Company                            |ALIAS      |
|Buyer                              |PARTY      |
|Buyer Entities                     |ALIAS      |
|Arizona                            |PARTY      |


In [None]:
# ngram results
ngram_ner_results.select(F.explode(F.arrays_zip(ngram_ner_results.merged_chunks.result,
                                                ngram_ner_results.merged_chunks.metadata)).alias("cols"))\
                 .select(F.expr("cols['0']").alias("key_phrase_candidate"),
                         F.expr("cols['1']['entity']").alias("label")).filter("label == 'UNK'").show(50, truncate=False)

+-------------------------------+-----+
|key_phrase_candidate           |label|
+-------------------------------+-----+
|INTELLECTUAL PROPERTY AGREEMENT|UNK  |
|PROPERTY AGREEMENT INTELLECTUAL|UNK  |
|AGREEMENT INTELLECTUAL PROPERTY|UNK  |
|INTELLECTUAL PROPERTY AGREEMENT|UNK  |
|PROPERTY AGREEMENT (           |UNK  |
|AGREEMENT ( "                  |UNK  |
|( " Agreement                  |UNK  |
|" Agreement "),                |UNK  |
|Agreement "), dated            |UNK  |
|"), dated December             |UNK  |
|dated December 31              |UNK  |
|December 31 ,                  |UNK  |
|31 , 2018                      |UNK  |
|, 2018 (                       |UNK  |
|2018 ( "                       |UNK  |
|( " Effective                  |UNK  |
|" Effective Date               |UNK  |
|Effective Date ")              |UNK  |
|Date ") entered                |UNK  |
|") entered Armstrong           |UNK  |
|entered Armstrong Flooring     |UNK  |
|Armstrong Flooring ,           |UNK  |


In [None]:
# merged (NER chunk + ngram) results
ngram_ner_results.select(F.explode(F.arrays_zip(ngram_ner_results.merged_chunks.result,
                                                ngram_ner_results.merged_chunks.metadata)).alias("cols"))\
                 .select(F.expr("cols['0']").alias("key_phrase_candidate"),
                         F.expr("cols['1']['entity']").alias("label")).show(50, truncate=False)

+-------------------------------+-------+
|key_phrase_candidate           |label  |
+-------------------------------+-------+
|INTELLECTUAL PROPERTY AGREEMENT|UNK    |
|INTELLECTUAL PROPERTY AGREEMENT|DOC    |
|PROPERTY AGREEMENT INTELLECTUAL|UNK    |
|AGREEMENT INTELLECTUAL PROPERTY|UNK    |
|INTELLECTUAL PROPERTY AGREEMENT|UNK    |
|INTELLECTUAL PROPERTY AGREEMENT|DOC    |
|PROPERTY AGREEMENT (           |UNK    |
|AGREEMENT ( "                  |UNK    |
|( " Agreement                  |UNK    |
|" Agreement "),                |UNK    |
|Agreement "), dated            |UNK    |
|"), dated December             |UNK    |
|dated December 31              |UNK    |
|December 31 ,                  |UNK    |
|December 31, 2018              |EFFDATE|
|31 , 2018                      |UNK    |
|, 2018 (                       |UNK    |
|2018 ( "                       |UNK    |
|( " Effective                  |UNK    |
|" Effective Date               |UNK    |
|Effective Date ")              |U

In [None]:
ngram_ner_results.selectExpr("explode(merged_chunks) AS key_phrase_candidate")\
                 .selectExpr("key_phrase_candidate.result AS key_phrase_candidate",
                             "IF(key_phrase_candidate.metadata.entity = 'UNK', 'ngram', 'NER') AS source",
                             "key_phrase_candidate.metadata.sentence")\
                 .show(50, truncate=False)

+-------------------------------+------+--------+
|key_phrase_candidate           |source|sentence|
+-------------------------------+------+--------+
|INTELLECTUAL PROPERTY AGREEMENT|ngram |0       |
|INTELLECTUAL PROPERTY AGREEMENT|NER   |0       |
|PROPERTY AGREEMENT INTELLECTUAL|ngram |0       |
|AGREEMENT INTELLECTUAL PROPERTY|ngram |0       |
|INTELLECTUAL PROPERTY AGREEMENT|ngram |0       |
|INTELLECTUAL PROPERTY AGREEMENT|NER   |0       |
|PROPERTY AGREEMENT (           |ngram |0       |
|AGREEMENT ( "                  |ngram |0       |
|( " Agreement                  |ngram |0       |
|" Agreement "),                |ngram |0       |
|Agreement "), dated            |ngram |0       |
|"), dated December             |ngram |0       |
|dated December 31              |ngram |0       |
|December 31 ,                  |ngram |0       |
|December 31, 2018              |NER   |0       |
|31 , 2018                      |ngram |0       |
|, 2018 (                       |ngram |0       |


In [None]:
ngram_ner_results.select(F.explode(F.arrays_zip(ngram_ner_results.key_phrases.result,
                                                ngram_ner_results.key_phrases.metadata)).alias("cols"))\
                 .select(F.expr("cols['0']").alias("key_phrase"),
                         F.expr("cols['1']['entity']").alias("label"),
                         F.expr("cols['1']['DocumentSimilarity']").alias("DocumentSimilarity"),
                         F.expr("cols['1']['MMRScore']").alias("MMRScore"),
                         F.expr("cols['1']['sentence']").alias("sentence")).show(truncate=False)

+-----------------------------------+-----------+-------------------+--------------------+--------+
|key_phrase                         |label      |DocumentSimilarity |MMRScore            |sentence|
+-----------------------------------+-----------+-------------------+--------------------+--------+
|LLC Delaware                       |UNK        |0.67082428827937   |0.4024945889613194  |0       |
|Licensing LLC                      |UNK        |0.623435274176818  |0.07391284062011122 |0       |
|Licensing " Seller                 |UNK        |0.5620123028023943 |0.10101117303174226 |0       |
|corporation " Seller               |UNK        |0.5601213486826419 |0.03941356508847493 |0       |
|Tarzan HoldCo, Inc                 |FORMER_NAME|0.5367663102209614 |0.14132001185836535 |1       |
|Arizona hand Buyer                 |UNK        |0.5298374343984847 |0.11960417979726506 |0       |
|AHF Holding, Inc                   |PARTY      |0.5268001425682903 |0.07314865925079808 |0       |
