

![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/CONTEXTUAL_WORD_MEANING.ipynb)



# **Infer word meaning from context**

Compare the meaning of words in two different sentences and evaluate ambiguous pronouns.

## 1. Colab Setup

In [1]:
!wget http://setup.johnsnowlabs.com/colab.sh -O - | bash
# !bash colab.sh
# -p is for pyspark
# -s is for spark-nlp
# !bash colab.sh -p 3.1.1 -s 3.0.1
# by default they are set to the latest

openjdk version "11.0.10" 2021-01-19
OpenJDK Runtime Environment (build 11.0.10+9-Ubuntu-0ubuntu1.18.04)
OpenJDK 64-Bit Server VM (build 11.0.10+9-Ubuntu-0ubuntu1.18.04, mixed mode, sharing)
setup Colab for PySpark 3.1.1 and Spark NLP 3.0.0
[K     |████████████████████████████████| 212.3MB 63kB/s 
[K     |████████████████████████████████| 143kB 45.1MB/s 
[K     |████████████████████████████████| 204kB 47.7MB/s 
[?25h  Building wheel for pyspark (setup.py) ... [?25l[?25hdone


In [2]:
import pandas as pd
import numpy as np
import json
from pyspark.ml import Pipeline
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
from sparknlp.annotator import *
from sparknlp.base import *
import sparknlp
from sparknlp.pretrained import PretrainedPipeline

## 2. Start Spark Session

In [3]:
spark = sparknlp.start()

## 3. Select the model to use

In [4]:
#MODEL_NAME = 't5_small'
MODEL_NAME = 't5_base'

### 3.1 Select the task

The `T5 Transformer` model is able to perform 18 different tasks (ref.: [this paper](https://arxiv.org/abs/1910.10683)). To infer word meaning from context, we can use the following tasks:

- `wic`: Classify for a pair of sentences and a disambigous word if the word has the same meaning in both sentences.
- `wsc-dpr`: Predict for an ambiguous pronoun in a sentence what it is referring to.

In [5]:
#TASK = 'wic'
TASK = 'wsc-dpr'

In [6]:
# Prefix to be used on the T5Transformer().setTask(<<prefix>>)
task_prefix = {
                'wic': 'wic pos::', 
                'wsc-dpr': 'wsc:',
            }

## 4 Examples to try on the model

In [7]:
text_lists = {
            'wic':      ["""
                        pos:
                        sentence1: The expanded window will give us time to catch the thieves.
                        sentence2: You have a two-hour window of turning in your homework.
                        word: window
                        """],
            'wsc-dpr':  ["""The stable was very roomy, with four good stalls; a large swinging window opened into the yard , which made *it* pleasant and airy."""]
            }

## 5. Define the Spark NLP pipeline

In [8]:
document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("documents")

t5 = T5Transformer() \
    .pretrained(MODEL_NAME) \
    .setTask(task_prefix[TASK])\
    .setMaxOutputLength(200)\
    .setInputCols(["documents"]) \
    .setOutputCol("T5")

pipeline = Pipeline(stages=[document_assembler, t5])

t5_base download started this may take some time.
Approximate size to download 446 MB
[OK!]


## 6. Run the pipeline

In [9]:
# Fit on empty data frame (model is pretrained)
empty_df = spark.createDataFrame([['']]).toDF('text')
pipeline_model = pipeline.fit(empty_df)

# Send example texts to spark data frame
text_df = spark.createDataFrame(pd.DataFrame({'text': text_lists[TASK]}))

# Predict with the Pipeline model
result = pipeline_model.transform(text_df)

# Create Light Pipeline
lmodel = LightPipeline(pipeline_model)

# Predict with then Ligh Pipeline model
res = lmodel.fullAnnotate(text_lists[TASK])

## 7. Visualize the results

Using Light Pipeline:

In [10]:
for r in res:
    print(f"{r['documents'][0].result} => {r['T5'][0].result}")

The stable was very roomy, with four good stalls; a large swinging window opened into the yard , which made *it* pleasant and airy. => True


Using pipeline model:

In [11]:
result.select('text', 'T5.result').show(truncate=150)

+-----------------------------------------------------------------------------------------------------------------------------------+------+
|                                                                                                                               text|result|
+-----------------------------------------------------------------------------------------------------------------------------------+------+
|The stable was very roomy, with four good stalls; a large swinging window opened into the yard , which made *it* pleasant and airy.|[True]|
+-----------------------------------------------------------------------------------------------------------------------------------+------+

