

![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/CLASSIFICATION_TR_NEWS.ipynb)




# **Classify Question Pairs**

## 1. Colab Setup

In [None]:
# Install PySpark and Spark NLP
! pip install -q pyspark==3.3.0 spark-nlp==4.2.1

In [None]:
import numpy as np
import json
from pyspark.ml import Pipeline
from pyspark.sql.types import StringType, IntegerType
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
from sparknlp.annotator import *
from sparknlp.base import *
import sparknlp
from sparknlp.pretrained import PretrainedPipeline

## 2. Start Spark Session

In [None]:
spark = sparknlp.start()

## 3. Some sample examples

In [None]:
q1_list = ["How is your studies going?", 'If the Universe was born at the Big Bang, what existed before then?', 
           'After Obama finishes his presidency, does he still receive Secret Service protection?', 
          'Am looking for motivational books to read?',  'Antonio has a deep prejudice against Shylock. Is Shylocks anger towards Antonio justified? Why or why not?']

q2_list = ["How is your days going?",  'What actually existed before the Big Bang?', 
           'Does President Obama and his family have secret service protection for the rest of their lives?',
          'What motivational books one should read?', 'Who is the hero of The Merchant of Venice?']

## 4. Define Spark NLP pipeline

In [None]:
pipeline = PretrainedPipeline("classifierdl_electra_questionpair_pipeline", "en")

classifierdl_electra_questionpair_pipeline download started this may take some time.
Approx size to download 1.2 GB
[OK!]


## 5. Run the pipeline

In [None]:
## Getting one result

res =  pipeline.fullAnnotate(f"q1: How is your studies going q2: How is your days going?")

print(res[0]['class'][0].result)

almost_same


In [None]:
## Get all the results and save it in a dict

results = {}
for i, q1 in enumerate(q1_list):
    for j, q2 in enumerate(q1_list):
        result =  pipeline.fullAnnotate(f"q1: {q1} q2: {q2}") 
        a = result[0]['class'][0].result
        
        results[f'{i}{j}'] = a

## 6. Visualize results

In [None]:
print(results)

{'00': 'almost_same', '01': 'not_same', '02': 'not_same', '03': 'not_same', '04': 'not_same', '10': 'not_same', '11': 'almost_same', '12': 'not_same', '13': 'not_same', '14': 'not_same', '20': 'not_same', '21': 'not_same', '22': 'almost_same', '23': 'not_same', '24': 'not_same', '30': 'not_same', '31': 'not_same', '32': 'not_same', '33': 'almost_same', '34': 'not_same', '40': 'not_same', '41': 'not_same', '42': 'not_same', '43': 'not_same', '44': 'almost_same'}
