![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)



[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/spark_nlp_utilities/NLU_utils_for_Spark_NLP.ipynb)
# NLU utilities for Spark NLP
This notebook showcases various utils provided for Spark NLP by NLU

## Install and Authorize

In [None]:
%%capture
import nlu
! pip install nlu pyspark==3.1.1
SPARK_NLP_LICENSE = "YOUR SECRETS HERE"
AWS_ACCESS_KEY_ID = "YOUR SECRETS HERE"
AWS_SECRET_ACCESS_KEY = "YOUR SECRETS HERE"
JSL_SECRET = "YOUR SECRETS HERE"
nlu.auth(SPARK_NLP_LICENSE, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, JSL_SECRET)

## nlu.viz(pipe,data) 

Visualize input data with an already configured Spark NLP pipeline,  
for Algorithms of type (Ner,Assertion, Relation, Resolution, Dependency)  
using [Spark NLP Display](https://nlp.johnsnowlabs.com/docs/en/display)  
Automatically infers applicable viz type and output columns to use for visualization.  



If a pipeline has multiple models candidates that can be used for a viz,  
the first Annotator that is vizzable will be used to create viz.  
You can specify which type of viz to create with the viz_type parameter  
  
Output columns to use for the viz are automatically deducted from the pipeline, by using the  
first annotator that provides the correct output type for a specific viz.  
You can specify which columns to use for a viz by using the  
corresponding ner_col, pos_col, dep_untyped_col, dep_typed_col, resolution_col, relation_col, assertion_col, parameters.


In [26]:
# works with Pipeline, LightPipeline, PipelineModel, List[Annotator] 
from sparknlp.pretrained import PretrainedPipeline, LightPipeline

ade_pipeline = PretrainedPipeline('explain_clinical_doc_ade', 'en', 'clinical/models')
text = """I have an allergic reaction to vancomycin. 
    My skin has be itchy, sore throat/burning/itchy, and numbness in tongue and gums. 
    I would not recommend this dr	- new conversion tool


explain_clinical_doc_ade download started this may take some time.
Approx size to download 462.2 MB
[OK!]


## nlu.to_pretty_df(pipe,data) 

Annotates a Pandas Dataframe/Pandas Series/Numpy Array/Spark DataFrame/Python List strings /Python String  
with given Spark NLP pipeline, which is assumed to be complete and runnable and returns it in a pythonic pandas dataframe format.


Annotators are grouped internally by NLU into output levels `token`,`sentence`, `document`,`chunk` and `relation`
Same level annotators output columns are zipped and exploded together to create  the final output df. 
Additionally, most keys from the metadata dictionary in the result annotations will be collected and expanded into their own columns in the resulting Dataframe, with special handling for Annotators that encode multiple metadata fields inside of one, seperated by strings like `|||`   or `:::`.
Some columns are omitted from metadata to reduce total amount of output columns, these can be re-enabled by setting `metadata=True`

For a given pipeline output level is automatically set to the last anntators output level by default.
This can be changed by defining `to_preddty_df(pipe,text,output_level='my_level'` for levels `token`,`sentence`, `document`,`chunk` and `relation` . 

In [66]:
# works with Pipeline, LightPipeline, PipelineModel, List[Annotator] 

text = """I have an allergic reaction to vancomycin. 
    My skin has be itchy, sore throat/burning/itchy, and numbness in tongue and gums. 
    I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication."""
ade_assert_cols = ['assertion', 'entities_ner_chunks_ade_assertion',	 'entities_ner_chunks_ade_assertion_class','assertion_confidence']
df = nlu.to_pretty_df(ade_pipeline,text)
df[ade_assert_cols]


Unnamed: 0,assertion,entities_ner_chunks_ade_assertion,entities_ner_chunks_ade_assertion_class,assertion_confidence
0,present,allergic reaction,ADE,0.998
0,present,itchy,ADE,0.8414
0,present,sore throat/burning/itchy,ADE,0.9019
0,present,numbness in tongue and gums,ADE,0.9991
0,,,,


## nlu.to_nlu_pipe(pipe)

Convert a pipeline or list of annotators into a NLU pipeline making `.predict()` and `.viz()` avaiable for every Spark NLP pipeline.
Assumes the pipeline is already runnable.


In [69]:

text = """I have an allergic reaction to vancomycin. 
    My skin has be itchy, sore throat/burning/itchy, and numbness in tongue and gums. 
    I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication."""
nlu_pipe = nlu.to_nlu_pipe(ade_pipeline)
nlu_pipe.viz(text)
nlu_pipe.predict(text)[ade_assert_cols]



Unnamed: 0,assertion,entities_ner_chunks_ade_assertion,entities_ner_chunks_ade_assertion_class,assertion_confidence
0,present,allergic reaction,ADE,0.998
0,present,itchy,ADE,0.8414
0,present,sore throat/burning/itchy,ADE,0.9019
0,present,numbness in tongue and gums,ADE,0.9991
0,,,,


## nlu.autocomplete_pipeline(pipe)

Auto-Complete a pipeline or single annotator into a runnable pipeline by harnessing NLU's DAG Autocompletion algorithm and returns it as NLU pipeline.
The standard Spark pipeline is avaiable on the `.vanilla_transformer_pipe` attribute of the returned nlu pipe

Every Annotator and Pipeline of Annotators defines a `DAG` of tasks, with various dependencies that must be satisfied in `topoligical order`.
NLU enables the completion of an incomplete DAG by finding or creating a path between
the very first input node which is almost always is `DocumentAssembler/MultiDocumentAssembler` 
and the very last node(s), which is given by the `topoligical sorting` the iterable annotators parameter. 
Paths are created by resolving input features of annotators to the corrrosponding providers with matching storage references.


In [73]:
from sparknlp_jsl.annotator import RelationExtractionModel
text = """I have an allergic reaction to vancomycin. 
    My skin has be itchy, sore throat/burning/itchy, and numbness in tongue and gums. 
    I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication."""

re_model = RelationExtractionModel().pretrained("re_ade_clinical", "en", 'clinical/models')

nlu_pipe = nlu.autocomplete_pipeline(re_model)
df = nlu_pipe.predict(text)
cols = [
'relation_RelationExtractionModel_1fb1dfa024c7',
'relation_RelationExtractionModel_1fb1dfa024c7_confidence',
'relation_RelationExtractionModel_1fb1dfa024c7_entity1',
'relation_RelationExtractionModel_1fb1dfa024c7_entity2',
'relation_RelationExtractionModel_1fb1dfa024c7_entity2_class',
]

df[cols]

re_ade_clinical download started this may take some time.
Approximate size to download 10.9 MB
[OK!]
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
pos_anc download started this may take some time.
Approximate size to download 3.9 MB
[OK!]
dependency_conllu download started this may take some time.
Approximate size to download 16.7 MB
[OK!]
ner_jsl download started this may take some time.
[OK!]
sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]


Unnamed: 0,relation_RelationExtractionModel_1fb1dfa024c7,relation_RelationExtractionModel_1fb1dfa024c7_confidence,relation_RelationExtractionModel_1fb1dfa024c7_entity1,relation_RelationExtractionModel_1fb1dfa024c7_entity2,relation_RelationExtractionModel_1fb1dfa024c7_entity2_class
0,1,1.0,allergic reaction,vancomycin,Drug_Ingredient
0,1,0.9999999,skin,itchy,Symptom
0,1,0.99998033,skin,sore throat/burning/itchy,Symptom
0,1,0.9562254,skin,numbness,Symptom
0,1,0.9990915,skin,tongue,External_body_part_or_region
0,0,0.94292736,skin,gums,External_body_part_or_region
0,1,0.80632734,itchy,sore throat/burning/itchy,Symptom
0,1,0.52616316,itchy,numbness,Symptom
0,1,0.9999474,itchy,tongue,External_body_part_or_region
0,0,0.9946185,itchy,gums,External_body_part_or_region
