

![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/NER_PT.ipynb)




# **Detect tumor morphology in Spanish text**

## 1. Colab Setup

In [None]:
import os
import json

from google.colab import files

license_keys = files.upload()

with open(list(license_keys.keys())[0]) as f:
    license_keys = json.load(f)

sparknlp_version = license_keys["PUBLIC_VERSION"]
jsl_version = license_keys["JSL_VERSION"]

In [2]:
print ('SparkNLP Version:', sparknlp_version)
print ('SparkNLP-JSL Version:', jsl_version)

SparkNLP Version: 3.1.2
SparkNLP-JSL Version: 3.1.2


Install dependencies

In [3]:
%%capture
for k,v in license_keys.items(): 
    %set_env $k=$v

!wget https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/jsl_colab_setup.sh
!bash jsl_colab_setup.sh

# Install Spark NLP Display for visualization
!pip install --ignore-installed spark-nlp-display

## 2. Start the Spark session

In [4]:
import json
import pandas as pd
import numpy as np

from pyspark.ml import Pipeline
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
from sparknlp.annotator import *
from sparknlp.base import *
import sparknlp
from sparknlp.pretrained import PretrainedPipeline

import sparknlp_jsl
from sparknlp_jsl.annotator import *

spark = sparknlp_jsl.start(license_keys['SECRET'])

spark

## 3. Select the DL model

In [5]:
# If you change the model, re-run all the cells below.
# Applicable models: cantemist_scielowiki
MODEL_NAME = "cantemist_scielowiki"

## 4. Some sample examples

In [6]:
# Enter examples to be transformed as strings in this list
text_list = [
    """Anamnesis Paciente de 37 años de edad sin antecedentes patológicos ni quirúrgicos de interés. 
    En diciembre de 2012 consultó al Servicio de Urgencias por un cuadro de cefalea aguda e hipostesia del hemicuerpo izquierdo de 15 días de evolución 
    refractario a tratamiento. Exploración neurológica sin focalidad; fondo de ojo: papiledema unilateral. 
    Se solicitaron una TC del SNC, que objetiva una LOE frontal derecha con afectación aparente del cuerpo calloso, y una RM del SNC, 
    que muestra un extenso proceso expansivo intraparenquimatoso frontal derecho que infiltra la rodilla del cuerpo calloso, mal delimitada y sin componente necrótico. 
    Tras la administración de contraste se apreciaban diferentes realces parcheados en la lesión, 
    pero sin definirse una cápsula con aumento del flujo sanguíneo en la lesión, características compatibles con linfoma o astrocitoma anaplásico . 
    El 3 de enero de 2013 se efectúa biopsia intraoperatoria, con diagnóstico histológico de astrocitoma anaplásico GIII"""
]

## 5. Define Spark NLP pipeline

In [7]:
document_assembler = DocumentAssembler() \
    .setInputCol('text') \
    .setOutputCol('document')

tokenizer = Tokenizer() \
    .setInputCols(['document']) \
    .setOutputCol('token')

# Embeddings needs to be the same as the one used to train the model
embeddings = WordEmbeddingsModel.pretrained("embeddings_scielowiki_300d", "es", "clinical/models")\
    .setInputCols("document", "token") \
    .setOutputCol("embeddings")

ner_model = MedicalNerModel.pretrained(MODEL_NAME, 'es', 'clinical/models') \
    .setInputCols(['document', 'token', 'embeddings']) \
    .setOutputCol('ner')

ner_converter = NerConverter() \
    .setInputCols(['document', 'token', 'ner']) \
    .setOutputCol('ner_chunk')

nlp_pipeline = Pipeline(stages=[
    document_assembler, 
    tokenizer,
    embeddings,
    ner_model,
    ner_converter
])

embeddings_scielowiki_300d download started this may take some time.
Approximate size to download 351.2 MB
[OK!]
cantemist_scielowiki download started this may take some time.
Approximate size to download 21.4 MB
[OK!]


## 6. Run the pipeline

In [8]:
empty_df = spark.createDataFrame([['']]).toDF('text')
pipeline_model = nlp_pipeline.fit(empty_df)
df = spark.createDataFrame(pd.DataFrame({'text': text_list}))
result = pipeline_model.transform(df)

## 7. Visualize results

In [9]:
from sparknlp_display import NerVisualizer

NerVisualizer().display(
    result = result.collect()[0],
    label_col = 'ner_chunk',
    document_col = 'document'
)