

![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://github.com/JohnSnowLabs/spark-nlp-workshop/edit/master/tutorials/streamlit_notebooks/healthcare_jsl/NER_PROFESSIONS_ES.ipynb)




# **Detect Professions and Occupations in Spanish text**

## 1. Colab Setup

In [None]:
# Install the johnsnowlabs library to access Spark-OCR and Spark-NLP for Healthcare, Finance, and Legal.
! pip install johnsnowlabs

In [None]:
from google.colab import files
print("Please Upload your John Snow Labs License using the button below")
license_keys = files.upload()

In [None]:
from johnsnowlabs import *

# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM
# Make sure to restart your notebook afterwards for changes to take effect

jsl.install()

## 2. Start Session

In [None]:
from johnsnowlabs import *
# Automatically load license data and start a session with all jars user has access to
spark = jsl.start()

In [None]:
spark

## 3. Select the DL model

In [None]:
# If you change the model, re-run all the cells below.
# Applicable models: meddroprof_scielowiki
MODEL_NAME = "meddroprof_scielowiki"

## 4. Some sample examples

In [None]:
# Enter examples to be transformed as strings in this list
text_list = [
    """Paciente var√≥n de 42 a√±os que acude al servicio de Urgencias de su hospital acompa√±ado de las Fuerzas del Orden P√∫blico por presentar 
    una actitud hostil y desconfiada hacia su padre, permaneciendo aislado en su habitaci√≥n desde hace un mes. 
    Se decide ingreso hospitalario en Salud Mental. 
    ANTECEDENTES Antecedentes personales y familiares: niega antecedentes personales y familiares de inter√©s. 
    No reacciones al√©rgicas medicamentosas. 
    Nunca ha estado en tratamiento psiqui√°trico, aunque, al parecer, seg√∫n los datos aportados por familiares, viene presentando s√≠ntomas psic√≥ticos desde hace al menos cinco a√±os. 
    Se trata de un var√≥n divorciado, con una hija de 13 a√±os con la que refiere contacto espor√°dico. 
    Actualmente en desempleo . 
    Sus familiares verbalizan que desde hace unos cinco a√±os ‚Äìcoincidiendo con la ruptura matrimonial, la p√©rdida de empleo Fuerzas del Orden P√∫blico."""]

## 5. Define Spark NLP pipeline

In [None]:
document_assembler = nlp.DocumentAssembler() \
    .setInputCol('text') \
    .setOutputCol('document')

tokenizer = nlp.Tokenizer() \
    .setInputCols(['document']) \
    .setOutputCol('token')

# The model was trained with the bert_portuguese_base_cased embeddings,
# we need to it.
embeddings = nlp.WordEmbeddingsModel.pretrained("embeddings_scielowiki_300d", "es", "clinical/models")\
    .setInputCols(["document", "token"]) \
    .setOutputCol("embeddings")

ner_model = medical.NerModel.pretrained(MODEL_NAME, "es", "clinical/models") \
    .setInputCols(["document", "token", "embeddings"]) \
    .setOutputCol('ner')

ner_converter = nlp.NerConverter() \
    .setInputCols(['document', 'token', 'ner']) \
    .setOutputCol('ner_chunk')

nlp_pipeline = Pipeline(stages=[document_assembler, 
                                tokenizer,
                                embeddings,
                                ner_model,
                                ner_converter])

embeddings_scielowiki_300d download started this may take some time.
Approximate size to download 351.2 MB
[OK!]
meddroprof_scielowiki download started this may take some time.
[OK!]


## 6. Run the pipeline

In [None]:
from pyspark.sql.types import StringType, IntegerType

df = spark.createDataFrame(text_list,StringType()).toDF('text')
result = nlp_pipeline.fit(df).transform(df)

result.show(truncate = 100)

+----------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------+
|                                                                                                text|                                                                                            document|                                                                                               token|                                                                                        

## 7. Visualize results

In [None]:
from sparknlp_display import NerVisualizer

NerVisualizer().display(
    result = result.collect()[0],
    label_col = 'ner_chunk',
    document_col = 'document'
)