![JohnSnowLabs](https://sparknlp.org/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/training/english/dl-ner/ner_logs_Azure.ipynb)


# Exporting Logs in Azure with NER training

In Spark NLP you can configure the location to download the logs of training NER models. Starting at Spark NLP 5.1.0, you can set a GCP Storage URI, or Azure Storage URI, or DBFS paths like HDFS or Databricks FS.

In this notebook, we are going to see the steps required to use an external Azure Storage URI to store the logs of traning an NER model

To do this, we need to configure the spark session with the required settings for Spark NLP and Spark ML.

In [None]:
# Only run this cell when you are using Spark NLP on Google Colab
!wget http://setup.johnsnowlabs.com/colab.sh -O - | bash

### Spark NLP Settings

`output_logs_path`: Define the Azure Storage path to be set while trainine NER

### Spark ML Settings

Spark ML requires the following configuration to load a model from Azure:


1. Azure connector: You need to identify your hadoop version and set the required dependency in `spark.jars.packages`|
2. Hadoop File System: You also need to setup the Hadoop file system to work with azure storage as file system. This is define in `spark.hadoop.fs.azure`

To integrage with Azure, we need to define STORAGE_ACCOUNT and AZURE_ACCOUNT_KEY variables:
1. STORAGE_ACCOUNT: This can be found in Microsoft Azure portal, in Resources look for the Type storage account and check the name that is your storage account.
2. AZURE_ACCOUNT_KEY: 
Check View account access keys in this oficial [Azure documentation](https://learn.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage?tabs=azure-portal)

Then you can define this two properties as variables to set those during spark session creation:

In [None]:
print("Enter your Storage Account:")
STORAGE_ACCOUNT = input()

In [None]:
print("Enter your Azure Account Key:")
AZURE_ACCOUNT_KEY = input()

In [None]:
import sparknlp
import pyspark

azure_hadoop_config = "spark.hadoop.fs.azure.account.key." + STORAGE_ACCOUNT + ".blob.core.windows.net"

hadoop_azure_pkg = "org.apache.hadoop:hadoop-azure:3.3.4"
azure_storage_pkg = "com.microsoft.azure:azure-storage:8.6.6"
azure_identity_pkg = "com.azure:azure-identity:1.9.1"
azure_storage_blob_pkg = "com.azure:azure-storage-blob:12.22.2"
azure_pkgs = hadoop_azure_pkg + "," + azure_storage_pkg + "," + azure_identity_pkg + "," + azure_storage_blob_pkg


#Azure Storage configuration
azure_params = {
    "spark.jars.packages": azure_pkgs,
    azure_hadoop_config: AZURE_ACCOUNT_KEY
}

spark = sparknlp.start(params=azure_params)

print("Apache Spark version: {}".format(spark.version))

Apache Spark version: 3.4.0


In [None]:
import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *
from sparknlp.training import CoNLL

In [None]:
training_data = CoNLL().readDataset(spark, './test_ner_dataset.txt')
training_data.show(3)

embeddings = WordEmbeddingsModel.pretrained("glove_100d")
ready_data = embeddings.transform(training_data).cache()

output_logs_path = "https://" + STORAGE_ACCOUNT + ".blob.core.windows.net/test/logs"

ner_tagger = NerDLApproach() \
    .setInputCols("sentence", "token", "embeddings") \
    .setLabelColumn("label") \
    .setOutputCol("ner") \
    .setMaxEpochs(1) \
    .setMaxEpochs(5) \
    .setRandomSeed(0) \
    .setVerbose(2) \
    .setDropout(0.8) \
    .setBatchSize(18) \
    .setEnableOutputLogs(True) \
    .setOutputLogsPath(output_logs_path)

ner_model = ner_tagger.fit(ready_data)

+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
|                text|            document|            sentence|               token|                 pos|               label|
+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
|John Smith works ...|[{document, 0, 35...|[{document, 0, 35...|[{token, 0, 3, Jo...|[{pos, 0, 3, NNP,...|[{named_entity, 0...|
+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+

glove_100d download started this may take some time.
Approximate size to download 145.3 MB
[OK!]
