![JohnSnowLabs](https://sparknlp.org/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/training/english/dl-ner/ner_dl_graph_checker.ipynb)

# Checking TF Graph Availability for NerDLApproach

This notebook shows how you can use the `NerDLGraphChecker` annotator, introduced in Spark NLP 6.1.3, to check for `NerDLApproach` TF graphs before training. This is useful for custom training cases, where specialized graphs are needed.

Note that you can create your own graphs with `TFNerDLGraphBuilder`. Please see its [example notebook](https://github.com/JohnSnowLabs/spark-nlp/blob/master/examples/python/training/english/dl-ner/ner_graph_builder.ipynb).

In [None]:
# Only run this cell when you are using Spark NLP on Google Colab
!wget http://setup.johnsnowlabs.com/colab.sh -O - | bash

In [None]:
from pyspark.sql import SparkSession
from pyspark.ml import PipelineModel
from pyspark.sql import functions as F
from pyspark.sql import types as T

import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *

from sparknlp.training import CoNLL

### Prerequisites for `NerDLGraphChecker`

This annotator requires the same columns as the NerDLApproach:

1. `DOCUMENT` and `TOKEN` annotation type columns
2. a label column
3. The embedding dimension of the dimension model. We can provide the used embeddings model with `setEmbeddingsModel`, which will automatically set the dimension.
4. (Optional) Similarly to `NerDLApproach`, you can also provide your own graph folder with `setGraphFolder`

In [None]:
spark = sparknlp.start()

print("Spark NLP version:", sparknlp.version())

Spark NLP version: 6.1.3


### Prepare NER test data

In [None]:
conll = CoNLL()

test_conll = "../../../../../src/test/resources/conll2003/eng.testa"
train_data = conll.readDataset(spark=spark, path=test_conll).limit(1000)
test_data = conll.readDataset(spark=spark, path=test_conll).limit(1000)

### Pipeline with `NerDLGraphChecker`

We define a pretrained embeddings model and a `NerDLGraphChecker` to check for a compatible graph before NER training starts and embeddings are evaluated. Note that the transformations of this annotator need to be ordered before the embeddings (either manually or in a pipeline).

For this example we assume, that the embeddings might have a dimension that is not common and requires a special graph.

In [None]:
embeddings = (
    WordEmbeddingsModel.pretrained()
    .setInputCols(["sentence", "token"])
    .setOutputCol("embeddings")
    .setDimension(
        120
    )  # Manual override for demonstration purposes, don't do this with an actual pretrained model
)

ner_dl_graph_checker = (
    NerDLGraphChecker()
    .setInputCols(["sentence", "token", "embeddings"])
    .setLabelColumn("label")
    .setEmbeddingsModel(embeddings)
)

glove_100d download started this may take some time.


Approximate size to download 145.3 MB
[OK!]


Then, we combine these with `NerDLApproach` and create a pipeline where the checker comes before the embeddings and NerDLApproach.

In [None]:
ner_dl = (
    NerDLApproach()
    .setInputCols(["sentence", "token", "embeddings"])
    .setLabelColumn("label")
    .setOutputCol("ner")
    .setMaxEpochs(1)
    .setLr(0.003)
    .setBatchSize(8)
    .setRandomSeed(0)
    .setVerbose(1)
)


ner_pipeline = Pipeline().setStages([ner_dl_graph_checker, embeddings, ner_dl])

If we know start the training with `fit` we should see an exception before anything is evaluated.

In [None]:
ner_pipeline.fit(train_data)

IllegalArgumentException: NerDLGraphChecker: requirement failed: Graph dimensions should be 120: Could not find a suitable tensorflow graph for embeddings dim: 120 tags: 9 nChars: 79. Check https://sparknlp.org/docs/en/graph for instructions to generate the required graph.