![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/open-source-nlp/04.1.NerDL_Graph.ipynb)

# Graph Generation for NerDL Model

---



In [None]:
!pip install -q pyspark==3.4.1  spark-nlp==5.1.2
!pip install -q tensorflow==2.12.0
!pip install -q tensorflow_addons

In [None]:
import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *

spark = sparknlp.start()

print("Spark NLP version: ", sparknlp.version())
print("Apache Spark version: ", spark.version)

spark

Spark NLP version:  5.1.2
Apache Spark version:  3.4.1


# TF Graph Builder

`TFNerDLGraphBuilder` annotator can be used to create graph in the model training pipeline. This annotator inspects the data and creates the proper graph if a suitable version of TensorFlow (<= 2.7 ) is available. The graph is stored in the defined folder and loaded by the approach.

**NOTE:** This annotator is avaliable on `sparknlp` version `v4.1.0` and after.

**ATTENTION:** **Do not forget to play with the parameters of this annotator, it may affect the model performance that you want to train.**


In [None]:
!mkdir ner_logs
!mkdir ner_graphs

graph_folder = "/content/ner_graphs"

In [None]:
graph_builder = TFNerDLGraphBuilder()\
                      .setInputCols(["sentence", "token", "embeddings"]) \
                      .setLabelColumn("label")\
                      .setGraphFile("auto")\
                      .setGraphFolder(graph_folder)\
                      .setHiddenUnitsNumber(20)

*Train the model with `NerDLApproach` and let it use the graph generated by the builder.*

You can find an example in [NERDL Training Notebook](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Public/4.NERDL_Training.ipynb).

```python
# You can use any word embeddings you want (Glove, Elmo, Bert, custom etc.)
glove_embeddings = WordEmbeddingsModel.pretrained('glove_100d')\
              .setInputCols(["document", "token"])\
              .setOutputCol("embeddings")

nerTagger = NerDLApproach()\
              .setInputCols(["sentence", "token", "embeddings"])\
              .setLabelColumn("label")\
              .setOutputCol("ner")\
              .setMaxEpochs(3)\
              .setLr(0.003)\
              .setBatchSize(32)\
              .setRandomSeed(0)\
              .setVerbose(1)\
              .setValidationSplit(0.2)\
              .setEvaluationLogExtended(True) \
              .setEnableOutputLogs(True)\
              .setIncludeConfidence(True)\
              .setGraphFolder(graph_folder)\
              .setOutputLogsPath('ner_logs') # if not set, logs will be written to ~/annotator_logs
          
ner_pipeline = Pipeline(stages=[glove_embeddings,
                                graph_builder,
                                nerTagger])
```


# Custom Graph

In [None]:
!wget https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/tutorials/Certification_Trainings/Public/utils/graph_utils/nerdl/nerdl-graph/create_graph.py
!wget https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/tutorials/Certification_Trainings/Public/utils/graph_utils/nerdl/nerdl-graph/dataset_encoder.py
!wget https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/tutorials/Certification_Trainings/Public/utils/graph_utils/nerdl/nerdl-graph/ner_model.py
!wget https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/tutorials/Certification_Trainings/Public/utils/graph_utils/nerdl/nerdl-graph/ner_model_saver.py
!wget https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/tutorials/Certification_Trainings/Public/utils/graph_utils/nerdl/nerdl-graph/sentence_grouper.py

--2023-10-04 05:26:27--  https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/tutorials/Certification_Trainings/Public/utils/graph_utils/nerdl/nerdl-graph/create_graph.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1545 (1.5K) [text/plain]
Saving to: ‘create_graph.py’


2023-10-04 05:26:27 (30.5 MB/s) - ‘create_graph.py’ saved [1545/1545]

--2023-10-04 05:26:27--  https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/tutorials/Certification_Trainings/Public/utils/graph_utils/nerdl/nerdl-graph/dataset_encoder.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:4

In [None]:
import create_graph

ntags = 19 # number of labels
embeddings_dim = 100
nchars = 100

create_graph.create_graph(ntags, embeddings_dim, nchars)

# then put your graph file (pb) under a folder and set it with .setGraphFolder('folder') in NerDLApproach

2.12.0
Spark NLP is compiled with TensorFlow 1.15.0, Please use such version.
Current TensorFlow version:  2.12.0


  assert(self._word_embeddings_added or self._char_cnn_added or self._char_bilstm_added,
  assert(self._context_added,
  assert(self._inference_added,
  assert(self._training_added, "Add training layer by method add_training_op before running training")
