# Building Transformer-Based Natural Language Processing Applications
### Self-Supervision, BERT, and Beyond

The goal of this lab is to build two example NLP (Natural Language Processing) application tasks, *text classification* and *named entity recognition (NER)*, by transferring knowledge from large, high-performing pre-trained transformer-based language models. <br>

1. [Explore the Data](010_ExploreData.ipynb)<br>
    Datasets drawn from the [NCBI-disease corpus](https://www.ncbi.nlm.nih.gov/CBBresearch/Dogan/DISEASE/), which consists of 793 annotated PubMed abstracts. You'll explore the details of these datasets, to gain insight into how you can adapt _your own datasets_ to these types of applications.<br>
    explore:
    - The Corpus Annotated Data
    - The Text Classification Dataset
    - The NER Dataset
<br><br>
1. [Build a Text Classifier](020_TextClassification.ipynb)<br>
    Build a BERT-based multi-class text classification project using the [NVIDIA NeMo](https://developer.nvidia.com/nvidia-nemo) open-source toolkit.  The framework is based on [PyTorch Lightning](https://www.pytorchlightning.ai/).<br>
    - Build a Text Classification Project
    - Quickly Run Experiments from the Command Line
    - Train and Test with PyTorch Lightning    
    - Select a Pretrained BERT model
    - Visualize the Model Accuracy
<br><br>
1. [Build a Named Entity Recognizer](030_NamedEntityRecognition.ipynb)<br>
   Build a domain-specific NER (named entity recognition) project with NVIDIA NeMo.<br>
    - Build a Token Classification (NER task) Project
    - Train a Token Classifier from the Command Line
    - Apply a Domain-Specific Model
    - Test the NER Model from a Saved Checkpoint

### Production Deployment

Deploy a NLP model to a production inference server. 

NVIDIA Triton Inference Server.  The "results" you'll get from production inference are the same as when using the model's framework directly, but using Triton has additional benefits:
* Concurrent model execution (can run multiple models simultaneously)
* Dynamic batching (better throughput)
* Model hot replacement (can update while server is running)
* Docker container available (portable)
* Multiple framework support (TensorRT, TensorFlow, PyTorch, ONNX)

1. [Exporting the Model](010_ExportingTheModel.ipynb)<br/>
    - Convert a model trained in PyTorch into a server-efficient format<br/>
    - Apply reduced precision and TensorRT model optimizations <br/>
2. [Hosting the Model](020_HostingTheModel.ipynb)<br/>
    - Deploy the model to production using an NVIDIA Triton Inference Server<br/>
    - Control some of the basic features of NVIDIA Triton via the model configuration. <br/>
    - Evaluate the impact of export format and configuration choices on performance and cost<br/>
3. [Server Performance](030_ServerPerformance.ipynb)<br/>
    - Evaluate the impact different Triton configuration options on serving performance<br/>
    - Monitor the performance of inference in production <br/>
4. [Using the Model](040_UsingTheModel.ipynb)<br/>
    - Build a simple application that can take advantage of the API exposed by Triton<br/>
    - Discuss the options for more complex application and model pipeline deployments<br/>
