<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>

# Building Transformer-Based Natural Language Processing Applications
### Part 2: Self-Supervision, BERT, and Beyond

The goal of this lab is to build two example NLP (Natural Language Processing) application tasks, *text classification* and *named entity recognition (NER)*, by transferring knowledge from large, high-performing pre-trained transformer-based language models.  
<br><br><br>
<img src="images/bert_pretrain_workflow.png" width=800>


## Table of Contents

1. [Explore the Data](010_ExploreData.ipynb)<br>
    For the two class projects, you'll use datasets drawn from the [NCBI-disease corpus](https://www.ncbi.nlm.nih.gov/CBBresearch/Dogan/DISEASE/), which consists of 793 annotated PubMed abstracts. You'll explore the details of these datasets, to gain insight into how you can adapt _your own datasets_ to these types of applications.<br>
    You'll view and explore:
    - The Corpus Annotated Data
    - The Text Classification Dataset
    - The NER Dataset
<br><br>
1. [Build a Text Classifier](020_TextClassification.ipynb)<br>
    You'll build a BERT-based multi-class text classification project using the [NVIDIA NeMo](https://developer.nvidia.com/nvidia-nemo) open-source toolkit.  The framework is based on [PyTorch Lightning](https://www.pytorchlightning.ai/).<br>
    You'll learn how to:
    - Build a Text Classification Project
    - Quickly Run Experiments from the Command Line
    - Train and Test with PyTorch Lightning    
    - Select a Pretrained BERT model
    - Visualize the Model Accuracy
<br><br>
1. [Build a Named Entity Recognizer](030_NamedEntityRecognition.ipynb)<br>
    You'll build a domain-specific NER (named entity recognition) project with NVIDIA NeMo.<br>
    You'll learn how to:
    - Build a Token Classification (NER task) Project
    - Train a Token Classifier from the Command Line
    - Apply a Domain-Specific Model
    - Test the NER Model from a Saved Checkpoint
<br><br>

## Supplemental Material
- [Retail Case Studies](case_studies/Retail.ipynb)
- [Healthcare Case Studies](case_studies/Healthcare.ipynb)
<br><br>

## JupyterLab
For this hands-on lab, we use [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) to manage our environment.  The [JupyterLab Interface](https://jupyterlab.readthedocs.io/en/stable/user/interface.html) is a dashboard that provides access to interactive iPython notebooks, as well as the folder structure of our environment and a terminal window into the Ubuntu operating system. The first view you'll see includes a **menu bar** at the top, a **file browser** in the **left sidebar**, and a **main work area** that is initially open to the "Launcher" page. 

<img src="images/jl_launcher.png">

The file browser can be navigated just like any other file explorer. A double click on any of the items will open a new tab with its content.

The main work area includes tabbed views of open files that can be closed, moved, and edited as needed. 

The notebooks, including this one, consist of a series of content and code **cells**.  To execute code in a code cell, press `Shift+Enter` or the "Run" button in the menu bar above, while a cell is highlighted. Sometimes, a content cell will get switched to editing mode. Pressing `Shift+Enter` will switch it back to a readable form.

Try executing the simple print statement in the cell below.

In [1]:
# Highlight this cell and click [Shift+Enter] to execute
print('This is just a simple print statement')

This is just a simple print statement


<h2 style="color:green;">Congratulations!</h2>

You've reviewed the information about this section of the course and are ready to begin.<br>
Move on to [1.0 Explore the Data](010_ExploreData.ipynb).


<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>