# **State-of-the-art NLP Made Easy with [AdaptNLP](https://www.github.com/novetta/adaptnlp)**

# 1. Today's Objective: Generate Enriched Data from Unstructured Text

### *Prerequisite: Install AdaptNLP*

In [None]:
!pip install adaptnlp

### *Prerequisite: Show Unstructured Text Example*

In [None]:
from IPython.core.display import HTML, display

example_text = """
The history (and prehistory) of the United States, started with the arrival of Native Americans before 15,000 B.C. Numerous indigenous cultures formed, and many disappeared before 1500. The arrival of Christopher Columbus in the year 1492 started the European colonization of the Americas. Most colonies were formed after 1600, and the early records and writings of John Winthrop make the United States the first nation whose most distant origins are fully recorded.[1] By the 1760s, the thirteen British colonies contained 2.5 million people along the Atlantic Coast east of the Appalachian Mountains. After defeating France, the British government imposed a series of taxes, including the Stamp Act of 1765, rejecting the colonists' constitutional argument that new taxes needed their approval. Resistance to these taxes, especially the Boston Tea Party in 1773, led to Parliament issuing punitive laws designed to end self-government in Massachusetts.

Armed conflict began in 1775. In 1776, in Philadelphia, the Second Continental Congress declared the independence of the colonies as the United States. Led by General George Washington, it won the Revolutionary War with large support from France. The peace treaty of 1783 gave the land east of the Mississippi River (except Canada and Florida) to the new nation. The Articles of Confederation established a central government, but it was ineffectual at providing stability as it could not collect taxes and had no executive officer. A convention in 1787 wrote a new Constitution that was adopted in 1789. In 1791, a Bill of Rights was added to guarantee inalienable rights. With Washington as the first president and Alexander Hamilton his chief adviser, a strong central government was created. Purchase of the Louisiana Territory from France in 1803 doubled the size of the United States. A second and final war with Britain was fought in 1812, which solidified national pride.

Encouraged by the notion of manifest destiny, U.S. territory expanded all the way to the Pacific Coast. While the United States was large in terms of area, by 1790 its population was only 4 million. It grew rapidly, however, reaching 7.2 million in 1810, 32 million in 1860, 76 million in 1900, 132 million in 1940, and 321 million in 2015. Economic growth in terms of overall GDP was even greater. Compared to European powers, though, the nation's military strength was relatively limited in peacetime before 1940. Westward expansion was driven by a quest for inexpensive land for yeoman farmers and slave owners. The expansion of slavery was increasingly controversial and fueled political and constitutional battles, which were resolved by compromises. Slavery was abolished in all states north of the Mason–Dixon line by 1804, but the South continued to profit from the institution, mostly from the production of cotton. Republican Abraham Lincoln was elected president in 1860 on a platform of halting the expansion of slavery.

Seven Southern slave states rebelled and created the foundation of the Confederacy. Its attack of Fort Sumter against the Union forces there in 1861 started the Civil War. Defeat of the Confederates in 1865 led to the impoverishment of the South and the abolition of slavery. In the Reconstruction era following the war, legal and voting rights were extended to freed slaves. The national government emerged much stronger, and because of the Fourteenth Amendment in 1868, it gained explicit duty to protect individual rights. However, when white Democrats regained their power in the South in 1877, often by paramilitary suppression of voting, they passed Jim Crow laws to maintain white supremacy, as well as new disenfranchising state constitutions that prevented most African Americans and many poor whites from voting. This continued until the gains of the civil rights movement in the 1960s and the passage of federal legislation to enforce uniform constitutional rights for all citizens.

The United States became the world's leading industrial power at the turn of the 20th century, due to an outburst of entrepreneurship and industrialization in the Northeast and Midwest and the arrival of millions of immigrant workers and farmers from Europe. A national railroad network was completed and large-scale mines and factories were established. Mass dissatisfaction with corruption, inefficiency, and traditional politics stimulated the Progressive movement, from the 1890s to 1920s. This era led to many reforms, including the Sixteenth to Nineteenth constitutional amendments, which brought the federal income tax, direct election of Senators, prohibition, and women's suffrage. Initially neutral during World War I, the United States declared war on Germany in 1917 and funded the Allied victory the following year. Women obtained the right to vote in 1920, with Native Americans obtaining citizenship and the right to vote in 1924.

After a prosperous decade in the 1920s, the Wall Street Crash of 1929 marked the onset of the decade-long worldwide Great Depression. Democratic President Franklin D. Roosevelt ended the Republican dominance of the White House and implemented his New Deal programs, which included relief for the unemployed, support for farmers, Social Security and a minimum wage. The New Deal defined modern American liberalism. After the Japanese attack on Pearl Harbor in 1941, the United States entered World War II and financed the Allied war effort and helped defeat Nazi Germany in the European theater. Its involvement culminated in using newly-invented nuclear weapons on two Japanese cities to defeat Imperial Japan in the Pacific theater.

The United States and the Soviet Union emerged as rival superpowers in the aftermath of World War II. During the Cold War, the two countries confronted each other indirectly in the arms race, the Space Race, proxy wars, and propaganda campaigns. The goal of the United States in this was to stop the spread of communism. In the 1960s, in large part due to the strength of the civil rights movement, another wave of social reforms was enacted which enforced the constitutional rights of voting and freedom of movement to African Americans and other racial minorities. The Cold War ended when the Soviet Union was officially dissolved in 1991, leaving the United States as the world's only superpower.

After the Cold War, the United States's foreign policy has focused on modern conflicts in the Middle East. The beginning of the 21st century saw the September 11 attacks carried out by Al-Qaeda in 2001, which was later followed by wars in Afghanistan and Iraq. In 2007, the United States entered its worst economic crisis since the Great Depression, which was followed by slower-than-usual rates of economic growth during the early 2010s. Economic growth and unemployment rates recovered by the late 2010s, however new economic disruption began in 2020 due to the 2019-20 coronavirus pandemic.
"""
example_text_html = f"""
<!DOCTYPE html>
<html>
<head>
<meta name="viewport" content="width=device-width, initial-scale=1">
<style>
.collapsible {{
  background-color: #777;
  color: white;
  cursor: pointer;
  padding: 18px;
  width: 100%;
  border: none;
  text-align: left;
  outline: none;
  font-size: 15px;
}}

.active, .collapsible:hover {{
  background-color: #555;
}}

.content {{
  padding: 0 18px;
  display: none;
  overflow: hidden;
  background-color: #f1f1f1;
}}
</style>
</head>
<body>

<button type="button" class="collapsible">Example Unstructured Text</button>
<div class="content">
  <p>{example_text}</p>
</div>

<script>
var coll = document.getElementsByClassName("collapsible");
var i;

for (i = 0; i < coll.length; i++) {{
  coll[i].addEventListener("click", function() {{
    this.classList.toggle("active");
    var content = this.nextElementSibling;
    if (content.style.display === "block") {{
      content.style.display = "none";
    }} else {{
      content.style.display = "block";
    }}
  }});
}}
</script>

</body>
</html>
"""

display(HTML(example_text_html))


### *Prerequisite: Download Models and Generate Final Timeline*

In [None]:
from adaptnlp import (
    EasyTokenTagger,
    EasySequenceClassifier,
    EasyQuestionAnswering,
    EasySummarizer,
    EasyTranslator,
    EasyDocumentEmbeddings,
)
from dateutil.parser import parse
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.dates as mdates
import pprint

# Summary
summarizer = EasySummarizer()
summary = summarizer.summarize(text=example_text, model_name_or_path="t5-base", mini_batch_size=1, num_beams=4, min_length=100, max_length=200)
summary = summary[0]

# Translation of Summary
translator = EasyTranslator()
translated_summary = translator.translate(text=summary.split(" . "), model_name_or_path="t5-base", t5_prefix="translate English to French", mini_batch_size=3, min_length=0, max_length=200)
translated_summary = " . ".join(translated_summary)

# NER
nl = "\n" # For f-string formatting
tagger = EasyTokenTagger()
sentences = tagger.tag_text(text=example_text, model_name_or_path="ner-ontonotes-fast", mini_batch_size=32)
ner_dict = sentences[0].to_dict("ner")
ner_dict = [f"<b>{i+1}.</b> {pprint.pformat(ent).replace(nl,'<br>')}" for i, ent in enumerate(ner_dict["entities"][:6])]
ner_html = "<br>" + "<br>".join(ner_dict)

# QA
qa = EasyQuestionAnswering()
_, top_n = qa.predict_qa(query="What happened in 1776?", context=example_text, model_name_or_path="bert-large-cased-whole-word-masking-finetuned-squad", n_best_size=5, mini_batch_size=1)
top_n = [f"<b>{i+1}.</b> {pprint.pformat(dict(ans)).replace(nl,'<br>')}" for i, ans in enumerate(top_n)]
top_n_html = "<br>" + "<br>".join(top_n)    
    
# Timeline
dates = []
for span in sentences[0].get_spans("ner"):
  if span.tag == "DATE":
    dates.append(span.text)
dates = sorted(dates)

dates_map = {}
for d in dates:
  try:
    dates_map[d] = parse(d, fuzzy=True)
  except:
    pass

answers_map = {}
answer, _ = qa.predict_qa(query=[f"What happened in {t}" for t in dates_map.keys()], context = [example_text]*len(dates_map.keys()), model_name_or_path="bert-large-cased-whole-word-masking-finetuned-squad", n_best_size=7, mini_batch_size=10)

def generate_timeline(names_mat: list, dates_mat: list):
  # Choose levels
  levels = np.tile([-30, 30, -20, 20, -12, 12, -7, 7, -1, 1], int(np.ceil(len(dates_mat)/10)))[:len(dates_mat)]

  # Create figure and plot a stem plot with the date
  fig, ax = plt.subplots(figsize=(20, 6), constrained_layout=True)
  ax.set_title("Timeline of Significant Events in U.S. History", fontsize=30, fontweight='bold')
  markerline, stemline, baseline = ax.stem(dates_mat, levels, linefmt="C3-", basefmt="k-", use_line_collection=True)
  plt.setp(markerline, mec="k", mfc="w", zorder=3)

  # Shift the markers to the baseline by replacing the y-data by zeros.
  markerline.set_ydata(np.zeros(len(dates_mat)))

  # Annotate lines
  vert = np.array(['top', 'bottom'])[(levels > 0).astype(int)]
  for d, l, r, va in zip(dates_mat, levels, names_mat, vert):
    ax.annotate(r, xy=(d, l), xytext=(-3, np.sign(l)*3), textcoords="offset points", va=va, ha="right")

  # Format xaxis with AutoDateLocator
  ax.get_xaxis().set_major_locator(mdates.AutoDateLocator())
  ax.get_xaxis().set_major_formatter(mdates.DateFormatter("%b %Y"))
  plt.setp(ax.get_xticklabels(), rotation=30, ha="right")

  # Remove y axis and spines
  ax.get_yaxis().set_visible(False)
  for spine in ["left", "top", "right"]:
    ax.spines[spine].set_visible(False)

  ax.margins(y=0.1)
  plt.show()

names_mat = list(answer.values()) [:30]
dates_mat = list(dates_map.values()) [:30]   

generate_timeline(names_mat=names_mat, dates_mat=dates_mat)
    
html = f"""
<!DOCTYPE html>
<html>
<head>
<style>
.item0 {{ grid-area: timeline; }}
.item1 {{ grid-area: header; }}
.item2 {{ grid-area: menu; }}
.item3 {{ grid-area: main; }}
.item4 {{ grid-area: right; }}

.grid-container {{
  display: grid;
  grid-template:
    'timeline timeline timeline timeline timeline timeline'
    'header header main main right right'
    'menu menu main main right right';

  grid-gap: 5px;
  background-color: #777;
  padding: 5px;
}}

.grid-container > div {{
  background-color: rgba(255, 255, 255, .9);
  text-align: center;
  padding: 20px;
  font-size: 12px;
}}
</style>
</head>
<body>

<div class="grid-container">
  <div class="item0">
    <h2>Extracted Metadata using AdaptNLP</h2>
  </div>
  <div class="item1">
    <h3>Summary: </h3>
    <p style="text-align: center">{summary}</p>
  </div>
  <div class="item2">
    <h3>Translated French Summary: </h3>
    <p style="text-align: center">{translated_summary}</p>
  </div>
  <div class="item3">
    <h3>Extracted Entities: </h3>
    <p style="text-align: left">{ner_html}</p>
  </div>  
  <div class="item4">
    <h3>Top Answers to the Question: <br><em>"What happened in 1776?"</em></h3>
    <p style="text-align: left">{top_n_html}</p>
  </div>
</div>

</body>
</html>
"""
display(HTML(html))

# 2. Run NLP Tasks: Summarization, Translation, Named Entity Recognition (NER), and Question Answering (QA)

### [Documentation and Guides](http://novetta.github.io/adaptnlp)

### *Import "Easy" NLP Task Modules with AdaptNLP*

### *Set Example Text*

### *Summarize*

### *Translate*

### *Named Entity Recognition (NER)*

### *Question Answering*

# 3. Generate the Timeline: NER and QA

### *Run NER Task to Extract "Date" Tagged Entities*


### *Run QA Task to Extract Information on "What happened in..." Extracted Dates*

### *Generate Text Timeline*

### *Generate Stem Timeline with Matplotlib*

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.dates as mdates
from datetime import datetime

def generate_timeline(names_mat: list, dates_mat: list):
  # Choose levels
  levels = np.tile([-30, 30, -20, 20, -12, 12, -7, 7, -1, 1], int(np.ceil(len(dates_mat)/10)))[:len(dates_mat)]

  # Create figure and plot a stem plot with the date
  fig, ax = plt.subplots(figsize=(20, 6), constrained_layout=True)
  ax.set_title("Timeline of Significant Events in U.S. History", fontsize=30, fontweight='bold')
  markerline, stemline, baseline = ax.stem(dates_mat, levels, linefmt="C3-", basefmt="k-", use_line_collection=True)
  plt.setp(markerline, mec="k", mfc="w", zorder=3)

  # Shift the markers to the baseline by replacing the y-data by zeros.
  markerline.set_ydata(np.zeros(len(dates_mat)))

  # Annotate lines
  vert = np.array(['top', 'bottom'])[(levels > 0).astype(int)]
  for d, l, r, va in zip(dates_mat, levels, names_mat, vert):
    ax.annotate(r, xy=(d, l), xytext=(-3, np.sign(l)*3), textcoords="offset points", va=va, ha="right")

  # Format xaxis with AutoDateLocator
  ax.get_xaxis().set_major_locator(mdates.AutoDateLocator())
  ax.get_xaxis().set_major_formatter(mdates.DateFormatter("%b %Y"))
  plt.setp(ax.get_xticklabels(), rotation=30, ha="right")

  # Remove y axis and spines
  ax.get_yaxis().set_visible(False)
  for spine in ["left", "top", "right"]:
    ax.spines[spine].set_visible(False)

  ax.margins(y=0.1)
  plt.show()

# Additional AdaptNLP Resources (All Open and Publically Available)

### *Tutorials for NLP Tasks with AdaptNLP*

  1. Token Classification: NER, POS, Chunk, and Frame Tagging
      - [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Novetta/adaptnlp/blob/master/tutorials/1.%20Token%20Classification/token_tagging.ipynb)
  2. Sequence Classification: Sentiment
      - [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Novetta/adaptnlp/blob/master/tutorials/2.%20Sequence%20Classification/sequence_classification.ipynb)
  3. Embeddings: Transformer Embeddings e.g. BERT, XLM, GPT2, XLNet, roBERTa, ALBERT
      - [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Novetta/adaptnlp/blob/master/tutorials/3.%20Embeddings/embeddings.ipynb)
  4. Question Answering: Span-based Question Answering Model
      - [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Novetta/adaptnlp/blob/master/tutorials/4.%20Question%20Answering/question_answering.ipynb)
  5. Summarization: Abstractive and Extractive
      - [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Novetta/adaptnlp/blob/master/tutorials/5.%20Summarization/summarization.ipynb)
  6. Translation: Seq2Seq
      - [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Novetta/adaptnlp/blob/master/tutorials/6.%20Translation/translation.ipynb)

### *Tutorial for Fine-tuning and Training Custom Models with AdaptNLP*

 1. Training a Sequence Classifier
   - [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Novetta/adaptnlp/blob/master/tutorials/Finetuning%20and%20Training%20(Advanced)/sequence_classification_training.ipynb)
 2. Fine-tuning a Transformers Language Model
   - [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Novetta/adaptnlp/blob/master/tutorials/Finetuning%20and%20Training%20(Advanced)/fine_tuning.ipynb)
  
Checkout the [documentation](https://novetta.github.io/adaptnlp) for more information.
  

## *NVIDIA Docker and Configurable AdaptNLP REST Microservices*

  1. AdaptNLP official docker images are up on [Docker Hub](https://hub.docker.com/r/achangnovetta/adaptnlp).
  2. REST Microservices with AdaptNLP and FastAPI are also up on [Docker Hub](https://hub.docker.com/r/achangnovetta/adaptnlp-rest)
 
All images can build with GPU support if NVIDIA-Docker is correctly installed.