# BUILDING NLP WEB APPS WITH GRADIO AND HUGGING FACE TRANSFORMERS

After trying to load 2 summarization models in parallel for comparisons, let's try to load 2 models in *series* for two distinct tasks - first, to translate Chinese text into English, and then summarize the English output.

The results for "chain linking" transformer models are frankly quite patchy, but as the quality of models improve over time, this will be an interesting area to watch.

In [1]:
import gradio as gr
import warnings
import re

from gradio.mix import Series
from transformers import pipeline, MarianMTModel, MarianTokenizer

warnings.filterwarnings("ignore")


# 1. DEFINE TEXT CLEANING, TRANSLATION AND SUMMARIZATION FUNCTIONS 

In [2]:
# Tweak the text cleaning function further if you wish

def clean_text(text):
    text = re.sub(r"\n", " ", text)
    text = re.sub(r"\n\n", " ", text)
    text = re.sub(r"\t", " ", text)
    text = text.strip(" ")
    text = re.sub(
        " +", " ", text
    ).strip()  # get rid of multiple spaces and replace with a single
    return text  

## 1.1 SUMMARIZATION VIA HUGGING FACE PIPELINE

I'm re-using the code from notebook2.0. Feel free to use a different summarization model if you wish.

In [3]:
pipeline_summ = pipeline(
    "summarization",
    model="facebook/bart-large-cnn",  # switch out to "t5-small" etc if you wish
    tokenizer="facebook/bart-large-cnn",  # as above
    framework="pt",
)

# First of 2 functions in Gradio series
def fb_summarizer(text):
    input_text = clean_text(text)
    results = pipeline_summ(input_text)
    return results[0]["summary_text"]


# First of 2 Gradio apps that we'll put in "series"
summary = gr.Interface(
    fn=fb_summarizer, inputs=gr.inputs.Textbox(), outputs=gr.outputs.Textbox()
)


## 1.2 TRANSLATION VIA MARIANMT

There are also a few options for translation models on Hugging Face's model hub. I've tried out the MarianMT models in [previous repos](https://github.com/chuachinhon/practical_nlp), and will stick to that here.

In [4]:
model_name = (
    "Helsinki-NLP/opus-mt-zh-en"
)  # switch out the Marian MT model as required for your use case

# Second of 2 summarization function
def cn_to_eng(text):
    input_text = clean_text(text)

    model = MarianMTModel.from_pretrained(model_name)

    tokenizer = MarianTokenizer.from_pretrained(model_name)

    batch = tokenizer.prepare_seq2seq_batch(input_text, return_tensors="pt")

    output = model.generate(**batch)

    translated = tokenizer.batch_decode(output, skip_special_tokens=True)

    return translated[0]


# Second of 2 Gradio apps that we'll put in "parallel"
translation = gr.Interface(
    fn=cn_to_eng, inputs=gr.inputs.Textbox(), outputs=gr.outputs.Textbox()
)


# 2. LAUNCH GRADIO

Note the sequence of the 2 functions, and line them up as you intend.

In [5]:
Series(
    translation,
    summary,
    title="Translate And Summarize Chinese Text Into English",
    inputs=gr.inputs.Textbox(lines=20, label="Paste some Chinese text here"),
    outputs=gr.outputs.Textbox(label="English Summary"),
).launch()


Running locally at: http://127.0.0.1:7861/
To create a public link, set `share=True` in `launch()`.
Interface loading below...


(<Flask 'gradio.networking'>, 'http://127.0.0.1:7861/', None)

Your max_length is set to 142, but you input_length is only 114. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=50)
