# NLP tasks with a simple interface ✨

## Introduction
This notebook demonstrates how to perform various Natural Language Processing (NLP) tasks using different models from Hugging Face, integrated with Gradio for creating simple web interfaces.

### Install and Import Libraries
Here is a brief description of the required libraries:
- The python-dotenv library is used to load environment variables from a .env file into your application's environment. It helps manage sensitive configuration details like API keys and database credentials.

- The Gradio library is a library for building user-friendly web-based interfaces for machine learning models and data pipelines. It allows you to create interactive demos with minimal code. 

In [1]:
## Install and update the necessary libraries
%pip install python-dotenv 
%pip install gradio

Collecting python-dotenv
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.0.1
Note: you may need to restart the kernel to use updated packages.


- Loading API Keys and Libraries

In [1]:
# Import the required libraries
import os  # os provides a way of using operating system-dependent functionality
import io  # this library provides core tools for working with streams of data
from IPython.display import Image, display, HTML  # This is used for displaying rich content (e.g., images, HTML) in Jupyter Notebooks
from PIL import Image  # Python Imaging Library (PIL) is used for opening, manipulating, and saving image files
import base64   # This library encodes and decodes data in base64 format
import requests
import json
from dotenv import load_dotenv, find_dotenv
import gradio as gr
import textwrap

# Load environment variables from .env file
load_dotenv(find_dotenv())
hf_api_key = os.getenv('HF_API_KEY')
endpoint_url = os.getenv('HF_API_SUMMARY_BASE')

# Uncomment the following line to print HF API Key and Endpoint URL
#print("HF API Key:", hf_api_key)
#print("Endpoint URL:", endpoint_url)

  from .autonotebook import tqdm as notebook_tqdm


HF API Key: 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
Endpoint URL: https://api-inference.huggingface.co/models/facebook/bart-large-cnn


### Helper Function for Summarization
We'll define a helper function to interact with the Hugging Face API for text summarization.

In [2]:
#Function to Send API Requests for Text Completion with Error Handling
def get_completion(inputs, parameters=None, endpoint_url=None):
    if not endpoint_url:
        endpoint_url = os.getenv('HF_API_SUMMARY_BASE')
    headers = {
        "Authorization": f"Bearer {hf_api_key}",
        "Content-Type": "application/json"
    }
    data = {"inputs": inputs}
    if parameters:
        data.update({"parameters": parameters})
    try:
        response = requests.post(endpoint_url, headers=headers, data=json.dumps(data))
        response.raise_for_status()
        return response.json()
    except requests.exceptions.RequestException as e:
        print(f"Error: {e}")
        return None

**Note** Running the Summarization Locally. If you prefer to run the summarization locally, you can use the Transformers library.
```py
from transformers import pipeline

get_completion = pipeline("summarization", model="shleifer/distilbart-cnn-12-6")

def summarize(input):
    output = get_completion(input)
    return output[0]['summary_text']

```

## Building a text summarization app
We'll create a simple text summarization app using Gradio.

- Example 1 Text Summarization

In [3]:
text = ('''The tower is 324 metres (1,063 ft) tall, about the same height
        as an 81-storey building, and the tallest structure in Paris. 
        Its base is square, measuring 125 metres (410 ft) on each side. 
        During its construction, the Eiffel Tower surpassed the Washington 
        Monument to become the tallest man-made structure in the world,
        a title it held for 41 years until the Chrysler Building
        in New York City was finished in 1930. It was the first structure 
        to reach a height of 300 metres. Due to the addition of a broadcasting 
        aerial at the top of the tower in 1957, it is now taller than the 
        Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the 
        Eiffel Tower is the second tallest free-standing structure in France 
        after the Millau Viaduct.''')

get_completion(text)

[{'summary_text': 'The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building. Its base is square, measuring 125 metres (410 ft) on each side. It is the second tallest free-standing structure in France after the Millau Viaduct.'}]

In [4]:
output = get_completion(text)
# Extract and print the summary text
if output and 'summary_text' in output[0]:
    summary = output[0]['summary_text']
    formatted_text = textwrap.fill(summary, width=80)
    # Uncomment the following line to print the formatted_text
    #print(formatted_text)
    print(f"The summary of the given text is:\n{formatted_text}")


The summary of the given text is:
The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey
building. Its base is square, measuring 125 metres (410 ft) on each side. It is
the second tallest free-standing structure in France after the Millau Viaduct.


## Creating the Gradio Interface

In [5]:
def summarize(input):
    output = get_completion(input)
    return output[0]['summary_text']

gr.close_all()
demo = gr.Interface(fn=summarize, 
                    inputs=[gr.Textbox(label="Text to summarize", lines=6)], 
                    outputs=[gr.Textbox(label="Result", lines=3)], 
                    title="Text Summarization with distilbart-cnn",
                    description="Summarize any text using the `shleifer/distilbart-cnn-12-6` model under the hood!")
demo.launch(share=True, server_port=int(os.getenv('PORT1', 7860)))

* Running on local URL:  http://127.0.0.1:7860
* Running on public URL: https://190761eae4f2846584.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




In [6]:
# Close the demo
gr.close_all()

Closing server running on port: 7860


## Building a Named Entity Recognition App
**Note** As Interface Endpoint has been used the [Inference Endpoint](https://huggingface.co/inference-endpoints) for `dslim/bert-base-NER` (HF_API_NER_BASE), a 108M parameter fine-tuned BERT model on the NER task.

**Note** Running the NER Locally. If you prefer to run the summarization locally, you can use the Transformers library.
```py
from transformers import pipeline

get_completion = pipeline("ner", model="dslim/bert-base-NER")

def ner(input):
    output = get_completion(input)
    return {"text": input, "entities": output}
    
```

- Performing Named Entity Recognition (NER) Using an API

 Briefly, the below code sends a text string to a NER API endpoint, retrieves the processed output, and prints it. It utilizes the get_completion function to handle the API request, where the endpoint URL is sourced from an environment variable (HF_API_NER_BASE). The provided text contains personal information to extract entities like names, affiliations, and locations.

In [7]:
API_URL = os.getenv('HF_API_NER_BASE')
text = "My name is Michela, I'm learning from DeepLearningAI and I live in Italy"
output = get_completion(text, parameters=None, endpoint_url=API_URL)
print(output)

[{'entity_group': 'PER', 'score': 0.8131653666496277, 'word': 'Michela', 'start': 11, 'end': 18}, {'entity_group': 'ORG', 'score': 0.9298146963119507, 'word': 'DeepLearningA', 'start': 38, 'end': 51}, {'entity_group': 'LOC', 'score': 0.9996592998504639, 'word': 'Italy', 'start': 67, 'end': 72}]


Explanation output : The above output is the result of a Named Entity Recognition (NER) task performed on the input text. It identifies and classifies specific words or phrases as entities belonging to predefined categories. Briefy, Key Elements in the Output are Entity Groups (PER: Refers to a person, ORG: Refers to an organization, LOC: Refers to a location), Score (Indicates the confidence level of the model in classifying the entity correctly), Word (The specific word or phrase identified as an entity in the input text.) In the above output 3 Entities have been detected (Michela, DeepLearningAI, Italy, repectively for PER, ORG, LOC ) with their respective score (e.g.: The model identifies "Michela" as a person with 81.3% confidence, DeepLearningA" is identified as an organization with 92.9% confidence, and "Italy" is identified as a location with 99.9% confidence). 

## Creating the Gradio Interface for NER

In [8]:
def ner(input):
    output = get_completion(input, parameters=None, endpoint_url=API_URL)
    return {"text": input, "entities": output}

gr.close_all()
demo = gr.Interface(fn=ner,
                    inputs=[gr.Textbox(label="Text to find entities", lines=2)],
                    outputs=[gr.HighlightedText(label="Text with entities")],
                    title="NER with dslim/bert-base-NER",
                    description="Find entities using the `dslim/bert-base-NER` model under the hood!",
                    allow_flagging="never",
                    examples=["My name is Michela and I live in Italy", "My name is Andrew and work at HuggingFace"])
demo.launch(share=True, server_port=int(os.getenv('PORT2', 7870)))

Closing server running on port: 7860




* Running on local URL:  http://127.0.0.1:7870
* Running on public URL: https://3f32dc6402b1b332fe.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




In [9]:
# Close the Demo
gr.close_all()

Closing server running on port: 7870
Closing server running on port: 7860


### Adding a helper function to merge tokens

In [12]:
# Mock function simulating an API response
def get_completion(input, parameters=None, ENDPOINT_URL=None):
    # Simulating tokens, some of which might be missing the 'entity' key
    return [
        {"word": "My", "start": 0, "end": 2, "score": 0.95, "entity": "B-PER"},
        {"word": "name", "start": 3, "end": 7, "score": 0.98, "entity": "I-PER"},
        {"word": "is", "start": 8, "end": 10, "score": 0.90},
        {"word": "Andrew", "start": 11, "end": 17, "score": 0.99, "entity": "I-PER"},
        {"word": "and", "start": 18, "end": 21, "score": 0.88},
        {"word": "I", "start": 22, "end": 23, "score": 0.93, "entity": "B-PER"},
        {"word": "live", "start": 24, "end": 28, "score": 0.97, "entity": "O"},
        {"word": "in", "start": 29, "end": 31, "score": 0.89, "entity": "O"},
        {"word": "California", "start": 32, "end": 42, "score": 0.96, "entity": "B-LOC"},
    ]

# Updated function to merge tokens (Helper function)
def merge_tokens(tokens):
    merged_tokens = []
    for token in tokens:
        if 'entity' not in token:
            continue  # Skip tokens without 'entity'
        if merged_tokens and token['entity'].startswith('I-') and merged_tokens[-1]['entity'].endswith(token['entity'][2:]):
            last_token = merged_tokens[-1]
            last_token['word'] += token['word'].replace('##', '')
            last_token['end'] = token['end']
            last_token['score'] = (last_token['score'] + token['score']) / 2
        else:
            merged_tokens.append(token)
    return merged_tokens

# Main NER function
def ner(input):
    try:
        output = get_completion(input, parameters=None, ENDPOINT_URL=None)
        merged_tokens = merge_tokens(output)
        return {"text": input, "entities": merged_tokens}
    except Exception as e:
        print(f"Error: {e}")
        return {"text": input, "entities": []}

# Gradio Interface
gr.close_all()
demo = gr.Interface(
    fn=ner,
    inputs=[gr.Textbox(label="Text to find entities", lines=2)],
    outputs=[gr.HighlightedText(label="Text with entities")],
    title="NER with dslim/bert-base-NER",
    description="Find entities using the `dslim/bert-base-NER` model under the hood!",
    allow_flagging="never",
    examples=[
        "My name is Andrew, I'm building DeeplearningAI and I live in California",
        "My name is Michela, I live in Italy and learn from HuggingFace"
    ]
)

# Launch the app
demo.launch(share=True, server_port=int(os.environ.get('PORT4', 7860)))


Closing server running on port: 7870
Closing server running on port: 7860
Closing server running on port: 7880




* Running on local URL:  http://127.0.0.1:7890
* Running on public URL: https://199c0bb68dc1f208a1.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




In [13]:
# Close the demo
gr.close_all()

Closing server running on port: 7870
Closing server running on port: 7860
Closing server running on port: 7890
Closing server running on port: 7880


## Conclusion
In this notebook, we demonstrated how to perform text summarization and named entity recognition using models from Hugging Face. We also showed how to create interactive web interfaces using Gradio. For more advanced applications, consider exploring additional models and features provided by:
- [Gradio Documentation](https://gradio.app)
- [Hugging Face API](https://huggingface.co/docs)




## Next Steps

- Experiment with Different Promts: Try using other prompts!
- Experiment with Different Models: Try using other text summarization and named entity recognition  models available on Hugging Face to see how their performance compares!
- Deploy the App: Deploy your Gradio app to Hugging Face Spaces to make it accessible to others!