<div align="center">
<img src="https://poorit.in/image.png" alt="Poorit" width="40" style="vertical-align: middle;"> <b>AI SYSTEMS ENGINEERING 1</b>

## Unit 3: Build a Text Analyzer App

**CV Raman Global University, Bhubaneswar**  
*AI Center of Excellence*

---

</div>

---

### What You'll Learn

In this notebook, you will:

1. **Understand local models vs APIs** — free, private, no API keys needed
2. **Use new HuggingFace pipelines** — named entity recognition and zero-shot classification
3. **Build flexible UIs with `gr.Blocks`** — rows, columns, and tabs
4. **Create a multi-tool text analyzer app** with Gradio

**Prerequisites:** Notebook 01 (HuggingFace Pipelines & Tokenization), Unit 2 Gradio basics

---

## 1. Environment Setup

In [None]:
!pip install -q transformers torch gradio

In [None]:
import torch
import gradio as gr
from transformers import pipeline, AutoTokenizer

device = 0 if torch.cuda.is_available() else -1
print(f"Using: {'GPU' if device == 0 else 'CPU'}")

---

## 2. Local Models vs APIs

In Unit 2, we built Gradio apps powered by **API-based models** (OpenAI, Gemini). In this notebook, we'll use **local HuggingFace models** instead.

| | API Models (Unit 2) | Local Models (This Notebook) |
|---|---|---|
| **Cost** | Pay per token | Free |
| **Setup** | Need API keys | No keys needed |
| **Privacy** | Data sent to provider | Data stays on your machine |
| **Capability** | Very powerful (GPT-4, Gemini) | Smaller, task-specific |
| **Speed** | Fast (provider hardware) | Depends on your hardware |

**Key insight:** Local models are great for well-defined tasks (classification, NER, summarization) where you don't need a massive general-purpose model.

---

## 3. Loading Our Pipelines

We'll load three pipelines for our text analyzer app. You already used `sentiment-analysis` in Notebook 01 — now let's add two more.

### Sentiment Analysis

Classifies text as positive or negative. Same pipeline from Notebook 01.

In [None]:
sentiment = pipeline("sentiment-analysis", device=device)

result = sentiment("I really enjoyed this course on AI!")
print(result)

### Named Entity Recognition (NER)

Identifies entities in text — people, organizations, locations, etc.

In [None]:
ner = pipeline("ner", aggregation_strategy="simple", device=device)

result = ner("Sundar Pichai is the CEO of Google, headquartered in Mountain View, California.")

for entity in result:
    print(f"{entity['word']:20s} → {entity['entity_group']} (confidence: {entity['score']:.2f})")

### Zero-Shot Classification

Classifies text into categories **you define at runtime** — no retraining needed. The model decides which label fits best.

In [None]:
classifier = pipeline("zero-shot-classification", device=device)

text = "The stock market rallied today after the Federal Reserve announced lower interest rates."
labels = ["politics", "sports", "finance", "technology", "health"]

result = classifier(text, candidate_labels=labels)

for label, score in zip(result["labels"], result["scores"]):
    print(f"  {label:12s} → {score:.2f}")

We'll also reuse the **GPT-2 tokenizer** from Notebook 01 for a token counting feature.

In [None]:
tokenizer = AutoTokenizer.from_pretrained("gpt2")

text = "Hello, I am studying AI at CV Raman University!"
tokens = tokenizer.encode(text)
print(f"Text: {text}")
print(f"Token count: {len(tokens)}")

---

## 4. Quick Recap — `gr.Interface`

In Unit 2, you learned the basic Gradio pattern:

```python
gr.Interface(fn=your_function, inputs="textbox", outputs="textbox").launch()
```

Let's wrap our sentiment pipeline in `gr.Interface` — same pattern as Unit 2, but now with a **local model** instead of an API.

In [None]:
def analyze_sentiment(text):
    result = sentiment(text)[0]
    return f"{result['label']} (confidence: {result['score']:.2f})"

In [None]:
gr.Interface(
    fn=analyze_sentiment,
    inputs=gr.Textbox(label="Enter text:", lines=3),
    outputs=gr.Textbox(label="Sentiment:"),
    title="Sentiment Analyzer",
    flagging_mode="never"
).launch()

This works, but `gr.Interface` gives you a fixed layout. What if we want more control — side-by-side panels, tabs, or multiple tools in one app?

---

## 5. Introducing `gr.Blocks`

`gr.Blocks` gives you full control over layout. Instead of the fixed input → output structure of `gr.Interface`, you can arrange components however you want.

| Feature | `gr.Interface` | `gr.Blocks` |
|---|---|---|
| Layout | Fixed (inputs → outputs) | Fully customizable |
| Complexity | Simple, one function | Multiple functions, interactions |
| Use case | Quick demos | Full applications |

### Side-by-Side Layout with `gr.Row`

Let's rebuild the sentiment analyzer with a side-by-side layout.

In [None]:
with gr.Blocks() as demo:
    gr.Markdown("## Sentiment Analyzer")

    with gr.Row():
        text_input = gr.Textbox(label="Enter text:", lines=3)
        result_output = gr.Textbox(label="Sentiment:")

    btn = gr.Button("Analyze")
    btn.click(fn=analyze_sentiment, inputs=text_input, outputs=result_output)

demo.launch()

Key differences from `gr.Interface`:

- **`gr.Row()`** places components side by side
- **`gr.Button()`** gives explicit submit control
- **`.click()`** connects the button to the function
- You choose where each component goes

---

## 6. Build the Text Analyzer App

Now let's combine all our pipelines into a single multi-tool app using **`gr.Tab`**. We'll build it incrementally — one tab at a time.

### Step 1: Define the functions

Each tab needs a function. Let's define them all.

In [None]:
def analyze_sentiment(text):
    """Return sentiment label and confidence."""
    result = sentiment(text)[0]
    return f"{result['label']} (confidence: {result['score']:.2f})"


def extract_entities(text):
    """Return named entities as formatted text."""
    results = ner(text)
    if not results:
        return "No entities found."

    output = ""
    for entity in results:
        output += f"**{entity['word']}** → {entity['entity_group']} ({entity['score']:.2f})\n\n"
    return output


def classify_text(text, labels):
    """Classify text into user-defined categories."""
    label_list = [l.strip() for l in labels.split(",") if l.strip()]
    if not label_list:
        return "Please enter at least one label (comma-separated)."
    result = classifier(text, candidate_labels=label_list)

    output = ""
    for label, score in zip(result["labels"], result["scores"]):
        bar = "█" * int(score * 20)
        output += f"**{label}:** {score:.2f} {bar}\n\n"
    return output


def count_tokens(text):
    """Return token count and breakdown."""
    tokens = tokenizer.encode(text)
    token_strings = tokenizer.convert_ids_to_tokens(tokens)

    output = f"**Total tokens:** {len(tokens)}\n\n"
    output += f"**Characters per token:** {len(text)/len(tokens):.1f}\n\n"
    output += "**Token breakdown:**\n\n"
    for tid, ts in zip(tokens, token_strings):
        output += f"- `{ts}` (ID: {tid})\n"
    return output

### Step 2: Build the app with tabs

We use `gr.Tab` inside `gr.Blocks` to create a tabbed interface. Each tab is its own mini-app.

In [None]:
with gr.Blocks() as app:
    gr.Markdown("# Text Analyzer")
    gr.Markdown("Analyze text using local AI models — no API keys needed.")

    with gr.Tab("Sentiment"):
        sent_input = gr.Textbox(label="Enter text:", lines=3)
        sent_output = gr.Textbox(label="Result:")
        sent_btn = gr.Button("Analyze Sentiment")
        sent_btn.click(fn=analyze_sentiment, inputs=sent_input, outputs=sent_output)

    with gr.Tab("Named Entities"):
        ner_input = gr.Textbox(label="Enter text:", lines=3)
        ner_output = gr.Markdown(label="Entities:")
        ner_btn = gr.Button("Extract Entities")
        ner_btn.click(fn=extract_entities, inputs=ner_input, outputs=ner_output)

    with gr.Tab("Classifier"):
        cls_input = gr.Textbox(label="Enter text:", lines=3)
        cls_labels = gr.Textbox(label="Categories (comma-separated):", value="politics, sports, finance, technology, health")
        cls_output = gr.Markdown(label="Classification:")
        cls_btn = gr.Button("Classify")
        cls_btn.click(fn=classify_text, inputs=[cls_input, cls_labels], outputs=cls_output)

    with gr.Tab("Token Counter"):
        tok_input = gr.Textbox(label="Enter text:", lines=3)
        tok_output = gr.Markdown(label="Token Info:")
        tok_btn = gr.Button("Count Tokens")
        tok_btn.click(fn=count_tokens, inputs=tok_input, outputs=tok_output)

app.launch()

That's the full app — four NLP tools in one interface, all running locally. Try each tab with different inputs.

**Sample texts to try:**

- **Sentiment:** `"This movie was absolutely brilliant, I loved every minute of it!"`
- **NER:** `"Elon Musk founded SpaceX in Hawthorne, California in 2002."`
- **Classifier:** `"NASA launched a new telescope to study distant galaxies."` with labels `science, politics, sports, entertainment`
- **Token Counter:** `"Tokenization splits text into smaller pieces called tokens."`

---

## 7. Exercise

**Add a 5th tab** to the text analyzer app. Choose one:

- **Text Generation** — use `pipeline("text-generation", model="gpt2")` to complete a user's prompt
- **Question Answering** — use `pipeline("question-answering")` with a context + question input

Starter code is provided below. Fill in the function and add your tab to the app.

In [None]:
# Exercise: Add a 5th tab

# Step 1: Load your pipeline
# generator = pipeline("text-generation", model="gpt2", device=device)


# Step 2: Define the function
def generate_text(prompt):
    # Use the pipeline to generate text from the prompt
    # Return the generated text
    pass


# Step 3: Build the full app with 5 tabs
# Copy the app code from above and add your new tab

# with gr.Blocks() as app:
#     gr.Markdown("# Text Analyzer")
#     ... (keep the 4 existing tabs)
#
#     with gr.Tab("Text Generator"):
#         gen_input = gr.Textbox(label="Enter a prompt:", lines=3)
#         gen_output = gr.Textbox(label="Generated text:", lines=5)
#         gen_btn = gr.Button("Generate")
#         gen_btn.click(fn=generate_text, inputs=gen_input, outputs=gen_output)
#
# app.launch()

---

## 8. Key Takeaways

1. **`gr.Blocks`** gives you full layout control — rows, columns, tabs

2. **`gr.Tab`** lets you build multi-page apps in a single interface

3. **HuggingFace pipelines** make it easy to add NLP features — sentiment, NER, classification, and more

4. **Local models are free and private** — no API keys, no data leaving your machine

5. **Same Gradio skills, different backend** — Unit 2 used APIs, this notebook used local models

**Stretch goal:** Deploy your app to [HuggingFace Spaces](https://huggingface.co/spaces) for free hosting.

---

## Additional Resources

- [Gradio Blocks Guide](https://www.gradio.app/guides/blocks-and-event-listeners)
- [Gradio Tabbed Interfaces](https://www.gradio.app/guides/controlling-layout#tabs)
- [HuggingFace Pipelines](https://huggingface.co/docs/transformers/pipeline_tutorial)
- [HuggingFace Spaces](https://huggingface.co/spaces)

---

**Course Information:**
- **Institution:** CV Raman Global University, Bhubaneswar
- **Program:** AI Center of Excellence
- **Course:** AI Systems Engineering 1
- **Developed by:** [Poorit Technologies](https://poorit.in) — *Transform Graduates into Industry-Ready Professionals*

---