# Conversation Management & Classification using Groq API

**Deliverables in this notebook:**

- Task 1: Conversation history management with truncation and periodic summarization.
- Task 2: JSON Schema classification & information extraction (name, email, phone, location, age).

**Usage:** The notebook supports `MOCK_MODE=True` so you can run all demos offline. To run live with Groq, set `MOCK_MODE=False` and provide your `GROQ_API_KEY` through Colab environment variables or secrets (do NOT commit keys to GitHub).


## Setup

Install lightweight dependencies and configure flags.

In [None]:
# Install dependencies
!pip install --quiet jsonschema requests nbformat

# Config
MOCK_MODE = True   # Set to False to call Groq/OpenAI-compatible endpoints
GROQ_API_KEY = ''  # Do NOT fill this in for public repos. Use environment vars in Colab.
GROQ_BASE_URL = 'https://api.groq.com/openai/v1'  # OpenAI-compatible base path for Groq
MODEL = 'openai/gpt-4o'  # Example model; change as per Groq docs

import os
if not GROQ_API_KEY:
    GROQ_API_KEY = os.environ.get('GROQ_API_KEY', '')

print('MOCK_MODE =', MOCK_MODE)

## Task 1 — Conversation History Management with Summarization

We keep `conversation_history` as a list of messages: `[{role:'user'|'assistant'|'system', 'content': '...'}]`.

Features:
- truncate by last n turns
- truncate by char/word length
- periodic summarization after every k-th run (configurable)

A MOCK summarizer is included so the notebook runs without API keys.

In [None]:
from typing import List, Dict, Optional
def append_message(history: List[Dict], role: str, content: str):
    history.append({'role': role, 'content': content})

def truncate_by_turns(history: List[Dict], last_n_turns: int) -> List[Dict]:
    # Keep last user+assistant pairs => approx last 2*last_n_turns messages
    keep = max(0, last_n_turns * 2)
    return history[-keep:] if keep and len(history) > keep else history.copy()

def truncate_by_chars(history: List[Dict], max_chars: int) -> List[Dict]:
    out = []
    total = 0
    for msg in reversed(history):
        l = len(msg['content'])
        if total + l > max_chars:
            break
        out.append(msg)
        total += l
    return list(reversed(out))

def history_to_text(history: List[Dict]) -> str:
    return '\n'.join([f"[{m['role']}] {m['content']}" for m in history])

In [None]:
import requests, json
def mock_summarize(text: str, max_tokens: int = 120) -> str:
    # Simple deterministic summary for offline demo
    first = text.strip().split('\n')[0][:200]
    return f"Summary: {first}... (orig_chars={len(text)})"

def groq_summarize(text: str, api_key: str, model: str='openai/gpt-4o', max_tokens: int = 200) -> str:
    # Example OpenAI-compatible call to Groq's chat/completions endpoint.
    # NOTE: This will work only when MOCK_MODE=False and a valid GROQ_API_KEY is provided.
    url = f"{GROQ_BASE_URL}/chat/completions"
    headers = {'Authorization': f'Bearer {api_key}', 'Content-Type': 'application/json'}
    payload = {
        'model': model,
        'messages': [
            {'role':'system', 'content':'You are a concise summarizer.'},
            {'role':'user', 'content': f'Summarize the following conversation concisely:\n\n{text}'}
        ],
        'max_output_tokens': max_tokens
    }
    resp = requests.post(url, headers=headers, json=payload, timeout=30)
    resp.raise_for_status()
    data = resp.json()
    # Try to extract content.
    if isinstance(data, dict):
        if 'choices' in data and len(data['choices'])>0 and 'message' in data['choices'][0]:
            return data['choices'][0]['message'].get('content','').strip()
        if 'output' in data and isinstance(data['output'], list):
            parts = []
            for o in data['output']:
                if isinstance(o, dict) and 'content' in o:
                    parts.append(o['content'])
            return ' '.join(parts).strip()
    return json.dumps(data)[:1000]

In [None]:
class ConversationManager:
    def __init__(self, mock_mode=True, api_key='', summarize_every_k=3, keep_tail=2):
        self.history = []
        self.run_count = 0
        self.mock_mode = mock_mode
        self.api_key = api_key
        self.summarize_every_k = summarize_every_k
        self.keep_tail = keep_tail  # number of recent messages to keep after summarization

    def user_message(self, text: str):
        append_message(self.history, 'user', text)
        self.run_count += 1
        if self.summarize_every_k>0 and self.run_count % self.summarize_every_k == 0:
            self.perform_summarization()

    def assistant_message(self, text: str):
        append_message(self.history, 'assistant', text)

    def perform_summarization(self):
        full = history_to_text(self.history)
        if self.mock_mode:
            summary = mock_summarize(full)
        else:
            summary = groq_summarize(full, api_key=self.api_key)
        tail = self.history[-self.keep_tail:] if len(self.history)>=self.keep_tail else self.history[:]
        self.history = [{'role':'system','content': summary}] + tail

    def get_history(self):
        return self.history

    def truncate(self, by_turns: Optional[int]=None, by_chars: Optional[int]=None):
        h = self.history
        if by_turns is not None:
            h = truncate_by_turns(h, by_turns)
        if by_chars is not None:
            h = truncate_by_chars(h, by_chars)
        return h

### Demo: Task 1 — Periodic summarization & truncation

We'll feed sample messages and show history after each run (summarization after k-th run).

In [None]:
mgr = ConversationManager(mock_mode=MOCK_MODE, api_key=GROQ_API_KEY, summarize_every_k=3, keep_tail=2)
samples = [
    'Hi, I want to apply for the internship. My name is Rahul.',
    'I live in Pune. My email is rahul@example.com.',
    'My phone number is 9123456789 and I am 22 years old.',
    'I can start from June.',
    'I have experience with Python and HTML.',
    'Please confirm interview schedule.'
]
for s in samples:
    mgr.user_message(s)
    mgr.assistant_message('Auto-reply (mock)')
    print('\n--- After run', mgr.run_count)
    for m in mgr.get_history():
        print(f"[{m['role']}]", m['content'][:200].replace('\n',' '))
    print('\nTruncate by last 2 turns:') 
    print(history_to_text(mgr.truncate(by_turns=2)))
    print('\nTruncate by 150 chars:')
    print(history_to_text(mgr.truncate(by_chars=150)))

## Task 2 — JSON Schema Classification & Information Extraction

We define a JSON schema to extract: name, email, phone, location, age. We'll show a heuristic extractor for MOCK mode and show how to call Groq/OpenAI function-calling style when MOCK_MODE=False.

In [None]:
from jsonschema import validate, ValidationError
schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "email": {"type": "string", "format": "email"},
        "phone": {"type": "string"},
        "location": {"type": "string"},
        "age": {"type": "integer"}
    },
    "required": ["name","email","phone","location"],
    "additionalProperties": False
}

sample_chats = [
    "Hello, I'm Priya Singh. My email is priya.singh@mail.com and phone 9876543210. I'm from Bangalore and I'm 24.",
    "Hey, this is Aman. Contact: aman1990@gmail.com, +91-9123456789. Mumbai-based.",
    "Name: Kavita; Email: kk@example.com; Phone: 7012345678; Location: Jaipur; Age: 28."
]

import re
def heuristic_extract(text: str) -> dict:
    out = {}
    # name heuristics
    m = re.search(r"(?:I\'?m|I am|this is)\s+([A-Z][a-z]+(?:\s+[A-Z][a-z]+)*)", text)
    if not m:
        m = re.search(r"Name:\s*([A-Z][a-z]+(?:\s+[A-Z][a-z]+)*)", text)
    if m: out['name'] = m.group(1).strip()
    # email
    me = re.search(r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}", text)
    if me: out['email'] = me.group(0)
    # phone
    mp = re.search(r"(\+?\d[\d\- ]{7,}\d)", text)
    if mp: out['phone'] = re.sub(r"[^0-9]", "", mp.group(0))
    # location
    ml = re.search(r"(from|based in|based|Mumbai|Bangalore|Jaipur|Location:|location:)\s*([A-Z][a-z]+(?:\s+[A-Z][a-z]+)*)", text)
    if ml: out['location'] = ml.group(2).strip()
    # age
    ma = re.search(r"\b(1\d{1}|[2-9]\d)\b", text)
    if ma: out['age'] = int(ma.group(0))
    return out

def validate_output(obj: dict):
    try:
        validate(instance=obj, schema=schema)
        return True, None
    except ValidationError as e:
        return False, str(e)

# Demo parse
for c in sample_chats:
    print('\nChat:', c)
    if MOCK_MODE:
        parsed = heuristic_extract(c)
    else:
        # Here demonstrate how you would call Groq/OpenAI function-calling style (not executed in MOCK)
        parsed = heuristic_extract(c)
    ok, err = validate_output(parsed)
    print('Parsed:', parsed, 'Valid:', ok, 'Error:', err)

### Example: Function-calling payload (template)

This cell shows how you'd prepare a function-calling style request compatible with OpenAI/Groq. Replace with actual call when MOCK_MODE=False and GROQ_API_KEY set.

In [None]:
def build_function_call_payload(chat_text: str):
    # Example function schema (as would be sent in function-calling scenarios)
    function_schema = {
        "name": "extract_contact_info",
        "description": "Extract contact details from user chat",
        "parameters": {
            "type": "object",
            "properties": {
                "name": {"type":"string"},
                "email": {"type":"string"},
                "phone": {"type":"string"},
                "location": {"type":"string"},
                "age": {"type":"integer"}
            },
            "required": ["name","email","phone","location"]
        }
    }
    messages = [
        {"role":"system","content":"You are a JSON extractor. Return valid JSON."},
        {"role":"user","content": chat_text}
    ]
    payload = {
        "model": MODEL,
        "messages": messages,
        "functions": [function_schema],
        "function_call": {"name": "extract_contact_info"}
    }
    return payload

# Example usage
print(build_function_call_payload('Hi, I am Rohit...'))

## How to push to GitHub (recommended)

1. Create a new repository on GitHub.
2. In Colab, configure git user and use a Personal Access Token (PAT) to push. Example:

```bash
!git config --global user.email "you@example.com"
!git config --global user.name "Your Name"
!git init
!git remote add origin https://github.com/yourusername/yourrepo.git
!git add Conversation_Management_and_Classification.ipynb
!git commit -m "Add assignment notebook"
!git push https://<TOKEN>@github.com/yourusername/yourrepo.git main
```

Use GitHub Secrets / Colab environment variables for tokens. Do NOT push API keys.


### Deliverables
- Conversation_Management_and_Classification.ipynb
- README.md (instructions included)

The ZIP below contains the notebook and a README file.