<a href="https://colab.research.google.com/github/abdul9870/abdul9870/blob/main/project%204_Structured_JSON_Formatting_Phi2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Day 5: Structured Output & Prompt Chaining with Phi-2
**Duration:** ~2 hours of teaching + hands‑on exercises

This notebook uses the open-source Phi-2 model (a 2.7 billion parameter model) for exploring structured output and prompt chaining. Note: Phi-2 is a base model and may require careful prompting for reliable JSON extraction and prompt chaining, as it is not instruction-tuned like some other models.


## 1. Setup & Installation

In [None]:
# User needs to provide a token when prompted by the next command
# Authenticate with Hugging Face Hub. You may need to provide an access token.
!huggingface-cli login

# Install necessary Python packages for the notebook.
# bitsandbytes: for 8-bit quantization
# accelerate: for efficient model loading and hardware management
# transformers: Hugging Face library for models and tokenizers
# sentencepiece: tokenizer often used with models like Mistral
# pandas: for data manipulation, used for displaying results
!pip install bitsandbytes accelerate transformers sentencepiece einops pandas



    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) y
Token is valid (permission: read).
The token `Read` has been saved to /root/.cache/huggingface/stored_tokens
[1m[31mCannot authenticate through git-credential as no helper is defined on your machine.
You might have to re-authenticate w

## 2. Load Phi-2 (8‑bit)

In [None]:
import json, re
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import pandas as pd

# Model selection and quantization
model_name = 'microsoft/phi-2'
# Load the tokenizer for Mistral-7B-Instruct.
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=False)
# Load the Mistral-7B-Instruct model with 8-bit quantization for memory efficiency.
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map='auto',  # Automatically map model parts to available devices (GPU/CPU),      # automatically place on GPU/CPU
    load_in_8bit=True  # Enable 8-bit quantization       # 8-bit quantization for memory efficiency
)
# Set the model to evaluation mode (disables dropout, etc.)
model.eval()
device = next(model.parameters()).device
print(f'Loaded {model_name} on {device}')


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/7.34k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/1.08k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/735 [00:00<?, ?B/s]

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


model.safetensors.index.json:   0%|          | 0.00/35.7k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/564M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Loaded microsoft/phi-2 on cuda:0


## 3. JSON Extraction Utility

In [None]:
# Utility function to generate text from a prompt using the loaded Mistral model and parse JSON from the output.
def extract_json(prompt: str, max_tokens: int = 200) -> dict:
    '''Generate JSON from prompt using Mistral and parse it.'''
# Tokenize the prompt and move tensors to the model's device.
    inputs = tokenizer(prompt, return_tensors='pt').to(device)
# Generate text using the model.
    outputs = model.generate(**inputs, max_new_tokens=max_tokens)
# Decode only the newly generated tokens, skipping special tokens and stripping whitespace.
    text = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True).strip() # Decode only generated tokens
    print('Raw output:', text)
# Use regex to find the JSON block (handles multi-line JSON strings).
    m = re.search(r'{.*}', text, re.DOTALL)
    if not m:
        raise ValueError(f'No JSON found in:\n{text}')
# Parse the extracted JSON string into a Python dictionary.
    return json.loads(m.group())


## 4. Example: Contact Information Extraction

In [None]:
text = 'Reach out to Dr. Alice Nguyen at alice.nguyen@univ.edu or +44-20-7946-0958.'
# Construct the prompt for the LLM, instructing it to return JSON.
prompt = (
    'You are a JSON extractor. Respond ONLY with valid JSON containing keys:'
    ' fullName (string), email (string), phone (string)\n'
    f'Text: {text}\nOutput:'
)
# Call the extraction/processing function and print/display the result.
result = extract_json(prompt)
print('Parsed result:', result)


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Raw output: {"fullName": "Alice Nguyen", "email": "alice.nguyen@univ.edu", "phone": "+44-20-7946-0958"}
Parsed result: {'fullName': 'Alice Nguyen', 'email': 'alice.nguyen@univ.edu', 'phone': '+44-20-7946-0958'}


## 5. Example: Invoice Parsing

In [None]:
def parse_invoice(text: str) -> dict:
# Construct the prompt for the LLM, instructing it to return JSON.
    prompt = (
        'Parse the invoice into JSON with keys: invoice_number (string), date (YYYY-MM-DD), '
        'items (array of {description, qty, unit_price}), total_amount (string).\n'
        f'Invoice:\n{text}\nOutput:'
    )
# Call the extraction/processing function and print/display the result.
    return extract_json(prompt)

invoice_text = '''
Invoice No: INV-2025-0401
Date: 2025-04-01
1x Widget A @ $10.00
2x Widget B @ $15.50
Total: $41.00
'''
# Call the extraction/processing function and print/display the result.
print(parse_invoice(invoice_text))


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Raw output: {
  "invoice_number": "INV-2025-0401",
  "date": "2025-04-01",
  "items": [
    {
      "description": "Widget A",
      "qty": "1",
      "unit_price": "$10.00"
    },
    {
      "description": "Widget B",
      "qty": "2",
      "unit_price": "$15.50"
    }
  ],
  "total_amount": "$41.00"
}
{'invoice_number': 'INV-2025-0401', 'date': '2025-04-01', 'items': [{'description': 'Widget A', 'qty': '1', 'unit_price': '$10.00'}, {'description': 'Widget B', 'qty': '2', 'unit_price': '$15.50'}], 'total_amount': '$41.00'}


## 6. Example: Resume Parsing

In [None]:
def parse_resume(text: str) -> dict:
# Construct the prompt for the LLM, instructing it to return JSON.
    prompt = (
        'Extract resume details as JSON with keys: name, email, phone, '
        'education (array of {degree, institution, year}), skills (array of strings).\n'
        f'Resume:\n{text}\nOutput:'
    )
# Call the extraction/processing function and print/display the result.
    return extract_json(prompt)

resume = ('John Doe\nEmail: john.doe@gmail.com\nPhone: 555-1234\n'
          'B.Sc. Computer Science, MIT, 2018\nSkills: Python, ML, Docker')
# Call the extraction/processing function and print/display the result.
print(parse_resume(resume))


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Raw output: {
  "name": "John Doe",
  "email": "john.doe@gmail.com",
  "phone": "555-1234",
  "education": [
    {
      "degree": "B.Sc.",
      "institution": "MIT",
      "year": "2018"
    }
  ],
  "skills": ["Python", "ML", "Docker"]
}
{'name': 'John Doe', 'email': 'john.doe@gmail.com', 'phone': '555-1234', 'education': [{'degree': 'B.Sc.', 'institution': 'MIT', 'year': '2018'}], 'skills': ['Python', 'ML', 'Docker']}


## 7. Prompt Chaining: Customer Review Analysis

In [None]:
import pandas as pd
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import json
import re

# # Load Mistral model
# model_name = "microsoft/phi-2"  # adjust if different
# tokenizer = AutoTokenizer.from_pretrained(model_name)
# model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")
# device = model.device

In [None]:
# Helper: Generate text from prompt
def generate_response(prompt, max_new_tokens=100):
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    output = model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False)
    return tokenizer.decode(output[0], skip_special_tokens=True).strip()

# Helper: Extract JSON block from LLM output
def extract_json(prompt):
    response = generate_response(prompt)
    try:
        json_str = re.search(r'{.*}', response, re.DOTALL).group()
        return json.loads(json_str)
    except:
        try:
            json_array = re.search(r'\[.*\]', response, re.DOTALL).group()
            return json.loads(json_array)
        except:
            return {}

# Steps
def step1_overview(text):
    prompt = f"""Extract JSON with the following keys: product_name, sentiment (Positive/Neutral/Negative), rating (1–5).
Review: {text}
Output:"""
    return extract_json(prompt)

def step2_praised(text):
    prompt = f"""List the features praised in this review as a JSON array of strings.
Review: {text}
Output:"""
    return extract_json(prompt)

def step3_criticized(text):
    prompt = f"""List the aspects criticized in this review as a JSON array of strings.
Review: {text}
Output:"""
    return extract_json(prompt)

def step4_summary(p, c):
    prompt = f"""Summarize the following in one sentence. Praised: {p}. Criticized: {c}.
Summary:"""
    return generate_response(prompt, max_new_tokens=60)

def analyze(review):
    o = step1_overview(review['text'])
    p = step2_praised(review['text'])
    c = step3_criticized(review['text'])
    s = step4_summary(p, c)
    return {**review, **o, 'praised_features': p, 'criticized_aspects': c, 'summary': s}

# Input reviews
reviews = [
    {'id': 'R001', 'text': 'I love my new SuperPhone X! Battery life is great, but the camera is slow. 4/5 stars.'},
    {'id': 'R002', 'text': "CoffeeMax 5000 brews slowly and leaks occasionally. I'd give it 2 stars."},
    {'id': 'R003', 'text': 'SwiftBook Pro is fast and sleek. Keyboard feels amazing. 5 stars!'}
]

# Run analysis
df = pd.DataFrame([analyze(r) for r in reviews])
df


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Unnamed: 0,id,text,product_name,sentiment,rating,praised_features,criticized_aspects,summary
0,R001,I love my new SuperPhone X! Battery life is gr...,SuperPhone X,Neutral,4,"[great battery life, slow camera]",[slow camera],Summarize the following in one sentence. Prais...
1,R002,CoffeeMax 5000 brews slowly and leaks occasion...,CoffeeMax 5000,Negative,2,"[slow, leaks]","[slowly, leaks occasionally]",Summarize the following in one sentence. Prais...
2,R003,SwiftBook Pro is fast and sleek. Keyboard feel...,SwiftBook Pro,Positive,5,"[fast, sleek, amazing keyboard]","[fast, sleek, keyboard feels amazing]",Summarize the following in one sentence. Prais...


## 8. Error Handling & Fallback

In [None]:
try:
    bad = extract_json('No JSON here')
except ValueError as e:
    print('Error caught:', e)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


## 9. Next Steps
- Extend invoice parser with supplier info
- Add multi-language prompts
- Deploy as API using FastAPI
- Experiment with other LLMs or fine-tuning