# Task 2: JSON Schema Classification & Extraction

## Objective
This notebook demonstrates structured data extraction from conversational text using Groq API with function calling. The system will:
- Define a JSON schema for extracting user information
- Use Groq API's OpenAI-compatible function calling to extract structured data
- Parse multiple sample chat conversations
- Validate extracted data against the defined schema
- Display structured JSON outputs with confidence scoring

## Schema Definition
Target extraction fields:
- `name`: Person's full name
- `email`: Email address
- `phone`: Phone number
- `location`: Geographic location (city, country, etc.)
- `age`: Age in years

## Requirements
- Standard Python libraries only
- `requests` library for API calls
- `openai` client (Groq API is OpenAI-compatible)
- Groq API key set as environment variable

## Setup and Dependencies

In [1]:
!pip install openai requests



In [2]:
import os
import json
import re
from typing import Dict, List, Optional, Any, Union
from openai import OpenAI
import requests
from datetime import datetime
from dataclasses import dataclass
from enum import Enum

## API Key Configuration

In [3]:
from google.colab import userdata
GROQ_API_KEY = userdata.get('GROQ_API_KEY')

In [4]:
if not GROQ_API_KEY:
    raise ValueError("Please set your GROQ_API_KEY environment variable or in the cell above")
client = OpenAI(
    api_key=GROQ_API_KEY,
    base_url="https://api.groq.com/openai/v1"
)
print("✅ Groq API client configured successfully")

✅ Groq API client configured successfully


## JSON Schema Definition and Validation

In [5]:
USER_INFO_SCHEMA = {
    "type": "object",
    "properties": {
        "name": {
            "type": "string",
            "description": "Person's full name (first and last name)"
        },
        "email": {
            "type": "string",
            "pattern": r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$",
            "description": "Valid email address"
        },
        "phone": {
            "type": "string",
            "description": "Phone number in any format"
        },
        "location": {
            "type": "string",
            "description": "Geographic location (city, state, country)"
        },
        "age": {
            "type": "integer",
            "minimum": 0,
            "maximum": 150,
            "description": "Person's age in years"
        }
    },
    "required": [],
    "additionalProperties": False
}
EXTRACT_USER_INFO_FUNCTION = {
    "name": "extract_user_info",
    "description": "Extract structured user information from conversational text",
    "parameters": USER_INFO_SCHEMA
}

print("✅ JSON schema and function definition created")
print("📋 Schema fields:", list(USER_INFO_SCHEMA["properties"].keys()))

✅ JSON schema and function definition created
📋 Schema fields: ['name', 'email', 'phone', 'location', 'age']


## User Information Extraction Class

In [9]:
@dataclass
class UserInfo:
    """Data class to hold extracted user information."""
    name: Optional[str] = None
    email: Optional[str] = None
    phone: Optional[str] = None
    location: Optional[str] = None
    age: Optional[int] = None
    confidence_score: Optional[float] = None
    extraction_source: Optional[str] = None

    def to_dict(self) -> Dict[str, Any]:
        return {k: v for k, v in self.__dict__.items() if v is not None}

    def is_valid(self) -> bool:
        return any([
            self.name, self.email, self.phone, self.location,
            self.age is not None
        ])


class UserInfoExtractor:
    def __init__(self, model: str = "llama-3.3-70b-versatile"):
        self.model = model
        self.extraction_history: List[Dict[str, Any]] = []

    def extract_from_conversation(self, conversation_text: str) -> UserInfo:
        try:
            response = client.chat.completions.create(
                model=self.model,
                messages=[
                    {
                        "role": "system",
                        "content": (
                            "You are an expert at extracting structured user information from conversations. "
                            "Extract any available user information including name, email, phone, location, and age. "
                            "Only extract information that is explicitly mentioned or clearly implied. "
                            "If information is not available, do not make assumptions or guesses."
                        )
                    },
                    {
                        "role": "user",
                        "content": f"Extract user information from this conversation:\n\n{conversation_text}"
                    }
                ],
                functions=[EXTRACT_USER_INFO_FUNCTION],
                function_call={"name": "extract_user_info"},
                temperature=0.1
            )
            if response.choices[0].message.function_call:
                function_args = json.loads(response.choices[0].message.function_call.arguments)
                user_info = UserInfo(**{k: v for k, v in function_args.items() if v})
                user_info.extraction_source = "groq_function_call"

                total_fields = 5
                extracted_fields = len([f for f in [user_info.name, user_info.email,
                                                   user_info.phone, user_info.location,
                                                   user_info.age] if f is not None])
                user_info.confidence_score = extracted_fields / total_fields

                self.extraction_history.append({
                    "timestamp": datetime.now().isoformat(),
                    "conversation_text": conversation_text[:200] + "...",
                    "extracted_info": user_info.to_dict(),
                    "success": True
                })

                return user_info
            else:
                print("⚠️ No function call found in API response")
                return UserInfo(extraction_source="no_function_call")

        except Exception as e:
            print(f"❌ Error extracting user info: {e}")
            self.extraction_history.append({
                "timestamp": datetime.now().isoformat(),
                "conversation_text": conversation_text[:200] + "...",
                "error": str(e),
                "success": False
            })
            return UserInfo(extraction_source="error")

    def validate_extracted_info(self, user_info: UserInfo) -> Dict[str, Union[bool, str]]:
        validation_results = {
            "is_valid": True,
            "errors": [],
            "warnings": []
        }
        if user_info.email:
            email_pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
            if not re.match(email_pattern, user_info.email):
                validation_results["errors"].append(f"Invalid email format: {user_info.email}")
                validation_results["is_valid"] = False
        if user_info.age is not None:
            if not (0 <= user_info.age <= 150):
                validation_results["errors"].append(f"Age out of valid range: {user_info.age}")
                validation_results["is_valid"] = False
        if user_info.phone:
            phone_digits = re.sub(r'[^0-9]', '', user_info.phone)
            if len(phone_digits) < 7 or len(phone_digits) > 15:
                validation_results["warnings"].append(f"Unusual phone number length: {user_info.phone}")
        if not user_info.is_valid():
            validation_results["warnings"].append("No user information extracted")

        return validation_results

    def display_extraction_results(self, user_info: UserInfo, conversation_text: str,
                                 validation_results: Dict[str, Any]):
        print("\n" + "="*60)
        print("🔍 USER INFORMATION EXTRACTION RESULTS")
        print("="*60)

        print(f"📝 Input Text: {conversation_text[:100]}...")
        print(f"\n📊 Extracted Information:")

        info_dict = user_info.to_dict()
        if info_dict:
            for key, value in info_dict.items():
                if key not in ['confidence_score', 'extraction_source']:
                    emoji = {
                        'name': '👤', 'email': '📧', 'phone': '📞',
                        'location': '📍', 'age': '🎂'
                    }
                    print(f"  {emoji.get(key, '📋')} {key.title()}: {value}")
        else:
            print("  ⚠️ No information extracted")

        if user_info.confidence_score is not None:
            confidence_percent = user_info.confidence_score * 100
            print(f"\n📈 Confidence Score: {confidence_percent:.1f}%")

        print(f"\n✅ Validation Results:")
        print(f"  Valid: {validation_results['is_valid']}")

        if validation_results['errors']:
            print(f"  ❌ Errors:")
            for error in validation_results['errors']:
                print(f"    - {error}")

        if validation_results['warnings']:
            print(f"  ⚠️ Warnings:")
            for warning in validation_results['warnings']:
                print(f"    - {warning}")

        print("="*60)

print("✅ UserInfoExtractor class created successfully")

✅ UserInfoExtractor class created successfully


## Sample Chat Conversations for Testing

In [10]:
SAMPLE_CONVERSATIONS = [
    {
        "title": "Complete User Profile",
        "conversation": """
User: Hi, I'm looking for help with setting up my account.
Assistant: Hello! I'd be happy to help you set up your account. Could you provide some basic information?
User: Sure! My name is Sarah Johnson, and I'm 28 years old. I live in Seattle, Washington.
Assistant: Great! And could I get your contact information?
User: Yes, my email is sarah.johnson@email.com and my phone number is (555) 123-4567.
Assistant: Perfect! I have all the information needed to set up your account.
"""
    },
    {
        "title": "Partial Information - Business Inquiry",
        "conversation": """
User: I'm interested in your consulting services.
Assistant: Excellent! What type of consulting are you looking for?
User: I need help with digital marketing for my startup. I'm Mike Chen, based in Toronto.
Assistant: Great to meet you, Mike! What's the best way to reach you?
User: You can email me at mike.chen.startup@gmail.com
Assistant: Perfect! I'll send you our service catalog and pricing information.
"""
    },
    {
        "title": "Support Ticket - Technical Issue",
        "conversation": """
User: I'm having trouble accessing my account after the recent update.
Assistant: I'm sorry to hear about the trouble. Let me help you resolve this issue.
User: I keep getting an error when I try to log in with my email alex.rodriguez@techcorp.com
Assistant: I see the issue. Let me check your account status.
User: I'm located in Miami, Florida if that helps with any regional settings.
Assistant: That's helpful information. I can see your account is affected by a regional server issue.
User: I'm 35 years old, not sure if age verification is part of the problem.
Assistant: Age verification shouldn't be an issue. Let me reset your login credentials.
"""
    },
    {
        "title": "Minimal Information - General Inquiry",
        "conversation": """
User: What are your business hours?
Assistant: We're open Monday through Friday, 9 AM to 6 PM EST.
User: Do you have weekend support?
Assistant: We offer limited weekend support for urgent issues. Is there something specific you need help with?
User: Not right now, just checking for future reference. Thanks!
Assistant: You're welcome! Feel free to contact us anytime you need assistance.
"""
    },
    {
        "title": "Phone-based Information",
        "conversation": """
User: I'd like to schedule a phone consultation.
Assistant: I'd be happy to help you schedule that. What's your preferred contact method?
User: Please call me at +1-202-555-9876. I'm usually available afternoons.
Assistant: Perfect! May I get your name for the appointment?
User: It's Dr. Emily Watson. I'm calling from Washington, DC.
Assistant: Thank you, Dr. Watson. I'll have someone reach out to schedule your consultation.
"""
    }
]

print(f"✅ Created {len(SAMPLE_CONVERSATIONS)} sample conversations for testing")
for i, conv in enumerate(SAMPLE_CONVERSATIONS, 1):
    print(f"  {i}. {conv['title']}")

✅ Created 5 sample conversations for testing
  1. Complete User Profile
  2. Partial Information - Business Inquiry
  3. Support Ticket - Technical Issue
  4. Minimal Information - General Inquiry
  5. Phone-based Information


## Demonstration 1: Extract Information from All Samples

In [11]:
extractor = UserInfoExtractor()

print("🚀 Starting comprehensive information extraction demonstration\n")
extraction_results = []

for i, sample in enumerate(SAMPLE_CONVERSATIONS, 1):
    print(f"\n{'='*20} SAMPLE {i}: {sample['title'].upper()} {'='*20}")

    user_info = extractor.extract_from_conversation(sample['conversation'])

    validation_results = extractor.validate_extracted_info(user_info)

    extractor.display_extraction_results(
        user_info, sample['conversation'], validation_results
    )

    extraction_results.append({
        'sample_title': sample['title'],
        'user_info': user_info,
        'validation': validation_results
    })

🚀 Starting comprehensive information extraction demonstration



🔍 USER INFORMATION EXTRACTION RESULTS
📝 Input Text: 
User: Hi, I'm looking for help with setting up my account.
Assistant: Hello! I'd be happy to help y...

📊 Extracted Information:
  👤 Name: Sarah Johnson
  📧 Email: sarah.johnson@email.com
  📞 Phone: (555) 123-4567
  📍 Location: Seattle, Washington
  🎂 Age: 28

📈 Confidence Score: 100.0%

✅ Validation Results:
  Valid: True


🔍 USER INFORMATION EXTRACTION RESULTS
📝 Input Text: 
User: I'm interested in your consulting services.
Assistant: Excellent! What type of consulting are...

📊 Extracted Information:
  👤 Name: Mike Chen
  📧 Email: mike.chen.startup@gmail.com
  📍 Location: Toronto

📈 Confidence Score: 60.0%

✅ Validation Results:
  Valid: True


🔍 USER INFORMATION EXTRACTION RESULTS
📝 Input Text: 
User: I'm having trouble accessing my account after the recent update.
Assistant: I'm sorry to hear...

📊 Extracted Information:
  👤 Name: Alex Rodriguez
  📧 Email: alex.rod

## Demonstration 2: JSON Output and Schema Validation

In [12]:
def create_json_output(extraction_results: List[Dict[str, Any]]) -> Dict[str, Any]:
    json_output = {
        "extraction_metadata": {
            "timestamp": datetime.now().isoformat(),
            "total_samples": len(extraction_results),
            "extractor_model": extractor.model,
            "schema_version": "1.0"
        },
        "extracted_profiles": [],
        "summary_statistics": {}
    }
    valid_extractions = 0
    total_fields_extracted = 0
    field_extraction_counts = {"name": 0, "email": 0, "phone": 0, "location": 0, "age": 0}

    for result in extraction_results:
        user_info = result['user_info']
        validation = result['validation']
        profile = {
            "sample_title": result['sample_title'],
            "extracted_data": user_info.to_dict(),
            "validation_status": {
                "is_valid": validation['is_valid'],
                "has_errors": len(validation['errors']) > 0,
                "has_warnings": len(validation['warnings']) > 0,
                "error_count": len(validation['errors']),
                "warning_count": len(validation['warnings'])
            }
        }

        if validation['errors'] or validation['warnings']:
            profile['validation_details'] = {
                'errors': validation['errors'],
                'warnings': validation['warnings']
            }

        json_output["extracted_profiles"].append(profile)

        if user_info.is_valid():
            valid_extractions += 1

        for field in field_extraction_counts.keys():
            if getattr(user_info, field, None) is not None:
                field_extraction_counts[field] += 1
                total_fields_extracted += 1

    json_output["summary_statistics"] = {
        "successful_extractions": valid_extractions,
        "success_rate": valid_extractions / len(extraction_results) * 100,
        "total_fields_extracted": total_fields_extracted,
        "average_fields_per_sample": total_fields_extracted / len(extraction_results),
        "field_extraction_rates": {
            field: (count / len(extraction_results) * 100)
            for field, count in field_extraction_counts.items()
        }
    }

    return json_output

json_results = create_json_output(extraction_results)
print("\n" + "="*70)
print("📄 COMPLETE JSON EXTRACTION RESULTS")
print("="*70)
print(json.dumps(json_results, indent=2))
print("="*70)


📄 COMPLETE JSON EXTRACTION RESULTS
{
  "extraction_metadata": {
    "timestamp": "2025-09-13T10:14:08.921100",
    "total_samples": 5,
    "extractor_model": "llama-3.3-70b-versatile",
    "schema_version": "1.0"
  },
  "extracted_profiles": [
    {
      "sample_title": "Complete User Profile",
      "extracted_data": {
        "name": "Sarah Johnson",
        "email": "sarah.johnson@email.com",
        "phone": "(555) 123-4567",
        "location": "Seattle, Washington",
        "age": 28,
        "confidence_score": 1.0,
        "extraction_source": "groq_function_call"
      },
      "validation_status": {
        "is_valid": true,
        "has_errors": false,
        "error_count": 0,
      }
    },
    {
      "sample_title": "Partial Information - Business Inquiry",
      "extracted_data": {
        "name": "Mike Chen",
        "email": "mike.chen.startup@gmail.com",
        "location": "Toronto",
        "confidence_score": 0.6,
        "extraction_source": "groq_function_call

## Demonstration 3: Advanced Schema Validation

In [13]:
def advanced_schema_validation(json_results: Dict[str, Any]) -> Dict[str, Any]:
    validation_report = {
        "schema_compliance": {
            "total_profiles": len(json_results["extracted_profiles"]),
            "compliant_profiles": 0,
            "non_compliant_profiles": 0,
            "compliance_rate": 0.0
        },
        "field_validation": {
            "valid_emails": 0,
            "invalid_emails": 0,
            "valid_ages": 0,
            "invalid_ages": 0,
            "valid_phones": 0,
            "questionable_phones": 0
        },
        "data_quality_metrics": {
            "completeness_scores": [],
            "average_completeness": 0.0
        },
        "detailed_issues": []
    }

    for profile in json_results["extracted_profiles"]:
        extracted_data = profile["extracted_data"]
        sample_title = profile["sample_title"]

        is_compliant = profile["validation_status"]["is_valid"]
        if is_compliant:
            validation_report["schema_compliance"]["compliant_profiles"] += 1
        else:
            validation_report["schema_compliance"]["non_compliant_profiles"] += 1
            validation_report["detailed_issues"].append({
                "sample": sample_title,
                "issues": profile.get("validation_details", {})
            })

        if "email" in extracted_data:
            email_pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
            if re.match(email_pattern, extracted_data["email"]):
                validation_report["field_validation"]["valid_emails"] += 1
            else:
                validation_report["field_validation"]["invalid_emails"] += 1

        if "age" in extracted_data:
            age = extracted_data["age"]
            if isinstance(age, int) and 0 <= age <= 150:
                validation_report["field_validation"]["valid_ages"] += 1
            else:
                validation_report["field_validation"]["invalid_ages"] += 1

        if "phone" in extracted_data:
            phone = extracted_data["phone"]
            phone_digits = re.sub(r'[^0-9]', '', phone)
            if 7 <= len(phone_digits) <= 15:
                validation_report["field_validation"]["valid_phones"] += 1
            else:
                validation_report["field_validation"]["questionable_phones"] += 1

        total_possible_fields = 5  # name, email, phone, location, age
        extracted_fields = len([f for f in ["name", "email", "phone", "location", "age"]
                              if f in extracted_data and extracted_data[f] is not None])
        completeness_score = (extracted_fields / total_possible_fields) * 100
        validation_report["data_quality_metrics"]["completeness_scores"].append(completeness_score)

    total_profiles = validation_report["schema_compliance"]["total_profiles"]
    compliant_profiles = validation_report["schema_compliance"]["compliant_profiles"]
    validation_report["schema_compliance"]["compliance_rate"] = (compliant_profiles / total_profiles) * 100

    completeness_scores = validation_report["data_quality_metrics"]["completeness_scores"]
    if completeness_scores:
        validation_report["data_quality_metrics"]["average_completeness"] = sum(completeness_scores) / len(completeness_scores)

    return validation_report

validation_report = advanced_schema_validation(json_results)
print("\n" + "="*70)
print("🔬 ADVANCED SCHEMA VALIDATION REPORT")
print("="*70)
print(json.dumps(validation_report, indent=2))
print("="*70)


🔬 ADVANCED SCHEMA VALIDATION REPORT
{
  "schema_compliance": {
    "total_profiles": 5,
    "compliant_profiles": 5,
    "non_compliant_profiles": 0,
    "compliance_rate": 100.0
  },
  "field_validation": {
    "valid_emails": 3,
    "invalid_emails": 0,
    "valid_ages": 2,
    "invalid_ages": 0,
    "valid_phones": 2,
    "questionable_phones": 0
  },
  "data_quality_metrics": {
    "completeness_scores": [
      100.0,
      60.0,
      80.0,
      0.0,
      60.0
    ],
    "average_completeness": 60.0
  },
  "detailed_issues": []
}


## Demonstration 4: Real-time Extraction Test

In [14]:
def test_custom_conversation():
    custom_conversation = """
User: Hi, I need to update my profile information.
Assistant: I can help you with that. What would you like to update?
User: My name is Jennifer Liu, and I just moved to San Francisco. I'm 32 now.
Assistant: I'll update your location to San Francisco and age to 32. Is Jennifer Liu the correct spelling?
User: Yes, that's correct. Also, my new work email is j.liu@innovatetech.com
Assistant: Perfect! I've noted your new email. Do you have a new phone number as well?
User: Actually yes, it's 415-555-0199.
Assistant: All updated! Your new information has been saved to your profile.
"""

    print("🧪 Testing custom conversation for real-time extraction\n")
    print("Input conversation:")
    print("-" * 50)
    print(custom_conversation)
    print("-" * 50)

    user_info = extractor.extract_from_conversation(custom_conversation)
    validation_results = extractor.validate_extracted_info(user_info)
    extractor.display_extraction_results(
        user_info, custom_conversation, validation_results
    )
    print("\n📋 JSON Output:")
    json_output = {
        "extracted_data": user_info.to_dict(),
        "validation": validation_results,
        "metadata": {
            "extraction_timestamp": datetime.now().isoformat(),
            "model_used": extractor.model,
            "confidence_score": user_info.confidence_score
        }
    }
    print(json.dumps(json_output, indent=2))
test_custom_conversation()

🧪 Testing custom conversation for real-time extraction

Input conversation:
--------------------------------------------------

User: Hi, I need to update my profile information.
Assistant: I can help you with that. What would you like to update?
User: My name is Jennifer Liu, and I just moved to San Francisco. I'm 32 now.
Assistant: I'll update your location to San Francisco and age to 32. Is Jennifer Liu the correct spelling?
User: Yes, that's correct. Also, my new work email is j.liu@innovatetech.com
Assistant: Perfect! I've noted your new email. Do you have a new phone number as well?
User: Actually yes, it's 415-555-0199.
Assistant: All updated! Your new information has been saved to your profile.

--------------------------------------------------

🔍 USER INFORMATION EXTRACTION RESULTS
📝 Input Text: 
User: Hi, I need to update my profile information.
Assistant: I can help you with that. What would ...

📊 Extracted Information:
  👤 Name: Jennifer Liu
  📧 Email: j.liu@innovatetech.

## Extraction History and Performance Analysis

In [15]:
def analyze_extraction_performance():
    history = extractor.extraction_history

    print("\n" + "="*70)
    print("📈 EXTRACTION PERFORMANCE ANALYSIS")
    print("="*70)

    total_extractions = len(history)
    successful_extractions = sum(1 for entry in history if entry.get('success', False))
    success_rate = (successful_extractions / total_extractions) * 100 if total_extractions > 0 else 0

    print(f"📊 Overall Statistics:")
    print(f"  Total Extractions: {total_extractions}")
    print(f"  Successful: {successful_extractions}")
    print(f"  Success Rate: {success_rate:.1f}%")

    field_counts = {"name": 0, "email": 0, "phone": 0, "location": 0, "age": 0}

    for entry in history:
        if entry.get('success') and 'extracted_info' in entry:
            extracted_info = entry['extracted_info']
            for field in field_counts.keys():
                if field in extracted_info and extracted_info[field] is not None:
                    field_counts[field] += 1

    print(f"\n🎯 Field Extraction Success Rates:")
    for field, count in field_counts.items():
        rate = (count / successful_extractions) * 100 if successful_extractions > 0 else 0
        emoji = {"name": "👤", "email": "📧", "phone": "📞", "location": "📍", "age": "🎂"}[field]
        print(f"  {emoji} {field.title()}: {count}/{successful_extractions} ({rate:.1f}%)")

    print(f"\n📋 Extraction History Summary:")
    for i, entry in enumerate(history, 1):
        status = "✅" if entry.get('success') else "❌"
        timestamp = entry.get('timestamp', 'Unknown')
        text_preview = entry.get('conversation_text', 'No text')[:50] + "..."
        print(f"  {i}. {status} {timestamp[:19]} - {text_preview}")

    print("="*70)

analyze_extraction_performance()


📈 EXTRACTION PERFORMANCE ANALYSIS
📊 Overall Statistics:
  Total Extractions: 6
  Successful: 6
  Success Rate: 100.0%

🎯 Field Extraction Success Rates:
  👤 Name: 5/6 (83.3%)
  📧 Email: 4/6 (66.7%)
  📞 Phone: 3/6 (50.0%)
  📍 Location: 5/6 (83.3%)
  🎂 Age: 3/6 (50.0%)

📋 Extraction History Summary:
  1. ✅ 2025-09-13T10:14:02 - 
User: Hi, I'm looking for help with setting up my...
  2. ✅ 2025-09-13T10:14:02 - 
User: I'm interested in your consulting services....
  3. ✅ 2025-09-13T10:14:03 - 
User: I'm having trouble accessing my account aft...
  4. ✅ 2025-09-13T10:14:03 - 
User: What are your business hours?
Assistant: We...
  5. ✅ 2025-09-13T10:14:03 - 
User: I'd like to schedule a phone consultation.
...
  6. ✅ 2025-09-13T10:14:18 - 
User: Hi, I need to update my profile information...


## Final Summary and Conclusions

This notebook successfully demonstrated a comprehensive JSON schema classification and extraction system using the Groq API. Here are the key achievements and insights:

### ✅ Successfully Implemented Features:

1. **JSON Schema Definition**: Created a robust schema for user information extraction with fields for name, email, phone, location, and age

2. **Function Calling Integration**: Successfully integrated Groq API's OpenAI-compatible function calling to extract structured data from conversational text

3. **Comprehensive Validation**: Implemented multi-level validation including:
   - Email format validation using regex patterns
   - Age range validation (0-150 years)
   - Phone number format checking
   - Overall data completeness assessment

4. **Multiple Test Cases**: Processed 5+ diverse conversation samples ranging from complete profiles to minimal information scenarios

### 📊 Extraction Performance Results:

- **High Success Rate**: Achieved successful information extraction from most conversation samples
- **Field-Specific Performance**: Different types of information showed varying extraction rates:
  - Names and locations: High extraction success when mentioned
  - Email addresses: Excellent extraction and validation accuracy
  - Phone numbers: Good extraction with format validation
  - Ages: Reliable extraction when explicitly mentioned

### 🔧 Technical Achievements:

- **Clean Architecture**: Modular design with separate classes for extraction, validation, and display
- **Error Handling**: Robust error handling for API failures and malformed responses
- **JSON Output**: Complete JSON serialization of results with metadata and statistics
- **Real-time Processing**: Demonstrated capability for processing new conversations dynamically

### 💡 Key Insights:

1. **Function Calling Effectiveness**: Groq API's function calling feature provides reliable structured data extraction from natural language

2. **Context Sensitivity**: The system successfully handles various conversational contexts (support tickets, business inquiries, account setup)

3. **Validation Importance**: Multi-layer validation catches extraction errors and ensures data quality

4. **Scalability**: The architecture supports easy extension to additional fields and validation rules

### 🚀 Production Readiness:

This implementation demonstrates production-ready capabilities including:
- Comprehensive error handling and logging
- Performance monitoring and statistics tracking
- JSON schema compliance validation
- Extensible architecture for additional use cases

The system successfully bridges the gap between unstructured conversational data and structured information required for database storage, CRM systems, or further processing pipelines.