The purpose of this notebook is to demonstrate the complete production pipeline for the patent novelty assessment system.

**Pipeline Setup:**
1. Load models and data
2. Process user input (patent title, abstract, claims)
3. Retrieve similar patents (FAISS + Online search)
4. Extract features for each patent pair
5. Score similarity using PyTorch Neural Network
6. Generate novelty assessment
7. Create LLM explanation

**Usage:** Run all cells sequentially to see the full pipeline in action.


In [None]:
import sys
from pathlib import Path
import numpy as np
import json
from typing import Dict, List, Optional

# Add project root to path
project_root = Path().resolve().parent
sys.path.insert(0, str(project_root))

# Import project modules
from src.app.patent_analyzer import PatentAnalyzer
from src.models.pytorch_classifier import PyTorchPatentClassifier
from src.features.feature_extract import FeatureExtractor
from src.embeddings.patent_sberta import PatentEmbedder

print("Imports successful!")
print(f"Project root: {project_root}")


## Step 1: Initialize the Patent Analyzer

Load all required models and data.


In [None]:
# Initialize analyzer (this loads all models)
analyzer = PatentAnalyzer(
    use_full_phi3=True,
    use_online_search=False,  # Set to True if you have a SerpAPI key
    use_llm_keywords=False
)

# Load all components
print("Loading models and data...")
analyzer.load()

print("\n Patent Analyzer initialized successfully!")
print(f" Embedding model: PatentSBERTa")
print(f" Classification model: PyTorch Neural Network")
print(f" LLM explainer: Phi-3 (Ollama)")
print(f" Patent database: {len(analyzer.patents)} patents")


## Step 2: Define Example Patent

Input a patent application to assess for novelty.

In [None]:
# Sample patent application (input)
example_patent = {
    "title": "Smart water bottle with hydration tracking sensors and Bluetooth connectivity",
    "abstract": "A smart water bottle system that tracks user hydration levels using embedded sensors. The bottle communicates with a mobile application via Bluetooth to provide real-time hydration monitoring and reminders. The system includes a capacitive sensor array to measure water level and an accelerometer to detect drinking events.",
    "claims": [
        "1. A smart water bottle comprising: a container for holding liquid; a sensor array configured to measure the volume of liquid in the container; a wireless communication module; and a processor configured to transmit hydration data to an external device.",
        "2. The smart water bottle of claim 1, wherein the sensor array includes capacitive sensors arranged around the container perimeter.",
        "3. The smart water bottle of claim 1, further comprising an accelerometer configured to detect drinking events."
    ]
}

print("Example Patent Application:")
print(f"Title: {example_patent['title']}")
print(f"Abstract: {example_patent['abstract'][:100]}...")
print(f"Claims: {len(example_patent['claims'])} claims")


## Step 3: Run Novelty Assessment

Execute the complete pipeline.


In [None]:
# Format input
input_text = f"{example_patent['title']}\n\n{example_patent['abstract']}\n\nClaims:\n" + "\n".join(example_patent['claims'])

# Run analysis
print("Running novelty assessment...")
print("="*60)

result = analyzer.analyze(input_text)

print("\n" + "="*60)
print("ANALYSIS COMPLETE")
print("="*60)


## Step 4: Display Results

Show the novelty assessment results.


In [None]:
if result.success:
    print("\n NOVELTY ASSESSMENT")
    print("-" * 60)
    print(f"Novelty Score: {result.novelty_score:.3f}")
    print(f"Assessment: {result.assessment}")
    
    if result.similar_patents:
        print(f"\n Found {len(result.similar_patents)} similar patents:")
        for i, patent in enumerate(result.similar_patents[:5], 1):
            print(f"\n  {i}. {patent.get('title', 'N/A')}")
            print(f"     Patent ID: {patent.get('patent_id', 'N/A')}")
            print(f"     Similarity: {patent.get('similarity_score', 0):.3f}")
            print(f"     Year: {patent.get('year', 'N/A')}")
    
    if result.explanation:
        print(f"\n EXPLANATION")
        print("-" * 60)
        print(result.explanation)
else:
    print(f" Analysis failed: {result.error}")


## Step 5: Pipeline Breakdown

Examine each step of the pipeline in detail.


In [None]:
# 1. Input parsing
print("\n1. Input Parsing")
parsed = analyzer.input_handler.parse(input_text)
print(f"   Mode: {parsed.mode}")
print(f"   Title: {parsed.title[:50]}...")
print(f"   Abstract length: {len(parsed.abstract)} chars")
print(f"   Claims: {len(parsed.claims)} claims")

# 2. Similarity search
print("\n2. Similarity Search")
if result.similar_patents:
    print(f"   Retrieved {len(result.similar_patents)} similar patents")
    print(f"   Top similarity: {result.similar_patents[0].get('similarity_score', 0):.3f}")

# 3. Feature extraction
print("\n3. Feature Extraction")
print(f"   Extracted 13 features for each patent pair")
print(f"   Features include: BM25, cosine similarity, CPC overlap, etc.")

# 4. Classification
print("\n4. Classification Step")
print(f"   Model: PyTorch Neural Network")
print(f"   Similarity probability: {1 - result.novelty_score:.3f}")
print(f"   Novelty score: {result.novelty_score:.3f}")

# 5. Explanation
print("\n5. LLM Explanation")
print(f"   Model: Phi-3 (Ollama)")
print(f"   Generated detailed explanation")


## Summary

1. **Input Processing**: Parse patent title, abstract, and claims
2. **Retrieval**: Find similar patents using FAISS (local) and optionally online search
3. **Feature Extraction**: Compute 13 features for each patent pair
4. **Classification**: Score similarity using PyTorch Neural Network (91.82% accuracy)
5. **Novelty Assessment**: Convert similarity to novelty score (1 - similarity)
6. **Explanation**: Generate detailed explanation using Phi-3 LLM

**Key Components:**
- PatentSBERTa embeddings for semantic similarity
- PyTorch Neural Network for classification
- Phi-3 (Ollama) for natural language explanations
- Feature engineering (BM25, cosine similarity, CPC, etc.)

**Output:**
- Novelty score (0-1, higher = more novel)
- Assessment category (NOVEL / MODERATELY NOVEL / LOW NOVELTY)
- List of similar prior art patents
- Detailed explanation of the assessment
