 UNSPSC Product Classification System
 ## Representative Code Excerpt & Analysis
 
 **Author:** Parikshith  
 **Date:** October 2025  
 **Purpose:** Demonstrate core classification workflow combining hybrid retrieval, 
 hierarchical routing, and multi-stage scoring for UNSPSC commodity code prediction


 ## System Overview
 
 This system classifies industrial product descriptions to UNSPSC (United Nations 
 Standard Products and Services Code) commodity codes using a 6-stage pipeline:
 
 1. **Hybrid Retrieval** - Semantic (FAISS) + Lexical (BM25)
 2. **Hierarchical Routing** - Extract hierarchy constraints from top results
 3. **Cross-Encoder Reranking** - Deep semantic matching
 4. **Score Merging** - Combine retrieval + reranking scores
 5. **Hierarchy Boost** - Boost candidates within constrained paths
 6. **Feature Similarity** - Domain-specific feature matching

## Core Classification Pipeline


In [None]:
import numpy as np
from typing import List, Dict


class UNSPSCClassifier:
    """
    Main classification pipeline that orchestrates all components
    to predict UNSPSC commodity codes for product descriptions
    """
    
    def __init__(self, retriever, reranker, merger, hierarchical_router, 
                 feature_extractor):
        self.retriever = retriever
        self.reranker = reranker
        self.merger = merger
        self.hierarchical_router = hierarchical_router
        self.feature_extractor = feature_extractor
        self.commodity_features = {}
    
    def classify_product(self, description: str, top_k: int = 10) -> List[Dict]:
        """
        Core classification workflow - maps product description to UNSPSC codes
        
        Pipeline stages:
        ---------------
        1. Hybrid Retrieval: Combine semantic (FAISS) + lexical (BM25) search
        2. Hierarchical Routing: Extract hierarchy constraints from top results
        3. Cross-Encoder Reranking: Deep semantic matching with transformer
        4. Score Merging: Combine retrieval + reranking scores
        5. Hierarchy Boost: Boost candidates within constrained hierarchy paths
        6. Feature Similarity: Add domain-specific feature matching
        
        Args:
            description: Product description text
            top_k: Number of top predictions to return
            
        Returns:
            List of top-k predictions with scores and metadata
        """


STAGE 1: HYBRID RETRIEVAL (Semantic + Lexical)