<a href="https://colab.research.google.com/github/ashwin-yedte/visual-intelligence-travel-finance/blob/main/VLM_Pipeline_DualScore.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# VLM Pipeline - DUAL SCORE (Visual + Semantic)


System Overview
Complete pipeline that takes user images and recommends similar travel destinations.
Pipeline Steps:

1. Image Analysis - Validate, preprocess, extract CLIP embeddings
2. Destination Matching - Match each image with top-10 destinations
3. Theme Aggregation - Majority vote, re-rank by frequency
4. Recommended Destinations - Enrich with metadata, display in gallery

Step 1: Install Dependencies

In [2]:
print("Installing dependencies...")
!pip install -q transformers torch pillow fastapi uvicorn pyngrok scikit-learn
print("Dependencies installed!")

Installing dependencies...
Dependencies installed!


Step 2: Mount Google Drive

In [3]:
from google.colab import drive
drive.mount('/content/drive')
print("Google Drive mounted!")

Mounted at /content/drive
Google Drive mounted!


 Step 3: Import Libraries

In [4]:
import io
import os
import json
import numpy as np
from PIL import Image
from typing import Dict, Any, List, Tuple
from collections import defaultdict, Counter
import traceback

import torch
from transformers import CLIPModel, CLIPProcessor

from fastapi import FastAPI, File, UploadFile
from fastapi.responses import JSONResponse, HTMLResponse
from fastapi.middleware.cors import CORSMiddleware
import uvicorn
from threading import Thread
from pyngrok import ngrok

print("All imports successful!")

All imports successful!


Step 4: Configuration

In [5]:


class Config:
    CLIP_MODEL_NAME = "openai/clip-vit-base-patch32"

    # Image settings
    MAX_IMAGE_SIZE_MB = 10.0
    SUPPORTED_FORMATS = ['jpg', 'jpeg', 'png']
    MIN_DIMENSION = 100
    MAX_DIMENSION = 4000
    MIN_IMAGES = 1
    MAX_IMAGES = 5

    # Matching settings
    TOP_K_PER_IMAGE = 10
    MIN_SIMILARITY_SCORE = 0.20

    # DUAL SCORING WEIGHTS (Visual + Semantic)
    VISUAL_WEIGHT = 0.6      # 60% visual similarity
    SEMANTIC_WEIGHT = 0.4    # 40% semantic alignment

    # Re-ranking weights
    AVG_VISUAL_WEIGHT = 0.25
    AVG_SEMANTIC_WEIGHT = 0.25
    FREQUENCY_WEIGHT = 0.25
    CONSISTENCY_WEIGHT = 0.25

    # Semantic profiling
    TOP_K_FOR_SEMANTIC_PROFILE = 20

print("Configuration loaded with DUAL SCORING enabled")
print(f"  Visual Weight: {Config.VISUAL_WEIGHT}")
print(f"  Semantic Weight: {Config.SEMANTIC_WEIGHT}")

Configuration loaded with DUAL SCORING enabled
  Visual Weight: 0.6
  Semantic Weight: 0.4


Step 5: Load data VL Encoding Data

In [6]:
class RealDataLoader:
    def __init__(self, base_path: str):
        self.base_path = base_path
        self.embeddings_path = os.path.join(base_path, 'vl_encoding/embeddings')
        self.prompts_path = os.path.join(base_path, 'vl_encoding/prompts')
        self.metadata_path = os.path.join(base_path, 'landmarks/metadata.json')
        self.destination_ids = None
        self.destination_embeddings = None
        self.destination_prompts = None
        self.metadata = None
        self.destination_lookup = None

    def load_all(self):
        print("\nLoading VL Encoding Database...")

        # Load embeddings
        embeddings_file = os.path.join(self.embeddings_path, 'all_embeddings.npz')
        vl_data = np.load(embeddings_file)
        self.destination_ids = vl_data['destination_ids']
        self.destination_embeddings = vl_data['destination_embeddings']
        print(f"  Loaded {len(self.destination_ids)} embeddings")

        # Load prompts
        prompts_file = os.path.join(self.prompts_path, 'destination_prompts.json')
        with open(prompts_file, 'r') as f:
            self.destination_prompts = json.load(f)
        print(f"  Loaded {len(self.destination_prompts)} prompts")

        # Load metadata
        with open(self.metadata_path, 'r') as f:
            self.metadata = json.load(f)

        # Build destination lookup from metadata
        self.destination_lookup = {}
        for theme in self.metadata['themes']:
            for state in theme['states']:
                for dest in state['destinations']:
                    dest_id = dest['destination_id']
                    self.destination_lookup[dest_id] = {
                        'destination_name': dest['destination_name'],
                        'state': state['state_name'],
                        'theme': theme['theme_name']
                    }

        print(f"  Built lookup for {len(self.destination_lookup)} destinations")
        print("Database loaded\n")
        return self

    def get_destination_prompts(self, dest_id: str) -> Dict:
        """Get aggregated prompts for a destination"""
        dest_id_str = str(dest_id)
        if dest_id_str in self.destination_prompts:
            return self.destination_prompts[dest_id_str].get('aggregated_prompts', {})
        return {}

    def get_destination_info(self, dest_id: str) -> Dict:
        """Get complete destination information"""
        dest_id_str = str(dest_id)

        # Get name/state/theme from metadata lookup
        if dest_id_str in self.destination_lookup:
            lookup_data = self.destination_lookup[dest_id_str]
            destination_name = lookup_data['destination_name']
            state = lookup_data['state']
            theme = lookup_data['theme']
        else:
            # Fallback: parse from dest_id
            parts = dest_id_str.split('_')
            destination_name = ' '.join(parts[2:]).title() if len(parts) >= 3 else 'Unknown'
            state = parts[1].replace('-', ' ').title() if len(parts) >= 2 else 'India'
            theme = parts[0].title() if len(parts) >= 1 else 'Unknown'

        # Get prompts from destination_prompts
        prompts = {}
        if dest_id_str in self.destination_prompts:
            prompts = self.destination_prompts[dest_id_str].get('aggregated_prompts', {})

        return {
            'destination_id': dest_id_str,
            'destination_name': destination_name,
            'state': state,
            'theme': theme,
            'prompts': prompts
        }

real_data_loader = None
print("RealDataLoader defined")

RealDataLoader defined


**IMAGE ANALYSIS Comprehensive image validation, preprocessing, and CLIP**


**Image Validation:** Validates uploaded images (1-5 images) for format (JPG/PNG), size (max 10MB), and dimensions (100-4000px) to ensure quality input for processing.

**Visual Embedding Extraction:** Uses CLIP model to convert each RGB image into a 512-dimensional normalized embedding vector that captures visual features and semantics.

**Semantic Profile Building:** For each image, identifies top-20 visually similar destinations from the database and extracts their textual prompt
characteristics (categories like visual theme, activities, accommodation style).

**Weighted Aggregation:** Aggregates semantic prompts from similar destinations, weighted by visual similarity scores and prompt confidence, creating a multi-category user preference profile.

**Output Generation**: Returns validated embeddings and semantic profiles for each processed image, ready for destination matching with dual-scoring (visual + semantic alignment).

In [7]:
class ImageValidator:
    def __init__(self):
        self.max_size_mb = Config.MAX_IMAGE_SIZE_MB
        self.supported_formats = Config.SUPPORTED_FORMATS
        self.min_dimension = Config.MIN_DIMENSION
        self.max_dimension = Config.MAX_DIMENSION

    def validate_image(self, image_bytes: bytes, filename: str) -> Dict[str, Any]:
        size_mb = len(image_bytes) / (1024 * 1024)

        if size_mb > self.max_size_mb:
            return {'valid': False, 'error': "Image too large"}
        if size_mb == 0:
            return {'valid': False, 'error': "Empty file"}

        try:
            img = Image.open(io.BytesIO(image_bytes))
            img_format = img.format.lower() if img.format else 'unknown'

            if img_format not in self.supported_formats:
                return {'valid': False, 'error': "Unsupported format"}

            width, height = img.size
            if width < self.min_dimension or height < self.min_dimension:
                return {'valid': False, 'error': "Image too small"}
            if width > self.max_dimension or height > self.max_dimension:
                return {'valid': False, 'error': "Image too large"}

            return {'valid': True}
        except Exception as e:
            return {'valid': False, 'error': f"Corrupted image: {str(e)}"}

print("ImageValidator defined!")

ImageValidator defined!


In [8]:

class ImageAnalyzer:
    def __init__(self, model, processor, device='cpu'):
        self.model = model
        self.processor = processor
        self.device = device

    def extract_clip_features(self, outputs):
        """Universal fix for extracting tensors from CLIP outputs - COPIED FROM VL ENCODING"""
        # PRIORITY 1: Direct tensor
        if torch.is_tensor(outputs):
            return outputs

        # PRIORITY 2: pooler_output (CLIP's projected features)
        if hasattr(outputs, 'pooler_output') and outputs.pooler_output is not None:
            return outputs.pooler_output

        # PRIORITY 3: text_embeds or image_embeds (newer CLIP versions)
        if hasattr(outputs, 'text_embeds') and outputs.text_embeds is not None:
            return outputs.text_embeds
        if hasattr(outputs, 'image_embeds') and outputs.image_embeds is not None:
            return outputs.image_embeds

        # PRIORITY 4: last_hidden_state (take CLS token)
        if hasattr(outputs, 'last_hidden_state') and outputs.last_hidden_state is not None:
            return outputs.last_hidden_state[:, 0, :]

        raise ValueError(f"Could not extract features from: {type(outputs)}")

    def extract_embedding(self, image_bytes: bytes) -> np.ndarray:
        img = Image.open(io.BytesIO(image_bytes))
        if img.mode != 'RGB':
            img = img.convert('RGB')

        inputs = self.processor(images=img, return_tensors="pt", padding=True)
        inputs = {k: v.to(self.device) for k, v in inputs.items()}

        with torch.no_grad():
            outputs = self.model.get_image_features(**inputs)
            # Use the EXACT SAME extraction as VL Encoding
            image_features = self.extract_clip_features(outputs)
            # Normalize
            image_features = image_features / image_features.norm(dim=-1, keepdim=True)

        return image_features.cpu().numpy()[0]

In [9]:
def build_semantic_profile(image_embedding: np.ndarray) -> Dict[str, Dict[str, float]]:
    """
    Build semantic profile by finding visually similar destinations
    and extracting their prompt characteristics.

    This is the KEY: We use visual similarity to find relevant destinations,
    then extract THEIR prompt features to build user's semantic profile.
    """

    # Normalize
    image_embedding = image_embedding / np.linalg.norm(image_embedding)

    # Find top K visually similar destinations
    similarities = np.dot(real_data_loader.destination_embeddings, image_embedding)
    top_indices = np.argsort(similarities)[::-1][:Config.TOP_K_FOR_SEMANTIC_PROFILE]

    # Aggregate prompts from similar destinations
    semantic_profile = defaultdict(lambda: defaultdict(float))

    for idx in top_indices:
        dest_id = str(real_data_loader.destination_ids[idx])
        similarity = float(similarities[idx])

        # Get destination's aggregated prompts
        dest_prompts = real_data_loader.get_destination_prompts(dest_id)

        # Aggregate weighted by visual similarity
        for category, prompts in dest_prompts.items():
            if isinstance(prompts, list):
                for prompt_data in prompts:
                    if isinstance(prompt_data, dict) and 'text' in prompt_data:
                        prompt_text = prompt_data['text']
                        weighted_score = prompt_data.get('weighted_score',
                                                        prompt_data.get('avg_score', 0.5))

                        # Weight by: visual similarity * prompt's weighted score
                        weight = similarity * weighted_score
                        semantic_profile[category][prompt_text] += weight

    # Normalize each category
    normalized_profile = {}
    for category, prompts in semantic_profile.items():
        total = sum(prompts.values())
        if total > 0:
            normalized_profile[category] = {
                prompt: score / total
                for prompt, score in prompts.items()
            }

    return normalized_profile

print("Semantic Profile Builder defined!")

Semantic Profile Builder defined!


In [10]:
def compute_semantic_alignment(user_semantic: Dict, dest_id: str) -> float:
    """
    Compute semantic alignment between user profile and destination prompts.
    Returns score between 0 and 1.
    """

    dest_prompts = real_data_loader.get_destination_prompts(dest_id)

    if not dest_prompts or not user_semantic:
        return 0.0

    # Compute alignment for each category
    category_alignments = []

    for category in user_semantic.keys():
        if category not in dest_prompts:
            continue

        # Get user's distribution
        user_dist = user_semantic[category]

        # Get destination's distribution
        dest_dist = {}
        if isinstance(dest_prompts[category], list):
            for p in dest_prompts[category]:
                if isinstance(p, dict) and 'text' in p:
                    dest_dist[p['text']] = p.get('weighted_score', p.get('avg_score', 0.5))

        # Compute overlap
        overlap = 0.0
        total_weight = sum(user_dist.values())

        for prompt_text, user_score in user_dist.items():
            if prompt_text in dest_dist:
                overlap += min(user_score, dest_dist[prompt_text])

        if total_weight > 0:
            category_alignments.append(overlap / total_weight)

    if category_alignments:
        return float(np.mean(category_alignments))
    else:
        return 0.0

print("Semantic Alignment function defined!")

Semantic Alignment function defined!


In [11]:
def analyze_images(files_dict: Dict[str, bytes]) -> Dict[str, Any]:
    print(f"\nAnalyzing {len(files_dict)} images...")

    validator = ImageValidator()
    valid_images = []

    for filename, image_bytes in files_dict.items():
        result = validator.validate_image(image_bytes, filename)
        if result['valid']:
            valid_images.append({'filename': filename, 'bytes': image_bytes})
            print(f"  Valid: {filename}")

    if len(valid_images) == 0:
        return {'status': 'error', 'error': 'No valid images'}

    embeddings = []
    semantic_profiles = []  # NEW

    for img_data in valid_images:
        try:
            # Extract visual embedding
            embedding = image_analyzer.extract_embedding(img_data['bytes'])
            embeddings.append(embedding)

            # Build semantic profile (NEW)
            semantic_profile = build_semantic_profile(embedding)
            semantic_profiles.append(semantic_profile)

            print(f"  Semantic categories: {len(semantic_profile)}")

        except Exception as e:
            print(f"  Error: {str(e)}")
            continue

    if len(embeddings) == 0:
        return {'status': 'error', 'error': 'Processing failed'}

    return {
        'status': 'success',
        'num_processed': len(embeddings),
        'embeddings': embeddings,
        'semantic_profiles': semantic_profiles  # NEW
    }

print("Enhanced analyze_images defined!")

Enhanced analyze_images defined!


# Destination Matching (Dual Scoring)

**Visual Similarity Computation**: Calculates cosine similarity between user image embeddings and all destination embeddings in the database using normalized dot product operations.

**Semantic Alignment Scoring**: Computes semantic alignment by comparing user's extracted preference profile with each destination's aggregated textual prompts across multiple categories (theme, activities, atmosphere).

**Dual Score Combination**: Combines visual similarity (60% weight) and semantic alignment (40% weight) into a unified matching score for each user image-destination pair.

**Top-K Selection:** For each uploaded image, selects top-10 destinations based on combined dual scores, filtering out matches below minimum similarity threshold (0.0).

**Per-Image Results:** Returns structured matches for each image with destination metadata (ID, name, state, theme) and three separate scores (visual, semantic, combined) for transparency and debugging.

In [12]:
def match_destinations(user_embeddings: np.ndarray, semantic_profiles: List[Dict]) -> Dict[str, Any]:
    """
    Match using DUAL SCORING: Visual (60%) + Semantic (40%)
    """

    print(f"\nMatching with DUAL SCORING to {len(real_data_loader.destination_ids)} destinations...")

    per_image_matches = []
    all_matched = set()

    for img_idx, (user_emb, user_semantic) in enumerate(zip(user_embeddings, semantic_profiles)):
        # Normalize
        user_emb = user_emb / np.linalg.norm(user_emb)

        # Compute visual similarities
        visual_similarities = np.dot(real_data_loader.destination_embeddings, user_emb)

        # Compute combined scores
        combined_scores = []
        for idx in range(len(real_data_loader.destination_ids)):
            dest_id = str(real_data_loader.destination_ids[idx])

            visual_score = float(visual_similarities[idx])
            semantic_score = compute_semantic_alignment(user_semantic, dest_id)

            # DUAL SCORE: Weighted combination
            combined_score = (
                Config.VISUAL_WEIGHT * visual_score +
                Config.SEMANTIC_WEIGHT * semantic_score
            )

            combined_scores.append({
                'idx': idx,
                'dest_id': dest_id,
                'visual': visual_score,
                'semantic': semantic_score,
                'combined': combined_score
            })

        # Sort by combined score
        combined_scores.sort(key=lambda x: x['combined'], reverse=True)

        # Get top-K
        matches = []
        for rank, item in enumerate(combined_scores[:Config.TOP_K_PER_IMAGE], 1):
            if item['combined'] < Config.MIN_SIMILARITY_SCORE:
                continue

            dest_info = real_data_loader.get_destination_info(item['dest_id'])

            matches.append({
                'destination_id': item['dest_id'],
                'destination_name': dest_info['destination_name'],
                'visual_score': item['visual'],
                'semantic_score': item['semantic'],
                'similarity_score': item['combined'],  # For compatibility
                'theme': dest_info['theme'],
                'state': dest_info['state']
            })

            all_matched.add(item['dest_id'])

        per_image_matches.append({
            'image_index': img_idx,
            'top_destinations': matches
        })

        if matches:
            print(f"  Image {img_idx + 1}: {len(matches)} matches")
            print(f"    Top: {matches[0]['destination_name']}")
            print(f"      Visual: {matches[0]['visual_score']:.3f}, Semantic: {matches[0]['semantic_score']:.3f}, Combined: {matches[0]['similarity_score']:.3f}")

    return {
        'per_image_matches': per_image_matches,
        'total_unique_destinations': len(all_matched)
    }

print("Enhanced match_destinations with DUAL SCORING defined!")

Enhanced match_destinations with DUAL SCORING defined!


# Aggregation & Ranking

**Cross-Image Aggregation**:
Collects all destination matches across multiple user images, grouping by destination ID and tracking visual scores, semantic scores, combined scores, and appearance frequency.

**Statistical Metrics Calculation**: Computes average visual score, average semantic score, average combined score, and consistency score (1 - standard deviation) for each destination across all images where it appeared.

**Multi-Factor Ranking Score**: Generates final ranking using weighted combination of four factors: average visual similarity (25%), average semantic alignment (25%), appearance frequency (25%), and score consistency (25%).

**Theme & Metadata Extraction**: Identifies the most common theme for each destination across matches and retrieves complete metadata including destination name, state, and geo-location placeholders for future integration.

**Top-10 Selection:** Sorts all matched destinations by final ranking score in descending order, assigns rank positions (1-10), and returns top-10 personalized recommendations with detailed scoring breakdown for user transparency.

In [13]:
def aggregate_and_rank(per_image_matches: List[Dict]) -> List[Dict]:
    print("\nAggregating and ranking...")

    try:
        dest_data = defaultdict(lambda: {
            'visual_scores': [],
            'semantic_scores': [],
            'combined_scores': [],
            'frequency': 0,
            'themes': [],
            'names': [],
            'states': []
        })

        for img_match in per_image_matches:
            for dest in img_match['top_destinations']:
                dest_id = dest['destination_id']
                dest_data[dest_id]['visual_scores'].append(dest['visual_score'])
                dest_data[dest_id]['semantic_scores'].append(dest['semantic_score'])
                dest_data[dest_id]['combined_scores'].append(dest['similarity_score'])
                dest_data[dest_id]['frequency'] += 1
                dest_data[dest_id]['themes'].append(dest.get('theme', 'Unknown'))
                dest_data[dest_id]['names'].append(dest.get('destination_name', 'Unknown'))
                dest_data[dest_id]['states'].append(dest.get('state', 'India'))

        if not dest_data:
            return []

        ranked = []
        max_freq = max(data['frequency'] for data in dest_data.values())

        for dest_id, data in dest_data.items():
            try:
                avg_visual = float(np.mean(data['visual_scores']))
                avg_semantic = float(np.mean(data['semantic_scores']))
                avg_combined = float(np.mean(data['combined_scores']))

                if len(data['combined_scores']) > 1:
                    consistency = 1.0 - min(float(np.std(data['combined_scores'])), 1.0)
                else:
                    consistency = 1.0

                freq_norm = float(data['frequency']) / max_freq

                # Enhanced ranking with semantic scores
                rank_score = (
                    Config.AVG_VISUAL_WEIGHT * avg_visual +
                    Config.AVG_SEMANTIC_WEIGHT * avg_semantic +
                    Config.FREQUENCY_WEIGHT * freq_norm +
                    Config.CONSISTENCY_WEIGHT * consistency
                )

                dest_info = real_data_loader.get_destination_info(dest_id)
                theme_counts = Counter(data['themes'])
                most_common_theme = theme_counts.most_common(1)[0]

                ranked.append({
                    'rank': 0,
                    'destination_id': dest_id,
                    'destination_name': dest_info.get('destination_name', data['names'][0]),
                    'state': dest_info.get('state', data['states'][0]),
                    'similarity_score': round(avg_combined * 100, 2),
                    'visual_score': round(avg_visual * 100, 2),
                    'semantic_score': round(avg_semantic * 100, 2),
                    'appearances': data['frequency'],
                    'final_score': round(rank_score * 100, 2),
                    'theme': most_common_theme[0],
                    'characteristics': [],
                    'geo_location': {'latitude': None, 'longitude': None},
                    'images': [],
                    'image_count': 0,
                    'offers': {'hotels': [], 'activities': []}
                })

            except Exception as e:
                print(f"  Error processing {dest_id}: {str(e)}")
                continue

        ranked.sort(key=lambda x: x['final_score'], reverse=True)

        for i, dest in enumerate(ranked, 1):
            dest['rank'] = i

        print(f"  Ranked {len(ranked)} destinations")
        return ranked[:10]

    except Exception as e:
        print(f"  ERROR: {str(e)}")
        print(traceback.format_exc())
        return []

print("Enhanced aggregate_and_rank defined!")

Enhanced aggregate_and_rank defined!


In [16]:
import base64

def convert_image_to_base64(image_path: str) -> str:
    """Convert image to base64 for frontend display."""
    try:
        if not os.path.exists(image_path):
            return None

        img = Image.open(image_path)
        img.thumbnail((800, 600), Image.Resampling.LANCZOS)

        if img.mode != 'RGB':
            img = img.convert('RGB')

        buffer = io.BytesIO()
        img.save(buffer, format='JPEG', quality=85)
        buffer.seek(0)

        img_base64 = base64.b64encode(buffer.read()).decode('utf-8')
        return f"data:image/jpeg;base64,{img_base64}"

    except Exception as e:
        print(f"  Error converting image: {str(e)}")
        return None


def enrich_with_images(ranked: List[Dict], base_landmarks_path: str, max_images: int = 3) -> List[Dict]:
    """Add actual images to recommendations."""

    print("\nEnriching with images...")

    for dest in ranked:
        dest_id = dest['destination_id']

        # Get destination info from metadata
        if dest_id in real_data_loader.destination_lookup:
            lookup = real_data_loader.destination_lookup[dest_id]

            # Build image folder path
            # Format: landmarks/theme/state/destination_folder/
            theme = lookup['theme'].lower().replace(' ', '_')
            state = lookup['state'].lower().replace(' ', '-')

            # Parse folder name from dest_id
            parts = dest_id.split('_')
            folder = '_'.join(parts[2:]) if len(parts) >= 3 else dest_id

            image_folder = os.path.join(
                base_landmarks_path,
                theme,
                state,
                folder
            )

            # Load images
            images = []
            if os.path.exists(image_folder):
                image_files = sorted([
                    f for f in os.listdir(image_folder)
                    if f.lower().endswith(('.jpg', '.jpeg', '.png'))
                ])[:max_images]

                for img_file in image_files:
                    img_path = os.path.join(image_folder, img_file)
                    base64_img = convert_image_to_base64(img_path)
                    if base64_img:
                        images.append(base64_img)

            dest['images'] = images
            dest['image_count'] = len(images)

            if images:
                print(f"  {dest['destination_name']}: {len(images)} images")

    return ranked

print("Image enrichment functions defined")

Image enrichment functions defined


In [17]:
# CELL 13: FastAPI App

app = FastAPI(title="VLM Pipeline - Dual Score")

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"]
)

@app.get("/")
async def root():
    return HTMLResponse(content="<h1>VLM Pipeline - Dual Score (Visual + Semantic)</h1>")

@app.post("/api/recommend-destinations")
async def recommend(files: List[UploadFile] = File(...)):
    try:
        print("\n" + "="*80)
        print("DUAL-SCORE PIPELINE STARTED")
        print("="*80)

        files_dict = {}
        for file in files:
            files_dict[file.filename] = await file.read()

        step1 = analyze_images(files_dict)
        if step1['status'] == 'error':
            return JSONResponse(status_code=400, content={"status": "error", "error": step1['error']})

        user_embeddings = np.array(step1['embeddings'])
        semantic_profiles = step1['semantic_profiles']

        step2 = match_destinations(user_embeddings, semantic_profiles)

        ranked = aggregate_and_rank(step2['per_image_matches'])

        if not ranked:
            return JSONResponse(status_code=500, content={"status": "error", "error": "No destinations ranked"})

        all_themes = []
        for img_match in step2['per_image_matches']:
            for dest in img_match['top_destinations']:
                all_themes.append(dest.get('theme', 'Unknown'))

        theme_counts = Counter(all_themes)
        dominant_theme = theme_counts.most_common(1)[0][0] if theme_counts else "Unknown"
        theme_confidence = theme_counts.most_common(1)[0][1] / len(all_themes) if all_themes else 0.0
        LANDMARKS_PATH = os.path.join(DATA_PATH, 'landmarks')
        ranked = enrich_with_images(ranked, LANDMARKS_PATH, max_images=3)
        response = {
            "status": "success",
            "pipeline_summary": {
                "images_processed": step1['num_processed'],
                "destinations_matched": step2['total_unique_destinations'],
                "dominant_theme": dominant_theme,
                "theme_confidence": round(theme_confidence * 100, 2),
                "dual_scoring_enabled": True
            },
            "user_profile": {
                "dominant_theme": dominant_theme,
                "theme_confidence": round(theme_confidence * 100, 2)
            },
            "recommendations": ranked
        }

        print("\n" + "="*80)
        print("DUAL-SCORE PIPELINE COMPLETE")
        print("="*80)

        return JSONResponse(content=response)

    except Exception as e:
        print(f"\nERROR: {str(e)}")
        print(traceback.format_exc())
        return JSONResponse(status_code=500, content={"status": "error", "error": str(e)})

print("FastAPI app configured!")

FastAPI app configured!


In [18]:
# CELL 14: Initialize and Start

def initialize_pipeline(data_base_path: str):
    global real_data_loader, image_analyzer

    print("\n" + "="*80)
    print("INITIALIZING DUAL-SCORE PIPELINE")
    print("="*80)

    device = "cuda" if torch.cuda.is_available() else "cpu"
    print(f"Device: {device}")

    model = CLIPModel.from_pretrained(Config.CLIP_MODEL_NAME)
    processor = CLIPProcessor.from_pretrained(Config.CLIP_MODEL_NAME)
    model.to(device)
    model.eval()
    print("CLIP model loaded")

    real_data_loader = RealDataLoader(data_base_path)
    real_data_loader.load_all()

    image_analyzer = ImageAnalyzer(model, processor, device)

    print("="*80)
    print("DUAL-SCORE PIPELINE READY")
    print("="*80)

DATA_PATH = None
def start_server(data_base_path: str, ngrok_token: str):
    global DATA_PATH
    DATA_PATH = data_base_path

    initialize_pipeline(data_base_path)

    ngrok.set_auth_token(ngrok_token)
    ngrok_tunnel = ngrok.connect(8000)
    public_url = ngrok_tunnel.public_url

    print("\n" + "="*80)
    print("SERVER RUNNING")
    print("="*80)
    print(f"URL: {public_url}/api/recommend-destinations")
    print("="*80)

    def run_uvicorn():
        uvicorn.run(app, host="0.0.0.0", port=8000, log_level="info")

    thread = Thread(target=run_uvicorn, daemon=True)
    thread.start()

    try:
        thread.join()
    except KeyboardInterrupt:
        print("\nServer stopped")

print("Functions ready!")

Functions ready!


In [None]:
# CELL 15: START SERVER

DATA_PATH = '/content/drive/MyDrive/visual-intelligence-travel-finance/data'
NGROK_TOKEN = '39iRMARVIrn5qQDVIVVqYonwJz9_7NiXkJR82irsX58m1osCZ'

start_server(DATA_PATH, NGROK_TOKEN)


INITIALIZING DUAL-SCORE PIPELINE
Device: cpu


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]



pytorch_model.bin:   0%|          | 0.00/605M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/605M [00:00<?, ?B/s]

Loading weights:   0%|          | 0/398 [00:00<?, ?it/s]

CLIPModel LOAD REPORT from: openai/clip-vit-base-patch32
Key                                  | Status     |  | 
-------------------------------------+------------+--+-
vision_model.embeddings.position_ids | UNEXPECTED |  | 
text_model.embeddings.position_ids   | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


preprocessor_config.json:   0%|          | 0.00/316 [00:00<?, ?B/s]

The image processor of type `CLIPImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. 


tokenizer_config.json:   0%|          | 0.00/592 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/389 [00:00<?, ?B/s]

CLIP model loaded

Loading VL Encoding Database...
  Loaded 168 embeddings
  Loaded 168 prompts
  Built lookup for 168 destinations
Database loaded

DUAL-SCORE PIPELINE READY


INFO:     Started server process [140]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)



SERVER RUNNING
URL: https://dextrously-nonaddicting-alaya.ngrok-free.dev/api/recommend-destinations

DUAL-SCORE PIPELINE STARTED

Analyzing 5 images...
  Valid: tirupati_balaji_temple_001.jpg
  Valid: tirupati_balaji_temple_005.jpg
  Valid: yumthang_valley_001.jpg
  Valid: yumthang_valley_002.jpg
  Valid: yumthang_valley_003.jpg
  Semantic categories: 11
  Semantic categories: 11
  Semantic categories: 11
  Semantic categories: 11
  Semantic categories: 11

Matching with DUAL SCORING to 168 destinations...
  Image 1: 10 matches
    Top: Ramanathaswamy Temple Rameswaram
      Visual: 0.785, Semantic: 0.381, Combined: 0.624
  Image 2: 10 matches
    Top: Ranganathaswamy Temple Srirangam
      Visual: 0.830, Semantic: 0.301, Combined: 0.618
  Image 3: 10 matches
    Top: Yumthang Valley
      Visual: 0.919, Semantic: 0.124, Combined: 0.601
  Image 4: 10 matches
    Top: Yumthang Valley
      Visual: 0.883, Semantic: 0.125, Combined: 0.580
  Image 5: 10 matches
    Top: Auli
      Visual: