In [None]:
# AlphaEarth Semantic Search System for Coastal Analysis

This notebook demonstrates a complete pipeline for using AlphaEarth embeddings to create a semantic search system for coastal documents. The system consists of three main components:

1. **Offline Indexing Pipeline**: Generate embeddings for coastal documents and store in vector database
2. **Real-Time Query Pipeline**: Process user queries and perform similarity search  
3. **LLM Augmentation**: Synthesize responses using retrieved similar coastal data

## System Architecture Overview

1. **Offline Indexing Pipeline:**
   - For each document in your database, you'll have a (document_id, location, date)
   - Use the location (and optionally, the date) to generate semantic embedding using AlphaEarth model
   - Store this embedding in a vector database, creating a mapping: embedding -> document_id

2. **Real-Time Query Pipeline:**
   - User submits a query, which you geocode to get a query_location
   - Generate a new AlphaEarth embedding for this query_location on the fly
   - Use this embedding to perform similarity search against your index
   - Return the document_ids of the top 'k' most similar coastlines

3. **LLM Augmentation & Response:**
   - Retrieve the full text of those top 'k' documents
   - Pass this context-rich information to your LLM along with the original user query
   - LLM synthesizes a response grounded in data from functionally similar coastlines

## 1. Environment Setup and Dependencies

In [1]:
# Import necessary libraries
import os
import numpy as np
import pandas as pd
import json
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
import seaborn as sns
from typing import List, Dict, Tuple, Optional
import warnings
warnings.filterwarnings('ignore')

# For geocoding and location handling
from geopy.geocoders import Nominatim
from geopy.distance import geodesic

# For vector similarity search (simulating Vertex AI Vector Search)
from sklearn.metrics.pairwise import cosine_similarity
import faiss

# For LLM simulation (we'll use a mock LLM for demonstration)
import random

print("Environment setup complete!")

Environment setup complete!


## 2. Sample Coastal Documents Database

In [2]:
# Create sample coastal documents database
# Each document contains: document_id, location (lat, lon), date, and detailed description

coastal_documents = [
    {
        'document_id': 'doc_001',
        'location': (37.7749, -122.4194),  # San Francisco Bay
        'location_name': 'San Francisco Bay, California',
        'date': '2024-06-15',
        'description': """Rocky coastline with steep cliffs and limited sandy beaches. 
        Heavy urban development with ports and marinas. Strong tidal influences and 
        frequent fog. Vulnerable to sea level rise due to low-lying areas. 
        Complex ecosystem with salt marshes and mudflats.""",
        'characteristics': ['rocky', 'urban', 'tidal', 'cliffs', 'protected_bay']
    },
    {
        'document_id': 'doc_002', 
        'location': (25.7617, -80.1918),  # Miami Beach
        'location_name': 'Miami Beach, Florida',
        'date': '2024-07-20',
        'description': """Sandy barrier island with extensive beach nourishment projects. 
        High-rise development directly on coastline. Vulnerable to hurricanes and storm surge. 
        Coral reefs offshore provide some natural protection. Active erosion management 
        with seawalls and breakwaters.""",
        'characteristics': ['sandy', 'urban', 'barrier_island', 'hurricane_prone', 'engineered']
    },
    {
        'document_id': 'doc_003',
        'location': (64.1466, -21.9426),  # Reykjavik, Iceland  
        'location_name': 'Reykjavik Coast, Iceland',
        'date': '2024-05-10',
        'description': """Volcanic coastline with black sand beaches and basaltic rock formations.
        Extreme tidal range and harsh weather conditions. Limited vegetation due to climate.
        Geothermally active area with unique geological features. Low population density 
        with minimal coastal development.""",
        'characteristics': ['volcanic', 'harsh_climate', 'minimal_development', 'extreme_tides']
    },
    {
        'document_id': 'doc_004',
        'location': (-33.8688, 151.2093),  # Sydney, Australia
        'location_name': 'Sydney Harbour, Australia', 
        'date': '2024-08-03',
        'description': """Large natural harbor with numerous inlets and bays. Sandstone cliffs
        and headlands with pocket beaches. Significant urban development around harbor edges.
        Complex bathymetry with deep channels. Important recreational and commercial harbor
        with ferry systems.""",
        'characteristics': ['natural_harbor', 'sandstone', 'urban', 'deep_water', 'recreational']
    },
    {
        'document_id': 'doc_005',
        'location': (53.3498, -6.2603),  # Dublin Bay, Ireland
        'location_name': 'Dublin Bay, Ireland',
        'date': '2024-04-18',
        'description': """Shallow bay with extensive mudflats and salt marshes. Important
        bird habitat and nature reserves. Moderate tidal range with regular flooding of
        low-lying areas. Mixed development from urban to agricultural. Historical
        land reclamation projects have altered natural coastline.""",
        'characteristics': ['shallow_bay', 'mudflats', 'wildlife_habitat', 'reclaimed_land']
    },
    {
        'document_id': 'doc_006',
        'location': (35.6762, 139.6503),  # Tokyo Bay
        'location_name': 'Tokyo Bay, Japan',
        'date': '2024-09-12',
        'description': """Heavily industrialized bay with extensive port facilities and
        artificial islands. High seawall protection against tsunamis. Dense urban development
        with land reclamation. Complex shipping channels and maritime traffic. Vulnerable
        to earthquakes and associated coastal hazards.""",
        'characteristics': ['industrial', 'artificial_islands', 'seawalls', 'earthquake_prone', 'dense_urban']
    },
    {
        'document_id': 'doc_007',
        'location': (41.9028, 12.4964),  # Rome coastal area
        'location_name': 'Ostia Coast, Italy',
        'date': '2024-07-30',
        'description': """Mediterranean coastline with sandy beaches and dune systems.
        Ancient port ruins and archaeological significance. Tourism-focused development
        with seasonal population fluctuations. Pine forests behind coastal dunes.
        Moderate wave action and stable climate conditions.""",
        'characteristics': ['mediterranean', 'sandy', 'tourism', 'archaeological', 'stable_climate']
    },
    {
        'document_id': 'doc_008',
        'location': (59.9139, 10.7522),  # Oslo Fjord
        'location_name': 'Oslo Fjord, Norway', 
        'date': '2024-06-25',
        'description': """Deep fjord system with steep rocky shores and scattered islands.
        Cold water temperatures and seasonal ice formation. Extensive recreational boating
        and summer cabin culture. Pristine water quality in outer areas. Traditional
        fishing communities on outer islands.""",
        'characteristics': ['fjord', 'cold_water', 'recreational', 'pristine', 'traditional_communities']
    }
]

print(f"Created database with {len(coastal_documents)} coastal documents")
for doc in coastal_documents:
    print(f"- {doc['document_id']}: {doc['location_name']}")

Created database with 8 coastal documents
- doc_001: San Francisco Bay, California
- doc_002: Miami Beach, Florida
- doc_003: Reykjavik Coast, Iceland
- doc_004: Sydney Harbour, Australia
- doc_005: Dublin Bay, Ireland
- doc_006: Tokyo Bay, Japan
- doc_007: Ostia Coast, Italy
- doc_008: Oslo Fjord, Norway


## 3. AlphaEarth Model Simulation and Offline Indexing Pipeline

In [3]:
class AlphaEarthSimulator:
    """
    Simulates the AlphaEarth model for generating semantic embeddings of coastal locations.
    In reality, this would interface with the actual AlphaEarth model API.
    """
    
    def __init__(self, embedding_dim=512):
        self.embedding_dim = embedding_dim
        # In reality, this would load the actual AlphaEarth model
        np.random.seed(42)  # For reproducible results
        
    def generate_embedding(self, location, date=None, characteristics=None):
        """
        Generate a semantic embedding for a coastal location.
        
        Args:
            location: Tuple of (latitude, longitude)
            date: Date string (optional, could be used for temporal analysis)
            characteristics: List of coastal characteristics (optional)
            
        Returns:
            numpy array representing the semantic embedding
        """
        lat, lon = location
        
        # Simulate generating embeddings based on location and characteristics
        # In reality, this would use satellite imagery and the AlphaEarth model
        
        # Create a base embedding influenced by geographic location
        base_embedding = np.random.normal(0, 0.1, self.embedding_dim)
        
        # Add geographic clustering - nearby locations should have similar embeddings
        geo_factor = np.array([
            np.sin(np.radians(lat)) * 0.5,
            np.cos(np.radians(lat)) * 0.5, 
            np.sin(np.radians(lon)) * 0.3,
            np.cos(np.radians(lon)) * 0.3
        ])
        
        # Extend geo_factor to match embedding dimension
        geo_embedding = np.tile(geo_factor, self.embedding_dim // 4 + 1)[:self.embedding_dim]
        
        # Add characteristic-based features
        char_embedding = np.zeros(self.embedding_dim)
        if characteristics:
            # Simulate how characteristics influence the embedding
            char_map = {
                'rocky': 0.8, 'sandy': -0.8, 'urban': 0.6, 'natural': -0.6,
                'tidal': 0.4, 'protected_bay': 0.3, 'hurricane_prone': 0.7,
                'volcanic': 0.9, 'harsh_climate': 0.5, 'industrial': 0.8,
                'mediterranean': -0.3, 'fjord': 0.7, 'barrier_island': 0.4
            }
            
            for i, char in enumerate(characteristics[:10]):  # Limit to first 10 characteristics
                if char in char_map:
                    start_idx = i * (self.embedding_dim // 10)
                    end_idx = (i + 1) * (self.embedding_dim // 10)
                    char_embedding[start_idx:end_idx] += char_map[char]
        
        # Combine all components
        final_embedding = base_embedding + 0.3 * geo_embedding + 0.2 * char_embedding
        
        # Normalize the embedding
        norm = np.linalg.norm(final_embedding)
        if norm > 0:
            final_embedding = final_embedding / norm
            
        return final_embedding

# Initialize the AlphaEarth simulator
alpha_earth = AlphaEarthSimulator()
print("AlphaEarth simulator initialized")

AlphaEarth simulator initialized


In [4]:
class VectorDatabase:
    """
    Simulates a vector database like Vertex AI Vector Search.
    Stores embeddings with document IDs and provides similarity search.
    """
    
    def __init__(self, embedding_dim=512):
        self.embedding_dim = embedding_dim
        self.embeddings = []
        self.document_ids = []
        self.metadata = []
        self.index = None
        
    def add_document(self, document_id, embedding, metadata=None):
        """Add a document embedding to the database."""
        self.document_ids.append(document_id)
        self.embeddings.append(embedding)
        self.metadata.append(metadata or {})
        
    def build_index(self):
        """Build the FAISS index for efficient similarity search."""
        if len(self.embeddings) == 0:
            raise ValueError("No embeddings to index")
            
        # Convert to numpy array
        embeddings_array = np.array(self.embeddings).astype('float32')
        
        # Create FAISS index (using cosine similarity)
        self.index = faiss.IndexFlatIP(self.embedding_dim)  # Inner product for normalized vectors
        self.index.add(embeddings_array)
        
        print(f"Built FAISS index with {len(self.embeddings)} documents")
        
    def search(self, query_embedding, k=5):
        """
        Search for similar documents.
        
        Args:
            query_embedding: Query embedding vector
            k: Number of top results to return
            
        Returns:
            List of (document_id, similarity_score, metadata) tuples
        """
        if self.index is None:
            raise ValueError("Index not built. Call build_index() first.")
            
        # Ensure query embedding is normalized
        query_embedding = query_embedding.astype('float32').reshape(1, -1)
        query_norm = np.linalg.norm(query_embedding)
        if query_norm > 0:
            query_embedding = query_embedding / query_norm
            
        # Search
        similarities, indices = self.index.search(query_embedding, k)
        
        results = []
        for i, (similarity, idx) in enumerate(zip(similarities[0], indices[0])):
            if idx < len(self.document_ids):  # Valid index
                results.append((
                    self.document_ids[idx], 
                    float(similarity), 
                    self.metadata[idx]
                ))
                
        return results

# Initialize vector database
vector_db = VectorDatabase()
print("Vector database initialized")

Vector database initialized


In [5]:
# Offline Indexing Pipeline: Generate embeddings for all coastal documents
print("Starting offline indexing pipeline...")
print("=" * 50)

for doc in coastal_documents:
    print(f"Processing {doc['document_id']}: {doc['location_name']}")
    
    # Generate embedding using AlphaEarth simulator
    embedding = alpha_earth.generate_embedding(
        location=doc['location'],
        date=doc['date'], 
        characteristics=doc['characteristics']
    )
    
    # Store in vector database
    vector_db.add_document(
        document_id=doc['document_id'],
        embedding=embedding,
        metadata={
            'location_name': doc['location_name'],
            'location': doc['location'],
            'date': doc['date'],
            'characteristics': doc['characteristics'],
            'description': doc['description']
        }
    )
    
    print(f"  -> Generated {len(embedding)}-dimensional embedding")

# Build the search index
print("\nBuilding vector search index...")
vector_db.build_index()
print("Offline indexing pipeline complete!")

Starting offline indexing pipeline...
Processing doc_001: San Francisco Bay, California
  -> Generated 512-dimensional embedding
Processing doc_002: Miami Beach, Florida
  -> Generated 512-dimensional embedding
Processing doc_003: Reykjavik Coast, Iceland
  -> Generated 512-dimensional embedding
Processing doc_004: Sydney Harbour, Australia
  -> Generated 512-dimensional embedding
Processing doc_005: Dublin Bay, Ireland
  -> Generated 512-dimensional embedding
Processing doc_006: Tokyo Bay, Japan
  -> Generated 512-dimensional embedding
Processing doc_007: Ostia Coast, Italy
  -> Generated 512-dimensional embedding
Processing doc_008: Oslo Fjord, Norway
  -> Generated 512-dimensional embedding

Building vector search index...
Built FAISS index with 8 documents
Offline indexing pipeline complete!


## 4. Real-Time Query Pipeline

In [6]:
class QueryProcessor:
    """
    Handles user query processing including geocoding and embedding generation.
    """
    
    def __init__(self, alpha_earth_model):
        self.geolocator = Nominatim(user_agent="coastal_analysis_system")
        self.alpha_earth = alpha_earth_model
        
    def geocode_query(self, location_query):
        """
        Convert a location query to coordinates.
        
        Args:
            location_query: String describing a location
            
        Returns:
            Tuple of (latitude, longitude) or None if not found
        """
        try:
            location = self.geolocator.geocode(location_query, timeout=10)
            if location:
                return (location.latitude, location.longitude)
            else:
                print(f"Could not geocode: {location_query}")
                return None
        except Exception as e:
            print(f"Geocoding error for '{location_query}': {e}")
            return None
    
    def infer_coastal_characteristics(self, location_query, coordinates=None):
        """
        Attempt to infer coastal characteristics from the query text.
        In a real system, this might use additional data sources or ML models.
        """
        query_lower = location_query.lower()
        characteristics = []
        
        # Simple keyword-based inference
        if any(word in query_lower for word in ['bay', 'harbor', 'harbour']):
            characteristics.append('protected_bay')
        if any(word in query_lower for word in ['beach', 'sand']):
            characteristics.append('sandy')
        if any(word in query_lower for word in ['city', 'urban', 'port']):
            characteristics.append('urban')
        if any(word in query_lower for word in ['fjord', 'inlet']):
            characteristics.append('fjord')
        if any(word in query_lower for word in ['mediterranean']):
            characteristics.append('mediterranean')
        if any(word in query_lower for word in ['atlantic', 'pacific', 'ocean']):
            characteristics.append('open_ocean')
            
        return characteristics
    
    def process_query(self, user_query):
        """
        Process a complete user query: geocode location and generate embedding.
        
        Args:
            user_query: Natural language query about a coastal location
            
        Returns:
            Dictionary with query results
        """
        print(f"Processing query: '{user_query}'")
        
        # Geocode the query
        coordinates = self.geocode_query(user_query)
        if coordinates is None:
            return {'error': f"Could not geocode location: {user_query}"}
        
        print(f"Geocoded to: {coordinates}")
        
        # Infer characteristics
        characteristics = self.infer_coastal_characteristics(user_query, coordinates)
        print(f"Inferred characteristics: {characteristics}")
        
        # Generate embedding
        embedding = self.alpha_earth.generate_embedding(
            location=coordinates,
            characteristics=characteristics
        )
        
        return {
            'query': user_query,
            'coordinates': coordinates,
            'characteristics': characteristics,
            'embedding': embedding
        }

# Initialize query processor
query_processor = QueryProcessor(alpha_earth)
print("Query processor initialized")

Query processor initialized


### 4.1 Similarity Search System

In [7]:
class CoastalSimilaritySearcher:
    """
    Main class that orchestrates the similarity search for coastal locations.
    """
    
    def __init__(self, vector_db, query_processor):
        self.vector_db = vector_db
        self.query_processor = query_processor
        
    def search_similar_coastlines(self, user_query, k=3):
        """
        Find coastlines similar to the user's query location.
        
        Args:
            user_query: Natural language description of a coastal location
            k: Number of similar coastlines to return
            
        Returns:
            Dictionary with search results and metadata
        """
        print(f"\n🔍 Searching for coastlines similar to: '{user_query}'")
        print("=" * 60)
        
        # Process the query (geocoding + embedding generation)
        query_result = self.query_processor.process_query(user_query)
        
        if 'error' in query_result:
            return query_result
            
        # Perform similarity search
        similar_docs = self.vector_db.search(query_result['embedding'], k=k)
        
        # Format results
        results = {
            'query': query_result['query'],
            'query_coordinates': query_result['coordinates'],
            'query_characteristics': query_result['characteristics'],
            'similar_coastlines': []
        }
        
        print(f"\\nFound {len(similar_docs)} similar coastlines:")
        print("-" * 40)
        
        for i, (doc_id, similarity, metadata) in enumerate(similar_docs, 1):
            coastline_info = {
                'rank': i,
                'document_id': doc_id,
                'location_name': metadata['location_name'],
                'similarity_score': similarity,
                'coordinates': metadata['location'],
                'characteristics': metadata['characteristics'],
                'description': metadata['description'],
                'date': metadata['date']
            }
            
            results['similar_coastlines'].append(coastline_info)
            
            # Calculate distance between query and result
            distance = geodesic(query_result['coordinates'], metadata['location']).kilometers
            
            print(f"{i}. {metadata['location_name']}")
            print(f"   Similarity: {similarity:.3f}")
            print(f"   Distance: {distance:.0f} km")
            print(f"   Characteristics: {', '.join(metadata['characteristics'])}")
            print(f"   Description preview: {metadata['description'][:100]}...")
            print()
            
        return results

# Initialize the similarity searcher
similarity_searcher = CoastalSimilaritySearcher(vector_db, query_processor)
print("Coastal similarity searcher initialized")

Coastal similarity searcher initialized


## 5. LLM Augmentation & Response System

In [9]:
class MockLLM:
    """
    Simulates an LLM (like Gemini) for generating responses based on retrieved context.
    In reality, this would interface with actual LLM APIs.
    """
    
    def __init__(self):
        self.response_templates = {
            'general_analysis': [
                "Based on the analysis of similar coastal environments, {query_location} shares characteristics with {similar_locations}. These areas typically experience {common_characteristics}.",
                "The coastal area you're asking about has functional similarities to {similar_locations}. Key patterns observed in these locations include {analysis_points}.",
                "From the coastal database analysis, {query_location} exhibits similarities to {similar_locations}. This suggests {conclusions}."
            ],
            'risk_assessment': [
                "Given the similarities to {similar_locations}, {query_location} may face comparable coastal risks including {risk_factors}.",
                "Based on functionally similar coastlines like {similar_locations}, potential concerns for {query_location} include {risk_assessment}."
            ],
            'recommendations': [
                "Drawing from successful approaches in similar coastal environments like {similar_locations}, recommendations for {query_location} include {recommendations}.",
                "Best practices observed in comparable coastal areas such as {similar_locations} suggest {management_strategies}."
            ]
        }
    
    def generate_response(self, user_query, search_results, response_type='general_analysis'):
        """
        Generate an LLM response based on the user query and retrieved similar coastlines.
        
        Args:
            user_query: Original user query
            search_results: Results from similarity search
            response_type: Type of response to generate
            
        Returns:
            Generated response string
        """
        if not search_results['similar_coastlines']:
            return "I couldn't find sufficient similar coastal data to provide a comprehensive analysis."
        
        # Extract information from similar coastlines
        similar_locations = [sc['location_name'] for sc in search_results['similar_coastlines'][:3]]
        all_characteristics = []
        for sc in search_results['similar_coastlines']:
            all_characteristics.extend(sc['characteristics'])
        
        # Find most common characteristics
        char_counts = {}
        for char in all_characteristics:
            char_counts[char] = char_counts.get(char, 0) + 1
        common_chars = sorted(char_counts.items(), key=lambda x: x[1], reverse=True)[:5]
        common_characteristics = [char for char, count in common_chars]
        
        # Generate specific analysis points
        analysis_points = self._generate_analysis_points(search_results)
        risk_factors = self._identify_risk_factors(search_results)
        recommendations = self._generate_recommendations(search_results)
        
        # Select response template
        templates = self.response_templates.get(response_type, self.response_templates['general_analysis'])
        template = random.choice(templates)
        
        # Fill in the template
        response = template.format(
            query_location=search_results['query'],
            similar_locations=', '.join(similar_locations),
            common_characteristics=', '.join(common_characteristics),
            analysis_points=analysis_points,
            risk_factors=risk_factors,
            recommendations=recommendations,
            conclusions=self._generate_conclusions(search_results),
            risk_assessment=risk_factors,
            management_strategies=recommendations
        )
        
        # Add specific details from top matches
        response += "\n\nDetailed Analysis:\n"
        for i, sc in enumerate(search_results['similar_coastlines'][:2], 1):
            response += f"\n{i}. {sc['location_name']} (Similarity: {sc['similarity_score']:.3f}):\n"
            response += f"   - {sc['description'][:200]}..."
        
        return response
    
    def _generate_analysis_points(self, search_results):
        """Generate analysis points based on retrieved data."""
        points = []
        
        # Analyze characteristics
        char_analysis = {}
        for sc in search_results['similar_coastlines']:
            for char in sc['characteristics']:
                if char not in char_analysis:
                    char_analysis[char] = []
                char_analysis[char].append(sc['location_name'])
        
        common_chars = {k: v for k, v in char_analysis.items() if len(v) >= 2}
        
        if 'urban' in common_chars:
            points.append("intensive urban development and associated coastal pressures")
        if 'sandy' in common_chars:
            points.append("sandy beach systems requiring active management")
        if 'rocky' in common_chars:
            points.append("rocky shoreline resilience but limited recreational access")
        if 'tidal' in common_chars:
            points.append("significant tidal influences affecting coastal dynamics")
            
        return '; '.join(points[:3]) if points else "diverse coastal characteristics"
    
    def _identify_risk_factors(self, search_results):
        """Identify potential risk factors based on similar locations."""
        risks = []
        
        for sc in search_results['similar_coastlines']:
            if 'hurricane_prone' in sc['characteristics']:
                risks.append("severe weather events and storm surge")
            if 'earthquake_prone' in sc['characteristics']:
                risks.append("seismic activity and associated coastal hazards")
            if 'urban' in sc['characteristics']:
                risks.append("development pressure and ecosystem degradation")
            if 'erosion' in sc['description'].lower():
                risks.append("coastal erosion and sediment management challenges")
        
        # Remove duplicates
        unique_risks = list(set(risks))
        return '; '.join(unique_risks[:3]) if unique_risks else "standard coastal management considerations"
    
    def _generate_recommendations(self, search_results):
        """Generate management recommendations based on similar successful approaches."""
        recommendations = []
        
        for sc in search_results['similar_coastlines']:
            if 'engineered' in sc['characteristics']:
                recommendations.append("engineered coastal protection measures")
            if 'wildlife_habitat' in sc['characteristics']:
                recommendations.append("ecosystem-based management approaches")
            if 'tourism' in sc['characteristics']:
                recommendations.append("sustainable tourism development practices")
            if 'seawalls' in sc['characteristics']:
                recommendations.append("hard infrastructure protection systems")
                
        unique_recs = list(set(recommendations))
        return '; '.join(unique_recs[:3]) if unique_recs else "integrated coastal zone management"
    
    def _generate_conclusions(self, search_results):
        """Generate conclusions based on the analysis."""
        conclusions = [
            "similar functional characteristics require comparable management strategies",
            "lessons learned from these coastal environments can inform local planning",
            "coordinated monitoring and adaptive management approaches are recommended",
            "stakeholder engagement following successful models in similar areas"
        ]
        return random.choice(conclusions)

# Initialize mock LLM
mock_llm = MockLLM()
print("Mock LLM system initialized")

Mock LLM system initialized


In [10]:
class CoastalAnalysisSystem:
    """
    Complete end-to-end system that integrates all components.
    """
    
    def __init__(self, similarity_searcher, llm):
        self.similarity_searcher = similarity_searcher
        self.llm = llm
        
    def analyze_coastal_location(self, user_query, analysis_type='general_analysis', k=3):
        """
        Complete pipeline: query processing, similarity search, and LLM response generation.
        
        Args:
            user_query: Natural language query about a coastal location
            analysis_type: Type of analysis ('general_analysis', 'risk_assessment', 'recommendations')
            k: Number of similar coastlines to retrieve
            
        Returns:
            Dictionary with complete analysis results
        """
        print("🌊 COASTAL ANALYSIS SYSTEM")
        print("=" * 50)
        
        # Step 1: Similarity Search
        search_results = self.similarity_searcher.search_similar_coastlines(user_query, k=k)
        
        if 'error' in search_results:
            return search_results
            
        # Step 2: LLM Response Generation
        print(f"\n🤖 Generating {analysis_type.replace('_', ' ').title()}...")
        print("-" * 30)
        
        llm_response = self.llm.generate_response(user_query, search_results, analysis_type)
        
        # Combine all results
        complete_analysis = {
            'user_query': user_query,
            'search_results': search_results,
            'llm_response': llm_response,
            'analysis_type': analysis_type,
            'timestamp': datetime.now().isoformat()
        }
        
        print("\\n📋 ANALYSIS COMPLETE")
        print("=" * 30)
        print(f"Query: {user_query}")
        print(f"Analysis Type: {analysis_type.replace('_', ' ').title()}")
        print(f"Similar Coastlines Found: {len(search_results['similar_coastlines'])}")
        print(f"\\n🎯 LLM RESPONSE:")
        print("-" * 20)
        print(llm_response)
        
        return complete_analysis

# Initialize the complete system
coastal_system = CoastalAnalysisSystem(similarity_searcher, mock_llm)
print("\\n✅ Complete Coastal Analysis System initialized and ready!")

\n✅ Complete Coastal Analysis System initialized and ready!


## 6. End-to-End Testing and Demonstrations

### Test 1: General Coastal Analysis

In [11]:
# Test 1: Analyze a major harbor city
test_query_1 = "Boston Harbor, Massachusetts"

result_1 = coastal_system.analyze_coastal_location(
    user_query=test_query_1,
    analysis_type='general_analysis',
    k=3
)

🌊 COASTAL ANALYSIS SYSTEM

🔍 Searching for coastlines similar to: 'Boston Harbor, Massachusetts'
Processing query: 'Boston Harbor, Massachusetts'
Geocoded to: (42.3417649, -70.9661596)
Inferred characteristics: ['protected_bay']
\nFound 3 similar coastlines:
----------------------------------------
1. San Francisco Bay, California
   Similarity: 0.429
   Distance: 4352 km
   Characteristics: rocky, urban, tidal, cliffs, protected_bay
   Description preview: Rocky coastline with steep cliffs and limited sandy beaches. 
        Heavy urban development with p...

2. Reykjavik Coast, Iceland
   Similarity: 0.402
   Distance: 3919 km
   Characteristics: volcanic, harsh_climate, minimal_development, extreme_tides
   Description preview: Volcanic coastline with black sand beaches and basaltic rock formations.
        Extreme tidal range...

3. Dublin Bay, Ireland
   Similarity: 0.393
   Distance: 4817 km
   Characteristics: shallow_bay, mudflats, wildlife_habitat, reclaimed_land
   Descriptio

### Test 2: Risk Assessment

In [12]:
# Test 2: Risk assessment for a coastal city
test_query_2 = "Venice Beach, California"

result_2 = coastal_system.analyze_coastal_location(
    user_query=test_query_2,
    analysis_type='risk_assessment',
    k=3
)

🌊 COASTAL ANALYSIS SYSTEM

🔍 Searching for coastlines similar to: 'Venice Beach, California'
Processing query: 'Venice Beach, California'
Geocoded to: (33.9799601, -118.468771)
Inferred characteristics: ['sandy']
\nFound 3 similar coastlines:
----------------------------------------
1. Miami Beach, Florida
   Similarity: 0.413
   Distance: 3785 km
   Characteristics: sandy, urban, barrier_island, hurricane_prone, engineered
   Description preview: Sandy barrier island with extensive beach nourishment projects. 
        High-rise development direc...

2. San Francisco Bay, California
   Similarity: 0.302
   Distance: 552 km
   Characteristics: rocky, urban, tidal, cliffs, protected_bay
   Description preview: Rocky coastline with steep cliffs and limited sandy beaches. 
        Heavy urban development with p...

3. Ostia Coast, Italy
   Similarity: 0.235
   Distance: 10230 km
   Characteristics: mediterranean, sandy, tourism, archaeological, stable_climate
   Description preview: Medite

### Test 3: Management Recommendations

In [14]:
# Test 3: Management recommendations for a fjord system
test_query_3 = "Bergen, Norway"

result_3 = coastal_system.analyze_coastal_location(
    user_query=test_query_3,
    analysis_type='recommendations',
    k=4
)

🌊 COASTAL ANALYSIS SYSTEM

🔍 Searching for coastlines similar to: 'Bergen, Norway'
Processing query: 'Bergen, Norway'
Geocoded to: (60.3943055, 5.3259192)
Inferred characteristics: []
\nFound 4 similar coastlines:
----------------------------------------
1. Dublin Bay, Ireland
   Similarity: 0.478
   Distance: 1053 km
   Characteristics: shallow_bay, mudflats, wildlife_habitat, reclaimed_land
   Description preview: Shallow bay with extensive mudflats and salt marshes. Important
        bird habitat and nature rese...

2. Oslo Fjord, Norway
   Similarity: 0.472
   Distance: 306 km
   Characteristics: fjord, cold_water, recreational, pristine, traditional_communities
   Description preview: Deep fjord system with steep rocky shores and scattered islands.
        Cold water temperatures and...

3. Reykjavik Coast, Iceland
   Similarity: 0.443
   Distance: 1464 km
   Characteristics: volcanic, harsh_climate, minimal_development, extreme_tides
   Description preview: Volcanic coastline wit

## 7. System Summary and Performance Analysis

In [15]:
# System Performance Summary
print("🏁 ALPHAEARTH SEMANTIC SEARCH SYSTEM - PERFORMANCE SUMMARY")
print("=" * 65)

print(f"📊 SYSTEM STATISTICS:")
print(f"  • Database Size: {len(coastal_documents)} coastal documents")
print(f"  • Embedding Dimension: {alpha_earth.embedding_dim}")
print(f"  • Vector Database: FAISS with {vector_db.index.ntotal} indexed vectors")
print(f"  • Geocoding Service: Nominatim (OpenStreetMap)")
print(f"  • LLM: Mock system simulating Gemini/GPT responses")

print(f"\\n🔧 PIPELINE COMPONENTS:")
print(f"  ✅ Offline Indexing Pipeline - Complete")
print(f"  ✅ Real-Time Query Processing - Complete")  
print(f"  ✅ Similarity Search (ANN) - Complete")
print(f"  ✅ LLM Response Generation - Complete")

print(f"\\n📈 TESTED SCENARIOS:")
print(f"  • General coastal analysis")
print(f"  • Risk assessment")
print(f"  • Management recommendations")
print(f"  • Cross-continental similarity matching")

print(f"\\n✨ KEY FEATURES DEMONSTRATED:")
print(f"  • Geographic location-based embedding generation")
print(f"  • Coastal characteristic inference from queries")
print(f"  • Semantic similarity matching across global coastlines")
print(f"  • Context-aware LLM response synthesis")
print(f"  • Multi-modal analysis (general, risk, recommendations)")

print(f"\\n🌊 SYSTEM READY FOR PRODUCTION DEPLOYMENT!")
print("Replace mock components with:")
print("  • Real AlphaEarth model API")
print("  • Vertex AI Vector Search")
print("  • Gemini/GPT API integration")
print("  • Production coastal database")

🏁 ALPHAEARTH SEMANTIC SEARCH SYSTEM - PERFORMANCE SUMMARY
📊 SYSTEM STATISTICS:
  • Database Size: 8 coastal documents
  • Embedding Dimension: 512
  • Vector Database: FAISS with 8 indexed vectors
  • Geocoding Service: Nominatim (OpenStreetMap)
  • LLM: Mock system simulating Gemini/GPT responses
\n🔧 PIPELINE COMPONENTS:
  ✅ Offline Indexing Pipeline - Complete
  ✅ Real-Time Query Processing - Complete
  ✅ Similarity Search (ANN) - Complete
  ✅ LLM Response Generation - Complete
\n📈 TESTED SCENARIOS:
  • General coastal analysis
  • Risk assessment
  • Management recommendations
  • Cross-continental similarity matching
\n✨ KEY FEATURES DEMONSTRATED:
  • Geographic location-based embedding generation
  • Coastal characteristic inference from queries
  • Semantic similarity matching across global coastlines
  • Context-aware LLM response synthesis
  • Multi-modal analysis (general, risk, recommendations)
\n🌊 SYSTEM READY FOR PRODUCTION DEPLOYMENT!
Replace mock components with:
  • Real