# Semantic Search with on2vec

This notebook demonstrates how to use on2vec embeddings for semantic search across ontologies. We'll show how to:

1. Find semantically similar concepts within an ontology
2. Search for concepts across different ontologies
3. Build a semantic search engine using cosine similarity
4. Implement query expansion and concept matching
5. Visualize semantic neighborhoods

## Use Case: Cross-Ontology Concept Discovery
Imagine you're working with multiple biomedical ontologies and need to find related concepts across them. Traditional keyword search fails to capture semantic relationships, but embedding-based search can find conceptually similar terms even when they use different vocabulary.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.manifold import TSNE
import umap
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# on2vec imports
from on2vec import (
    load_embeddings_as_dataframe,
    get_embedding_vector,
    train_ontology_embeddings,
    embed_ontology_with_model
)

plt.style.use('default')
sns.set_palette("husl")

## Step 1: Generate Embeddings for Multiple Ontologies

First, let's create embeddings for different ontologies that we'll use for cross-ontology search.

In [None]:
# Check if we have existing embeddings, otherwise generate them
import os
from pathlib import Path

# Example ontologies for demonstration - using verified working ontologies
ontologies = {
    'edam': 'EDAM.owl',  # Bioinformatics (if available)
    'cvdo': 'cvdo.owl',  # Cardiovascular disease (if available) 
    'duo': 'owl_files/duo.owl',  # Data Use Ontology (45 classes, tested working)
    'cro': 'owl_files/cro.owl',   # Contributor Role Ontology (105 classes, tested working)
    'fao': 'owl_files/fao.owl'    # FAIR* Reviews Ontology (116 classes, tested working)
}

# Filter to only use ontologies that actually exist
available_ontologies = {name: path for name, path in ontologies.items() if os.path.exists(path)}

if not available_ontologies:
    print("❌ No ontology files found. Please ensure at least one ontology is available:")
    for name, path in ontologies.items():
        print(f"  • {name}: {path}")
    available_ontologies = {}

print(f"Found {len(available_ontologies)} available ontologies: {list(available_ontologies.keys())}")

embedding_files = {}
model_files = {}

# Generate embeddings for each available ontology
for name, owl_file in available_ontologies.items():
    model_file = f"{name}_model.pt"
    embedding_file = f"{name}_embeddings.parquet"
    
    print(f"\nProcessing {name} ontology: {owl_file}")
    
    try:
        # Train model if it doesn't exist
        if not os.path.exists(model_file):
            print(f"  Training model for {name}...")
            result = train_ontology_embeddings(
                owl_file=owl_file,
                model_output=model_file,
                model_type="gcn",
                hidden_dim=128,
                out_dim=64,
                epochs=50,  # Reduced for demo
                learning_rate=0.01
            )
            print(f"  ✓ Model trained successfully")
        else:
            print(f"  ✓ Using existing model: {model_file}")
        
        # Generate embeddings if they don't exist
        if not os.path.exists(embedding_file):
            print(f"  Generating embeddings for {name}...")
            embeddings = embed_ontology_with_model(
                model_path=model_file,
                owl_file=owl_file,
                output_file=embedding_file
            )
            print(f"  ✓ Embeddings generated successfully")
        else:
            print(f"  ✓ Using existing embeddings: {embedding_file}")
        
        embedding_files[name] = embedding_file
        model_files[name] = model_file
        print(f"✅ {name} ready for semantic search")
        
    except Exception as e:
        print(f"❌ Failed to process {name}: {str(e)[:100]}...")
        continue

print(f"\n🎯 SEMANTIC SEARCH READY!")
print("=" * 40)
if len(embedding_files) == 0:
    print("❌ No ontologies processed successfully.")
    print("Please check that the ontology files exist and are valid.")
else:
    print(f"✅ Successfully loaded {len(embedding_files)} ontologies:")
    for name in embedding_files.keys():
        print(f"   • {name}")
    print(f"\nReady for cross-ontology semantic search!")

## Step 2: Build Semantic Search Engine

Let's create a semantic search engine that can find similar concepts across ontologies.

In [None]:
class SemanticSearchEngine:
    def __init__(self, embedding_files):
        """Initialize search engine with multiple ontology embeddings."""
        self.ontologies = {}
        self.embeddings = {}
        self.concept_index = {}
        
        # Load all embeddings
        for name, file_path in embedding_files.items():
            print(f"Loading {name} embeddings...")
            df, metadata = load_embeddings_as_dataframe(file_path, return_metadata=True)
            
            # Store embeddings as numpy array
            embeddings_matrix = np.stack(df['embedding'].to_numpy())
            node_ids = df['node_id'].to_numpy()
            
            self.ontologies[name] = {
                'df': df,
                'metadata': metadata,
                'embeddings': embeddings_matrix,
                'node_ids': node_ids,
                'id_to_idx': {node_id: idx for idx, node_id in enumerate(node_ids)}
            }
            
            # Build concept index for text-based lookup
            for idx, node_id in enumerate(node_ids):
                # Extract concept name from IRI
                concept_name = node_id.split('/')[-1].split('#')[-1].replace('_', ' ').lower()
                if concept_name not in self.concept_index:
                    self.concept_index[concept_name] = []
                self.concept_index[concept_name].append({
                    'ontology': name,
                    'node_id': node_id,
                    'idx': idx
                })
            
            print(f"  ✓ Loaded {len(node_ids):,} concepts from {name}")
        
        print(f"\nSearch engine ready with {len(self.concept_index):,} unique concept names!")
    
    def find_concept(self, query):
        """Find concepts matching a text query."""
        query = query.lower()
        matches = []
        
        for concept_name, concept_info in self.concept_index.items():
            if query in concept_name:
                matches.extend(concept_info)
        
        return matches
    
    def get_embedding(self, ontology, node_id):
        """Get embedding vector for a specific concept."""
        if ontology not in self.ontologies:
            return None
        
        ont_data = self.ontologies[ontology]
        if node_id not in ont_data['id_to_idx']:
            return None
        
        idx = ont_data['id_to_idx'][node_id]
        return ont_data['embeddings'][idx]
    
    def semantic_search(self, query_concept, ontology=None, top_k=10, min_similarity=0.5):
        """Find semantically similar concepts across ontologies."""
        # Get query embedding
        if ontology and ontology in self.ontologies:
            query_embedding = self.get_embedding(ontology, query_concept)
            if query_embedding is None:
                return []
        else:
            # Try to find the concept in any ontology
            query_embedding = None
            for ont_name in self.ontologies:
                query_embedding = self.get_embedding(ont_name, query_concept)
                if query_embedding is not None:
                    ontology = ont_name
                    break
            
            if query_embedding is None:
                return []
        
        # Search across all ontologies
        results = []
        
        for ont_name, ont_data in self.ontologies.items():
            # Calculate similarities
            similarities = cosine_similarity(
                query_embedding.reshape(1, -1), 
                ont_data['embeddings']
            )[0]
            
            # Find top similar concepts
            for idx, similarity in enumerate(similarities):
                if similarity >= min_similarity:
                    node_id = ont_data['node_ids'][idx]
                    concept_name = node_id.split('/')[-1].split('#')[-1].replace('_', ' ')
                    
                    results.append({
                        'ontology': ont_name,
                        'node_id': node_id,
                        'concept_name': concept_name,
                        'similarity': float(similarity),
                        'is_query': ont_name == ontology and node_id == query_concept
                    })
        
        # Sort by similarity and return top_k
        results.sort(key=lambda x: x['similarity'], reverse=True)
        return results[:top_k]
    
    def cross_ontology_search(self, query_text, top_k=20):
        """Search for concepts across ontologies using text query."""
        # First find concepts matching the text
        concept_matches = self.find_concept(query_text)
        
        if not concept_matches:
            print(f"No concepts found matching '{query_text}'")
            return []
        
        # Use the first match as query concept
        query_match = concept_matches[0]
        print(f"Using query concept: {query_match['concept_name']} from {query_match['ontology']}")
        
        # Perform semantic search
        return self.semantic_search(
            query_match['node_id'], 
            query_match['ontology'], 
            top_k=top_k
        )

# Initialize search engine
if embedding_files:
    search_engine = SemanticSearchEngine(embedding_files)
else:
    print("No embedding files available for search engine initialization")

## Step 3: Semantic Search Examples

Let's demonstrate the semantic search capabilities with real examples.

In [None]:
# Example 1: Search for protein-related concepts
if 'search_engine' in locals():
    print("=" * 60)
    print("SEMANTIC SEARCH EXAMPLE 1: Protein-related concepts")
    print("=" * 60)
    
    results = search_engine.cross_ontology_search("protein", top_k=15)
    
    if results:
        results_df = pd.DataFrame(results)
        print(f"\nFound {len(results)} similar concepts:")
        print("-" * 80)
        
        for i, result in enumerate(results[:10]):
            marker = "🎯" if result['is_query'] else "🔍"
            print(f"{marker} {i+1:2d}. {result['concept_name']:40} | {result['ontology']:8} | {result['similarity']:.3f}")
        
        # Show distribution by ontology
        print("\nDistribution by ontology:")
        ont_counts = results_df['ontology'].value_counts()
        for ont, count in ont_counts.items():
            print(f"  {ont}: {count} concepts")
else:
    print("Search engine not available - please run previous cells first")

In [None]:
# Example 2: Search for disease-related concepts
if 'search_engine' in locals():
    print("=" * 60)
    print("SEMANTIC SEARCH EXAMPLE 2: Disease-related concepts")
    print("=" * 60)
    
    results = search_engine.cross_ontology_search("disease", top_k=15)
    
    if results:
        print(f"\nFound {len(results)} similar concepts:")
        print("-" * 80)
        
        for i, result in enumerate(results[:10]):
            marker = "🎯" if result['is_query'] else "🔍"
            print(f"{marker} {i+1:2d}. {result['concept_name']:40} | {result['ontology']:8} | {result['similarity']:.3f}")

In [None]:
# Example 3: Search for data/format concepts
if 'search_engine' in locals():
    print("=" * 60)
    print("SEMANTIC SEARCH EXAMPLE 3: Data format concepts")
    print("=" * 60)
    
    results = search_engine.cross_ontology_search("format", top_k=15)
    
    if results:
        print(f"\nFound {len(results)} similar concepts:")
        print("-" * 80)
        
        for i, result in enumerate(results[:10]):
            marker = "🎯" if result['is_query'] else "🔍"
            print(f"{marker} {i+1:2d}. {result['concept_name']:40} | {result['ontology']:8} | {result['similarity']:.3f}")

## Step 4: Visualize Semantic Neighborhoods

Let's create interactive visualizations to explore semantic neighborhoods around concepts.

In [None]:
def visualize_semantic_neighborhood(search_engine, query_text, top_k=50):
    """Create an interactive visualization of semantic neighborhood."""
    results = search_engine.cross_ontology_search(query_text, top_k=top_k)
    
    if len(results) < 5:
        print(f"Not enough results to visualize (found {len(results)})")
        return None
    
    # Collect embeddings for visualization
    embeddings_viz = []
    labels = []
    ontologies = []
    similarities = []
    
    for result in results:
        embedding = search_engine.get_embedding(result['ontology'], result['node_id'])
        if embedding is not None:
            embeddings_viz.append(embedding)
            labels.append(result['concept_name'])
            ontologies.append(result['ontology'])
            similarities.append(result['similarity'])
    
    if len(embeddings_viz) < 5:
        print("Not enough valid embeddings for visualization")
        return None
    
    embeddings_matrix = np.array(embeddings_viz)
    
    # Reduce dimensionality for visualization
    print(f"Reducing dimensionality from {embeddings_matrix.shape[1]}D to 2D...")
    
    # Use UMAP for better preservation of local structure
    reducer = umap.UMAP(n_components=2, random_state=42, min_dist=0.1, n_neighbors=15)
    embeddings_2d = reducer.fit_transform(embeddings_matrix)
    
    # Create interactive plot
    colors = px.colors.qualitative.Set3
    ontology_colors = {ont: colors[i % len(colors)] for i, ont in enumerate(set(ontologies))}
    
    fig = go.Figure()
    
    # Add points for each ontology
    for ont in set(ontologies):
        mask = [o == ont for o in ontologies]
        x_vals = embeddings_2d[mask, 0]
        y_vals = embeddings_2d[mask, 1]
        
        ont_labels = [labels[i] for i, m in enumerate(mask) if m]
        ont_sims = [similarities[i] for i, m in enumerate(mask) if m]
        
        # Size points by similarity
        sizes = [max(8, sim * 20) for sim in ont_sims]
        
        fig.add_trace(go.Scatter(
            x=x_vals,
            y=y_vals,
            mode='markers+text',
            name=f'{ont} ({sum(mask)} concepts)',
            text=ont_labels,
            textposition='top center',
            marker=dict(
                size=sizes,
                color=ontology_colors[ont],
                opacity=0.7,
                line=dict(width=1, color='white')
            ),
            customdata=ont_sims,
            hovertemplate='<b>%{text}</b><br>' +
                         'Ontology: ' + ont + '<br>' +
                         'Similarity: %{customdata:.3f}<br>' +
                         '<extra></extra>'
        ))
    
    # Highlight query concept
    query_idx = 0  # First result is always the query
    fig.add_trace(go.Scatter(
        x=[embeddings_2d[query_idx, 0]],
        y=[embeddings_2d[query_idx, 1]],
        mode='markers',
        name='Query Concept',
        marker=dict(
            size=25,
            color='red',
            symbol='star',
            line=dict(width=2, color='darkred')
        ),
        hovertemplate=f'<b>QUERY: {labels[query_idx]}</b><br>' +
                     f'Ontology: {ontologies[query_idx]}<br>' +
                     '<extra></extra>'
    ))
    
    fig.update_layout(
        title=f'Semantic Neighborhood: "{query_text}"<br><sub>Concepts colored by ontology, sized by similarity</sub>',
        xaxis_title='UMAP Dimension 1',
        yaxis_title='UMAP Dimension 2',
        width=1000,
        height=700,
        hovermode='closest',
        legend=dict(x=1.02, y=1)
    )
    
    return fig

# Create visualizations for different queries
if 'search_engine' in locals():
    queries = ["protein", "disease", "format"]
    
    for query in queries:
        print(f"\nCreating visualization for '{query}'...")
        fig = visualize_semantic_neighborhood(search_engine, query, top_k=30)
        if fig:
            fig.show()
        else:
            print(f"Could not create visualization for '{query}'")

## Step 5: Advanced Search Features

Let's implement more advanced search capabilities like query expansion and multi-concept queries.

In [None]:
class AdvancedSemanticSearch:
    def __init__(self, search_engine):
        self.search_engine = search_engine
    
    def query_expansion(self, initial_query, expansion_threshold=0.8, max_expansions=5):
        """Expand query by finding highly similar concepts."""
        # Get initial results
        initial_results = self.search_engine.cross_ontology_search(initial_query, top_k=20)
        
        if not initial_results:
            return []
        
        # Find concepts with high similarity to expand the query
        expansion_concepts = []
        for result in initial_results[:max_expansions+1]:  # +1 for query itself
            if result['similarity'] >= expansion_threshold:
                expansion_concepts.append(result)
        
        print(f"Expanding query '{initial_query}' with {len(expansion_concepts)-1} similar concepts:")
        for concept in expansion_concepts[1:]:  # Skip query itself
            print(f"  + {concept['concept_name']} (sim: {concept['similarity']:.3f})")
        
        # Collect all results from expanded queries
        all_results = []
        seen_concepts = set()
        
        for concept in expansion_concepts:
            results = self.search_engine.semantic_search(
                concept['node_id'], 
                concept['ontology'], 
                top_k=15,
                min_similarity=0.4
            )
            
            for result in results:
                concept_key = (result['ontology'], result['node_id'])
                if concept_key not in seen_concepts:
                    all_results.append(result)
                    seen_concepts.add(concept_key)
        
        # Sort by similarity and return
        all_results.sort(key=lambda x: x['similarity'], reverse=True)
        return all_results
    
    def multi_concept_search(self, concept_queries, combination_method='average'):
        """Search using multiple concepts combined."""
        concept_embeddings = []
        concept_info = []
        
        # Get embeddings for each query concept
        for query in concept_queries:
            matches = self.search_engine.find_concept(query)
            if matches:
                match = matches[0]
                embedding = self.search_engine.get_embedding(match['ontology'], match['node_id'])
                if embedding is not None:
                    concept_embeddings.append(embedding)
                    concept_info.append(match)
        
        if len(concept_embeddings) < 2:
            print("Need at least 2 valid concepts for multi-concept search")
            return []
        
        # Combine embeddings
        if combination_method == 'average':
            combined_embedding = np.mean(concept_embeddings, axis=0)
        elif combination_method == 'max':
            combined_embedding = np.max(concept_embeddings, axis=0)
        else:
            # Weighted average (equal weights for now)
            weights = np.ones(len(concept_embeddings)) / len(concept_embeddings)
            combined_embedding = np.average(concept_embeddings, axis=0, weights=weights)
        
        print(f"Searching with combined concept: {' + '.join(concept_queries)}")
        print(f"Combination method: {combination_method}")
        
        # Search using combined embedding
        results = []
        for ont_name, ont_data in self.search_engine.ontologies.items():
            similarities = cosine_similarity(
                combined_embedding.reshape(1, -1),
                ont_data['embeddings']
            )[0]
            
            for idx, similarity in enumerate(similarities):
                if similarity >= 0.3:  # Lower threshold for combined queries
                    node_id = ont_data['node_ids'][idx]
                    concept_name = node_id.split('/')[-1].split('#')[-1].replace('_', ' ')
                    
                    results.append({
                        'ontology': ont_name,
                        'node_id': node_id,
                        'concept_name': concept_name,
                        'similarity': float(similarity),
                        'is_multi_concept': True
                    })
        
        results.sort(key=lambda x: x['similarity'], reverse=True)
        return results[:20]
    
    def semantic_clustering(self, concept_list, n_clusters=5):
        """Group concepts into semantic clusters."""
        from sklearn.cluster import KMeans
        
        embeddings = []
        valid_concepts = []
        
        for concept in concept_list:
            matches = self.search_engine.find_concept(concept)
            if matches:
                match = matches[0]
                embedding = self.search_engine.get_embedding(match['ontology'], match['node_id'])
                if embedding is not None:
                    embeddings.append(embedding)
                    valid_concepts.append({
                        'query': concept,
                        'concept_name': match['node_id'].split('/')[-1].split('#')[-1].replace('_', ' '),
                        'ontology': match['ontology']
                    })
        
        if len(embeddings) < n_clusters:
            print(f"Need at least {n_clusters} concepts for clustering")
            return {}
        
        # Perform clustering
        kmeans = KMeans(n_clusters=n_clusters, random_state=42)
        clusters = kmeans.fit_predict(embeddings)
        
        # Group concepts by cluster
        clustered_concepts = {}
        for i, cluster_id in enumerate(clusters):
            if cluster_id not in clustered_concepts:
                clustered_concepts[cluster_id] = []
            clustered_concepts[cluster_id].append(valid_concepts[i])
        
        return clustered_concepts

# Initialize advanced search
if 'search_engine' in locals():
    advanced_search = AdvancedSemanticSearch(search_engine)
    print("Advanced semantic search features ready!")

In [None]:
# Example: Query Expansion
if 'advanced_search' in locals():
    print("=" * 60)
    print("QUERY EXPANSION EXAMPLE")
    print("=" * 60)
    
    expanded_results = advanced_search.query_expansion("protein", expansion_threshold=0.7)
    
    if expanded_results:
        print(f"\nExpanded search results ({len(expanded_results)} concepts):")
        print("-" * 80)
        
        for i, result in enumerate(expanded_results[:15]):
            print(f"{i+1:2d}. {result['concept_name']:40} | {result['ontology']:8} | {result['similarity']:.3f}")

In [None]:
# Example: Multi-concept Search
if 'advanced_search' in locals():
    print("=" * 60)
    print("MULTI-CONCEPT SEARCH EXAMPLE")
    print("=" * 60)
    
    # Search for concepts that combine multiple ideas
    multi_results = advanced_search.multi_concept_search(["protein", "structure", "data"])
    
    if multi_results:
        print(f"\nMulti-concept search results ({len(multi_results)} concepts):")
        print("-" * 80)
        
        for i, result in enumerate(multi_results[:15]):
            print(f"{i+1:2d}. {result['concept_name']:40} | {result['ontology']:8} | {result['similarity']:.3f}")

## Step 6: Performance Analysis

Let's analyze the performance and coverage of our semantic search system.

In [None]:
def analyze_search_performance(search_engine):
    """Analyze the performance and coverage of the search engine."""
    print("=" * 60)
    print("SEMANTIC SEARCH PERFORMANCE ANALYSIS")
    print("=" * 60)
    
    # Coverage statistics
    total_concepts = sum(len(ont_data['node_ids']) for ont_data in search_engine.ontologies.values())
    unique_concept_names = len(search_engine.concept_index)
    
    print(f"\n📊 Coverage Statistics:")
    print(f"  • Total concepts across all ontologies: {total_concepts:,}")
    print(f"  • Unique concept names (searchable): {unique_concept_names:,}")
    print(f"  • Coverage ratio: {unique_concept_names/total_concepts:.1%}")
    
    # Per-ontology statistics
    print(f"\n🗂️ Per-ontology breakdown:")
    for ont_name, ont_data in search_engine.ontologies.items():
        n_concepts = len(ont_data['node_ids'])
        embedding_dim = ont_data['embeddings'].shape[1]
        print(f"  • {ont_name:10}: {n_concepts:,} concepts, {embedding_dim}D embeddings")
    
    # Test search performance with sample queries
    test_queries = ["protein", "disease", "cell", "gene", "data", "format", "structure"]
    
    print(f"\n🔍 Search performance test:")
    performance_stats = []
    
    import time
    
    for query in test_queries:
        start_time = time.time()
        results = search_engine.cross_ontology_search(query, top_k=20)
        search_time = time.time() - start_time
        
        n_results = len(results)
        n_ontologies = len(set(r['ontology'] for r in results)) if results else 0
        avg_similarity = np.mean([r['similarity'] for r in results]) if results else 0
        
        performance_stats.append({
            'query': query,
            'n_results': n_results,
            'n_ontologies': n_ontologies,
            'avg_similarity': avg_similarity,
            'search_time_ms': search_time * 1000
        })
        
        print(f"  • '{query:8}': {n_results:2d} results, {n_ontologies} ontologies, "
              f"avg_sim={avg_similarity:.3f}, {search_time*1000:.1f}ms")
    
    # Create performance visualization
    perf_df = pd.DataFrame(performance_stats)
    
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 10))
    
    # Number of results by query
    ax1.bar(perf_df['query'], perf_df['n_results'], color='skyblue')
    ax1.set_title('Number of Results by Query')
    ax1.set_ylabel('Number of Results')
    ax1.tick_params(axis='x', rotation=45)
    
    # Average similarity by query
    ax2.bar(perf_df['query'], perf_df['avg_similarity'], color='lightgreen')
    ax2.set_title('Average Similarity by Query')
    ax2.set_ylabel('Average Similarity')
    ax2.tick_params(axis='x', rotation=45)
    
    # Search time by query
    ax3.bar(perf_df['query'], perf_df['search_time_ms'], color='coral')
    ax3.set_title('Search Time by Query (ms)')
    ax3.set_ylabel('Time (milliseconds)')
    ax3.tick_params(axis='x', rotation=45)
    
    # Ontology coverage by query
    ax4.bar(perf_df['query'], perf_df['n_ontologies'], color='gold')
    ax4.set_title('Ontologies Covered by Query')
    ax4.set_ylabel('Number of Ontologies')
    ax4.tick_params(axis='x', rotation=45)
    
    plt.tight_layout()
    plt.show()
    
    return performance_stats

# Run performance analysis
if 'search_engine' in locals():
    perf_stats = analyze_search_performance(search_engine)
else:
    print("Search engine not available for performance analysis")

## Conclusion

This notebook demonstrated how on2vec embeddings enable powerful semantic search across ontologies:

### ✅ Key Capabilities Demonstrated:

1. **Cross-Ontology Search**: Find related concepts across different domain ontologies
2. **Semantic Similarity**: Discover conceptually similar terms even with different vocabulary
3. **Interactive Visualization**: Explore semantic neighborhoods in 2D space
4. **Query Expansion**: Automatically expand searches with similar concepts
5. **Multi-Concept Queries**: Combine multiple concepts for complex searches
6. **Performance Analysis**: Measure search quality and speed

### 🚀 Real-World Applications:

- **Literature Search**: Find relevant papers across domains using semantic concept matching
- **Data Discovery**: Locate datasets and resources using conceptual similarity
- **Knowledge Integration**: Bridge different domain vocabularies automatically
- **Recommendation Systems**: Suggest related concepts, tools, or methods
- **Ontology Harmonization**: Identify equivalent concepts across ontologies

### 🔄 Next Steps:

1. **Scale to Larger Ontologies**: Test with full-size ontologies (GO, ChEBI, etc.)
2. **Domain-Specific Tuning**: Fine-tune embeddings for specific application domains
3. **Multi-Modal Search**: Combine text, structure, and metadata for richer search
4. **Real-Time Applications**: Deploy as web service with caching and optimization
5. **User Interface**: Build interactive web interface for non-technical users

The semantic search capabilities shown here demonstrate the practical value of on2vec embeddings for real-world knowledge discovery and integration tasks.