# 01 - Stage 1: Bi-Encoder Retrieval System

## Overview
This notebook implements the first stage of our multi-stage resume screening pipeline:
- **Stage 1: Fast Retrieval using Bi-Encoders**
- Encode job descriptions and resumes independently into dense vectors
- Build FAISS index for efficient similarity search
- Retrieve top-K candidates (typically K=100) for re-ranking

**Key Advantages**:
- ‚ö° Fast: Can search through millions of resumes in milliseconds
- üì¶ Scalable: Vectors computed once, stored, and reused
- üéØ Good recall: Captures semantic similarity effectively

**Runtime**: CPU sufficient (GPU 10x faster for encoding)

**Estimated Time**: 10-20 minutes (depends on dataset size)

## 1. Environment Setup

In [None]:
# Check environment
import sys
import os

IN_COLAB = 'google.colab' in sys.modules
IN_KAGGLE = 'KAGGLE_KERNEL_RUN_TYPE' in os.environ

print(f"Running in Google Colab: {IN_COLAB}")
print(f"Running in Kaggle: {IN_KAGGLE}")

# Check for GPU
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"\nDevice: {device}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB")

## 2. Install Required Packages

In [None]:
%%capture
# Install sentence-transformers and FAISS
!pip install -U sentence-transformers
!pip install faiss-cpu  # Use faiss-gpu if CUDA is available
# !pip install faiss-gpu  # Uncomment for GPU version

# Visualization
!pip install umap-learn plotly

# Utilities
!pip install pandas numpy scikit-learn tqdm

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import json
import pickle
from pathlib import Path
from typing import List, Dict, Tuple
import time
import warnings

# Sentence Transformers
from sentence_transformers import SentenceTransformer

# FAISS
import faiss

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from sklearn.manifold import TSNE

# Progress bar
from tqdm.auto import tqdm

warnings.filterwarnings('ignore')
sns.set_style('whitegrid')

print(f"‚úÖ sentence-transformers version: {sentence_transformers.__version__}")
print(f"‚úÖ FAISS version: {faiss.__version__}")
print(f"‚úÖ All libraries imported successfully")

## 3. Load Configuration and Data

In [None]:
# Load session configuration from previous notebook
try:
    if IN_COLAB:
        from google.colab import drive
        drive.mount('/content/drive')
        BASE_PATH = Path('/content/drive/MyDrive/resume_screening_project')
    elif IN_KAGGLE:
        BASE_PATH = Path('/kaggle/working/resume_screening_project')
    else:
        BASE_PATH = Path('./resume_screening_project')
    
    session_config_path = BASE_PATH / 'session_config.json'
    if session_config_path.exists():
        with open(session_config_path, 'r') as f:
            config = json.load(f)
        print("‚úÖ Loaded session configuration")
    else:
        print("‚ö†Ô∏è Session config not found, using default paths")
        config = {}
except Exception as e:
    print(f"‚ö†Ô∏è Could not load session config: {e}")
    BASE_PATH = Path('./resume_screening_project')
    config = {}

# Setup paths
DATA_PATH = BASE_PATH / 'data'
PROCESSED_PATH = DATA_PATH / 'processed'
MODELS_PATH = BASE_PATH / 'models'
OUTPUTS_PATH = BASE_PATH / 'outputs'
STAGE1_PATH = MODELS_PATH / 'stage1_retriever'

STAGE1_PATH.mkdir(parents=True, exist_ok=True)

print(f"\nüìÅ Working Directory: {BASE_PATH}")

In [None]:
# Load preprocessed data
print("Loading preprocessed datasets...")

df1_path = PROCESSED_PATH / 'resume_scores_anonymized.parquet'
df2_path = PROCESSED_PATH / 'jd_resume_match_anonymized.parquet'

if df1_path.exists():
    df_resumes = pd.read_parquet(df1_path)
    print(f"‚úÖ Loaded resume scores: {len(df_resumes)} records")
else:
    print("‚ö†Ô∏è Resume scores not found, creating sample data")
    df_resumes = pd.DataFrame({
        'resume_text': [f'Sample resume {i} with skills in Python, ML, and data science' for i in range(1000)],
        'score': np.random.randint(60, 100, 1000)
    })

if df2_path.exists():
    df_jd_match = pd.read_parquet(df2_path)
    print(f"‚úÖ Loaded JD-Resume pairs: {len(df_jd_match)} records")
else:
    print("‚ö†Ô∏è JD-Resume pairs not found, creating sample data")
    df_jd_match = pd.DataFrame({
        'job_description': [f'Job {i} requires Python, machine learning' for i in range(100)],
        'resume': [f'Candidate {i} with Python experience' for i in range(100)],
        'match_score': np.random.uniform(0, 1, 100)
    })

print(f"\nDatasets loaded:")
print(f"  - Resumes: {df_resumes.shape}")
print(f"  - JD-Resume pairs: {df_jd_match.shape}")

## 4. Load Bi-Encoder Model

We use `all-MiniLM-L6-v2` from sentence-transformers:
- **Size**: 80MB (very lightweight)
- **Speed**: ~14,000 sentences/sec on CPU
- **Dimensions**: 384
- **Performance**: Excellent for semantic search tasks

Alternative models:
- `all-mpnet-base-v2` (higher quality, slower)
- `paraphrase-multilingual-MiniLM-L12-v2` (multilingual)

In [None]:
# Load pre-trained model
MODEL_NAME = 'all-MiniLM-L6-v2'

print(f"Loading model: {MODEL_NAME}...")
model = SentenceTransformer(MODEL_NAME)

# Move to GPU if available
if torch.cuda.is_available():
    model = model.to('cuda')
    print("‚úÖ Model moved to GPU")

print(f"‚úÖ Model loaded")
print(f"   - Embedding dimension: {model.get_sentence_embedding_dimension()}")
print(f"   - Max sequence length: {model.max_seq_length}")

## 5. Create Embeddings

### Research Note:
Bi-encoders compute representations independently for queries and documents.
This allows pre-computing and caching all document embeddings, making retrieval
extremely fast. The trade-off is less nuanced interaction modeling compared to
cross-encoders (addressed in Stage 2).

In [None]:
# Identify text columns
resume_col = None
for col in df_resumes.columns:
    if 'resume' in col.lower() or 'text' in col.lower():
        resume_col = col
        break

if resume_col is None:
    resume_col = df_resumes.columns[0]
    print(f"‚ö†Ô∏è No obvious text column found, using: {resume_col}")
else:
    print(f"‚úÖ Using resume text column: {resume_col}")

# Prepare resume texts
resume_texts = df_resumes[resume_col].astype(str).tolist()
print(f"\nPreparing to encode {len(resume_texts)} resumes...")

In [None]:
# Check if embeddings already exist
embeddings_path = STAGE1_PATH / 'resume_embeddings.npy'
metadata_path = STAGE1_PATH / 'embeddings_metadata.json'

if embeddings_path.exists():
    print("Found existing embeddings. Loading...")
    resume_embeddings = np.load(embeddings_path)
    with open(metadata_path, 'r') as f:
        embed_metadata = json.load(f)
    print(f"‚úÖ Loaded embeddings: {resume_embeddings.shape}")
    print(f"   Created: {embed_metadata.get('creation_date', 'unknown')}")
else:
    print("Creating new embeddings...")
    
    # Encode with progress bar
    start_time = time.time()
    
    resume_embeddings = model.encode(
        resume_texts,
        batch_size=32,
        show_progress_bar=True,
        convert_to_numpy=True,
        normalize_embeddings=True  # L2 normalization for cosine similarity
    )
    
    encoding_time = time.time() - start_time
    
    print(f"\n‚úÖ Encoding complete!")
    print(f"   Shape: {resume_embeddings.shape}")
    print(f"   Time: {encoding_time:.2f}s")
    print(f"   Speed: {len(resume_texts) / encoding_time:.0f} resumes/sec")
    
    # Save embeddings
    np.save(embeddings_path, resume_embeddings)
    
    embed_metadata = {
        'model': MODEL_NAME,
        'num_documents': len(resume_texts),
        'embedding_dim': resume_embeddings.shape[1],
        'creation_date': pd.Timestamp.now().isoformat(),
        'encoding_time_seconds': encoding_time,
    }
    
    with open(metadata_path, 'w') as f:
        json.dump(embed_metadata, f, indent=2)
    
    print(f"\nüíæ Saved to: {embeddings_path}")

## 6. Build FAISS Index

### FAISS Index Types:
- **IndexFlatIP**: Exact search using inner product (best for small datasets < 1M)
- **IndexIVFFlat**: Inverted file index (good balance, 10-100M documents)
- **IndexHNSW**: Hierarchical NSW graph (fastest, best for > 100M)

We'll use **IndexFlatIP** for exact search since our dataset is relatively small.
For production with millions of resumes, switch to IndexIVFFlat or IndexHNSW.

In [None]:
# Check if index already exists
index_path = STAGE1_PATH / 'faiss_index.bin'

if index_path.exists():
    print("Loading existing FAISS index...")
    index = faiss.read_index(str(index_path))
    print(f"‚úÖ Index loaded: {index.ntotal} vectors")
else:
    print("Building FAISS index...")
    
    # Get embedding dimension
    d = resume_embeddings.shape[1]
    
    # Create index (using inner product for normalized vectors = cosine similarity)
    index = faiss.IndexFlatIP(d)
    
    # Add vectors
    index.add(resume_embeddings.astype('float32'))
    
    print(f"‚úÖ Index built: {index.ntotal} vectors")
    
    # Save index
    faiss.write_index(index, str(index_path))
    print(f"üíæ Saved to: {index_path}")

print(f"\nIndex statistics:")
print(f"  - Type: {type(index).__name__}")
print(f"  - Dimension: {index.d}")
print(f"  - Total vectors: {index.ntotal}")
print(f"  - Is trained: {index.is_trained}")

## 7. Implement Retrieval Functions

In [None]:
class BiEncoderRetriever:
    """Stage 1 retrieval using bi-encoder and FAISS."""
    
    def __init__(self, model, index, resume_data, resume_texts):
        self.model = model
        self.index = index
        self.resume_data = resume_data
        self.resume_texts = resume_texts
    
    def retrieve(self, query: str, top_k: int = 100) -> List[Dict]:
        """
        Retrieve top-k most similar resumes for a job description.
        
        Args:
            query: Job description text
            top_k: Number of candidates to retrieve
        
        Returns:
            List of dicts with resume info and similarity scores
        """
        # Encode query
        query_embedding = self.model.encode(
            [query], 
            convert_to_numpy=True,
            normalize_embeddings=True
        )
        
        # Search
        scores, indices = self.index.search(query_embedding.astype('float32'), top_k)
        
        # Format results
        results = []
        for score, idx in zip(scores[0], indices[0]):
            if idx < len(self.resume_data):
                results.append({
                    'index': int(idx),
                    'score': float(score),
                    'resume_text': self.resume_texts[idx],
                    'resume_data': self.resume_data.iloc[idx].to_dict()
                })
        
        return results
    
    def batch_retrieve(self, queries: List[str], top_k: int = 100) -> List[List[Dict]]:
        """
        Retrieve for multiple queries in batch.
        """
        # Encode all queries
        query_embeddings = self.model.encode(
            queries,
            batch_size=32,
            convert_to_numpy=True,
            normalize_embeddings=True,
            show_progress_bar=True
        )
        
        # Batch search
        scores, indices = self.index.search(query_embeddings.astype('float32'), top_k)
        
        # Format results
        all_results = []
        for query_scores, query_indices in zip(scores, indices):
            results = []
            for score, idx in zip(query_scores, query_indices):
                if idx < len(self.resume_data):
                    results.append({
                        'index': int(idx),
                        'score': float(score),
                        'resume_text': self.resume_texts[idx],
                    })
            all_results.append(results)
        
        return all_results

# Initialize retriever
retriever = BiEncoderRetriever(model, index, df_resumes, resume_texts)
print("‚úÖ Retriever initialized")

## 8. Test Retrieval with Sample Queries

In [None]:
# Sample job descriptions
sample_jds = [
    """
    Senior Machine Learning Engineer
    
    We are seeking an experienced ML engineer with strong Python skills,
    deep learning expertise (PyTorch/TensorFlow), and production deployment experience.
    Must have 5+ years experience building and deploying ML models at scale.
    Experience with transformers, NLP, and cloud platforms (AWS/GCP) required.
    """,
    
    """
    Full Stack Developer
    
    Looking for a full-stack developer proficient in React, Node.js, and databases.
    Should have experience with RESTful APIs, microservices architecture, and DevOps.
    Knowledge of Docker, Kubernetes, and CI/CD pipelines is a plus.
    3+ years of professional development experience required.
    """,
    
    """
    Data Scientist - Healthcare Analytics
    
    Join our healthcare analytics team to build predictive models for patient outcomes.
    Strong statistical background, experience with R/Python, and familiarity with
    healthcare data (HIPAA compliance) required. PhD in Statistics, Biostatistics,
    or related field preferred. Experience with causal inference and A/B testing.
    """
]

print("Testing retrieval with sample job descriptions...\n")

In [None]:
# Test single query
test_jd = sample_jds[0]
print("Query:")
print(test_jd[:200] + "...")
print("\n" + "="*80)

# Retrieve top 10 for display
start_time = time.time()
results = retriever.retrieve(test_jd, top_k=10)
query_time = time.time() - start_time

print(f"\nRetrieval time: {query_time*1000:.2f}ms")
print(f"\nTop 10 Results:\n")

for i, result in enumerate(results, 1):
    print(f"{i}. Score: {result['score']:.4f}")
    print(f"   Resume preview: {result['resume_text'][:150]}...")
    print()

## 9. Performance Benchmarks

In [None]:
# Benchmark different retrieval sizes
print("Running performance benchmarks...\n")

k_values = [10, 50, 100, 200, 500]
benchmark_results = []

for k in k_values:
    times = []
    for _ in range(10):  # 10 runs per k
        start = time.time()
        _ = retriever.retrieve(sample_jds[0], top_k=k)
        times.append(time.time() - start)
    
    avg_time = np.mean(times) * 1000  # Convert to ms
    std_time = np.std(times) * 1000
    
    benchmark_results.append({
        'k': k,
        'avg_time_ms': avg_time,
        'std_time_ms': std_time
    })
    
    print(f"k={k:4d}: {avg_time:6.2f}ms ¬± {std_time:5.2f}ms")

df_benchmark = pd.DataFrame(benchmark_results)

In [None]:
# Visualize benchmark results
fig, ax = plt.subplots(figsize=(10, 6))

ax.plot(df_benchmark['k'], df_benchmark['avg_time_ms'], marker='o', linewidth=2, markersize=8)
ax.fill_between(
    df_benchmark['k'],
    df_benchmark['avg_time_ms'] - df_benchmark['std_time_ms'],
    df_benchmark['avg_time_ms'] + df_benchmark['std_time_ms'],
    alpha=0.3
)

ax.set_xlabel('Top-K', fontsize=12)
ax.set_ylabel('Query Time (ms)', fontsize=12)
ax.set_title('FAISS Retrieval Performance vs. K', fontsize=14, fontweight='bold')
ax.grid(True, alpha=0.3)

plt.savefig(OUTPUTS_PATH / 'stage1_benchmark.png', dpi=150, bbox_inches='tight')
plt.show()

print(f"\nüìä Benchmark plot saved to: {OUTPUTS_PATH / 'stage1_benchmark.png'}")

In [None]:
# Scalability analysis
print("\n" + "="*80)
print("SCALABILITY ANALYSIS")
print("="*80)

current_size = len(resume_texts)
queries_per_second = 1000 / df_benchmark[df_benchmark['k'] == 100]['avg_time_ms'].values[0]

print(f"\nCurrent dataset: {current_size:,} resumes")
print(f"Retrieval speed (k=100): {queries_per_second:.1f} queries/second")
print(f"\nProjected performance at scale:")

for scale in [10_000, 100_000, 1_000_000, 10_000_000]:
    # Approximate scaling (linear for flat index)
    scale_factor = scale / current_size if current_size > 0 else 1
    estimated_time = df_benchmark[df_benchmark['k'] == 100]['avg_time_ms'].values[0] * scale_factor
    estimated_qps = 1000 / estimated_time
    
    print(f"  {scale:>10,} resumes: {estimated_time:7.2f}ms/query ({estimated_qps:6.1f} QPS)")

print("\nüí° Note: For > 1M resumes, consider IndexIVFFlat or IndexHNSW for better scaling")

## 10. Visualize Embeddings (UMAP/t-SNE)

Visualize the embedding space to understand how resumes cluster.

In [None]:
# Sample embeddings for visualization (too many points slow down plotting)
n_visualize = min(1000, len(resume_embeddings))
sample_indices = np.random.choice(len(resume_embeddings), n_visualize, replace=False)
sample_embeddings = resume_embeddings[sample_indices]

print(f"Visualizing {n_visualize} embeddings...")

In [None]:
# UMAP dimensionality reduction
try:
    import umap
    
    print("Running UMAP (this may take a minute)...")
    reducer = umap.UMAP(n_components=2, random_state=42, n_neighbors=15, min_dist=0.1)
    embedding_2d = reducer.fit_transform(sample_embeddings)
    
    # Create interactive plot
    fig = px.scatter(
        x=embedding_2d[:, 0],
        y=embedding_2d[:, 1],
        title='Resume Embeddings (UMAP Projection)',
        labels={'x': 'UMAP 1', 'y': 'UMAP 2'},
        opacity=0.6
    )
    
    fig.update_layout(
        width=900,
        height=700,
        template='plotly_white'
    )
    
    fig.write_html(OUTPUTS_PATH / 'embeddings_umap.html')
    fig.show()
    
    print(f"‚úÖ UMAP visualization saved to: {OUTPUTS_PATH / 'embeddings_umap.html'}")
    
except ImportError:
    print("‚ö†Ô∏è UMAP not available, skipping visualization")
    print("   Install with: pip install umap-learn")

In [None]:
# Alternative: t-SNE visualization
print("\nRunning t-SNE (alternative visualization)...")

tsne = TSNE(n_components=2, random_state=42, perplexity=30)
embedding_2d_tsne = tsne.fit_transform(sample_embeddings)

fig, ax = plt.subplots(figsize=(12, 10))
scatter = ax.scatter(
    embedding_2d_tsne[:, 0],
    embedding_2d_tsne[:, 1],
    alpha=0.5,
    s=20,
    c=range(len(embedding_2d_tsne)),
    cmap='viridis'
)

ax.set_xlabel('t-SNE 1', fontsize=12)
ax.set_ylabel('t-SNE 2', fontsize=12)
ax.set_title('Resume Embeddings (t-SNE Projection)', fontsize=14, fontweight='bold')
plt.colorbar(scatter, ax=ax, label='Resume Index')

plt.savefig(OUTPUTS_PATH / 'embeddings_tsne.png', dpi=150, bbox_inches='tight')
plt.show()

print(f"‚úÖ t-SNE visualization saved to: {OUTPUTS_PATH / 'embeddings_tsne.png'}")

## 11. Batch Processing and Caching

In [None]:
# Process all sample JDs in batch
print("Running batch retrieval for all sample JDs...\n")

batch_results = retriever.batch_retrieve(sample_jds, top_k=100)

print(f"‚úÖ Batch retrieval complete")
print(f"   Processed {len(sample_jds)} job descriptions")
print(f"   Retrieved {len(batch_results[0])} candidates per JD")

# Cache results for Stage 2
cache_data = {
    'job_descriptions': sample_jds,
    'retrieval_results': batch_results,
    'model': MODEL_NAME,
    'top_k': 100,
    'timestamp': pd.Timestamp.now().isoformat()
}

cache_path = STAGE1_PATH / 'retrieval_cache.pkl'
with open(cache_path, 'wb') as f:
    pickle.dump(cache_data, f)

print(f"\nüíæ Results cached to: {cache_path}")

## 12. Evaluation Metrics (Recall@K)

In [None]:
# If we have ground truth labels, calculate recall
# For demonstration, we'll use the match_score from df_jd_match as pseudo ground truth

if 'match_score' in df_jd_match.columns and len(df_jd_match) > 0:
    print("Calculating Recall@K metrics...\n")
    
    # Take first few examples for evaluation
    eval_samples = min(20, len(df_jd_match))
    
    # Identify JD and Resume columns
    jd_col = [col for col in df_jd_match.columns if 'job' in col.lower() or 'jd' in col.lower()][0]
    resume_col_match = [col for col in df_jd_match.columns if 'resume' in col.lower()][0]
    
    recalls = {k: [] for k in [10, 50, 100]}
    
    for idx in range(eval_samples):
        jd = str(df_jd_match.iloc[idx][jd_col])
        true_resume = str(df_jd_match.iloc[idx][resume_col_match])
        
        # Retrieve candidates
        for k in [10, 50, 100]:
            results = retriever.retrieve(jd, top_k=k)
            retrieved_texts = [r['resume_text'] for r in results]
            
            # Check if true resume in top-k (simple text matching)
            found = any(true_resume[:100] in text[:100] for text in retrieved_texts)
            recalls[k].append(1 if found else 0)
    
    print("Recall@K Results:")
    for k in [10, 50, 100]:
        recall = np.mean(recalls[k]) * 100
        print(f"  Recall@{k:3d}: {recall:5.2f}%")
    
else:
    print("‚ö†Ô∏è No ground truth available for evaluation")
    print("   Skipping recall calculation")

## 13. Save Stage 1 Model and Artifacts

In [None]:
# Create comprehensive metadata
stage1_metadata = {
    'model_name': MODEL_NAME,
    'embedding_dimension': int(resume_embeddings.shape[1]),
    'num_documents': int(len(resume_texts)),
    'index_type': type(index).__name__,
    'creation_date': pd.Timestamp.now().isoformat(),
    'device': str(device),
    'performance': {
        'avg_query_time_ms': float(df_benchmark[df_benchmark['k'] == 100]['avg_time_ms'].values[0]),
        'queries_per_second': float(queries_per_second),
    },
    'paths': {
        'embeddings': str(embeddings_path),
        'index': str(index_path),
        'cache': str(cache_path),
    }
}

metadata_output_path = STAGE1_PATH / 'stage1_metadata.json'
with open(metadata_output_path, 'w') as f:
    json.dump(stage1_metadata, f, indent=2)

print("‚úÖ Stage 1 metadata saved")

In [None]:
# Save model configuration
model_config = {
    'model_name': MODEL_NAME,
    'max_seq_length': model.max_seq_length,
    'embedding_dimension': model.get_sentence_embedding_dimension(),
    'normalization': True,
    'similarity_metric': 'cosine',
}

config_path = STAGE1_PATH / 'model_config.json'
with open(config_path, 'w') as f:
    json.dump(model_config, f, indent=2)

print(f"‚úÖ Model configuration saved to: {config_path}")

## 14. Summary and Next Steps

In [None]:
print("="*80)
print(" " * 20 + "STAGE 1: BI-ENCODER RETRIEVAL COMPLETE")
print("="*80)

print("\nüìä Summary:")
print(f"   - Model: {MODEL_NAME}")
print(f"   - Documents indexed: {index.ntotal:,}")
print(f"   - Embedding dimension: {resume_embeddings.shape[1]}")
print(f"   - Index type: {type(index).__name__}")

print("\n‚ö° Performance:")
print(f"   - Query time (k=100): {df_benchmark[df_benchmark['k'] == 100]['avg_time_ms'].values[0]:.2f}ms")
print(f"   - Throughput: {queries_per_second:.1f} queries/second")

print("\nüíæ Saved Artifacts:")
print(f"   - Embeddings: {embeddings_path.name}")
print(f"   - FAISS index: {index_path.name}")
print(f"   - Retrieval cache: {cache_path.name}")
print(f"   - Metadata: {metadata_output_path.name}")

print("\nüìà Key Insights:")
print("   ‚úì Fast retrieval enables real-time candidate screening")
print("   ‚úì Bi-encoder captures semantic similarity effectively")
print("   ‚úì Pre-computed embeddings allow scaling to millions of resumes")
print("   ‚úì Top-100 candidates ready for Stage 2 re-ranking")

print("\nüî¨ Research Notes:")
print("   - Bi-encoders trade interaction modeling for speed")
print("   - Optimal for first-stage retrieval in multi-stage systems")
print("   - Consider model fine-tuning on domain-specific data for better accuracy")

print("\n‚úÖ Ready for Stage 2: Cross-Encoder Re-Ranking")
print("   üëâ Open: 02_stage2_reranker_crossencoder.ipynb")
print("="*80)