# Advanced NLP Project Demo

This notebook demonstrates the key features of the Advanced NLP Project, including:
- RAG-based fact checking
- Multi-modal keyword extraction
- Advanced sentence ranking


In [None]:
# Install the package if not already installed
# !pip install -e ..

import sys
import os
sys.path.append('../src')

from nlp_project import RAGSystem, KeywordExtractor, SentenceRanker
from nlp_project.utils import Config, get_logger

# Setup logging
logger = get_logger(__name__, level="INFO")
print("Imports successful!")

## 1. Configuration Setup

First, let's set up the configuration. Make sure to set your Anthropic API key.

In [None]:
# Set up configuration
# Option 1: From environment (recommended)
os.environ['ANTHROPIC_API_KEY'] = 'your-api-key-here'  # Replace with your actual key
config = Config.from_env()

# Option 2: Create custom config
# config = Config(
#     anthropic_api_key="your-api-key",
#     batch_size=64,
#     device="cpu"
# )

print(f"Configuration loaded:")
print(f"- Model: {config.sentence_model}")
print(f"- Device: {config.device}")
print(f"- Batch size: {config.batch_size}")

## 2. Keyword Extraction Demo

Let's demonstrate the multi-modal keyword extraction system.

In [None]:
# Initialize keyword extractor
keyword_extractor = KeywordExtractor(config=config)

# Test text
test_text = "Machine learning and artificial intelligence are transforming healthcare by enabling predictive analytics and personalized treatment plans."

print("Text:", test_text)
print("\n" + "="*50)

# Extract keywords using different methods
methods = ["ner", "llm", "combined"]

for method in methods:
    try:
        keywords = keyword_extractor.extract_keywords(test_text, method=method)
        print(f"\n{method.upper()} Keywords: {keywords}")
    except Exception as e:
        print(f"\n{method.upper()} Keywords: Error - {e}")

# Get keywords with scores
try:
    keyword_scores = keyword_extractor.extract_keywords_with_scores(test_text)
    print("\nKeywords with Scores:")
    for keyword, data in keyword_scores.items():
        print(f"  {keyword}: {data['confidence']:.2f} ({data.get('entity_type', 'UNKNOWN')})")
except Exception as e:
    print(f"\nKeywords with scores: Error - {e}")

## 3. Sentence Ranking Demo

Demonstrate advanced sentence similarity and ranking.

In [None]:
# Initialize sentence ranker
try:
    sentence_ranker = SentenceRanker(config=config)
    
    # Test query and candidate sentences
    query = "What are the benefits of renewable energy?"
    
    candidates = [
        "Solar panels convert sunlight directly into electricity using photovoltaic cells.",
        "Wind turbines generate clean electricity by harnessing wind power.",
        "Coal burning releases harmful pollutants into the atmosphere.",
        "Renewable energy sources reduce greenhouse gas emissions significantly.",
        "Nuclear power plants require uranium fuel for energy generation.",
        "Hydroelectric dams use flowing water to generate sustainable electricity.",
        "Fossil fuels are finite resources that will eventually be depleted."
    ]
    
    print("Query:", query)
    print("\nCandidate Sentences:")
    for i, sentence in enumerate(candidates, 1):
        print(f"{i}. {sentence}")
    
    # Rank sentences by relevance
    ranked_sentences, similarities = sentence_ranker.rank_sentences_by_relevance(
        query, candidates, return_similarities=True
    )
    
    print("\n" + "="*50)
    print("RANKED SENTENCES (by relevance):")
    print("="*50)
    
    for i, (sentence, sim) in enumerate(zip(ranked_sentences, similarities), 1):
        print(f"{i}. ({sim:.3f}) {sentence[:80]}...")
    
    # Get top-3 most similar
    top_similar = sentence_ranker.get_top_k_similar(query, candidates, k=3)
    
    print("\nTOP-3 MOST SIMILAR:")
    for i, (sentence, sim) in enumerate(top_similar, 1):
        print(f"{i}. ({sim:.3f}) {sentence}")
        
except Exception as e:
    print(f"Sentence ranking error: {e}")
    print("Note: This requires the sentence-transformers model to be downloaded.")

## 4. RAG System Demo

Demonstrate the complete RAG-based fact-checking system.

In [None]:
# Initialize RAG system
try:
    rag_system = RAGSystem(config=config)
    
    # Test claims
    test_claims = [
        "The Earth is the third planet from the Sun.",
        "Water boils at 100 degrees Celsius at sea level.",
        "The Great Wall of China is visible from space with the naked eye."
    ]
    
    print("RAG SYSTEM FACT-CHECKING DEMO")
    print("="*50)
    
    for i, claim in enumerate(test_claims, 1):
        print(f"\nClaim {i}: {claim}")
        print("-" * 40)
        
        try:
            # Verify the claim
            result = rag_system.verify_claim(
                claim,
                return_evidence=True,
                verbose=False
            )
            
            print(f"Verdict: {result['verdict']}")
            print(f"Confidence: {result['confidence']:.2f}")
            print(f"Keywords: {result.get('keywords', [])}")
            
            if 'reasoning' in result:
                reasoning = result['reasoning'][:200] + "..." if len(result['reasoning']) > 200 else result['reasoning']
                print(f"Reasoning: {reasoning}")
            
            if 'evidence' in result and result['evidence']:
                print(f"Evidence (first 2):")
                for j, evidence in enumerate(result['evidence'][:2], 1):
                    evidence_short = evidence[:100] + "..." if len(evidence) > 100 else evidence
                    print(f"  {j}. {evidence_short}")
                    
        except Exception as e:
            print(f"Error verifying claim: {e}")
            print("Note: This requires a valid Anthropic API key and internet connection.")

except Exception as e:
    print(f"RAG system initialization error: {e}")
    print("Note: Make sure all dependencies are installed and API key is set.")

## 5. System Information

Get information about the current system configuration.

In [None]:
try:
    # Get system information
    system_info = rag_system.get_system_info()
    
    print("SYSTEM INFORMATION")
    print("="*30)
    
    print("\nModel Information:")
    for key, value in system_info['model_info'].items():
        print(f"  {key}: {value}")
    
    print("\nConfiguration:")
    for key, value in system_info['config'].items():
        print(f"  {key}: {value}")
    
    print("\nComponents:")
    for key, value in system_info['components'].items():
        print(f"  {key}: {value}")
        
except Exception as e:
    print(f"Could not get system info: {e}")

## 6. Performance Testing (Optional)

Test the system with sample data.

In [None]:
# Sample test data for evaluation
sample_test_data = [
    {"claim": "Water freezes at 0 degrees Celsius.", "label": "SUPPORTS"},
    {"claim": "The sun rises in the west.", "label": "REFUTES"},
    {"claim": "Quantum computers use quantum bits.", "label": "SUPPORTS"}
]

try:
    if 'rag_system' in locals():
        print("PERFORMANCE EVALUATION")
        print("="*30)
        
        # Run evaluation on sample data
        metrics = rag_system.evaluate_performance(sample_test_data)
        
        print(f"Test Data Size: {metrics['total']}")
        print(f"Correct Predictions: {metrics['correct']}")
        print(f"Accuracy: {metrics['accuracy']:.3f}")
        
        print("\nVerdict Distribution:")
        for verdict, count in metrics['verdict_distribution'].items():
            print(f"  {verdict}: {count}")
    else:
        print("RAG system not initialized. Skipping performance evaluation.")
        
except Exception as e:
    print(f"Performance evaluation error: {e}")

## Conclusion

This demo showcased the key capabilities of the Advanced NLP Project:

1. **Multi-modal keyword extraction** combining NER and LLM approaches
2. **Advanced sentence ranking** using fine-tuned transformers
3. **Complete RAG pipeline** for fact-checking and claim verification
4. **Professional architecture** with proper error handling and logging

The system is designed for production use with comprehensive configuration management and extensible components.

### Next Steps

- Set up your Anthropic API key for full functionality
- Explore the evaluation scripts in `scripts/`
- Run the test suite with `pytest`
- Check out the API documentation in `docs/API.md`