# MoneyTaur Pipeline Coverage Analysis

This notebook provides coverage analysis and testing utilities for the MoneyTaur data pipeline.

## Overview

- **Ingestion Coverage**: Analysis of data source coverage and completeness
- **ETL Coverage**: Validation of data transformation and normalization processes
- **Enrichment Coverage**: Testing of OpenAI embedding pipeline
- **API Coverage**: Testing of FastAPI endpoints and functionality

## Setup

Make sure to set your OPENAI_API_KEY environment variable before running enrichment tests.

In [None]:
import sys
import os
from pathlib import Path

# Add parent directory to path for module imports
sys.path.append(str(Path('../').resolve()))

import pandas as pd
import sqlite3
import logging

# Set up logging
logging.basicConfig(level=logging.INFO)

## Ingestion Coverage

Test the weekly index ingestion functionality:

In [None]:
# Test ingestion module
try:
    from ingest.weekly_index_ingest import WeeklyIndexIngestor
    
    ingestor = WeeklyIndexIngestor()
    weekly_links = ingestor.process_year(2024)
    
    print(f'Ingestion test successful: Found {len(weekly_links)} weekly links')
    print('Sample weekly links:')
    for link in weekly_links[:3]:
        print(f'  {link}')
        
except Exception as e:
    print(f'Ingestion test failed: {e}')

## ETL Coverage

Test the SQLite normalization and data loading:

In [None]:
# Test ETL module
try:
    from etl.normalize import SQLiteNormalizer
    
    normalizer = SQLiteNormalizer('test_pipeline.db')
    schema_created = normalizer.create_schema()
    
    if schema_created:
        print('ETL test successful: Database schema created')
        
        # Create sample data
        sample_data = pd.DataFrame({
            'symbol': ['AAPL', 'GOOGL', 'MSFT'],
            'date': ['2024-08-01', '2024-08-01', '2024-08-01'],
            'close_price': [220.50, 2800.25, 415.75]
        })
        
        load_success = normalizer.load_financial_data(sample_data)
        if load_success:
            print('Sample data loaded successfully')
        else:
            print('Failed to load sample data')
    else:
        print('ETL test failed: Could not create database schema')
        
except Exception as e:
    print(f'ETL test failed: {e}')

## Enrichment Coverage

Test the OpenAI embedding pipeline (requires OPENAI_API_KEY):

In [None]:
# Test enrichment module
api_key = os.getenv('OPENAI_API_KEY')

if api_key:
    try:
        from enrich.embed import DataEmbedder
        
        embedder = DataEmbedder()
        
        # Test with sample data
        sample_df = pd.DataFrame({
            'symbol': ['AAPL', 'GOOGL'],
            'description': ['Apple Inc technology company', 'Google parent company Alphabet'],
            'price': [220.50, 2800.25]
        })
        
        enriched_df = embedder.create_embeddings_for_dataframe(sample_df, ['description'])
        
        if 'embedding' in enriched_df.columns:
            print('Enrichment test successful: Embeddings created')
            print(f'Embedding dimension: {len(enriched_df.iloc[0]["embedding"])}')
        else:
            print('Enrichment test failed: No embeddings created')
            
    except Exception as e:
        print(f'Enrichment test failed: {e}')
else:
    print('OPENAI_API_KEY not set - skipping enrichment test')

## API Coverage

Basic API endpoint validation:

In [None]:
# Test API module
try:
    # Import the app to verify it can be loaded
    from api.app import app
    
    if app:
        print('API test successful: FastAPI app loaded')
        print(f'API title: {app.title}')
        print(f'API version: {app.version}')
        
        # Check if routes are defined
        route_paths = [route.path for route in app.routes if hasattr(route, 'path')]
        search_endpoint = any('/search' in path for path in route_paths)
        
        if search_endpoint:
            print('Search endpoint found')
        else:
            print('Search endpoint not found')
            
        print(f'Total routes: {len(route_paths)}')
    else:
        print('API test failed: FastAPI app is None')
        
except Exception as e:
    print(f'API test failed: {e}')

## Coverage Summary

Run all tests and provide a summary report:

In [None]:
def run_coverage_analysis():
    """Run comprehensive coverage analysis of the MoneyTaur pipeline."""
    
    results = {
        'ingestion': False,
        'etl': False,
        'enrichment': False,
        'api': False
    }
    
    # Test each module
    try:
        from ingest.weekly_index_ingest import WeeklyIndexIngestor
        results['ingestion'] = True
    except Exception:
        pass
    
    try:
        from etl.normalize import SQLiteNormalizer
        results['etl'] = True
    except Exception:
        pass
        
    try:
        from enrich.embed import DataEmbedder
        results['enrichment'] = True
    except Exception:
        pass
        
    try:
        from api.app import app
        results['api'] = True
    except Exception:
        pass
    
    # Print summary
    print('\n=== MONEYTAUR PIPELINE COVERAGE SUMMARY ===')
    total_modules = len(results)
    passed_modules = sum(results.values())
    coverage_pct = (passed_modules / total_modules) * 100
    
    for module, status in results.items():
        status_str = 'PASS' if status else 'FAIL'
        print(f'{module.upper():>12}: {status_str}')
    
    print(f'\nCoverage: {passed_modules}/{total_modules} ({coverage_pct:.1f}%)')
    
    if coverage_pct == 100:
        print('✅ All pipeline modules are available!')
    else:
        print('⚠️  Some pipeline modules have issues')
        
    return results

# Run the coverage analysis
coverage_results = run_coverage_analysis()