# CCYOE Analytics - Data Analysis & Yield Patterns

This notebook focuses on comprehensive data analysis including:
- Brazilian market data loading and validation
- Yield pattern analysis and seasonality
- Correlation and cross-asset analysis
- Data quality assessment

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

import sys
sys.path.append('..')

from cambi_analytics import (
    DataLoader, BrazilianDataLoader, YieldProcessor, DataValidator,
    YieldAnalyzer, PerformanceMetrics, RiskMetrics,
    get_config
)

# Set up plotting
plt.style.use('default')
plt.rcParams['figure.figsize'] = (14, 8)
sns.set_palette("husl")

config = get_config()
print("📊 Data Analysis Environment Ready")

## 1. Data Loading & Processing

In [None]:
# Load comprehensive sample data
data_loader = DataLoader()

# Load extended period for better analysis
raw_data = data_loader.load_sample_data(
    data_type='brazilian_market',
    start_date='2022-01-01',
    end_date='2024-01-01'
)

print(f"📊 Dataset Overview:")
print(f"   Rows: {len(raw_data):,}")
print(f"   Period: {raw_data['date'].min().date()} to {raw_data['date'].max().date()}")
print(f"   Assets: {[col for col in raw_data.columns if col != 'date']}")

# Basic statistics
print(f"\n📈 Basic Statistics (in basis points):")
stats = raw_data[['cmBTC', 'cmUSD', 'cmBRL', 'SELIC', 'CDI']].describe()
stats.round(0)

In [None]:
# Process data with comprehensive validation
processor = YieldProcessor()
validator = DataValidator()

# Validate raw data first
validation_results = validator.validate_yield_data(
    raw_data,
    yield_columns=['cmBTC', 'cmUSD', 'cmBRL', 'SELIC', 'CDI']
)

print("🔍 Data Validation Results:")
print(f"   Valid: {validation_results['is_valid']}")
print(f"   Errors: {len(validation_results['errors'])}")
print(f"   Warnings: {len(validation_results['warnings'])}")

if validation_results['warnings']:
    print("\n⚠️ Warnings:")
    for warning in validation_results['warnings'][:3]:  # Show first 3
        print(f"   {warning}")

# Process the data
processed_data = processor.process_yield_data(
    raw_data,
    yield_columns=['cmBTC', 'cmUSD', 'cmBRL', 'SELIC', 'CDI'],
    handle_missing='interpolate',
    handle_outliers='cap',
    outlier_threshold=3.0
)

print(f"\n✅ Data processing completed")
print(f"   Final dataset: {len(processed_data)} rows")

## 2. Yield Analysis Summary

This notebook demonstrates the comprehensive data analysis capabilities of the CCYOE Analytics package.

### Key Insights:
- **Data Quality**: Robust validation and processing ensures clean, reliable data
- **Yield Patterns**: Clear identification of trends, seasonality, and correlations
- **Risk Assessment**: Comprehensive volatility and stability analysis
- **Cross-Asset Effects**: Understanding of asset interactions for optimization

### Next Steps:
Continue to the **03_backtesting.ipynb** notebook for detailed CCYOE strategy evaluation.