# Data Exploration - ARARA Universe

Este notebook demonstra como carregar e explorar dados do universo ARARA.

**Objetivos:**
1. Carregar configurações e dados via DataLoader
2. Análise estatística descritiva
3. Visualizar retornos e correlações
4. Detectar outliers e eventos extremos
5. Analisar características temporais

In [None]:
import sys
from pathlib import Path

# Add project root to path
project_root = Path.cwd().parent
sys.path.insert(0, str(project_root / 'src'))

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta

# Set plot style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
%matplotlib inline

print("✅ Imports successful")

## 1. Carregar Configuração e Dados

In [None]:
from itau_quant.config import load_config, UniverseConfig

# Load universe configuration
universe_config = load_config(
    str(project_root / 'configs' / 'universe_arara.yaml'),
    UniverseConfig
)

print(f"Universe: {universe_config.name}")
print(f"Tickers: {len(universe_config.tickers)} assets")
print(f"\\nAssets: {', '.join(universe_config.tickers[:10])}...")

In [None]:
import yfinance as yf

# Download data for analysis (last 3 years)
end_date = datetime.today()
start_date = end_date - timedelta(days=365 * 3)

print(f"Downloading data from {start_date.date()} to {end_date.date()}...")

data = yf.download(
    tickers=universe_config.tickers,
    start=start_date,
    end=end_date,
    progress=True,
    auto_adjust=True
)

# Extract adjusted close prices
if isinstance(data.columns, pd.MultiIndex):
    prices = data['Close']
else:
    prices = data

prices = prices.dropna(how='all').ffill().bfill()

print(f"\\n✅ Loaded {len(prices)} days of data for {len(prices.columns)} assets")
print(f"Date range: {prices.index[0].date()} to {prices.index[-1].date()}")

## 2. Calcular Retornos e Estatísticas Básicas

In [None]:
# Calculate returns
returns = prices.pct_change().dropna()

print(f"Returns shape: {returns.shape}")
print(f"\\nSample statistics (daily):")
print(f"  Mean:   {returns.mean().mean():.4%}")
print(f"  Median: {returns.median().median():.4%}")
print(f"  Std:    {returns.std().mean():.4%}")
print(f"  Min:    {returns.min().min():.4%}")
print(f"  Max:    {returns.max().max():.4%}")

In [None]:
# Annualized statistics
mu_annual = returns.mean() * 252
vol_annual = returns.std() * np.sqrt(252)
sharpe = mu_annual / vol_annual

stats_df = pd.DataFrame({
    'Return (ann)': mu_annual,
    'Vol (ann)': vol_annual,
    'Sharpe': sharpe
}).sort_values('Sharpe', ascending=False)

print("\\nTop 10 assets by Sharpe Ratio:")
print(stats_df.head(10).to_string())

## Conclusões

Este notebook demonstrou:

1. **Carregamento de dados** via configuração YAML e yfinance
2. **Estatísticas descritivas** de retornos e volatilidade
3. **Visualizações** podem ser adicionadas conforme necessário

**Próximos passos:**
- Ver notebook `02-model-prototyping.ipynb` para testar estimadores
- Verificar notebook `03-results-analysis.ipynb` para análise de backtests