# Main Analysis: MacroMind (LNES) Experiment

This notebook demonstrates a complete end-to-end analysis of the Latent News Event Simulation (LNES) system using the small dataset.

## Table of Contents

1. [Setup & Configuration](#setup)
2. [Data Loading & Exploration](#data)
3. [News Embedding](#embedding)
4. [Clustering Analysis](#clustering)
5. [Agent Initialization](#agents)
6. [Market Simulation](#simulation)
7. [Performance Metrics](#metrics)
8. [Visualization](#visualization)
9. [Key Takeaways](#takeaways)

## Overview

This notebook walks through all stages of the LNES pipeline:
- Load news and price data
- Embed news text using TF-IDF or transformers
- Cluster news into latent event categories
- Initialize trading agents (rule-based and AI)
- Simulate market dynamics
- Evaluate performance with quantitative metrics
- Visualize results

**Dataset**: Small curated dataset (~100 rows)  
**Configuration**: `config/small_dataset.yaml`

<a id='setup'></a>
## 1. Setup & Configuration

In [None]:
# Import notebook utilities (includes all necessary imports)
from notebook_utils import *

# Set random seed for reproducibility
np.random.seed(42)

print_section("MacroMind Main Analysis", "=")

In [None]:
# Load configuration
config = load_config("small_dataset")

print("Configuration loaded successfully!")
print(f"\nExperiment: {config['experiment']['name']}")
print(f"Dataset: {config['dataset']['type']}")
print(f"Embedder: {config['embedder']['backend']}")
print(f"Clustering K: {config['clustering']['k']}")
print(f"Agents: {', '.join(config['agents']['enabled'])}")

<a id='data'></a>
## 2. Data Loading & Exploration

In [None]:
# Load data
news_df, prices_df = load_smallset()

print(f"Loaded {len(news_df)} news items")
print(f"Loaded {len(prices_df)} price records")
print(f"\nDate range: {prices_df['date'].min()} to {prices_df['date'].max()}")

In [None]:
# Explore news data
print_subsection("News Data Sample")
display(news_df.head())

print(f"\nColumns: {list(news_df.columns)}")
print(f"Text length stats: min={news_df['text'].str.len().min()}, "
      f"max={news_df['text'].str.len().max()}, "
      f"mean={news_df['text'].str.len().mean():.1f}")

In [None]:
# Explore price data
print_subsection("Price Data Sample")
display(prices_df.head())

# Plot price series
fig, ax = plt.subplots(figsize=FIGSIZE_WIDE)
ax.plot(prices_df['date'], prices_df['close'], label='Close Price', linewidth=2)
ax.set_title('Historical Price Series')
ax.set_xlabel('Date')
ax.set_ylabel('Price')
ax.legend()
ax.grid(True, alpha=0.3)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

<a id='embedding'></a>
## 3. News Embedding

In [None]:
# Create embedder
embedder = NewsEmbedder(backend=config['embedder']['backend'])

print(f"Embedder backend: {embedder.backend}")
print("Embedding news text...")

# Embed news
embeddings = embedder.embed(news_df['text'].tolist())

print(f"✓ Generated embeddings of shape: {embeddings.shape}")
print(f"  Embedding dimension: {embeddings.shape[1]}")
print(f"  Number of embeddings: {embeddings.shape[0]}")

<a id='clustering'></a>
## 4. Clustering Analysis

In [None]:
# Perform clustering
k = config['clustering']['k']
clustering = NewsClustering(k=k)
clusters = clustering.fit_predict(embeddings)

print(f"✓ Created {k} clusters")
print(f"  Silhouette score: {clustering.silhouette_score(embeddings):.3f}")

# Cluster distribution
unique, counts = np.unique(clusters, return_counts=True)
print(f"\nCluster distribution:")
for cluster_id, count in zip(unique, counts):
    print(f"  Cluster {cluster_id}: {count} items ({count/len(clusters)*100:.1f}%)")

In [None]:
# Visualize clusters
fig = plot_cluster_analysis(embeddings, clusters, method='tsne')
plt.show()

In [None]:
# Sample news from each cluster
print_subsection("Sample News from Each Cluster")

for cluster_id in range(k):
    cluster_indices = np.where(clusters == cluster_id)[0]
    sample_idx = cluster_indices[0]  # Take first item
    sample_text = news_df.iloc[sample_idx]['text'][:200]  # First 200 chars
    
    print(f"\n[Cluster {cluster_id}] ({len(cluster_indices)} items)")
    print(f"  {sample_text}...")

<a id='agents'></a>
## 5. Agent Initialization

In [None]:
# Initialize agents
agents = []
agent_names = config['agents']['enabled']

for agent_name in agent_names:
    if agent_name == 'Random':
        agents.append(Random())
    elif agent_name == 'Momentum':
        agents.append(Momentum())
    elif agent_name == 'Contrarian':
        agents.append(Contrarian())
    elif agent_name == 'NewsReactive':
        agents.append(NewsReactive(clusters=clusters, k=k))

print(f"✓ Initialized {len(agents)} agents:")
for i, agent in enumerate(agents):
    print(f"  {i+1}. {agent.__class__.__name__}")

<a id='simulation'></a>
## 6. Market Simulation

In [None]:
# Create simulator
simulator = MarketSimulator(
    agents=agents,
    alpha=config['simulator']['alpha'],
    noise_std=config['simulator']['noise_std']
)

print(f"Simulator configuration:")
print(f"  Alpha (price impact): {simulator.alpha}")
print(f"  Noise std: {simulator.noise_std}")
print(f"  Number of agents: {len(simulator.agents)}")

In [None]:
# Run simulation
print("\nRunning market simulation...")
sim_prices, action_log = simulator.simulate(
    news_df=news_df,
    prices_df=prices_df,
    clusters=clusters
)

print(f"✓ Simulation complete!")
print(f"  Simulated {len(sim_prices)} time steps")
print(f"  Price range: [{sim_prices.min():.2f}, {sim_prices.max():.2f}]")

<a id='metrics'></a>
## 7. Performance Metrics

In [None]:
# Compute metrics
ref_prices = prices_df['close'].values

print_subsection("Performance Metrics")

# Directional accuracy
dir_acc = metrics.directional_accuracy(ref_prices, sim_prices)
print(f"Directional Accuracy: {dir_acc:.2%}")

# Volatility clustering
vol_clust = metrics.volatility_clustering(sim_prices)
print(f"Volatility Clustering (lag=5): {vol_clust:.3f}")

# Cluster-price correlation
cluster_price_corr = metrics.cluster_price_correlation(clusters, ref_prices)
print(f"Cluster-Price Correlation: {cluster_price_corr:.3f}")

In [None]:
# Advanced metrics (if enabled)
if config['metrics']['advanced_metrics']:
    print_subsection("Advanced Metrics")
    
    # Sharpe ratio
    sharpe = metrics.sharpe_ratio(sim_prices, 
                                  risk_free_rate=config['metrics']['risk_free_rate'])
    print(f"Sharpe Ratio: {sharpe:.3f}")
    
    # Maximum drawdown
    max_dd = metrics.maximum_drawdown(sim_prices)
    print(f"Maximum Drawdown: {max_dd:.2%}")
    
    # Sortino ratio
    sortino = metrics.sortino_ratio(sim_prices, 
                                   risk_free_rate=config['metrics']['risk_free_rate'])
    print(f"Sortino Ratio: {sortino:.3f}")

In [None]:
# Agent performance
print_subsection("Agent Performance")

# PnL
pnl = metrics.agent_pnl(action_log, ref_prices)
print("\nAgent PnL (Naive):")
for agent_name, pnl_value in pnl.items():
    print(f"  {agent_name:15s}: {pnl_value:8.2f}")

# Win rates
win_rates = metrics.win_rate(action_log, ref_prices)
print("\nAgent Win Rates:")
for agent_name, wr in win_rates.items():
    print(f"  {agent_name:15s}: {wr:.2%}")

# Directional accuracy per agent
dir_acc_agents = metrics.per_agent_directional_accuracy(action_log, ref_prices)
print("\nAgent Directional Accuracy:")
for agent_name, acc in dir_acc_agents.items():
    print(f"  {agent_name:15s}: {acc:.2%}")

<a id='visualization'></a>
## 8. Visualization

In [None]:
# Plot price comparison
fig, ax = plt.subplots(figsize=FIGSIZE_WIDE)

ax.plot(ref_prices, label='Reference Prices', linewidth=2, alpha=0.7)
ax.plot(sim_prices, label='Simulated Prices', linewidth=2, alpha=0.7, linestyle='--')

ax.set_title('Price Comparison: Reference vs Simulated')
ax.set_xlabel('Time Step')
ax.set_ylabel('Price')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Agent comparison
fig = plot_agent_comparison(action_log, ref_prices)
plt.show()

In [None]:
# Comprehensive dashboard (if visualizations module available)
try:
    fig = plot_comprehensive_dashboard(
        prices=sim_prices,
        action_log=action_log,
        reference_prices=ref_prices
    )
    plt.show()
except Exception as e:
    print(f"Could not create comprehensive dashboard: {e}")

<a id='takeaways'></a>
## 9. Key Takeaways

### Summary of Results

1. **Data**: Successfully loaded and processed small dataset
2. **Embedding**: Generated semantic embeddings for news text
3. **Clustering**: Identified distinct latent event categories
4. **Simulation**: Simulated multi-agent market dynamics
5. **Performance**: Evaluated agents using multiple metrics

### Key Findings

- **Directional Accuracy**: Indicates how well simulated prices match reference price movements
- **Agent Performance**: Some agents outperform others based on strategy
- **Cluster Quality**: Silhouette score indicates cluster cohesion

### Next Steps

1. Try FNSPID dataset for real-world validation (`02_fnspid_analysis.ipynb`)
2. Compare different agent types (`03_agent_comparison.ipynb`)
3. Analyze AI agents in depth (`04_ai_agents_analysis.ipynb`)
4. Perform sensitivity analysis (`05_sensitivity_analysis.ipynb`)

In [None]:
# Save results (optional)
from src.result_cache import save_results

results = {
    'sim_prices': sim_prices,
    'ref_prices': ref_prices,
    'action_log': action_log,
    'clusters': clusters,
    'metrics': {
        'directional_accuracy': dir_acc,
        'volatility_clustering': vol_clust,
    }
}

# Uncomment to save:
# cache_key = save_results(results, config)
# print(f"Results saved with cache_key: {cache_key}")

print("\n" + "="*70)
print("Analysis Complete!")
print("="*70)