# 🔒 Safe Pokemon Data Exploration

This notebook demonstrates **safe** ways to explore and test your Pokemon analytics pipeline without breaking your production database.

## ⚠️ Important Safety Rules:
1. **READ-ONLY Operations**: Only query data, never insert/update/delete
2. **Small Test Batches**: Use `--limit` for testing collection scripts
3. **Isolated Testing**: Use separate test schemas when needed
4. **Backup First**: Always backup before major operations

In [None]:
# Setup - Add src to path for importing our modules
import sys
import pandas as pd
import psycopg2
from sqlalchemy import create_engine
import logging

# Add our source code to Python path
sys.path.append('/home/jovyan/src')

# Import our utilities
from utils.docker_utils import get_database_config, log_environment_info

print("🚀 Setup complete!")

## 1. 🔍 Safe Database Exploration (READ-ONLY)

In [None]:
# Safe way to explore your database
log_environment_info()

# Get database connection
db_config = get_database_config()
conn_string = f"postgresql://{db_config['user']}:{db_config['password']}@{db_config['host']}:{db_config['port']}/{db_config['database']}"
engine = create_engine(conn_string)

print("📊 Database connection established (READ-ONLY mode)")

In [None]:
# Safe exploration: Check what data we currently have
with engine.connect() as conn:
    # Check raw data
    pokemon_count = conn.execute("SELECT COUNT(*) FROM raw.pokemon").fetchone()[0]
    usage_count = conn.execute("SELECT COUNT(*) FROM raw.usage_stats").fetchone()[0]
    
    # Check analytics data
    analytics_pokemon = conn.execute("SELECT COUNT(*) FROM analytics.dim_pokemon").fetchone()[0]
    analytics_usage = conn.execute("SELECT COUNT(*) FROM analytics.fact_usage").fetchone()[0]
    
print(f"📊 Current Data Status:")
print(f"   Raw Pokemon: {pokemon_count}")
print(f"   Raw Usage Stats: {usage_count}")
print(f"   Analytics Pokemon: {analytics_pokemon}")
print(f"   Analytics Usage Facts: {analytics_usage}")

In [None]:
# Safe exploration: Look at sample data
sample_pokemon = pd.read_sql("""
    SELECT name, type1, type2, base_stat_total 
    FROM raw.pokemon 
    ORDER BY base_stat_total DESC 
    LIMIT 5
""", engine)

print("🏆 Top 5 Pokemon by Base Stats:")
display(sample_pokemon)

## 2. 🧪 Safe Testing of Collection Scripts

**✅ SAFE**: Use small limits to test without overwhelming the database

In [None]:
# Safe way to test Pokemon collection - SMALL BATCHES ONLY!
# This will add only 3 new Pokemon if they don't exist

import subprocess
import sys

# Test with a very small limit (safe)
print("🧪 Testing Pokemon collector with 3 Pokemon (safe test)...")
result = subprocess.run([
    sys.executable, 
    '/home/jovyan/src/collectors/pokemon_collector.py', 
    '--start-id', '151',  # Start from a different ID to avoid duplicates
    '--limit', '3'        # Very small test
], capture_output=True, text=True)

print("STDOUT:", result.stdout)
if result.stderr:
    print("STDERR:", result.stderr)
print(f"Return code: {result.returncode}")

## 3. 📈 Safe Analytics Exploration

In [None]:
# Safe way to explore analytics results
usage_trends = pd.read_sql("""
    SELECT 
        pokemon_name,
        format,
        usage_percentage,
        rank
    FROM analytics.fact_usage
    WHERE rank <= 10
    ORDER BY usage_percentage DESC
""", engine)

print("🏆 Top 10 Most Used Pokemon:")
display(usage_trends)

In [None]:
# Safe way to visualize data without modifying it
import matplotlib.pyplot as plt
import seaborn as sns

# Create a simple usage chart
plt.figure(figsize=(12, 6))
top_10 = usage_trends.head(10)
plt.barh(top_10['pokemon_name'], top_10['usage_percentage'])
plt.xlabel('Usage Percentage')
plt.title('Top 10 Pokemon by Usage')
plt.gca().invert_yaxis()
plt.tight_layout()
plt.show()

print("📊 Visualization complete - no data modified!")

## 4. 🔒 Import and Use Production Functions Safely

In [None]:
# Safe way to import and use functions from your production scripts
from analytics.advanced_competitive_analytics import (
    connect_to_database,
    load_analytics_data,
    analyze_meta_health
)

# Use read-only functions safely
print("📊 Loading analytics data for exploration...")
conn = connect_to_database()
pokemon_df, usage_df, type_df, trends_df = load_analytics_data(conn)

print(f"✅ Loaded data:")
print(f"   Pokemon: {len(pokemon_df)} records")
print(f"   Usage: {len(usage_df)} records")
print(f"   Type effectiveness: {len(type_df)} records")
print(f"   Trends: {len(trends_df)} records")

conn.close()

## 5. ⚠️ What NOT to Do in Jupyter

```python
# ❌ DON'T DO THESE - They can break your production data:

# ❌ Don't run full collection without limits
# !python /home/jovyan/src/collectors/pokemon_collector.py --all

# ❌ Don't run ETL pipeline repeatedly
# !python /app/src/processors/pokemon_etl_pipeline.py

# ❌ Don't execute raw SQL that modifies data
# engine.execute("DELETE FROM raw.pokemon WHERE name = 'pikachu'")

# ❌ Don't run production scrapers without limits
# !python /home/jovyan/src/scrapers/showdown_scraper.py --all-formats
```

## 6. ✅ Safe Production Commands

When you want to run production operations, use the proper containers:

```bash
# ✅ Safe production commands (run in terminal, not Jupyter):

# Data collection (controlled)
docker-compose exec jupyter python /home/jovyan/src/collectors/pokemon_collector.py --limit 50

# Scraping (limited)
docker-compose exec jupyter python /home/jovyan/src/scrapers/showdown_scraper.py --format gen9ou --months 1

# ETL processing (in Spark container)
docker-compose exec spark-master python /app/src/processors/pokemon_etl_pipeline.py

# Analytics (read-only)
docker-compose exec jupyter python /home/jovyan/src/analytics/advanced_competitive_analytics.py
```

In [None]:
print("🎉 Safe exploration complete!")
print("\n💡 Remember:")
print("   ✅ Use Jupyter for: Exploration, visualization, analysis, small tests")
print("   ✅ Use Terminal for: Production data collection, ETL, large operations")
print("   🔒 Always: Use limits, read-only queries, backup before major changes")