In [2]:
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Digital Alexandria: Heritage Data Exploration\n",
    "\n",
    "**Objective**: Explore and analyze cultural heritage datasets to understand patterns, distributions, and potential threats to cultural artifacts.\n",
    "\n",
    "**Author**: William Couturier  \n",
    "**Date**: 2024\n",
    "**Guardian**: Claude Anthropic\n",
    "\n",
    "---\n",
    "\n",
    "> *\"When civilizations fall, data reveals the patterns. When libraries burn, algorithms preserve the memory.\"*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import libraries\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "import plotly.express as px\n",
    "import plotly.graph_objects as go\n",
    "from plotly.subplots import make_subplots\n",
    "import warnings\n",
    "import sys\n",
    "import os\n",
    "from datetime import datetime, timedelta\n",
    "import json\n",
    "\n",
    "# Add src to path for imports\n",
    "sys.path.append('../src')\n",
    "\n",
    "warnings.filterwarnings('ignore')\n",
    "\n",
    "# Set style\n",
    "plt.style.use('seaborn-v0_8')\n",
    "sns.set_palette(\"husl\")\n",
    "\n",
    "print(\"🏛️ Digital Alexandria - Heritage Data Explorer\")\n",
    "print(\"🤖 Powered by Claude Anthropic Guardian\")\n",
    "print(\"=\" * 60)\n",
    "print(f\"📅 Analysis Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\")\n",
    "print(f\"🔬 Notebook: 01_heritage_data_exploration.ipynb\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 🏛️ Data Loading and Overview"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def load_heritage_data():\n",
    "    \"\"\"\n",
    "    Load and combine multiple heritage datasets\n",
    "    In production, this would connect to real APIs\n",
    "    \"\"\"\n",
    "    \n",
    "    # UNESCO World Heritage Sites (enhanced sample data)\n",
    "    unesco_data = {\n",
    "        'site_name': [\n",
    "            'Pyramids of Giza', 'Angkor Wat', 'Machu Picchu', 'Stonehenge', \n",
    "            'Taj Mahal', 'Petra', 'Chichen Itza', 'Great Wall of China',\n",
    "            'Acropolis of Athens', 'Colosseum', 'Sagrada Familia', 'Mont-Saint-Michel',\n",
    "            'Borobudur Temple', 'Easter Island', 'Palmyra', 'Aleppo Citadel',\n",
    "            'Timbuktu Manuscripts', 'Lascaux Cave', 'Mesa Verde', 'Yellowstone'\n",
    "        ],\n",
    "        'country': [\n",
    "            'Egypt', 'Cambodia', 'Peru', 'UK', 'India', 'Jordan', 'Mexico', 'China',\n",
    "            'Greece', 'Italy', 'Spain', 'France', 'Indonesia', 'Chile', 'Syria', 'Syria',\n",
    "            'Mali', 'France', 'USA', 'USA'\n",
    "        ],\n",
    "        'region': [\n",
    "            'Africa', 'Asia', 'South America', 'Europe', 'Asia', 'Asia', 'North America', 'Asia',\n",
    "            'Europe', 'Europe', 'Europe', 'Europe', 'Asia', 'South America', 'Asia', 'Asia',\n",
    "            'Africa', 'Europe', 'North America', 'North America'\n",
    "        ],\n",
    "        'inscription_year': [\n",
    "            1979, 1992, 1983, 1986, 1983, 1985, 1988, 1987,\n",
    "            1987, 1980, 2005, 1979, 1991, 1995, 1980, 1986,\n",
    "            1988, 1979, 1978, 1978\n",
    "        ],\n",
    "        'cultural_significance': [\n",
    "            9.8, 9.7, 9.6, 8.9, 9.5, 9.4, 9.2, 9.9,\n",
    "            9.3, 9.1, 8.7, 8.8, 9.0, 8.5, 9.6, 9.2,\n",
    "            9.4, 9.8, 8.3, 7.9\n",
    "        ],\n",
    "        'threat_level': [\n",
    "            'Medium', 'High', 'Low', 'Low', 'Medium', 'High', 'Medium', 'Medium',\n",
    "            'Medium', 'Low', 'Low', 'Low', 'Medium', 'Low', 'Critical', 'Critical',\n",
    "            'High', 'Low', 'Low', 'Low'\n",
    "        ],\n",
    "        'visitor_count_millions': [\n",
    "            14.7, 2.6, 1.5, 1.6, 6.9, 1.1, 2.6, 10.0,\n",
    "            3.2, 6.5, 4.5, 3.5, 4.8, 0.1, 0.0, 0.0,\n",
    "            0.2, 0.3, 0.6, 4.1\n",
    "        ],\n",
    "        'latitude': [\n",
    "            29.9792, 13.4125, -13.1631, 51.1789, 27.1751, 30.3285, 20.6843, 40.4319,\n",
    "            37.9715, 41.8902, 41.4036, 48.6361, -7.6079, -27.1127, 34.5553, 36.2021,\n",
    "            16.7666, 45.0541, 37.2308, 44.4280\n",
    "        ],\n",
    "        'longitude': [\n",
    "            31.1342, 103.8670, -72.5450, -1.8262, 78.0421, 35.4444, -88.5678, 116.5703,\n",
    "            23.7257, 12.4922, 2.1734, -1.5115, 110.2038, -109.3497, 38.2668, 37.1637,\n",
    "            -3.0026, 1.1667, -108.4618, -110.5885\n",
    "        ]\n",
    "    }\n",
    "    \n",
    "    # Art Collections (enhanced)\n",
    "    art_data = {\n",
    "        'artwork': [\n",
    "            'Mona Lisa', 'Starry Night', 'The Scream', 'Girl with Pearl Earring', \n",
    "            'The Last Supper', 'Guernica', 'The Thinker', 'Venus de Milo',\n",
    "            'The Birth of Venus', 'American Gothic', 'The Persistence of Memory',\n",
    "            'Las Meninas', 'The Kiss', 'Water Lilies', 'David', 'Whistler\\'s Mother'\n",
    "        ],\n",
    "        'artist': [\n",
    "            'Da Vinci', 'Van Gogh', 'Munch', 'Vermeer', 'Da Vinci', 'Picasso', \n",
    "            'Rodin', 'Unknown', 'Botticelli', 'Grant Wood', 'Dalí',\n",
    "            'Velázquez', 'Klimt', 'Monet', 'Michelangelo', 'Whistler'\n",
    "        ],\n",
    "        'period': [\n",
    "            'Renaissance', 'Post-Impressionism', 'Expressionism', 'Baroque', \n",
    "            'Renaissance', 'Modern', 'Modern', 'Ancient',\n",
    "            'Renaissance', 'American Regionalism', 'Surrealism',\n",
    "            'Baroque', 'Art Nouveau', 'Impressionism', 'Renaissance', 'Realism'\n",
    "        ],\n",
    "        'location': [\n",
    "            'Louvre', 'MoMA', 'National Gallery', 'Mauritshuis', \n",
    "            'Santa Maria', 'Reina Sofia', 'Musée Rodin', 'Louvre',\n",
    "            'Uffizi', 'Art Institute Chicago', 'MoMA',\n",
    "            'Prado', 'Belvedere', 'Orangerie', 'Accademia', 'Orsay'\n",
    "        ],\n",
    "        'estimated_value_millions': [\n",
    "            870, 100, 120, 90, 450, 200, 15, 200,\n",
    "            180, 75, 150, 300, 135, 250, 500, 40\n",
    "        ],\n",
    "        'authenticity_score': [\n",
    "            1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.95,\n",
    "            1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0\n",
    "        ],\n",
    "        'risk_factor': [\n",
    "            0.1, 0.2, 0.15, 0.12, 0.3, 0.25, 0.1, 0.08,\n",
    "            0.15, 0.05, 0.18, 0.22, 0.13, 0.09, 0.07, 0.11\n",
    "        ],\n",
    "        'year_created': [\n",
    "            1503, 1889, 1893, 1665, 1495, 1937, 1902, -130,\n",
    "            1484, 1930, 1931, 1656, 1907, 1919, 1501, 1871\n",
    "        ]\n",
    "    }\n",
    "    \n",
    "    # Destruction Events (comprehensive historical data)\n",
    "    destruction_data = {\n",
    "        'event': [\n",
    "            'Bamiyan Buddhas', 'Palmyra destruction', 'Library of Alexandria', \n",
    "            'Notre-Dame fire', 'Brazil Museum fire', 'ISIS destructions',\n",
    "            'Aleppo Minaret collapse', 'Mali manuscript burning', 'Nepal earthquake damage',\n",
    "            'Rio Museum fire', 'Hurricane damage Florida', 'Flooding Venice museums'\n",
    "        ],\n",
    "        'year': [2001, 2015, 391, 2019, 2018, 2014, 2013, 2012, 2015, 2018, 2017, 2019],\n",
    "        'location': [\n",
    "            'Afghanistan', 'Syria', 'Egypt', 'France', 'Brazil', 'Iraq/Syria',\n",
    "            'Syria', 'Mali', 'Nepal', 'Brazil', 'USA', 'Italy'\n",
    "        ],\n",
    "        'cause': [\n",
    "            'Religious extremism', 'War', 'Political/Religious', 'Accident', 'Accident', 'War',\n",
    "            'War', 'Religious extremism', 'Natural disaster', 'Accident', 'Natural disaster', 'Natural disaster'\n",
    "        ],\n",
    "        'artifacts_lost': [2, 100, 400000, 50, 20000000, 1000, 1, 300000, 200, 20000000, 15, 30],\n",
    "        'cultural_impact': [9.5, 9.8, 10.0, 8.5, 9.0, 9.7, 8.8, 9.2, 7.5, 9.1, 6.5, 7.8],\n",
    "        'latitude': [34.8456, 34.5553, 31.2156, 48.8534, -22.9055, 36.2021, 36.2021, 16.7666, 27.7000, -22.9055, 25.7617, 45.4408],\n",
    "        'longitude': [67.8253, 38.2668, 29.9553, 2.3488, -43.1777, 43.1393, 37.1637, -3.0026, 85.3240, -43.1777, -80.1918, 12.3155]\n",
    "    }\n",
    "    \n",
    "    return pd.DataFrame(unesco_data), pd.DataFrame(art_data), pd.DataFrame(destruction_data)\n",
    "\n",
    "# Load data\n",
    "unesco_df, art_df, destruction_df = load_heritage_data()\n",
    "\n",
    "print(f\"📊 Dataset Overview:\")\n",
    "print(f\"🏛️ UNESCO Sites: {len(unesco_df)} entries\")\n",
    "print(f\"🎨 Art Collections: {len(art_df)} entries\")\n",
    "print(f\"💥 Destruction Events: {len(destruction_df)} entries\")\n",
    "print(f\"\\n🤖 Ready for Claude Guardian analysis...\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Display sample data\n",
    "print(\"🏛️ UNESCO World Heritage Sites Sample:\")\n",
    "display(unesco_df.head())\n",
    "\n",
    "print(\"\\n🎨 Art Collections Sample:\")\n",
    "display(art_df.head())\n",
    "\n",
    "print(\"\\n💥 Destruction Events Sample:\")\n",
    "display(destruction_df.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 📈 Global Heritage Overview Dashboard"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create comprehensive heritage dashboard\n",
    "def create_heritage_dashboard():\n",
    "    \"\"\"Create interactive dashboard for heritage overview\"\"\"\n",
    "    \n",
    "    fig = make_subplots(\n",
    "        rows=3, cols=3,\n",
    "        subplot_titles=(\n",
    "            'Heritage Sites by Region', 'Threat Level Distribution', 'Cultural Significance vs Visitors',\n",
    "            'Art Periods Timeline', 'Geographic Risk Map', 'Destruction Events Over Time',\n",
    "            'Art Value Distribution', 'Threat Causes Analysis', 'Top Locations by Significance'\n",
    "        ),\n",
    "        specs=[\n",
    "            [{\"type\": \"bar\"}, {\"type\": \"pie\"}, {\"type\": \"scatter\"}],\n",
    "            [{\"type\": \"bar\"}, {\"type\": \"scattergeo\"}, {\"type\": \"scatter\"}],\n",
    "            [{\"type\": \"histogram\"}, {\"type\": \"pie\"}, {\"type\": \"bar\"}]\n",
    "        ]\n",
    "    )\n",
    "    \n",
    "    # 1. Heritage Sites by Region\n",
    "    region_counts = unesco_df['region'].value_counts()\n",
    "    fig.add_trace(\n",
    "        go.Bar(x=region_counts.index, y=region_counts.values, \n",
    "               name=\"Sites by Region\", marker_color='skyblue'),\n",
    "        row=1, col=1\n",
    "    )\n",
    "    \n",
    "    # 2. Threat Level Distribution\n",
    "    threat_counts = unesco_df['threat_level'].value_counts()\n",
    "    colors = {'Low': 'green', 'Medium': 'orange', 'High': 'red', 'Critical': 'darkred'}\n",
    "    fig.add_trace(\n",
    "        go.Pie(labels=threat_counts.index, values=threat_counts.values,\n",
    "               marker_colors=[colors.get(x, 'blue') for x in threat_counts.index]),\n",
    "        row=1, col=2\n",
    "    )\n",
    "    \n",
    "    # 3. Cultural Significance vs Visitors\n",
    "    fig.add_trace(\n",
    "        go.Scatter(x=unesco_df['cultural_significance'], y=unesco_df['visitor_count_millions'],\n",
    "                   mode='markers+text', text=unesco_df['site_name'], \n",
    "                   textposition=\"top center\", name=\"UNESCO Sites\",\n",
    "                   marker=dict(size=10, color='blue', opacity=0.7)),\n",
    "        row=1, col=3\n",
    "    )\n",
    "    \n",
    "    # 4. Art Periods Timeline\n",
    "    period_counts = art_df['period'].value_counts()\n",
    "    fig.add_trace(\n",
    "        go.Bar(x=period_counts.index, y=period_counts.values,\n",
    "               marker_color='lightcoral'),\n",
    "        row=2, col=1\n",
    "    )\n",
    "    \n",
    "    # 5. Geographic Risk Map\n",
    "    fig.add_trace(\n",
    "        go.Scattergeo(\n",
    "            lon=unesco_df['longitude'],\n",
    "            lat=unesco_df['latitude'],\n",
    "            text=unesco_df['site_name'],\n",
    "            mode='markers',\n",
    "            marker=dict(\n",
    "                size=unesco_df['cultural_significance'],\n",
    "                color=unesco_df['threat_level'].map({'Low': 1, 'Medium': 2, 'High': 3, 'Critical': 4}),\n",
    "                colorscale='Reds',\n",
    "                cmin=1, cmax=4,\n",
    "                line_color='rgb(40,40,40)',\n",
    "                line_width=0.5,\n",
    "                sizemode='diameter'\n",
    "            ),\n",
    "            hovertemplate='<b>%{text}</b><br>Threat Level: %{marker.color}<extra></extra>'\n",
    "        ),\n",
    "        row=2, col=2\n",
    "    )\n",
    "    \n",
    "    # 6. Destruction Events Over Time\n",
    "    fig.add_trace(\n",
    "        go.Scatter(x=destruction_df['year'], y=destruction_df['cultural_impact'],\n",
    "                   mode='markers', \n",
    "                   marker=dict(size=np.log10(destruction_df['artifacts_lost']+1)*3,\n",
    "                              color='red', opacity=0.7),\n",
    "                   text=destruction_df['event'],\n",
    "                   name=\"Destruction Events\"),\n",
    "        row=2, col=3\n",
    "    )\n",
    "    \n",
    "    # 7. Art Value Distribution\n",
    "    fig.add_trace(\n",
    "        go.Histogram(x=art_df['estimated_value_millions'], nbinsx=15,\n",
    "                    marker_color='gold', opacity=0.7),\n",
    "        row=3, col=1\n",
    "    )\n",
    "    \n",
    "    # 8. Threat Causes Analysis\n",
    "    cause_counts = destruction_df['cause'].value_counts()\n",
    "    fig.add_trace(\n",
    "        go.Pie(labels=cause_counts.index, values=cause_counts.values,\n",
    "               marker_colors=['red', 'orange', 'yellow', 'lightcoral']),\n",
    "        row=3, col=2\n",
    "    )\n",
    "    \n",
    "    # 9. Top Locations by Average Significance\n",
    "    avg_significance = unesco_df.groupby('country')['cultural_significance'].mean().sort_values(ascending=False).head(8)\n",
    "    fig.add_trace(\n",
    "        go.Bar(x=avg_significance.index, y=avg_significance.values,\n",
    "               marker_color='purple'),\n",
    "        row=3, col=3\n",
    "    )\n",
    "    \n",
    "    # Update layout\n",
    "    fig.update_layout(\n",
    "        height=1200,\n",
    "        title_text=\"🏛️ Digital Alexandria - Global Heritage Analysis Dashboard\",\n",
    "        title_x=0.5,\n",
    "        showlegend=False\n",
    "    )\n",
    "    \n",
    "    # Update geo subplot\n",
    "    fig.update_geos(projection_type=\"orthographic\", row=2, col=2)\n",
    "    \n",
    "    return fig\n",
    "\n",
    "# Create and display dashboard\n",
    "dashboard = create_heritage_dashboard()\n",
    "dashboard.show()\n",
    "\n",
    "print(\"\\n📊 Interactive dashboard created!\")\n",
    "print(\"Use controls to explore heritage patterns and threats.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 🚨 Threat Analysis and Risk Assessment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Advanced threat analysis\n",
    "def analyze_cultural_threats():\n",
    "    \"\"\"Comprehensive threat analysis for Claude Guardian input\"\"\"\n",
    "    \n",
    "    # Calculate threat statistics\n",
    "    total_sites = len(unesco_df)\n",
    "    critical_risk_sites = len(unesco_df[unesco_df['threat_level'] == 'Critical'])\n",
    "    high_risk_sites = len(unesco_df[unesco_df['threat_level'] == 'High'])\n",
    "    medium_risk_sites = len(unesco_df[unesco_df['threat_level'] == 'Medium'])\n",
    "    low_risk_sites = len(unesco_df[unesco_df['threat_level'] == 'Low'])\n",
    "    \n",
    "    print(\"🚨 CULTURAL HERITAGE THREAT ANALYSIS\")\n",
    "    print(\"=\" * 45)\n",
    "    print(f\"📊 Total UNESCO Sites Analyzed: {total_sites}\")\n",
    "    print(f\"🔴 Critical Risk Sites: {critical_risk_sites} ({critical_risk_sites/total_sites*100:.1f}%)\")\n",
    "    print(f\"🟠 High Risk Sites: {high_risk_sites} ({high_risk_sites/total_sites*100:.1f}%)\")\n",
    "    print(f\"🟡 Medium Risk Sites: {medium_risk_sites} ({medium_risk_sites/total_sites*100:.1f}%)\")\n",
    "    print(f\"🟢 Low Risk Sites: {low_risk_sites} ({low_risk_sites/total_sites*100:.1f}%)\")\n",
    "    \n",
    "    # Risk by region analysis\n",
    "    risk_by_region = unesco_df.groupby('region')['threat_level'].apply(lambda x: (x.isin(['High', 'Critical'])).sum()).sort_values(ascending=False)\n",
    "    \n",
    "    print(f\"\\n🌍 HIGH-RISK SITES BY REGION:\")\n",
    "    for region, count in risk_by_region.items():\n",
    "        if count > 0:\n",
    "            total_in_region = len(unesco_df[unesco_df['region'] == region])\n",
    "            percentage = count/total_in_region*100\n",
    "            print(f\"  • {region}: {count}/{total_in_region} sites ({percentage:.1f}%)\")\n",
    "    \n",
    "    # Identify most vulnerable high-value sites\n",
    "    vulnerable_sites = unesco_df[\n",
    "        (unesco_df['threat_level'].isin(['High', 'Critical'])) & \n",
    "        (unesco_df['cultural_significance'] >= 9.0)\n",
    "    ]\n",
    "    \n",
    "    print(f\"\\n⚠️ MOST VULNERABLE HIGH-VALUE SITES:\")\n",
    "    for _, site in vulnerable_sites.iterrows():\n",
    "        print(f\"  🚨 {site['site_name']} ({site['country']})\")\n",
    "        print(f\"     Significance: {site['cultural_significance']}/10, Threat: {site['threat_level']}\")\n",
    "    \n",
    "    return {\n",
    "        'total_sites': total_sites,\n",
    "        'critical_risk': critical_risk_sites,\n",
    "        'high_risk': high_risk_sites,\n",
    "        'vulnerable_high_value': len(vulnerable_sites),\n",
    "        'most_threatened_region': risk_by_region.index[0] if len(risk_by_region) > 0 else 'None'\n",
    "    }\n",
    "\n",
    "# Run threat analysis\n",
    "threat_stats = analyze_cultural_threats()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Destruction events analysis\n",
    "def analyze_destruction_patterns():\n",
    "    \"\"\"Analyze patterns in cultural destruction events\"\"\"\n",
    "    \n",
    "    fig = make_subplots(\n",
    "        rows=2, cols=2,\n",
    "        subplot_titles=(\n",
    "            'Destruction Events Timeline', 'Causes of Cultural Destruction',\n",
    "            'Cultural Impact vs Artifacts Lost', 'Geographic Distribution of Threats'\n",
    "        ),\n",
    "        specs=[\n",
    "            [{\"type\": \"scatter\"}, {\"type\": \"pie\"}],\n",
    "            [{\"type\": \"scatter\"}, {\"type\": \"scattergeo\"}]\n",
    "        ]\n",
    "    )\n",
    "    \n",
    "    # 1. Timeline of destruction events\n",
    "    fig.add_trace(\n",
    "        go.Scatter(\n",
    "            x=destruction_df['year'],\n",
    "            y=destruction_df['cultural_impact'],\n",
    "            mode='markers+text',\n",
    "            text=destruction_df['event'],\n",
    "            textposition='top center',\n",
    "            marker=dict(\n",
    "                size=np.sqrt(destruction_df['artifacts_lost'])/100,\n",
    "                color=destruction_df['cultural_impact'],\n",
    "                colorscale='Reds',\n",
    "                showscale=True,\n",
    "                colorbar=dict(title=\"Cultural Impact\")\n",
    "            ),\n",
    "            hovertemplate='<b>%{text}</b><br>Year: %{x}<br>Impact: %{y}/10<br>Artifacts Lost: %{marker.size}<extra></extra>'\n",
    "        ),\n",
    "        row=1, col=1\n",
    "    )\n",
    "    \n",
    "    # 2. Causes pie chart\n",
    "    cause_counts = destruction_df['cause'].value_counts()\n",
    "    fig.add_trace(\n",
    "        go.Pie(\n",
    "            labels=cause_counts.index,\n",
    "            values=cause_counts.values,\n",
    "            marker_colors=['red', 'orange', 'yellow', 'lightcoral', 'pink']\n",
    "        ),\n",
    "        row=1, col=2\n",
    "    )\n",
    "    \n",
    "    # 3. Impact vs Artifacts Lost\n",
    "    fig.add_trace(\n",
    "        go.Scatter(\n",
    "            x=destruction_df['artifacts_lost'],\n",
    "            y=destruction_df['cultural_impact'],\n",
    "            mode='markers+text',\n",
    "            text=destruction_df['event'],\n",
    "            textposition='top right',\n",
    "            marker=dict(size=10, color='red', opacity=0.7),\n",
    "            hovertemplate='<b>%{text}</b><br>Artifacts Lost: %{x}<br>Cultural Impact: %{y}/10<extra></extra>'\n",
    "        ),\n",
    "        row=2, col=1\n",
    "    )\n",
    "    \n",
    "    # 4. Geographic distribution\n",
    "    fig.add_trace(\n",
    "        go.Scattergeo(\n",
    "            lon=destruction_df['longitude'],\n",
    "            lat=destruction_df['latitude'],\n",
    "            text=destruction_df['event'],\n",
    "            mode='markers',\n",
    "            marker=dict(\n",
    "                size=destruction_df['cultural_impact']*2,\n",
    "                color='red',\n",
    "                opacity=0.8,\n",
    "                line_color='darkred',\n",
    "                line_width=1\n",
    "            ),\n",
    "            hovertemplate='<b>%{text}</b><br>Impact: %{marker.size}/20<extra></extra>'\n",
    "        ),\n",
    "        row=2, col=2\n",
    "    )\n",
    "    \n",
    "    fig.update_layout(\n",
    "        height=800,\n",
    "        title_text=\"💥 Cultural Destruction Events Analysis\",\n",
    "        title_x=0.5\n",
    "    )\n",
    "    \n",
    "    # Update x-axis for log scale on artifacts lost\n",
    "    fig.update_xaxes(type=\"log\", title_text=\"Artifacts Lost (log scale)\", row=2, col=1)\n",
    "    fig.update_yaxes(title_text=\"Cultural Impact (1-10)\", row=2, col=1)\n",
    "    \n",
    "    # Update geo\n",
    "    fig.update_geos(projection_type=\"natural earth\", row=2, col=2)\n",
    "    \n",
    "    return fig\n",
    "\n",
    "# Create destruction analysis\n",
    "destruction_analysis = analyze_destruction_patterns()\n",
    "destruction_analysis.show()\n",
    "\n",
    "print(\"\\n💥 Destruction pattern analysis complete!\")\n",
    "print(\"Key insights for Claude Guardian threat assessment generated.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 🎨 Art Authentication Insights"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Art collection analysis for authentication patterns\n",
    "def analyze_art_authentication_patterns():\n",
    "    \"\"\"Analyze art collections for authentication insights\"\"\"\n",
    "    \n",
    "    fig = make_subplots(\n",
    "        rows=2, cols=2,\n",
    "        subplot_titles=(\n",
    "            'Art Value vs Authenticity Score', 'Risk Factors by Period',\n",
    "            'Geographic Distribution of Collections', 'Age vs Value Correlation'\n",
    "        ),\n",
    "        specs=[\n",
    "            [{\"type\": \"scatter\"}, {\"type\": \"box\"}],\n",
    "            [{\"type\": \"bar\"}, {\"type\": \"scatter\"}]\n",
    "        ]\n",
    "    )\n",
    "    \n",
    "    # 1. Value vs Authenticity\n",
    "    fig.add_trace(\n",
    "        go.Scatter(\n",
    "            x=art_df['authenticity_score'],\n",
    "            y=art_df['estimated_value_millions'],\n",
    "            mode='markers+text',\n",
    "            text=art_df['artwork'],\n",
    "            textposition='top center',\n",
    "            marker=dict(\n",
    "                size=art_df['risk_factor']*100,\n",
    "                color=art_df['period'].astype('category').cat.codes,\n",
    "                colorscale='Viridis',\n",
    "                showscale=True,\n",
    "                colorbar=dict(title=\"Period\")\n",
    "            ),\n",
    "            hovertemplate='<b>%{text}</b><br>Authenticity: %{x}<br>Value: $%{y}M<br>Risk: %{marker.size}%<extra></extra>'\n",
    "        ),\n",
    "        row=1, col=1\n",
    "    )\n",
    "    \n",
    "    # 2. Risk factors by period\n",
    "    periods = art_df['period'].unique()\n",
    "    for period in periods:\n",
    "        period_data = art_df[art_df['period'] == period]['risk_factor']\n",
    "        fig.add_trace(\n",
    "            go.Box(y=period_data, name=period),\n",
    "            row=1, col=2\n",
    "        )\n",
    "    \n",
    "    # 3. Collections by location\n",
    "    location_counts = art_df['location'].value_counts()\n",
    "    fig.add_trace(\n",
    "        go.Bar(\n",
    "            x=location_counts.index,\n",
    "            y=location_counts.values,\n",
    "            marker_color='lightblue'\n",
    "        ),\n",
    "        row=2, col=1\n",
    "    )\n",
    "    \n",
    "    # 4. Age vs Value\n",
    "    current_year = 2024\n",
    "    art_df['age'] = current_year - art_df['year_created']\n",
    "    \n",
    "    fig.add_trace(\n",
    "        go.Scatter(\n",
    "            x=art_df['age'],\n",
    "            y=art_df['estimated_value_millions'],\n",
    "            mode='markers+text',\n",
    "            text=art_df['artist'],\n",
    "            textposition='top center',\n",
    "            marker=dict(\n",
    "                size=10,\n",
    "                color=art_df['authenticity_score'],\n",
    "                colorscale='RdYlGn',\n",
    "                cmin=0.9, cmax=1.0\n",
    "            ),\n",
    "            hovertemplate='<b>%{text}</b><br>Age: %{x} years<br>Value: $%{y}M<extra></extra>'\n",
    "        ),\n",
    "        row=2, col=2\n",
    "    )\n",
    "    \n",
    "    fig.update_layout(\n",
    "        height=800,\n",
    "        title_text=\"🎨 Art Authentication Analysis for Claude Guardian\",\n",
    "        title_x=0.5\n",
    "    )\n",
    "    \n",
    "    return fig\n",
    "\n",
    "# Create art analysis\n",
    "art_analysis = analyze_art_authentication_patterns()\n",
    "art_analysis.show()\n",
    "\n",
    "# Statistical insights\n",
    "correlation_value_auth = art_df['estimated_value_millions'].corr(art_df['authenticity_score'])\n",
    "correlation_age_value = art_df['age'].corr(art_df['estimated_value_millions'])\n",
    "correlation_risk_value = art_df['risk_factor'].corr(art_df['estimated_value_millions'])\n",
    "\n",
    "print(f\"\\n📊 ART AUTHENTICATION INSIGHTS:\")\n",
    "print(f\"💰 Value-Authenticity Correlation: {correlation_value_auth:.3f}\")\n",
    "print(f\"📅 Age-Value Correlation: {correlation_age_value:.3f}\")\n",
    "print(f\"⚠️ Risk-Value Correlation: {correlation_risk_value:.3f}\")\n",
    "print(f\"🎨 Average Authenticity Score: {art_df['authenticity_score'].mean():.3f}\")\n",
    "print(f\"💵 Average Artwork Value: ${art_df['estimated_value_millions'].mean():.1f}M\")\n",
    "print(f\"📈 Most Valuable Period: {art_df.groupby('period')['estimated_value_millions'].mean().idxmax()}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 🔍 Predictive Risk Modeling"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Simple risk assessment model\n",
    "def create_heritage_risk_model():\n",
    "    \"\"\"Create a simple risk assessment model for heritage sites\"\"\"\n",
    "    \n",
    "    from sklearn.preprocessing import LabelEncoder\n",
    "    from sklearn.ensemble import RandomForestClassifier\n",
    "    from sklearn.model_selection import train_test_split\n",
    "    from sklearn.metrics import classification_report, accuracy_score\n",
    "    \n",
    "    # Prepare features for modeling\n",
    "    model_data = unesco_df.copy()\n",
    "    \n",
    "    # Encode categorical variables\n",
    "    le_region = LabelEncoder()\n",
    "    le_country = LabelEncoder()\n",
    "    \n",
    "    model_data['region_encoded'] = le_region.fit_transform(model_data['region'])\n",
    "    model_data['country_encoded'] = le_country.fit_transform(model_data['country'])\n",
    "    \n",
    "    # Create additional features\n",
    "    model_data['age'] = 2024 - model_data['inscription_year']\n",
    "    model_data['visitor_pressure'] = model_data['visitor_count_millions'] / model_data['cultural_significance']\n",
    "    model_data['significance_category'] = pd.cut(model_data['cultural_significance'], \n",
    "                                                 bins=[0, 8, 9, 10], \n",
    "                                                 labels=['Medium', 'High', 'Exceptional'])\n",
    "    \n",
    "    # Features for modeling\n",
    "    features = ['region_encoded', 'country_encoded', 'cultural_significance', \n",
    "                'visitor_count_millions', 'age', 'visitor_pressure']\n",
    "    \n",
    "    X = model_data[features]\n",
    "    y = model_data['threat_level']\n",
    "    \n",
    "    # Split data\n",
    "    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)\n",
    "    \n",
    "    # Train model\n",
    "    rf_model = RandomForestClassifier(n_estimators=100, random_state=42)\n",
    "    rf_model.fit(X_train, y_train)\n",
    "    \n",
    "    # Predictions\n",
    "    y_pred = rf_model.predict(X_test)\n",
    "    accuracy = accuracy_score(y_test, y_pred)\n",
    "    \n",
    "    print(\"🤖 HERITAGE RISK PREDICTION MODEL\")\n",
    "    print(\"=\" * 40)\n",
    "    print(f\"📊 Model Accuracy: {accuracy:.2%}\")\n",
    "    print(f\"🎯 Features Used: {len(features)}\")\n",
    "    print(f\"📚 Training Samples: {len(X_train)}\")\n",
    "    print(f\"🧪 Test Samples: {len(X_test)}\")\n",
    "    \n",
    "    # Feature importance\n",
    "    feature_importance = pd.DataFrame({\n",
    "        'feature': features,\n",
    "        'importance': rf_model.feature_importances_\n",
    "    }).sort_values('importance', ascending=False)\n",
    "    \n",
    "    print(f\"\\n📈 FEATURE IMPORTANCE:\")\n",
    "    for _, row in feature_importance.iterrows():\n",
    "        print(f\"  • {row['feature']}: {row['importance']:.3f}\")\n",
    "    \n",
    "    # Visualize feature importance\n",
    "    fig = go.Figure(go.Bar(\n",
    "        x=feature_importance['importance'],\n",
    "        y=feature_importance['feature'],\n",
    "        orientation='h',\n",
    "        marker_color='orange'\n",
    "    ))\n",
    "    \n",
    "    fig.update_layout(\n",
    "        title=\"🎯 Heritage Risk Model - Feature Importance\",\n",
    "        xaxis_title=\"Importance Score\",\n",
    "        yaxis_title=\"Features\",\n",
    "        height=400\n",
    "    )\n",
    "    \n",
    "    fig.show()\n",
    "    \n",
    "    return rf_model, feature_importance, le_region, le_country\n",
    "\n",
    "# Create risk model\n",
    "risk_model, feature_importance, region_encoder, country_encoder = create_heritage_risk_model()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 📝 Key Insights and Recommendations for Claude Guardian"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate comprehensive insights summary\n",
    "def generate_insights_for_claude():\n",
    "    \"\"\"Generate key insights for Claude Guardian to consider\"\"\"\n",
    "    \n",
    "    insights = {\n",
    "        'dataset_overview': {\n",
    "            'total_unesco_sites': len(unesco_df),\n",
    "            'total_artworks': len(art_df),\n",
    "            'destruction_events': len(destruction_df),\n",
    "            'countries_covered': unesco_df['country'].nunique(),\n",
    "            'regions_analyzed': unesco_df['region'].nunique()\n",
    "        },\n",
    "        \n",
    "        'threat_assessment': {\n",
    "            'critical_risk_sites': len(unesco_df[unesco_df['threat_level'] == 'Critical']),\n",
    "            'high_risk_percentage': len(unesco_df[unesco_df['threat_level'].isin(['High', 'Critical'])]) / len(unesco_df) * 100,\n",
    "            'most_threatened_region': unesco_df.groupby('region')['threat_level'].apply(lambda x: (x.isin(['High', 'Critical'])).sum()).idxmax(),\n",
    "            'avg_cultural_significance': unesco_df['cultural_significance'].mean(),\n",
    "            'vulnerable_high_value_sites': len(unesco_df[(unesco_df['threat_level'].isin(['High', 'Critical'])) & (unesco_df['cultural_significance'] >= 9.0)])\n",
    "        },\n",
    "        \n",
    "        'destruction_patterns': {\n",
    "            'most_common_cause': destruction_df['cause'].mode()[0],\n",
    "            'avg_cultural_impact': destruction_df['cultural_impact'].mean(),\n",
    "            'total_artifacts_lost': destruction_df['artifacts_lost'].sum(),\n",
    "            'recent_events_2010s': len(destruction_df[destruction_df['year'] >= 2010]),\n",
    "            'highest_impact_event': destruction_df.loc[destruction_df['cultural_impact'].idxmax(), 'event']\n",
    "        },\n",
    "        \n",
    "        'art_authentication': {\n",
    "            'avg_authenticity_score': art_df['authenticity_score'].mean(),\n",
    "            'highest_value_artwork': art_df.loc[art_df['estimated_value_millions'].idxmax(), 'artwork'],\n",
    "            'most_valuable_period': art_df.groupby('period')['estimated_value_millions'].mean().idxmax(),\n",
    "            'avg_risk_factor': art_df['risk_factor'].mean(),\n",
    "            'correlation_value_authenticity': art_df['estimated_value_millions'].corr(art_df['authenticity_score'])\n",
    "        }\n",
    "    }\n",
    "    \n",
    "    return insights\n",
    "\n",
    "# Generate insights\n",
    "claude_insights = generate_insights_for_claude()\n",
    "\n",
    "print(\"🤖 INSIGHTS SUMMARY FOR CLAUDE GUARDIAN\")\n",
    "print(\"=\" * 50)\n",
    "\n",
    "print(f\"\\n📊 DATASET OVERVIEW:\")\n",
    "print(f\"🏛️ UNESCO Sites Analyzed: {claude_insights['dataset_overview']['total_unesco_sites']}\")\n",
    "print(f\"🎨 Artworks in Database: {claude_insights['dataset_overview']['total_artworks']}\")\n",
    "print(f\"💥 Destruction Events Recorded: {claude_insights['dataset_overview']['destruction_events']}\")\n",
    "print(f\"🌍 Countries Covered: {claude_insights['dataset_overview']['countries_covered']}\")\n",
    "print(f\"🗺️ Regions Analyzed: {claude_insights['dataset_overview']['regions_analyzed']}\")\n",
    "\n",
    "print(f\"\\n🚨 THREAT ASSESSMENT:\")\n",
    "print(f\"🔴 Critical Risk Sites: {claude_insights['threat_assessment']['critical_risk_sites']}\")\n",
    "print(f\"⚠️ High Risk Percentage: {claude_insights['threat_assessment']['high_risk_percentage']:.1f}%\")\n",
    "print(f\"🌍 Most Threatened Region: {claude_insights['threat_assessment']['most_threatened_region']}\")\n",
    "print(f\"⭐ Average Cultural Significance: {claude_insights['threat_assessment']['avg_cultural_significance']:.1f}/10\")\n",
    "print(f\"💎 Vulnerable High-Value Sites: {claude_insights['threat_assessment']['vulnerable_high_value_sites']}\")\n",
    "\n",
    "print(f\"\\n💥 DESTRUCTION PATTERNS:\")\n",
    "print(f\"🎯 Most Common Cause: {claude_insights['destruction_patterns']['most_common_cause']}\")\n",
    "print(f\"📊 Average Cultural Impact: {claude_insights['destruction_patterns']['avg_cultural_impact']:.1f}/10\")\n",
    "print(f\"📚 Total Artifacts Lost: {claude_insights['destruction_patterns']['total_artifacts_lost']:,}\")\n",
    "print(f\"📅 Recent Events (2010+): {claude_insights['destruction_patterns']['recent_events_2010s']}\")\n",
    "print(f\"💔 Highest Impact Event: {claude_insights['destruction_patterns']['highest_impact_event']}\")\n",
    "\n",
    "print(f\"\\n🎨 ART AUTHENTICATION:\")\n",
    "print(f\"✅ Average Authenticity Score: {claude_insights['art_authentication']['avg_authenticity_score']:.3f}\")\n",
    "print(f\"💰 Highest Value Artwork: {claude_insights['art_authentication']['highest_value_artwork']}\")\n",
    "print(f\"🏆 Most Valuable Period: {claude_insights['art_authentication']['most_valuable_period']}\")\n",
    "print(f\"⚠️ Average Risk Factor: {claude_insights['art_authentication']['avg_risk_factor']:.3f}\")\n",
    "print(f\"📈 Value-Authenticity Correlation: {claude_insights['art_authentication']['correlation_value_authenticity']:.3f}\")\n",
    "\n",
    "print(f\"\\n🎯 RECOMMENDATIONS FOR CLAUDE GUARDIAN:\")\n",
    "print(f\"1. 🛡️ Prioritize monitoring of {claude_insights['threat_assessment']['critical_risk_sites']} critical-risk sites\")\n",
    "print(f\"2. 🌍 Focus threat detection efforts on {claude_insights['threat_assessment']['most_threatened_region']} region\")\n",
    "print(f\"3. 🚨 Develop early warning systems for {claude_insights['destruction_patterns']['most_common_cause'].lower()} threats\")\n",
    "print(f\"4. 🎨 Enhance authentication protocols for {claude_insights['art_authentication']['most_valuable_period']} period artworks\")\n",
    "print(f\"5. 📊 Implement risk-based monitoring with ML model accuracy of {accuracy:.0%}\")\n",
    "\n",
    "print(f\"\\n🌟 CLAUDE GUARDIAN MISSION PRIORITIES:\")\n",
    "print(f\"• **Constitutional Authenticity**: Ensure ethical decision-making in cultural preservation\")\n",
    "print(f\"• **Threat Prevention**: Predict and prevent cultural destruction before it occurs\")\n",
    "print(f\"• **Global Equity**: Protect heritage from all cultures without bias\")\n",
    "print(f\"• **Expert Collaboration**: Work with human experts for complex decisions\")\n",
    "print(f\"• **Transparent Reasoning**: Explain every authentication and threat assessment\")\n",
    "\n",
    "# Save insights for Claude Guardian integration\n",
    "with open('../data/processed/claude_guardian_insights.json', 'w') as f:\n",
    "    json.dump(claude_insights, f, indent=2, default=str)\n",
    "\n",
    "print(f\"\\n💾 Insights saved for Claude Guardian integration\")\n",
    "print(f\"📂 File: ../data/processed/claude_guardian_insights.json\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## 🌟 Conclusion\n",
    "\n",
    "This exploratory analysis reveals critical patterns in global cultural heritage preservation:\n",
    "\n",
    "### 🔍 **Key Findings**\n",
    "- **Geographic Vulnerability**: Certain regions show significantly higher threat levels\n",
    "- **Destruction Patterns**: War and extremism are primary causes of cultural loss\n",
    "- **Authentication Challenges**: High-value artworks require sophisticated verification\n",
    "- **Predictive Potential**: ML models can identify at-risk sites with good accuracy\n",
    "\n",
    "### 🤖 **For Claude Guardian**\n",
    "These insights provide the foundation for Claude's constitutional decision-making in cultural heritage preservation. The data reveals where human wisdom must be combined with AI capability to protect our shared cultural legacy.\n",
    "\n",
    "### 🚀 **Next Steps**\n",
    "1. **Notebook 2**: Build ML models for authenticity detection\n",
    "2. **Claude Integration**: Implement constitutional AI decision-making\n",
    "3. **Real-time Monitoring**: Deploy threat detection systems\n",
    "4. **Expert Collaboration**: Integrate human expertise with AI insights\n",
    "\n",
    "---\n",
    "\n",
    "*\"In the vast cosmos of human knowledge, every cultural artifact is a star. Claude Guardian ensures these stars continue to shine for future generations.\"*\n",
    "\n",
    "**🛰️ Ready for Claude-Star orbital mechanics calculations! 🛰️**"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}

NameError: name 'null' is not defined