# OpenDota Premium Ingestion (Colab)

This notebook prepares a Colab runtime to ingest OpenDota premium matches, load them into Supabase, and export XML archives. It is designed to run end-to-end without modifying the repository source code.

## 1) Install dependencies

Run this cell first when starting a new Colab session. It installs the core dependencies for the ingestion CLI, Supabase access, and XML export utilities.

In [None]:
!pip -q install --upgrade pip
!pip -q install supabase python-dotenv pandas lxml rich google-colab google-auth google-auth-oauthlib

## 2) Configure secrets (OpenDota + Supabase)

The block below prompts for sensitive values at runtime so they are not stored in the notebook:
- `OPENDOTA_PREMIUM_KEY`: premium API token.
- `SUPABASE_URL` and `SUPABASE_SERVICE_ROLE_KEY`: Supabase project URL and service role key.
- `SUPABASE_STORAGE_BUCKET`: storage bucket for XML archives (create one beforehand in Supabase storage).

The values are persisted in environment variables for downstream cells.

In [None]:
import os
from getpass import getpass

# Prompt once per session (rerun if you need to change values)
OPENDOTA_PREMIUM_KEY = getpass('Enter your OpenDota PREMIUM API key: ')
SUPABASE_URL = getpass('Enter your Supabase URL: ')
SUPABASE_SERVICE_ROLE_KEY = getpass('Enter your Supabase service role key: ')
SUPABASE_STORAGE_BUCKET = getpass('Enter the Supabase storage bucket for XML exports: ')

os.environ['OPENDOTA_PREMIUM_KEY'] = OPENDOTA_PREMIUM_KEY
os.environ['SUPABASE_URL'] = SUPABASE_URL
os.environ['SUPABASE_SERVICE_ROLE_KEY'] = SUPABASE_SERVICE_ROLE_KEY
os.environ['SUPABASE_STORAGE_BUCKET'] = SUPABASE_STORAGE_BUCKET
print('✅ Secrets loaded into environment variables.')

## 3) Run the ingestion CLI

This section assumes the repository has an ingestion CLI entry point capable of pulling premium matches from OpenDota and writing them into Supabase. Update `INGESTION_CLI` to the correct script/module if it differs in your fork.

The example below passes the premium key along with Supabase credentials so the CLI can write directly to your database and optionally push binary assets to storage.

In [None]:
import os, sys, subprocess
from pathlib import Path

# Adjust the path/module below to match your ingestion entry point
# Example: Path('/content/Msc-Prometheus/03_INFRAESTRUTURA/opendota_ingest.py')
INGESTION_CLI = Path('/content/Msc-Prometheus/opendota_ingest.py')

if not INGESTION_CLI.exists():
    raise FileNotFoundError(f'Update INGESTION_CLI to your ingestion script. Not found: {INGESTION_CLI}')

base_env = os.environ.copy()
base_env.update({
    'OPENDOTA_PREMIUM_KEY': os.environ['OPENDOTA_PREMIUM_KEY'],
    'SUPABASE_URL': os.environ['SUPABASE_URL'],
    'SUPABASE_SERVICE_ROLE_KEY': os.environ['SUPABASE_SERVICE_ROLE_KEY'],
    'SUPABASE_STORAGE_BUCKET': os.environ['SUPABASE_STORAGE_BUCKET'],
})

cli_cmd = [
    sys.executable,
    str(INGESTION_CLI),
    '--api-key', base_env['OPENDOTA_PREMIUM_KEY'],
    '--supabase-url', base_env['SUPABASE_URL'],
    '--supabase-key', base_env['SUPABASE_SERVICE_ROLE_KEY'],
    '--storage-bucket', base_env['SUPABASE_STORAGE_BUCKET'],
    '--max-matches', '500',            # tweak as needed
    '--min-match-id', '0',             # optionally resume from a specific match
]

print('Running ingestion CLI...')
subprocess.run(cli_cmd, check=True, env=base_env)
print('
✅ Ingestion finished')

## 4) Verify Supabase row counts

Use the Supabase Python client to confirm how many matches and players were ingested. Replace table names if your schema differs.

In [None]:
from supabase import create_client, Client

supabase: Client = create_client(os.environ['SUPABASE_URL'], os.environ['SUPABASE_SERVICE_ROLE_KEY'])

def table_count(table_name: str) -> int:
    response = supabase.table(table_name).select('*', count='exact').limit(1).execute()
    return response.count or 0

match_count = table_count('matches')
player_count = table_count('players')

print(f"Matches in Supabase: {match_count}")
print(f"Players in Supabase: {player_count}")

## 5) Preview recent matches

Pull a small sample of the most recent matches to validate the ingestion contents without downloading the entire dataset.

In [None]:
import pandas as pd

sample = supabase.table('matches').select('*').order('match_id', desc=True).limit(5).execute()
frame = pd.DataFrame(sample.data)

print('Latest matches:')
frame.head()


## 6) Export XML archives

Create XML archives from the ingested matches and push them either to Colab Drive or directly into Supabase storage. The snippet writes `matches.xml` locally, mirrors it to Drive (if mounted), and uploads it to the configured bucket.

In [None]:
from io import BytesIO

# Optional: mount Google Drive in Colab for a persistent copy
try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=False)
    drive_path = Path('/content/drive/MyDrive/opendota')
    drive_path.mkdir(parents=True, exist_ok=True)
except ModuleNotFoundError:
    drive_path = None
    print('Running outside Colab; skipping Drive mount.')

xml_path = Path('/content/matches.xml')
xml_bucket_key = 'exports/matches.xml'

xml_df = pd.DataFrame(frame) if 'frame' in locals() else pd.DataFrame(sample.data)
xml_bytes = xml_df.to_xml(index=False).encode('utf-8')
xml_path.write_bytes(xml_bytes)
print(f'Saved XML locally to: {xml_path}')

if drive_path:
    drive_file = drive_path / 'matches.xml'
    drive_file.write_bytes(xml_bytes)
    print(f'Copied XML to Drive: {drive_file}')

# Upload to Supabase storage
storage = supabase.storage()
upload_response = storage.from_(os.environ['SUPABASE_STORAGE_BUCKET']).upload(
    file=BytesIO(xml_bytes),
    path=xml_bucket_key,
    file_options={'content-type': 'application/xml'}
)
print('Supabase storage upload response:')
print(upload_response)

## 7) (Optional) Download XML archive from storage

If you need to verify the uploaded file, fetch it back from Supabase storage and inspect the first few rows.

In [None]:
download = storage.from_(os.environ['SUPABASE_STORAGE_BUCKET']).download(xml_bucket_key)
roundtrip_df = pd.read_xml(BytesIO(download))
roundtrip_df.head()
