# BigQuery Connection Test

This notebook tests the connection to Google BigQuery and verifies access to the Bitcoin blockchain dataset.

## What this notebook does:
1. Loads environment variables
2. Tests BigQuery authentication
3. Queries the Bitcoin blockchain
4. Verifies data access

**Expected runtime**: < 1 minute

## Setup

In [None]:
# Import libraries
import os
from pathlib import Path
from google.cloud import bigquery
import pandas as pd
from dotenv import load_dotenv

print("‚úÖ Libraries imported successfully")

In [None]:
# Load environment variables from .env file
load_dotenv()

# Set credentials path
credentials_path = os.getenv('GOOGLE_APPLICATION_CREDENTIALS')
project_id = os.getenv('GCP_PROJECT_ID')

print(f"Credentials path: {credentials_path}")
print(f"Project ID: {project_id}")

# Verify credentials file exists
if os.path.exists(credentials_path):
    print("‚úÖ Credentials file found")
else:
    print("‚ùå Credentials file NOT found")
    print(f"   Looking for: {credentials_path}")

## Test 1: BigQuery Client Connection

In [None]:
# Create BigQuery client
try:
    client = bigquery.Client()
    print(f"‚úÖ BigQuery client created successfully")
    print(f"   Project: {client.project}")
except Exception as e:
    print(f"‚ùå Error creating BigQuery client: {e}")

## Test 2: Query Bitcoin Blockchain Data

In [None]:
# Query the last 10 Bitcoin blocks
query = """
SELECT 
    number as block_number,
    timestamp as block_timestamp,
    transaction_count,
    size as block_size_bytes
FROM `bigquery-public-data.crypto_bitcoin.blocks`
ORDER BY number DESC
LIMIT 10
"""

print("Executing query...")
try:
    df = client.query(query).to_dataframe()
    print(f"‚úÖ Query executed successfully")
    print(f"   Retrieved {len(df)} blocks")
except Exception as e:
    print(f"‚ùå Query failed: {e}")
    df = None

In [None]:
# Display results
if df is not None:
    print("\nüìä Latest Bitcoin Blocks:")
    display(df)
    
    print("\nüìà Statistics:")
    print(f"   Average transactions per block: {df['transaction_count'].mean():.0f}")
    print(f"   Average block size: {df['block_size_bytes'].mean() / 1_000_000:.2f} MB")
    print(f"   Latest block: #{df['block_number'].iloc[0]:,}")

## Test 3: Check Dataset Access

In [None]:
# List all tables in the Bitcoin dataset
dataset_id = "bigquery-public-data.crypto_bitcoin"

try:
    dataset = client.get_dataset(dataset_id)
    tables = list(client.list_tables(dataset))
    
    print(f"‚úÖ Dataset access successful")
    print(f"\nüìö Available tables in {dataset_id}:")
    for table in tables:
        print(f"   - {table.table_id}")
except Exception as e:
    print(f"‚ùå Cannot access dataset: {e}")

## Test 4: Verify Key Tables for Project

In [None]:
# Count rows in each key table (with limit to avoid long queries)
tables_to_check = ['blocks', 'transactions', 'inputs', 'outputs']

print("üîç Checking key tables...\n")

for table_name in tables_to_check:
    query = f"""
    SELECT COUNT(*) as row_count
    FROM `bigquery-public-data.crypto_bitcoin.{table_name}`
    WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
    """
    
    try:
        result = client.query(query).to_dataframe()
        count = result['row_count'].iloc[0]
        print(f"‚úÖ {table_name:15} - {count:,} rows (last 24h)")
    except Exception as e:
        print(f"‚ùå {table_name:15} - Error: {e}")

## Summary

If all tests passed (‚úÖ), you're ready to start the Bitcoin Whale Intelligence analysis!

### Next Steps:
1. Open `01_data_exploration.ipynb` to start exploring the data
2. Learn about multi-input transactions for entity clustering
3. Begin building the whale detection pipeline

### Troubleshooting:
If any test failed (‚ùå):
- Check that `.env` file exists and has correct paths
- Verify credentials file exists in `.credentials/bigquery-credentials.json`
- Make sure you ran `pip install google-cloud-bigquery python-dotenv`
- Restart the kernel and try again

In [None]:
# Final status check
print("\n" + "="*50)
print("üéâ CONNECTION TEST COMPLETE")
print("="*50)
print("\nYou now have access to:")
print("  üêã 800+ million Bitcoin transactions")
print("  üìä Complete blockchain since 2009")
print("  üîó All inputs & outputs for clustering")
print("\nüöÄ Ready to find the whales!")