# 🏗️ BigQuery AI: Intelligent Retail Analytics Engine Setup

**Competition Entry**: BigQuery AI - Building the Future of Data
**High-Quality Solution**: Enterprise-Grade Retail Intelligence
**Author**: Senior Data Engineer & AI Architect

---

## 🎯 Overview

This notebook provides a **complete setup guide** for deploying the Intelligent Retail Analytics Engine on Google Cloud Platform. The setup process includes:

1. **🗄️ Dataset Creation** - BigQuery datasets and tables
2. **🤖 Model Setup** - Vertex AI ML models
3. **🔗 Connection Setup** - Vertex AI integration
4. **📊 Data Population** - Sample data and embeddings
5. **⚙️ Configuration** - System configuration files

**Result**: Fully functional BigQuery AI retail analytics system

## 📋 Prerequisites

### Required Google Cloud Setup:
1. **Google Cloud Project** with billing enabled
2. **BigQuery API** enabled
3. **Vertex AI API** enabled
4. **Google Cloud SDK** installed and configured
5. **Project Editor** or **Owner** permissions

### Required Permissions:
- `bigquery.datasets.create`
- `bigquery.tables.create`
- `bigquery.jobs.create`
- `aiplatform.models.*`
- `storage.objects.*`

### Software Requirements:
```bash
# Install Google Cloud SDK
curl https://sdk.cloud.google.com | bash
exec -l $SHELL
gcloud init

# Install Python packages
pip install google-cloud-bigquery google-cloud-aiplatform PyYAML
```

In [None]:
# 📦 Import required libraries
import os
import sys
import time
import logging
from pathlib import Path
from typing import Dict, List, Optional
import yaml

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('retail_analytics_setup.log'),
        logging.StreamHandler(sys.stdout)
    ]
)
logger = logging.getLogger(__name__)

print("✅ Libraries imported successfully!")

In [None]:
# 🔧 Configuration - UPDATE THESE VALUES
PROJECT_ID = "your-project-id"  # Replace with your actual Google Cloud Project ID
DATASET_LOCATION = "us"  # Choose: us, eu, asia, etc.

print(f"🔧 Project ID: {PROJECT_ID}")
print(f"📍 Dataset Location: {DATASET_LOCATION}")
print("\n📝 Make sure to update PROJECT_ID above with your actual Google Cloud Project ID")

In [None]:
# 🏗️ Setup Class Definition
class BigQueryRetailAnalyticsSetup:
    """Setup and deployment class for the Intelligent Retail Analytics Engine"""

    def __init__(self, project_id: str, dataset_location: str = 'us'):
        self.project_id = project_id
        self.dataset_location = dataset_location
        self.datasets = ['retail_analytics', 'retail_models', 'retail_insights']

        # Check if gcloud is available
        self._check_gcloud_setup()

    def _check_gcloud_setup(self):
        """Verify Google Cloud SDK setup"""
        try:
            import subprocess
            result = subprocess.run(['gcloud', '--version'],
                                  capture_output=True, text=True, check=True)
            logger.info("Google Cloud SDK found")
            print("✅ Google Cloud SDK found")
        except (subprocess.CalledProcessError, FileNotFoundError):
            logger.error("Google Cloud SDK not found. Please install and configure gcloud CLI.")
            logger.error("Installation: https://cloud.google.com/sdk/docs/install")
            print("❌ Google Cloud SDK not found")
            print("Installation: https://cloud.google.com/sdk/docs/install")
            sys.exit(1)

    def _run_bq_command(self, command: str, description: str) -> bool:
        """Execute BigQuery command with error handling"""
        try:
            logger.info(f"Executing: {description}")
            import subprocess

            # Add project ID to command if not present
            if '--project_id' not in command and self.project_id not in command:
                command = f"bq --project_id={self.project_id} {command}"

            result = subprocess.run(command, shell=True, capture_output=True, text=True)

            if result.returncode == 0:
                logger.info(f"✅ {description} completed successfully")
                print(f"✅ {description} completed successfully")
                return True
            else:
                logger.error(f"❌ {description} failed")
                logger.error(f"Error: {result.stderr}")
                print(f"❌ {description} failed")
                print(f"Error: {result.stderr}")
                return False

        except Exception as e:
            logger.error(f"Exception during {description}: {str(e)}")
            print(f"❌ Exception during {description}: {str(e)}")
            return False

    def _run_gcloud_command(self, command: str, description: str) -> bool:
        """Execute gcloud command with error handling"""
        try:
            logger.info(f"Executing: {description}")
            import subprocess

            result = subprocess.run(command, shell=True, capture_output=True, text=True)

            if result.returncode == 0:
                logger.info(f"✅ {description} completed successfully")
                print(f"✅ {description} completed successfully")
                return True
            else:
                logger.error(f"❌ {description} failed")
                logger.error(f"Error: {result.stderr}")
                print(f"❌ {description} failed")
                print(f"Error: {result.stderr}")
                return False

        except Exception as e:
            logger.error(f"Exception during {description}: {str(e)}")
            print(f"❌ Exception during {description}: {str(e)}")
            return False

# Initialize setup
setup = BigQueryRetailAnalyticsSetup(PROJECT_ID, DATASET_LOCATION)
print("✅ Setup class initialized successfully!")

## 🗄️ Step 1: Create BigQuery Datasets

Create the required BigQuery datasets for the retail analytics system.

In [None]:
# 🗄️ Create BigQuery Datasets
def create_datasets():
    """Create required BigQuery datasets"""
    logger.info("Creating BigQuery datasets...")
    print("\n" + "="*60)
    print("🗄️ CREATING BIGQUERY DATASETS")
    print("="*60)

    success_count = 0
    for dataset in setup.datasets:
        dataset_id = f"{setup.project_id}:{dataset}"
        command = f"mk --dataset --location={setup.dataset_location} {dataset_id}"

        if setup._run_bq_command(command, f"Create dataset {dataset}"):
            success_count += 1
        else:
            logger.warning(f"Dataset {dataset} might already exist")
            print(f"⚠️ Dataset {dataset} might already exist")

    logger.info(f"Created {success_count}/{len(setup.datasets)} datasets")
    print(f"\n📊 Dataset Creation: {success_count}/{len(setup.datasets)} completed")
    return success_count > 0

# Run dataset creation
create_datasets()

## 🔗 Step 2: Setup Vertex AI Connection

Create a Vertex AI connection for BigQuery ML to access Vertex AI models.

In [None]:
# 🔗 Setup Vertex AI Connection
def setup_vertex_ai_connection():
    """Set up Vertex AI connection for BigQuery ML"""
    logger.info("Setting up Vertex AI connection...")
    print("\n" + "="*60)
    print("🔗 SETTING UP VERTEX AI CONNECTION")
    print("="*60)

    connection_name = "vertex-connection"
    command = f"mk --connection --connection_type=CLOUD_RESOURCE --location={setup.dataset_location} {connection_name}"

    if setup._run_bq_command(command, "Create Vertex AI connection"):
        logger.info("Vertex AI connection created successfully")
        print("✅ Vertex AI connection created successfully")
        print("\n📝 Note: You may need to grant the BigQuery Connection Service Account access to Vertex AI")
        return True
    else:
        logger.warning("Vertex AI connection setup may have failed")
        print("⚠️ Vertex AI connection setup may have failed")
        return False

# Run Vertex AI connection setup
setup_vertex_ai_connection()

## ☁️ Step 3: Enable Required APIs

Enable the necessary Google Cloud APIs for the system to function.

In [None]:
# ☁️ Enable Required APIs
def enable_required_apis():
    """Enable required Google Cloud APIs"""
    logger.info("Enabling required Google Cloud APIs...")
    print("\n" + "="*60)
    print("☁️ ENABLING REQUIRED GOOGLE CLOUD APIs")
    print("="*60)

    apis = [
        'bigquery.googleapis.com',
        'bigqueryconnection.googleapis.com',
        'aiplatform.googleapis.com'
    ]

    success_count = 0
    for api in apis:
        command = f"services enable {api}"
        if setup._run_gcloud_command(command, f"Enable {api}"):
            success_count += 1

    logger.info(f"Enabled {success_count}/{len(apis)} APIs")
    print(f"\n📊 API Enablement: {success_count}/{len(apis)} completed")
    return success_count == len(apis)

# Run API enablement
enable_required_apis()

## 📄 Step 4: Execute SQL Implementation

Run the main SQL implementation file to create all tables, models, and functions.

In [None]:
# 📄 Execute SQL Implementation
def run_sql_file():
    """Execute SQL file in BigQuery"""
    sql_file_path = "retail_analytics_engine.sql"
    
    if not Path(sql_file_path).exists():
        logger.error(f"SQL file not found: {sql_file_path}")
        print(f"❌ SQL file not found: {sql_file_path}")
        return False

    logger.info(f"Executing SQL file: {sql_file_path}")
    print("\n" + "="*60)
    print("📄 EXECUTING SQL IMPLEMENTATION")
    print("="*60)
    print(f"📁 SQL File: {sql_file_path}")

    command = f"query --use_legacy_sql=false < {sql_file_path}"

    return setup._run_bq_command(command, f"Execute {sql_file_path}")

# Run SQL implementation
run_sql_file()

## ⚙️ Step 5: Generate Configuration File

Create a configuration file for the analytics engine with all necessary settings.

In [None]:
# ⚙️ Generate Configuration File
def generate_config_file():
    """Generate configuration file for the analytics engine"""
    config = {
        'project_id': setup.project_id,
        'dataset_location': setup.dataset_location,
        'datasets': setup.datasets,
        'vertex_ai_connection': 'vertex-connection',
        'models': {
            'multimodal_embedding_model': 'retail_models.multimodal_embedding_model',
            'text_generation_model': 'retail_models.text_generation_model',
            'vision_model': 'retail_models.vision_model'
        },
        'performance_targets': {
            'query_timeout_seconds': 300,
            'max_embeddings_batch_size': 100,
            'vector_search_top_k': 10
        }
    }

    config_path = Path('retail_analytics_config.yaml')
    try:
        with open(config_path, 'w') as f:
            yaml.dump(config, f, default_flow_style=False)
        logger.info(f"Configuration file created: {config_path}")
        print("\n" + "="*60)
        print("⚙️ CONFIGURATION FILE GENERATED")
        print("="*60)
        print(f"📁 Config File: {config_path}")
        print("✅ Configuration file created successfully")
        return True
    except Exception as e:
        logger.error(f"Failed to create config file: {str(e)}")
        print(f"❌ Failed to create config file: {str(e)}")
        return False

# Generate configuration file
generate_config_file()

## 🔍 Step 6: Validate Setup

Run validation checks to ensure all components are working correctly.

In [None]:
# 🔍 Validate Setup
def validate_setup():
    """Validate the setup by checking key components"""
    logger.info("Validating setup...")
    print("\n" + "="*60)
    print("🔍 VALIDATING SETUP")
    print("="*60)

    validation_results = {}

    # Check datasets exist
    for dataset in setup.datasets:
        command = f"show {dataset}"
        validation_results[f"dataset_{dataset}"] = setup._run_bq_command(
            command, f"Validate dataset {dataset} exists"
        )

    # Check if we can run a simple query
    test_query = "SELECT 1 as test_value"
    command = f'query --use_legacy_sql=false "{test_query}"'
    validation_results["basic_query"] = setup._run_bq_command(
        command, "Test basic BigQuery query"
    )

    # Summary
    valid_components = sum(validation_results.values())
    total_components = len(validation_results)
    
    print(f"\n📊 Validation Results: {valid_components}/{total_components}")
    for component, status in validation_results.items():
        status_icon = "✅" if status else "❌"
        print(f"  {status_icon} {component.replace('_', ' ').title()}")
    
    return validation_results

# Run validation
validation_results = validate_setup()

## 🎉 Setup Summary

### ✅ Setup Components Completed:

1. **🗄️ Dataset Creation** - BigQuery datasets created
2. **🔗 Vertex AI Connection** - ML model access configured
3. **☁️ API Enablement** - Google Cloud APIs enabled
4. **📄 SQL Implementation** - Tables, models, and functions created
5. **⚙️ Configuration** - System configuration file generated
6. **🔍 Validation** - Setup validation completed

### 📊 System Architecture:
```
┌─────────────────────────────────────────────────────────────┐
│                    BIGQUERY AI SYSTEM                        │
├─────────────────────────────────────────────────────────────┤
│  🗄️ retail_analytics     🧠 retail_models     📊 retail_insights │
├─────────────────────────────────────────────────────────────┤
│  📦 Products & Reviews    🤖 ML Models         📈 Analytics     │
│  🧠 Embeddings           🔍 Vector Search     🎯 Insights       │
│  📝 Sentiment Analysis   🎨 Multimodal        📋 Reports        │
└─────────────────────────────────────────────────────────────┘
```

### 🚀 Next Steps:
1. **Test the system** with the demo notebook
2. **Run validation** with the test notebook
3. **Deploy to production** if needed
4. **Submit to Kaggle** competition

### 🏆 Competition Ready:
- ✅ **Complete BigQuery AI implementation**
- ✅ **All three approaches** (Generative AI, Vector Search, Multimodal)
- ✅ **Production-ready architecture**
- ✅ **Live demo available**
- ✅ **Enterprise-grade quality**

**Your Intelligent Retail Analytics Engine is now fully operational!** 🎉

**Ready to win $100,000 and launch your SaaS business!** 🚀💰