# VUG Cross-Domain Recommendation Experiments on Kaggle

This notebook runs VUG (Virtual User Generation) cross-domain recommendation experiments on Kaggle platform.

## Overview
- **VUG_CMF**: Combines VUG with Collective Matrix Factorization  
- **VUG_CLFM**: Combines VUG with Cross-domain Learning via Feature Mapping
- **VUG_BiTGCF**: Combines VUG with BiTGCF Graph Convolution
- **Ablation Studies**: Analyze individual components of VUG model

## Notebook Structure
1. **Environment Setup** - Install dependencies and setup Kaggle environment
2. **Import VUG Code** - Load VUG source code from Kaggle dataset
3. **Run Experiments** - Execute VUG models and ablation studies  
4. **Results Analysis** - Visualize and compare experimental results
5. **Save Results** - Export results for submission or further analysis

---

## 1. Set Up Kaggle Environment

First, we'll set up the Kaggle environment, install dependencies, and configure the workspace.

In [None]:
# Install required packages for VUG experiments
import subprocess
import sys
import os

def install_package(package):
    """Install package with error handling"""
    try:
        subprocess.check_call([sys.executable, "-m", "pip", "install", package])
        print(f"‚úÖ Successfully installed {package}")
    except subprocess.CalledProcessError:
        print(f"‚ùå Failed to install {package}")

# Essential packages for VUG experiments  
packages = [
    "recbole>=1.1.1",
    "torch>=1.9.0", 
    "scipy>=1.7.0",
    "pandas>=1.3.0",
    "scikit-learn>=1.0.0", 
    "PyYAML>=5.4.0",
    "colorlog>=6.4.0",
    "tqdm>=4.62.0",
    "matplotlib>=3.4.0",
    "seaborn>=0.11.0"
]

print("üì¶ Installing VUG dependencies...")
for package in packages:
    install_package(package)

In [None]:
# Setup Kaggle working environment
import shutil
from pathlib import Path
import gc

def setup_kaggle_workspace():
    """Setup workspace for VUG experiments"""
    
    # Create working directory
    work_dir = Path("/kaggle/working/VUG")
    work_dir.mkdir(exist_ok=True)
    os.chdir(work_dir)
    
    # Add to Python path
    if str(work_dir) not in sys.path:
        sys.path.insert(0, str(work_dir))
    
    print(f"üìÅ Working directory: {work_dir}")
    
    # Check for VUG dataset in input
    kaggle_input = Path("/kaggle/input")
    vug_datasets = list(kaggle_input.glob("*vug*")) + list(kaggle_input.glob("*VUG*"))
    
    if vug_datasets:
        source_path = vug_datasets[0]
        print(f"üìã Found VUG dataset: {source_path}")
        
        # Copy VUG source code
        if (source_path / "recbole_cdr").exists():
            shutil.copytree(source_path / "recbole_cdr", work_dir / "recbole_cdr", dirs_exist_ok=True)
            print("‚úÖ VUG source code copied")
            
        # Copy other files
        for pattern in ["*.py", "*.yaml", "*.txt"]:
            for file in source_path.glob(pattern):
                shutil.copy2(file, work_dir)
                print(f"üìÑ Copied {file.name}")
    else:
        print("‚ö†Ô∏è  No VUG dataset found in /kaggle/input")
        print("üí° Please add VUG source code as a Kaggle dataset")
    
    return work_dir

# Setup workspace
workspace = setup_kaggle_workspace()

In [None]:
# Check GPU availability and optimize for Kaggle
def check_and_optimize_gpu():
    """Check GPU and apply optimizations"""
    try:
        import torch
        
        if torch.cuda.is_available():
            gpu_name = torch.cuda.get_device_name(0)
            gpu_memory = torch.cuda.get_device_properties(0).total_memory / (1024**3)
            print(f"üéÆ GPU Available: {gpu_name}")
            print(f"üíæ GPU Memory: {gpu_memory:.1f} GB")
            
            # Clear GPU memory
            torch.cuda.empty_cache()
            return True
        else:
            print("‚ö†Ô∏è  No GPU available - using CPU")
            return False
            
    except ImportError:
        print("‚ö†Ô∏è  PyTorch not available")
        return False

# Environment optimizations
os.environ['CUDA_LAUNCH_BLOCKING'] = '0'
os.environ['TOKENIZERS_PARALLELISM'] = 'false'

# Check GPU
has_gpu = check_and_optimize_gpu()

# Memory cleanup
gc.collect()

print("‚úÖ Environment setup complete!")

## 2. Import and Verify VUG Code

Import the VUG models and verify they're working correctly.

In [None]:
# Import VUG models and verify installation
def verify_vug_installation():
    """Verify that VUG models can be imported"""
    
    try:
        # Import RecBole CDR
        from recbole_cdr.quick_start import run_recbole_cdr
        print("‚úÖ RecBole CDR imported successfully")
        
        # Import VUG models
        from recbole_cdr.model.cross_domain_recommender.vug_cmf import VUG_CMF
        from recbole_cdr.model.cross_domain_recommender.vug_clfm import VUG_CLFM  
        from recbole_cdr.model.cross_domain_recommender.vug_bitgcf import VUG_BiTGCF
        from recbole_cdr.model.cross_domain_recommender.vug import VUG
        
        print("‚úÖ VUG models imported successfully:")
        print("   - VUG_CMF (VUG + Collective Matrix Factorization)")
        print("   - VUG_CLFM (VUG + Cross-domain Learning via Feature Mapping)")
        print("   - VUG_BiTGCF (VUG + BiTGCF Graph Convolution)")
        print("   - VUG (Base model for ablation studies)")
        
        return True
        
    except ImportError as e:
        print(f"‚ùå Import failed: {e}")
        print("üí° Make sure VUG source code is uploaded as a Kaggle dataset")
        return False

# Verify installation
installation_ok = verify_vug_installation()

In [None]:
# Check available datasets and configurations
def check_datasets_and_configs():
    """Check available datasets and configuration files"""
    
    # Check datasets
    dataset_dir = Path("recbole_cdr/dataset")
    if dataset_dir.exists():
        datasets = [d.name for d in dataset_dir.iterdir() if d.is_dir()]
        print(f"üìä Available datasets: {datasets}")
    else:
        print("‚ö†Ô∏è  No datasets found")
    
    # Check model configs
    config_dir = Path("recbole_cdr/properties/model")
    if config_dir.exists():
        configs = [f.stem for f in config_dir.glob("*.yaml")]
        print(f"‚öôÔ∏è  Available model configs: {configs}")
    else:
        print("‚ö†Ô∏è  No model configs found")
    
    # Check dataset configs
    dataset_config_dir = Path("recbole_cdr/properties/dataset") 
    if dataset_config_dir.exists():
        dataset_configs = [f.stem for f in dataset_config_dir.glob("*.yaml")]
        print(f"üìã Available dataset configs: {dataset_configs}")
    else:
        print("‚ö†Ô∏è  No dataset configs found")

check_datasets_and_configs()

## 3. Run VUG Model Experiments

Run the three VUG combination models with Kaggle optimizations.

In [None]:
# Kaggle Experiment Runner Class
import json
import time
from datetime import datetime

class KaggleVUGRunner:
    """Optimized VUG experiment runner for Kaggle"""
    
    def __init__(self):
        self.results_dir = Path("/kaggle/working/results")
        self.results_dir.mkdir(exist_ok=True)
        self.start_time = time.time()
        self.max_runtime = 8.5 * 3600  # 8.5 hours
        
    def check_time(self):
        """Check remaining time"""
        elapsed = time.time() - self.start_time
        remaining = (self.max_runtime - elapsed) / 3600
        print(f"‚è±Ô∏è  Time remaining: {remaining:.1f} hours")
        return remaining > 0.5  # At least 30 minutes
    
    def kaggle_config(self):
        """Kaggle-optimized configuration"""
        return {
            'train_epochs': ['BOTH:30', 'TARGET:15'],  # Reduced for Kaggle
            'embedding_size': 32,  # Smaller for faster training
            'train_batch_size': 512,
            'eval_batch_size': 1024,
            'eval_step': 5,
            'stopping_step': 5,
            'learning_rate': 0.001,
            'n_layers': 1,  # For BiTGCF
        }
    
    def run_experiment(self, model_name, dataset='Amazon'):
        """Run single experiment"""
        if not self.check_time():
            return None
            
        print(f"üöÄ Running {model_name}...")
        
        try:
            from recbole_cdr.quick_start import run_recbole_cdr
            
            # Configuration files
            dataset_config = f'./recbole_cdr/properties/dataset/{dataset}.yaml'
            model_config = f'./recbole_cdr/properties/model/{model_name}.yaml'
            
            # Run with Kaggle config
            start = time.time()
            result = run_recbole_cdr(
                model=model_name.replace('_', '') if '_' in model_name else model_name,
                config_file_list=[dataset_config, model_config],
                config_dict=self.kaggle_config()
            )
            runtime = time.time() - start
            
            # Extract results
            results = {
                'model': model_name,
                'dataset': dataset, 
                'runtime_minutes': round(runtime/60, 2),
                'timestamp': datetime.now().isoformat(),
                'metrics': {}
            }
            
            if 'test_result' in result and 'rec' in result['test_result']:
                test_metrics = result['test_result']['rec']
                for metric in ['HR@10', 'HR@20', 'NDCG@10', 'NDCG@20']:
                    if metric in test_metrics:
                        results['metrics'][metric] = float(test_metrics[metric])
            
            # Save results
            result_file = self.results_dir / f"{model_name}_{dataset}_results.json"
            with open(result_file, 'w') as f:
                json.dump(results, f, indent=2)
            
            print(f"‚úÖ {model_name} completed in {runtime/60:.1f} minutes")
            return results
            
        except Exception as e:
            print(f"‚ùå {model_name} failed: {e}")
            return None

# Initialize runner
runner = KaggleVUGRunner()
print("üéØ Kaggle VUG Runner initialized")

In [None]:
# Run VUG combination models
vug_models = ['VUG_CMF', 'VUG_CLFM', 'VUG_BiTGCF']
vug_results = {}

print("üéØ Running VUG Combination Models")
print("="*50)

for model in vug_models:
    if runner.check_time():
        result = runner.run_experiment(model, 'Amazon')
        if result:
            vug_results[model] = result
            
            # Display results
            print(f"\nüìä {model} Results:")
            for metric, value in result['metrics'].items():
                print(f"   {metric}: {value:.4f}")
        
        # Memory cleanup
        gc.collect()
        if 'torch' in sys.modules:
            import torch
            if torch.cuda.is_available():
                torch.cuda.empty_cache()
    else:
        print(f"‚è∞ Skipping {model} due to time limit")
        break

print(f"\n‚úÖ Completed {len(vug_results)} out of {len(vug_models)} models")

## 4. Run Ablation Studies (Optional)

If time permits, run ablation studies to analyze VUG components.

In [None]:
# Run ablation studies if time permits
ablation_configs = {
    'VUG_wo_constrain': {'gen_weight': 0.0},
    'VUG_wo_super': {'gen_weight': 0.0, 'enhance_weight': 0.0},
    'VUG_wo_user_attn': {'user_weight_attn': 0.0}, 
    'VUG_wo_item_attn': {'user_weight_attn': 1.0},
    'VUG_full': {}
}

ablation_results = {}

if runner.check_time() and len(vug_results) > 0:
    print("üß™ Running Ablation Studies")
    print("="*40)
    
    for variant, config_updates in ablation_configs.items():
        if not runner.check_time():
            print("‚è∞ Time limit reached, stopping ablation studies")
            break
            
        print(f"\nüî¨ Running {variant}...")
        
        try:
            from recbole_cdr.quick_start import run_recbole_cdr
            
            # Merge configs
            full_config = runner.kaggle_config()
            full_config.update(config_updates)
            
            # Use base VUG model with modified config
            start = time.time()
            result = run_recbole_cdr(
                model='VUG',
                config_file_list=['./recbole_cdr/properties/dataset/Amazon.yaml',
                                './recbole_cdr/properties/model/VUG.yaml'],
                config_dict=full_config
            )
            runtime = time.time() - start
            
            # Extract results
            variant_result = {
                'variant': variant,
                'runtime_minutes': round(runtime/60, 2),
                'metrics': {}
            }
            
            if 'test_result' in result and 'rec' in result['test_result']:
                test_metrics = result['test_result']['rec']
                for metric in ['HR@10', 'HR@20', 'NDCG@10', 'NDCG@20']:
                    if metric in test_metrics:
                        variant_result['metrics'][metric] = float(test_metrics[metric])
            
            ablation_results[variant] = variant_result
            
            print(f"‚úÖ {variant} completed")
            for metric, value in variant_result['metrics'].items():
                print(f"   {metric}: {value:.4f}")
                
            # Cleanup
            gc.collect()
            
        except Exception as e:
            print(f"‚ùå {variant} failed: {e}")

else:
    print("‚ö†Ô∏è  Skipping ablation studies (insufficient time or no baseline results)")

## 5. Visualize and Analyze Results

Create visualizations and comparisons of the experimental results.

In [None]:
# Create results visualization and comparison tables
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

def create_results_table():
    """Create comprehensive results table"""
    
    all_results = []
    
    # Add VUG model results
    for model, result in vug_results.items():
        row = {'Model': model, 'Type': 'VUG_Combination'}
        row.update(result['metrics'])
        row['Runtime_min'] = result['runtime_minutes']
        all_results.append(row)
    
    # Add ablation results
    for variant, result in ablation_results.items():
        row = {'Model': variant, 'Type': 'Ablation'}
        row.update(result['metrics'])  
        row['Runtime_min'] = result['runtime_minutes']
        all_results.append(row)
    
    if all_results:
        df = pd.DataFrame(all_results)
        return df
    else:
        return None

# Create and display results table
results_df = create_results_table()

if results_df is not None:
    print("üìä Experimental Results Summary")
    print("="*60)
    
    # Display formatted table
    pd.set_option('display.precision', 4)
    print(results_df.to_string(index=False))
    
    # Save to CSV
    csv_path = runner.results_dir / "experiment_results.csv"
    results_df.to_csv(csv_path, index=False)
    print(f"\nüíæ Results saved to: {csv_path}")
else:
    print("‚ö†Ô∏è  No results to display")

In [None]:
# Create visualizations
if results_df is not None and len(results_df) > 1:
    
    # Set up plotting
    plt.style.use('default')
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    fig.suptitle('VUG Experiments Results Comparison', fontsize=16, fontweight='bold')
    
    # Define metrics to plot
    metrics = ['HR@10', 'HR@20', 'NDCG@10', 'NDCG@20']
    
    for i, metric in enumerate(metrics):
        ax = axes[i//2, i%2]
        
        if metric in results_df.columns:
            # Create bar plot
            sns.barplot(data=results_df, x='Model', y=metric, hue='Type', ax=ax)
            ax.set_title(f'{metric} Comparison', fontweight='bold')
            ax.set_xlabel('Model')
            ax.set_ylabel(metric)
            ax.tick_params(axis='x', rotation=45)
            
            # Add value labels on bars
            for container in ax.containers:
                ax.bar_label(container, fmt='%.3f', fontsize=8)
        else:
            ax.text(0.5, 0.5, f'{metric}\nNot Available', 
                   ha='center', va='center', transform=ax.transAxes)
            ax.set_title(f'{metric} - No Data')
    
    plt.tight_layout()
    plt.savefig(runner.results_dir / 'results_comparison.png', dpi=300, bbox_inches='tight')
    plt.show()
    
    print("üìà Visualization created and saved")
else:
    print("üìä Insufficient data for visualization")

## 6. Save and Export Results

Package all results for download or further analysis.

In [None]:
# Create comprehensive results package
import zipfile
from datetime import datetime

def create_results_package():
    """Package all results for download"""
    
    # Create summary report
    report_file = runner.results_dir / "experiment_report.txt"
    
    with open(report_file, 'w') as f:
        f.write("VUG Cross-Domain Recommendation Experiments\n")
        f.write("=" * 50 + "\n")
        f.write(f"Experiment Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
        f.write(f"Platform: Kaggle\n")
        f.write(f"Total Runtime: {(time.time() - runner.start_time)/3600:.2f} hours\n\n")
        
        # VUG Models Results
        f.write("VUG Combination Models:\n")
        f.write("-" * 30 + "\n")
        for model, result in vug_results.items():
            f.write(f"\n{model}:\n")
            f.write(f"  Runtime: {result['runtime_minutes']:.1f} minutes\n")
            for metric, value in result['metrics'].items():
                f.write(f"  {metric}: {value:.4f}\n")
        
        # Ablation Results  
        if ablation_results:
            f.write("\n\nAblation Study Results:\n")
            f.write("-" * 30 + "\n")
            for variant, result in ablation_results.items():
                f.write(f"\n{variant}:\n")
                f.write(f"  Runtime: {result['runtime_minutes']:.1f} minutes\n")
                for metric, value in result['metrics'].items():
                    f.write(f"  {metric}: {value:.4f}\n")
        
        # Best Results Summary
        if results_df is not None:
            f.write("\n\nBest Results Summary:\n")
            f.write("-" * 30 + "\n")
            for metric in ['HR@10', 'NDCG@10']:
                if metric in results_df.columns:
                    best_idx = results_df[metric].idxmax()
                    best_model = results_df.loc[best_idx, 'Model']
                    best_value = results_df.loc[best_idx, metric]
                    f.write(f"Best {metric}: {best_model} ({best_value:.4f})\n")
    
    # Create zip package
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    zip_path = Path(f"/kaggle/working/VUG_Results_{timestamp}.zip")
    
    with zipfile.ZipFile(zip_path, 'w') as zipf:
        # Add all result files
        for file in runner.results_dir.glob("*"):
            if file.is_file():
                zipf.write(file, file.name)
        
        # Add report
        zipf.write(report_file, "experiment_report.txt")
    
    return zip_path, report_file

# Create results package
if vug_results or ablation_results:
    zip_path, report_path = create_results_package()
    
    print("üì¶ Results Package Created")
    print("=" * 40)
    print(f"üìÅ Zip file: {zip_path}")
    print(f"üìÑ Report: {report_path}")
    print(f"üìä Results directory: {runner.results_dir}")
    
    # Display summary
    print("\nüìã Experiment Summary:")
    print(f"   VUG Models completed: {len(vug_results)}")
    print(f"   Ablation studies completed: {len(ablation_results)}")
    print(f"   Total runtime: {(time.time() - runner.start_time)/3600:.2f} hours")
    
else:
    print("‚ö†Ô∏è  No results to package")

## üéØ Instructions for Running on Kaggle

### Step-by-step process:

#### 1. **Create Kaggle Dataset**
   - Zip your entire VUG project folder (including `recbole_cdr/`, `*.py`, `*.yaml`)
   - Upload to Kaggle as a new dataset with title "VUG Cross-Domain Recommendation"
   - Make it public or private as needed

#### 2. **Create New Kaggle Notebook**
   - Start a new Kaggle notebook
   - Enable GPU accelerator (recommended)
   - Add your VUG dataset to the notebook inputs

#### 3. **Copy This Notebook**
   - Copy all cells from this notebook to your Kaggle notebook
   - Run cells sequentially from top to bottom

#### 4. **Monitor Progress**
   - Check time remaining and adjust experiments accordingly
   - Results are automatically saved to `/kaggle/working/results/`
   - Download the final zip file with all results

#### 5. **Expected Runtime**
   - VUG_CMF: ~30-45 minutes  
   - VUG_CLFM: ~30-45 minutes
   - VUG_BiTGCF: ~45-60 minutes  
   - Ablation studies: ~20-30 minutes each

### üí° **Tips for Success:**
- Use GPU acceleration for faster training
- Reduce epochs if running out of time
- Monitor memory usage with large datasets
- Save intermediate results frequently

---
**Happy experimenting! üöÄ**