# Black Ice Integrated Pipeline - Kaggle Training

This notebook runs the complete integrated pipeline for training all models including temporal LSTM/Transformer models.

## Setup
1. Upload all Python files from your project
2. Upload training_data.csv
3. Run the cells below
4. Download the trained models from Model_output/

In [None]:
# Install required packages
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
!pip install scikit-learn pandas numpy matplotlib seaborn plotly
!pip install lightgbm xgboost catboost
!pip install flask requests
!pip install jupyter ipykernel

In [None]:
# Create project structure
import os
os.makedirs('Model_output', exist_ok=True)
os.makedirs('Model_output/deployment', exist_ok=True)
os.makedirs('Model_output/ensemble', exist_ok=True)

print("Project structure created")

In [None]:
# Upload your files here (use Kaggle's file upload interface)
# Make sure to upload:
# - integrated_advanced_pipeline.py
# - production_ensemble_pipeline.py
# - advanced_temporal_architecture.py
# - enhanced_multitf_pipeline.py
# - feature_engineering_smc_institutional.py
# - model_export.py
# - learning_curve_plotter.py
# - temporal_validation.py
# - recovery_mechanism.py
# - training_data.csv

print("Upload your files to the Kaggle notebook environment")
print("Files should be in the root directory")

In [None]:
# Verify files are uploaded
import os
files_to_check = [
    'integrated_advanced_pipeline.py',
    'production_ensemble_pipeline.py',
    'advanced_temporal_architecture.py',
    'enhanced_multitf_pipeline.py',
    'feature_engineering_smc_institutional.py',
    'model_export.py',
    'learning_curve_plotter.py',
    'temporal_validation.py',
    'recovery_mechanism.py',
    'training_data.csv'
]

missing_files = [f for f in files_to_check if not os.path.exists(f)]
if missing_files:
    print(f"Missing files: {missing_files}")
    print("Please upload the missing files")
else:
    print("All required files are present!")

In [None]:
# Run the integrated pipeline
from integrated_advanced_pipeline import main
import warnings
warnings.filterwarnings('ignore')

# Configure the pipeline
config = {
    'data_path': 'training_data.csv',
    'save_path': 'Model_output',
    'sequence_length': 20,
    'batch_size': 64,  # Smaller batch size for Kaggle
    'epochs': 50,
    'learning_rate': 1e-3,
    'device': 'cuda' if torch.cuda.is_available() else 'cpu'
}

print(f"Using device: {config['device']}")
print("Starting integrated pipeline training...")

# Run the pipeline
system, results = main(config)

print("\n✅ Training completed successfully!")
print(f"Models saved to: {config['save_path']}")

In [None]:
# Verify models were created
import os

print("Checking created models...")

# Check deployment models
deployment_dir = 'Model_output/deployment'
if os.path.exists(deployment_dir):
    deployment_files = os.listdir(deployment_dir)
    print(f"\nDeployment models ({len(deployment_files)} files):")
    for f in sorted(deployment_files):
        print(f"  - {f}")

# Check ensemble models
ensemble_dir = 'Model_output/ensemble'
if os.path.exists(ensemble_dir):
    ensemble_files = os.listdir(ensemble_dir)
    print(f"\nEnsemble models ({len(ensemble_files)} files):")
    for f in sorted(ensemble_files):
        print(f"  - {f}")

# Check for temporal models specifically
temporal_models = [f for f in ensemble_files if 'lstm' in f.lower() or 'transformer' in f.lower()]
if temporal_models:
    print(f"\n✅ Temporal models found: {temporal_models}")
else:
    print("\n❌ No temporal models found!")

In [None]:
# Test model loading
try:
    from model_rest_server_proper import ProperModelServer
    
    print("Testing model server initialization...")
    server = ProperModelServer('Model_output')
    
    print("\n✅ Server initialized successfully!")
    print(f"Loaded models: {len(server.models['pytorch'])} PyTorch, {len(server.models['sklearn'])} sklearn, {len(server.models['temporal'])} temporal")
    
except Exception as e:
    print(f"❌ Server initialization failed: {e}")
    import traceback
    traceback.print_exc()

In [None]:
# Download the trained models
import zipfile
import os
from IPython.display import FileLink

# Create zip file of Model_output
def create_zip(zip_name, directory):
    with zipfile.ZipFile(zip_name, 'w', zipfile.ZIP_DEFLATED) as zipf:
        for root, dirs, files in os.walk(directory):
            for file in files:
                zipf.write(os.path.join(root, file), 
                          os.path.relpath(os.path.join(root, file), directory))
    return zip_name

# Create and download zip
zip_file = create_zip('trained_models.zip', 'Model_output')
print(f"Created zip file: {zip_file}")

# Display download link
FileLink(zip_file)

## Instructions

1. **Upload Files**: Use Kaggle's file upload interface to upload all your Python files and training_data.csv
2. **Run Cells**: Execute the cells in order
3. **Monitor Training**: The pipeline will train all models including temporal LSTM/Transformer
4. **Download Results**: Use the download link to get your trained models
5. **Copy to Local**: Download the zip file and extract to your local Python project

## Expected Output
- 4 PyTorch feedforward models (TorchScript .pt files)
- 4 sklearn models (.pkl files)
- 2 temporal models (LSTM and Transformer state_dict .pth files)
- Feature scalers and ensemble configuration

## Troubleshooting
- If you get CUDA out of memory, reduce batch_size in config
- If training is too slow, reduce epochs
- Make sure all required files are uploaded