# üè• Enhanced CVFBJTL-BCD: Breast Cancer Diagnosis on Kaggle

## Computer Vision with Fusion Based Joint Transfer Learning

**Target Performance:** >98.18% accuracy (paper baseline)

**GPU:** Tesla P100 / T4 (enable in Settings ‚Üí Accelerator ‚Üí GPU)

---

### üìã Features:
- ‚úÖ Gabor Filtering for noise reduction
- ‚úÖ DenseNet201 + InceptionV3 + MobileNetV2 fusion
- ‚ú® **Vision Transformer (NOVELTY)**
- ‚úÖ Stacked Autoencoder (SAE)
- ‚úÖ SMOTE for dataset balancing
- üîç Grad-CAM explainability
- üìä Comprehensive evaluation

---

### üöÄ Quick Start:
1. Enable GPU (Settings ‚Üí Accelerator ‚Üí GPU T4 x2)
2. Add BreaKHis dataset to input
3. Run all cells
4. Download outputs from `/kaggle/working/outputs/`

## üì¶ Step 1: Setup Environment

In [None]:
# Check Python version
import sys
print(f"Python Version: {sys.version}")
print(f"Python Executable: {sys.executable}")

In [None]:
# Install additional packages if needed
# Kaggle has most packages pre-installed
!pip install --quiet imbalanced-learn albumentations

In [None]:
# Import all required libraries
import os
import sys
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
import seaborn as sns

print(f"TensorFlow Version: {tf.__version__}")
print(f"Keras Version: {keras.__version__}")
print(f"NumPy Version: {np.__version__}")

# Check GPU
gpus = tf.config.list_physical_devices('GPU')
if gpus:
    print(f"\n‚úÖ GPU Available: {len(gpus)} GPU(s)")
    for gpu in gpus:
        print(f"   {gpu}")
else:
    print("\n‚ö†Ô∏è  No GPU found! Enable GPU: Settings ‚Üí Accelerator ‚Üí GPU")

## üìÅ Step 2: Verify Dataset

**Important:** You need to upload the BreaKHis dataset to Kaggle:

1. Go to https://www.kaggle.com/datasets
2. Click "New Dataset"
3. Upload your BreaKHis_v1 folder
4. Add the dataset to this notebook (Add Data button)

Expected structure:
```
/kaggle/input/breakhis-dataset/
‚îú‚îÄ‚îÄ benign/
‚îÇ   ‚îî‚îÄ‚îÄ SOB/
‚îî‚îÄ‚îÄ malignant/
    ‚îî‚îÄ‚îÄ SOB/
```

In [None]:
# List all available datasets in Kaggle input
print("üìÇ Available Kaggle Input Datasets:")
print("="*50)

if os.path.exists('/kaggle/input'):
    for item in os.listdir('/kaggle/input'):
        path = os.path.join('/kaggle/input', item)
        if os.path.isdir(path):
            print(f"\nüìÅ {item}/")
            # Show first level contents
            for sub in os.listdir(path)[:5]:
                print(f"   ‚îî‚îÄ‚îÄ {sub}")
else:
    print("Not running on Kaggle")

In [None]:
# Verify BreaKHis dataset structure
# MODIFY THIS PATH if your dataset is named differently
DATASET_PATH = '/kaggle/input/breakhis-dataset'  # Change this if needed

print(f"Checking dataset at: {DATASET_PATH}")
print("="*50)

if os.path.exists(DATASET_PATH):
    print("‚úÖ Dataset found!")
    
    # Count images
    total_images = 0
    for root, dirs, files in os.walk(DATASET_PATH):
        total_images += len([f for f in files if f.endswith('.png')])
    
    print(f"\nüìä Total images: {total_images}")
    print("\nüìÅ Directory structure:")
    
    for root, dirs, files in os.walk(DATASET_PATH):
        level = root.replace(DATASET_PATH, '').count(os.sep)
        indent = ' ' * 2 * level
        print(f"{indent}{os.path.basename(root)}/")
        if level < 3:  # Limit depth
            subindent = ' ' * 2 * (level + 1)
            for d in dirs[:3]:
                print(f"{subindent}‚îú‚îÄ‚îÄ {d}/")
else:
    print("‚ùå Dataset not found!")
    print("\nüì• Please upload BreaKHis dataset:")
    print("   1. Click 'Add Data' button on the right")
    print("   2. Upload your dataset")
    print("   3. Restart kernel")

## üèóÔ∏è Step 3: Upload Project Files

You need to upload these Python files to Kaggle:

**Required files:**
1. `enhanced_cvfbjtl_bcd_model.py`
2. `breakhis_dataloader.py`
3. `advanced_explainability.py` (or `gradcam_explainability.py`)

**How to upload:**
- Method 1: Use "File" ‚Üí "Upload Notebook or Script"
- Method 2: Copy files into cells below and save as .py files

In [None]:
# Check if required files are present
required_files = [
    'enhanced_cvfbjtl_bcd_model.py',
    'breakhis_dataloader.py',
    'advanced_explainability.py'
]

print("Checking required files:")
print("="*50)

all_present = True
for file in required_files:
    if os.path.exists(file):
        print(f"‚úÖ {file}")
    else:
        print(f"‚ùå {file} - NOT FOUND")
        all_present = False

if all_present:
    print("\n‚úÖ All required files present!")
else:
    print("\n‚ö†Ô∏è  Some files are missing. Please upload them.")
    print("\nüì§ Upload instructions:")
    print("   1. Click 'Add Data' ‚Üí 'Upload'")
    print("   2. Upload the Python files from your project")
    print("   3. Restart kernel")

## üöÄ Step 4: Run Training

This will:
1. Configure GPU for optimal performance
2. Load and preprocess BreaKHis dataset
3. Build Enhanced CVFBJTL-BCD model
4. Train for 50 epochs (with early stopping)
5. Evaluate on test set
6. Generate plots and Grad-CAM visualizations
7. Save all outputs to `/kaggle/working/outputs/`

**Expected time:** ~45-60 minutes on Tesla P100

In [None]:
# Run the complete training pipeline
# This uses kaggle_train_cvfbjtl_bcd.py

# If you haven't uploaded the training script, you can run it inline:
%run kaggle_train_cvfbjtl_bcd.py

## üìä Step 5: View Results

In [None]:
# Load and display results
import json

results_path = '/kaggle/working/outputs/results.json'

if os.path.exists(results_path):
    with open(results_path, 'r') as f:
        results = json.load(f)
    
    print("üéØ FINAL RESULTS")
    print("="*70)
    print(f"Test Accuracy:  {results['test_accuracy']*100:.2f}%")
    print(f"Test Precision: {results['test_precision']*100:.2f}%")
    print(f"Test Recall:    {results['test_recall']*100:.2f}%")
    print(f"Test F1-Score:  {results['test_f1']*100:.2f}%")
    
    print(f"\nüìà COMPARISON WITH PAPER:")
    print(f"Paper:          {results['paper_accuracy']}%")
    print(f"Our Model:      {results['test_accuracy']*100:.2f}%")
    print(f"Improvement:    {results['improvement']:+.2f}%")
    
    if results['improvement'] > 0:
        print("\nüéâ BETTER THAN PAPER! ‚ú®")
else:
    print("Results not found. Training may not have completed.")

In [None]:
# Display training plots
from IPython.display import Image, display

plots = [
    'training_history.png',
    'confusion_matrix.png',
    'roc_curve.png'
]

for plot in plots:
    plot_path = f'/kaggle/working/outputs/plots/{plot}'
    if os.path.exists(plot_path):
        print(f"\nüìä {plot}:")
        display(Image(filename=plot_path))
    else:
        print(f"‚ö†Ô∏è  {plot} not found")

In [None]:
# Display some Grad-CAM examples
gradcam_dir = '/kaggle/working/outputs/plots/gradcam'

if os.path.exists(gradcam_dir):
    gradcam_files = sorted([f for f in os.listdir(gradcam_dir) if f.endswith('.png')])[:5]
    
    print("üîç GRAD-CAM VISUALIZATIONS (Sample):")
    print("="*70)
    
    for gradcam_file in gradcam_files:
        print(f"\n{gradcam_file}:")
        display(Image(filename=os.path.join(gradcam_dir, gradcam_file)))
else:
    print("No Grad-CAM visualizations found")

## üì• Step 6: Download Outputs

All outputs are saved in `/kaggle/working/outputs/`

**Files to download:**
- `models/best_model.h5` - Trained model
- `results.json` - Evaluation metrics
- `TRAINING_REPORT.txt` - Complete report
- `plots/*.png` - All visualizations
- `logs/training_log.csv` - Training history

**How to download:**
1. Click "Save Version" to commit outputs
2. Go to "Output" tab
3. Download the entire `outputs` folder

In [None]:
# List all generated files
output_dir = '/kaggle/working/outputs'

if os.path.exists(output_dir):
    print("üìÅ GENERATED FILES:")
    print("="*70)
    
    for root, dirs, files in os.walk(output_dir):
        level = root.replace(output_dir, '').count(os.sep)
        indent = '  ' * level
        print(f"{indent}{os.path.basename(root)}/")
        subindent = '  ' * (level + 1)
        for file in files:
            file_path = os.path.join(root, file)
            file_size = os.path.getsize(file_path) / 1024  # KB
            print(f"{subindent}‚îú‚îÄ‚îÄ {file} ({file_size:.1f} KB)")
    
    print("\n‚úÖ All files ready for download!")
else:
    print("Output directory not found")

## üéì Step 7: Advanced Analysis (Optional)

In [None]:
# Load the trained model for further analysis
from tensorflow.keras.models import load_model

model_path = '/kaggle/working/outputs/models/best_model.h5'

if os.path.exists(model_path):
    print("Loading trained model...")
    model = load_model(model_path)
    print("‚úÖ Model loaded successfully!")
    
    # Model summary
    print("\nüìã Model Summary:")
    model.summary()
else:
    print("Model file not found")

In [None]:
# Perform additional analysis on test set
# This cell is optional - for detailed analysis

# Load test data (you'll need to modify this based on your data)
# Example:
# from breakhis_dataloader import BreaKHisDataLoader
# loader = BreaKHisDataLoader(DATASET_PATH)
# X_test, y_test, _ = loader.load_dataset(...)
# predictions = model.predict(X_test)

## üìù Summary

### Training Complete! üéâ

**Next Steps:**

1. **Download all outputs** from `/kaggle/working/outputs/`
2. **Review the training report** for detailed metrics
3. **Use the trained model** for predictions
4. **Include results** in your research paper

**For Research Paper:**
- Accuracy, Precision, Recall, F1-Score metrics ‚úì
- Confusion matrix visualization ‚úì
- ROC curve and AUC ‚úì
- Training/validation curves ‚úì
- Grad-CAM explanations ‚úì
- Comparison with baseline paper ‚úì

**Publication Ready!** üìÑ

---

### üìß Support

If you encounter any issues:
1. Check GPU is enabled
2. Verify dataset path is correct
3. Ensure all Python files are uploaded
4. Check the training logs for errors

### üåü Citation

If you use this code, please cite the original paper:

```
Iniyan et al. (2024). "Enhanced breast cancer diagnosis through integration 
of computer vision with fusion based joint transfer learning using multi 
modality medical images." Scientific Reports, 14:28376.
```

---

**Good luck with your research! üöÄ**