# Fraud Detection ML - Google Colab Integration

This notebook provides integration between your local fraud detection project and Google Colab's GPU resources. It allows you to:

1. Clone your GitHub repository (if you have one)
2. Upload your local data files
3. Install required dependencies
4. Run training and evaluation with Colab's GPU
5. Download results back to your local machine

**Note**: Make sure to save this notebook to your Google Drive and open it with Google Colab.

## 1. Check GPU Availability

First, let's check if we have access to a GPU in this Colab session.

In [None]:
# Check if GPU is available
!nvidia-smi

# Check TensorFlow GPU access
import tensorflow as tf
print("TensorFlow version:", tf.__version__)
print("GPU Available: ", tf.config.list_physical_devices('GPU'))

## 2. Setup Project Repository

You can either clone your project from GitHub (if it's in a repository) or upload your files directly.

### Option 1: Clone from GitHub (if available)

In [None]:
# Uncomment and modify if your project is in a GitHub repository
# !git clone https://github.com/your-username/fraud-detection-ml.git
# %cd fraud-detection-ml

### Option 2: Upload Files Directly

If your project is not in a GitHub repository, you can upload your files directly.

In [None]:
# Create project directory
!mkdir -p fraud-detection-ml
%cd fraud-detection-ml

In [None]:
# Mount Google Drive to access uploaded files
from google.colab import drive
drive.mount('/content/drive')

You can now upload your project files to Google Drive and access them here. Alternatively, you can use the file upload widget below to upload key files directly to this Colab session.

In [None]:
# File upload widget for direct uploads
from google.colab import files

# Create necessary directories
!mkdir -p src/models src/utils src/spark_jobs data/raw data/processed results/models

print("Upload your project files using the widget that appears below.")
print("You'll need to upload key files like:")
print("- config.yaml")
print("- src/models/train_model.py")
print("- src/models/evaluate_model.py")
print("- src/utils/*.py")
print("- data files")

uploaded = files.upload()

# Move uploaded files to appropriate directories
import os
for filename in uploaded.keys():
    if filename.endswith('.py'):
        if 'models' in filename:
            !mv {filename} src/models/
        elif 'utils' in filename:
            !mv {filename} src/utils/
        elif 'spark_jobs' in filename:
            !mv {filename} src/spark_jobs/
        else:
            !mv {filename} src/
    elif filename.endswith('.csv') or filename.endswith('.parquet'):
        !mv {filename} data/raw/
    elif filename.endswith('.yaml') or filename.endswith('.yml'):
        !mv {filename} ./
    else:
        print(f"Keeping {filename} in the current directory")

## 3. Install Dependencies

Let's install the required packages for our project.

In [None]:
# Upload requirements.txt file if not already uploaded
try:
    with open('requirements.txt', 'r') as f:
        print("requirements.txt already exists")
except FileNotFoundError:
    print("Please upload requirements.txt file")
    uploaded = files.upload()

In [None]:
# Install dependencies
%pip install -r requirements.txt

# Install additional dependencies for Colab compatibility
%pip install pyspark==3.3.0 pyarrow==10.0.1 fastparquet==0.8.3

## 4. Data Processing

Now let's process the data using your project's data processing script.

In [None]:
# List available data files
!ls -la data/raw/

In [None]:
# Process data using your load_data.py script
# Modify the paths as needed
!python src/spark_jobs/load_data.py --input data/raw/financial_fraud_detection_dataset.csv --output data/processed/transactions.parquet

## 5. Model Training with GPU

Now we'll train the models using Colab's GPU.

In [None]:
# Set up MLflow tracking
import os
import yaml

# Load config
try:
    with open('config.yaml', 'r') as file:
        config = yaml.safe_load(file)
    print("Config loaded successfully")
except Exception as e:
    print(f"Error loading config: {str(e)}")
    config = {
        'mlflow': {'experiment_name': 'Fraud_Detection_Experiment'},
        'models': {'output_dir': 'results/models'},
        'data': {'processed_path': 'data/processed/transactions.parquet'}
    }

In [None]:
# Train classification model
!python src/models/train_model.py \
    --data-path {config['data']['processed_path']} \
    --model-type classification \
    --experiment-name {config['mlflow']['experiment_name']} \
    --model-dir {config['models']['output_dir']}

In [None]:
# Train autoencoder model
!python src/models/train_model.py \
    --data-path {config['data']['processed_path']} \
    --model-type autoencoder \
    --experiment-name {config['mlflow']['experiment_name']} \
    --model-dir {config['models']['output_dir']}

## 6. Model Evaluation

Let's evaluate the trained models.

In [None]:
# Evaluate classification model
!python src/models/evaluate_model.py \
    --model-path {config['models']['output_dir']}/classification_model.h5 \
    --test-data {config['data']['processed_path']} \
    --model-type classification \
    --output-dir results

In [None]:
# Evaluate autoencoder model
!python src/models/evaluate_model.py \
    --model-path {config['models']['output_dir']}/autoencoder_model.h5 \
    --test-data {config['data']['processed_path']} \
    --model-type autoencoder \
    --output-dir results \
    --percentile 95

## 7. Download Results

Finally, let's download the trained models and evaluation results.

In [None]:
# Compress results for download
!zip -r fraud_detection_results.zip results/ {config['models']['output_dir']}/

In [None]:
# Download the results
from google.colab import files
files.download('fraud_detection_results.zip')

## 8. Save Trained Models to Google Drive (Optional)

If you want to save your trained models to Google Drive for future use:

In [None]:
# Create a directory in Google Drive to save models
!mkdir -p /content/drive/MyDrive/fraud_detection_models

# Copy models to Google Drive
!cp -r {config['models']['output_dir']}/* /content/drive/MyDrive/fraud_detection_models/
!cp -r results/* /content/drive/MyDrive/fraud_detection_models/

print("Models and results saved to Google Drive at: /content/drive/MyDrive/fraud_detection_models/")