# ‚òÅÔ∏è Jan Sunwai AI: Cloud Workflow (Colab / Kaggle)

Use this notebook to run your **Dataset Sorting** and **Model Evaluation** on Google Colab or Kaggle. This allows you to utilize free GPUs for faster processing.

### üìã Prerequisite
Before uploading this notebook to Colab, you need to:
1. Zip your entire local `backend` folder.
2. Name it `backend.zip`.
3. Have it ready to upload.

In [None]:
# 1. Install Dependencies
!pip install transformers pillow torch tqdm sentencepiece scikit-learn joblib numpy

In [None]:
# 2. Check GPU Availability
import torch

if torch.cuda.is_available():
    print(f"‚úÖ GPU Detected: {torch.cuda.get_device_name(0)}")
else:
    print("‚ö†Ô∏è No GPU detected. Go to Runtime > Change runtime type > Hardware accelerator > T4 GPU (in Colab)")

### üì§ Step 3: Upload Code & Data
Run the cell below, then click "Choose Files" and select your `backend.zip` file.

In [None]:
import argparse
import os
import sys

# Detect Environment
IS_COLAB = 'google.colab' in sys.modules
IS_KAGGLE = 'kaggle_secrets' in sys.modules

if IS_COLAB:
    from google.colab import files
    print("üì§ Please upload 'backend.zip' now...")
    uploaded = files.upload()
    
    if 'backend.zip' in uploaded:
        print("‚úÖ backend.zip uploaded!")
        !unzip -o backend.zip -d .
        print("‚úÖ Unzipped successfully.")
    else:
        print("‚ùå 'backend.zip' not found in upload. Please try again.")

elif IS_KAGGLE:
    # In Kaggle, you usually upload via the 'Data' tab on the right.
    # Assuming the user added 'backend-zip' as a dataset
    print("‚ÑπÔ∏è On Kaggle, add 'backend.zip' as a dataset.")
    print("If uploaded, copy it to working directory:")
    # !cp /kaggle/input/your-dataset-name/backend.zip ./backend.zip
    # !unzip -o backend.zip -d .

else:
    print("üíª Running Locally. Skipping upload.")

### üîÑ Step 4: Run Usage Sorting
This will organize your messy images into proper Authority Categories.

In [None]:
# Run the sorting script
if os.path.exists('backend/sort_dataset.py'):
    print("üöÄ Starting Dataset Sort...")
    !python backend/sort_dataset.py
else:
    print("‚ùå Could not find backend/sort_dataset.py. Did the unzip work?")

### üß† Step 5: Train Custom Classifier (Recommended)
Since Zero-Shot accuracy can be low (~10-40%), use this step to train a specific classifier on your sorted data. This usually boosts accuracy to >85%.

In [None]:
if os.path.exists('backend/train_custom_classifier.py'):
    print("üß† Training Custom Head...")
    !python backend/train_custom_classifier.py
else:
    print("‚ùå Training script not found.")

### üìä Step 6: Run Evaluation (Optional)
This tests the model on your newly sorted dataset using the GPU to confirm accuracy improvements.

In [None]:
if os.path.exists('backend/evaluate_sorted_dataset.py'):
    print("üöÄ Starting Evaluation...")
    !python backend/evaluate_sorted_dataset.py
else:
    print("‚ùå Script not found.")

### ‚¨áÔ∏è Step 7: Download Results
Zip the sorted dataset and the trained model to download back to your local machine.

In [None]:
# 1. Zip the results (Included Trained Model)
!zip -r sorted_dataset_output.zip backend/sorted_dataset backend/evaluation_report_v2.csv backend/custom_classifier_head.pkl

# 2. Download
if IS_COLAB:
    files.download('sorted_dataset_output.zip')
    print("‚¨áÔ∏è Download started!")
elif IS_KAGGLE:
    print("‚úÖ Output saved to /kaggle/working/sorted_dataset_output.zip. You can download it from the Output tab.")