# Indic-OCR Training on Google Colab (with GPU)

This notebook will guide you through training the Indic-OCR model using GPU on Google Colab. It covers:
- Mounting Google Drive
- Uploading and extracting your training data (`raw.zip`)
- Cloning or uploading the Indic-OCR code
- Installing dependencies
- Running the training script
- Saving outputs/models to Drive

## 1. Mount Google Drive
This step allows you to access files from your Google Drive, including uploading your `raw.zip` training data.

In [None]:
from google.colab import drive
# Mount Google Drive
drive.mount('/content/drive')

## 2. Upload Training Data (`raw.zip`)
You can either upload directly to Colab or place your `raw.zip` in your Google Drive. This cell provides both options.

In [None]:
# Option 1: Upload directly to Colab (choose file manually)
from google.colab import files
uploaded = files.upload()

# Option 2: If already in Google Drive, set the path below
# raw_zip_path = '/content/drive/MyDrive/path_to_your/raw.zip'

## 3. Unzip Training Data
Extract the `raw.zip` file to a working directory.

In [None]:
import zipfile
import os

# Set the path to your raw.zip (update if using Google Drive)
raw_zip_path = 'raw.zip'  # or '/content/drive/MyDrive/path_to_your/raw.zip'
extract_dir = '/content/raw_data'

os.makedirs(extract_dir, exist_ok=True)
with zipfile.ZipFile(raw_zip_path, 'r') as zip_ref:
    zip_ref.extractall(extract_dir)

print(f"Extracted to {extract_dir}")

## 4. Clone or Upload Indic-OCR Code
Clone your Indic-OCR repository from GitHub or upload the code directly to Colab.

In [None]:
# Option 1: Clone from GitHub
!git clone https://github.com/BytesByJay/Indic-OCR.git

# Option 2: Upload code manually (use Colab file upload if not using GitHub)

## 5. Install Dependencies
Install all required Python packages using the `requirements.txt` from the Indic-OCR repo.

In [None]:
# Change directory if needed
%cd Indic-OCR

# Install dependencies
!pip install -r requirements.txt

## 6. Run Training Script
Run the training script (`train.py`). Adjust arguments as needed for your dataset and configuration.

In [None]:
# Example: Adjust the arguments as per your train.py requirements
!python train.py --data_dir /content/raw_data --output_dir /content/drive/MyDrive/Indic-OCR-outputs

## 7. Save Outputs/Models to Google Drive
Ensure your trained models and outputs are saved to your Google Drive for later use.

In [None]:
# Example: Copy trained model to Google Drive (if not already saved)
# !cp /content/Indic-OCR/outputs/model.pth /content/drive/MyDrive/Indic-OCR-outputs/
# Adjust the path and filename as needed