# Train Abusive-Language-Detection on Google Colab

This notebook automates setup and training on Colab. It: 
- Installs dependencies from `requirements.txt`
- Mounts Google Drive (optional)
- Prepares `.env` and data files
- Downloads BERT model with `download_model.py`
- Runs `python train.py` to start training

Notes:
- Select GPU runtime: Runtime > Change runtime type > Hardware accelerator: GPU
- You can save outputs/checkpoints to Google Drive by mounting it below.

In [None]:
# Install dependencies
!pip install -q -r requirements.txt

In [None]:
# (Optional) Mount Google Drive to save outputs and datasets
from google.colab import drive
drive.mount('/content/drive')

# Create output directory in Drive to persist checkpoints (optional)
!mkdir -p /content/drive/MyDrive/ALD_output

In [None]:
# Ensure the repo files are present in Colab workspace (they are if you uploaded the repo)
# If running from a GitHub repo, you can clone here instead:
# !git clone <your-repo-url> repo && cd repo

# Show important files
!ls -la

In [None]:
# Create a .env file with default paths (adjust if you mounted Drive)
env_content = '''TRAIN_DATASET=data/train.csv
TEST_DATASET=data/test.csv
OUTPUT_DIR=output/
EPOCHS=3
BATCH_SIZE=8
LOGGING_STEPS=10
LEARNING_RATE=2e-5
'''
with open('.env', 'w') as f:
  f.write(env_content)
print(open('.env').read())

In [None]:
# (Optional) Copy dataset from Drive if you placed it there
# Uncomment and edit paths if using Drive:
# !cp /content/drive/MyDrive/datasets/train.csv data/train.csv
# !cp /content/drive/MyDrive/datasets/test.csv data/test.csv

# Verify datasets exist
!ls -la data || true
!head -n 3 data/train.csv || true

In [None]:
# Download bert model if needed
# The repo provides `download_model.py` which saves the model under models/bert-base-uncased/
!python download_model.py || true
!ls -la models/bert-base-uncased || true

In [None]:
# Run training
# Adjust epochs and batch size by editing .env or setting env vars in the same cell
!python train.py