In [None]:
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# SmartEmail Assistant - Google Colab Setup with Llama 2\n",
    "\n",
    "This notebook helps you set up and run the SmartEmail Assistant project on Google Colab with GPU acceleration using Llama 2.\n",
    "\n",
    "## Important Setup Steps:\n",
    "1. Make sure to select a GPU runtime (Runtime > Change runtime type > GPU)\n",
    "2. You'll need a Hugging Face account with access to Llama 2\n",
    "3. Set up your Hugging Face token in the notebook"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Check GPU Availability"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "source": [
    "!nvidia-smi"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Install Required Libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "source": [
    "!pip install -q pandas numpy scikit-learn transformers torch peft datasets tqdm pyyaml wandb python-dotenv accelerate bitsandbytes sentencepiece protobuf huggingface_hub einops"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Set up Hugging Face Authentication\n",
    "\n",
    "You need to:\n",
    "1. Create a Hugging Face account at https://huggingface.co/\n",
    "2. Request access to Llama 2 at https://huggingface.co/meta-llama/Llama-2-7b-hf\n",
    "3. Create an access token at https://huggingface.co/settings/tokens"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "source": [
    "from huggingface_hub import login\n",
    "login()  # This will prompt for your token"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Clone the Repository"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "source": [
    "!git clone https://github.com/Kush402/smartemail-assistant.git\n",
    "%cd smartemail-assistant"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Upload Training Data\n",
    "\n",
    "Upload your `raw_emails.csv` file to the `data/raw` directory. You can do this by:\n",
    "1. Click on the folder icon in the left sidebar\n",
    "2. Navigate to `data/raw`\n",
    "3. Click the upload button and select your file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "source": [
    "# Create necessary directories\n",
    "!mkdir -p data/raw data/processed model/checkpoints model/peft_adapter"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Run the Training Pipeline\n",
    "\n",
    "Now that everything is set up, you can run the training pipeline. The notebook will:\n",
    "1. Load and preprocess your data\n",
    "2. Set up the Llama 2 model with LoRA\n",
    "3. Train the model\n",
    "4. Save the trained model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "source": [
    "# Import the training pipeline\n",
    "from notebooks.smartemail_pipeline import *\n",
    "\n",
    "# The notebook will automatically use GPU if available\n",
    "print(\"CUDA available:\", torch.cuda.is_available())\n",
    "if torch.cuda.is_available():\n",
    "    print(\"GPU count:\", torch.cuda.device_count())\n",
    "    print(\"GPU name:\", torch.cuda.get_device_name(0))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. Download the Trained Model\n",
    "\n",
    "After training is complete, you can download the model files:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "source": [
    "from google.colab import files\n",
    "\n",
    "# Create a zip file of the model directory\n",
    "!zip -r model.zip model/\n",
    "\n",
    "# Download the zip file\n",
    "files.download('model.zip')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Additional Notes\n",
    "\n",
    "1. Make sure to select a GPU runtime in Colab (Runtime > Change runtime type > GPU)\n",
    "2. The training process might take 2-3 hours depending on the dataset size\n",
    "3. You can monitor the training progress through the output and Weights & Biases dashboard\n",
    "4. The model checkpoints will be saved in the `model/checkpoints` directory\n",
    "5. The LoRA adapter will be saved in the `model/peft_adapter` directory\n",
    "\n",
    "### Troubleshooting\n",
    "\n",
    "If you encounter any issues:\n",
    "1. Make sure you're using a GPU runtime\n",
    "2. Check if all dependencies are installed correctly\n",
    "3. Verify that your training data is in the correct format\n",
    "4. Ensure you have enough disk space in Colab\n",
    "5. If you get import errors, try restarting the runtime after installing packages\n",
    "6. If you get CUDA out of memory errors, try reducing the batch size or using gradient accumulation\n",
    "7. Check the Colab logs for any specific error messages"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.0"
  },
  "accelerator": "GPU"
 },
 "nbformat": 4,
 "nbformat_minor": 4
}