# Panza (with Mistral-7B)
In this tutorial we will demonstrate how to create your personal email assistant by efficiently fine-tuning a Mistral-7B model on your own emails.

## Preparation

First things first, clone the Panza repository by running the following cell.

In [None]:
!git clone https://github.com/IST-DASLab/PanzaMail.git
%cd PanzaMail/scripts/

Now run the cell bellow to install all the required packages (ignore warnings). This may take a while (up to 10 minutes), so please be patient!
**You may get a message saying that some packages used by colab are updated and this might cause a crash. This is fine, you can simply dismiss the message!**

In [None]:
!git clone https://github.com/IST-DASLab/spops.git
%cd spops
!sed -i -e 's/sm_80/sm_75/g' setup.py
!pip install -e .
%cd ..

!pip install langdetect langchain langchain-community sentence-transformers faiss-cpu fire nltk gradio cmake packaging
!pip install git+https://github.com/IST-DASLab/llm-foundry
!pip install git+https://github.com/IST-DASLab/peft-rosa.git@grad_quant

Finally, log into your `huggingface` account to access the Mistral-7B model and `wandb` account to enable logging.

In [None]:
!huggingface-cli login
!wandb login

## Download your sent emails
**If you want to try Panza on a synthetic dataset, you can skip this step.**

In order to train your personal email assistant, you need to download your sent emails. Please follow the instructions [here](https://github.com/IST-DASLab/PanzaMail?tab=readme-ov-file#step-0-download-your-sent-emails) and place the final `Send.mbox` file on your google drive in a `panza/` directory. Then run the following cell to mount your drive and copy the `mbox` file over to your local storage on colab.

In [None]:
from google.colab import drive
drive.mount('/content/drive')
%cp ../../drive/MyDrive/panza/Sent.mbox ../data/Sent.mbox

## Configuration
Now from the left panel, open the file `PanzaMail/scripts/config.sh` and configure the parameters according to [this set of instructions](https://github.com/IST-DASLab/PanzaMail?tab=readme-ov-file#step-1-environment-configuration). Additionally, you would want to edit your prompt preambles (under `PanzaMail/prompt_preambles`).

**Make sure to set `MODEL_PRECISION=4bit` and `PANZA_GENERATIVE_MODEL="mistralai/Mistral-7B-Instruct-v0.2"`, since this is the only setting that fits into colab GPU.**

## Email Extraction
**The cell bellow by default simply copies a synthetic set of emails to a specific location to be used later to prepare the dataset for fine-tuning. I case you want to use your own emails, uncomment the first line and comment the second line instead.**

Run the following cell to extract emails from the `.mbox` file. Read more [here](https://github.com/IST-DASLab/PanzaMail?tab=readme-ov-file#step-2-extract-emails).


In [None]:
#!./extract_emails.sh
!source config.sh && cp ../data/Don_Quijote_Emails.jsonl ../data/${PANZA_USERNAME}_clean.jsonl

## Dataset Preparation
Run the following command to prepare your dataset (explained [here](https://github.com/IST-DASLab/PanzaMail?tab=readme-ov-file#step-3-prepare-dataset)).

In [None]:
!./prepare_dataset.sh LOAD_IN_4BIT=1 RUN_FP32=1

## Fine-tune the model on your data!
Now you are ready to train your model! The following cell will start the training. This may take a while (up to a few hours, depending on your data size), so please be patient!

In [None]:
!./train_rosa.sh CONFIG=../src/panza/finetuning/configs/rosa_panza_colab.yaml

## Your Panza is ready!
You can find your trained Panza model in `PanzaMail/checkpoints/models`. Consider moving the trained model to your google drive by running the following cell. Note that you are only storing a [RoSA adapter](https://arxiv.org/abs/2401.04679) on top of the base model, so it is not going to take up much space.

In [None]:
%cp -r ../checkpoints/models ../../drive/MyDrive/panza/

Now you can run the cell bellow to start giving instructions to your Panza!
Please find the model path in `PanzaMail/checkpoints/models` and pass it in as the `MODEL` argument.

In [None]:
!./run_panza_cli.sh MODEL=/path/to/your/model/