# Notebook for Gmail Spam Filter Using LLM

In this example, we will show you how to use LLM to filter your spam gmails via uniflow.

### Before running the code

You will need to `uniflow` conda environment to run this notebook. You can set up the environment following the instruction: https://github.com/CambioML/uniflow/tree/main#installation.

Next, you will need a valid [Google API key](https://ai.google.dev/tutorials/setup) to run the code. Once you have the key, set it as the environment variable `GOOGLE_API_KEY` within a `.env` file in the root directory of this repository. For more details, see this [instruction](https://github.com/CambioML/uniflow/tree/main#api-keys)

Next, you will need a valid [OpenAI API key](https://platform.openai.com/api-keys) to run the code. Once you have the key, set it as the environment variable `OPENAI_API_KEY` within a `.env` file in the root directory of this repository. For more details, see this [instruction](https://github.com/CambioML/uniflow/tree/main#api-keys)

### Update system path

In [171]:
%reload_ext autoreload
%autoreload 2

import sys
import pprint

sys.path.append(".")
sys.path.append("..")
sys.path.append("../..")

In [172]:
from uniflow import Context
from uniflow.flow.client import ExtractClient
from uniflow.flow.config import ExtractGmailConfig
from uniflow.viz import Viz
from uniflow.flow.flow_factory import FlowFactory
from uniflow.flow.client import TransformClient
from uniflow.flow.config  import TransformGmailSpamConfig
from uniflow.op.model.model_config  import GoogleModelConfig, OpenAIModelConfig

from dotenv import load_dotenv
load_dotenv()


True

### Display the different flows

In [173]:
FlowFactory.list()

{'extract': ['ExtractHTMLFlow',
  'ExtractImageFlow',
  'ExtractIpynbFlow',
  'ExtractMarkdownFlow',
  'ExtractPDFFlow',
  'ExtractTxtFlow',
  'ExtractGmailFlow'],
 'transform': ['TransformAzureOpenAIFlow',
  'TransformCopyFlow',
  'TransformGoogleFlow',
  'TransformGoogleMultiModalModelFlow',
  'TransformHuggingFaceFlow',
  'TransformLMQGFlow',
  'TransformOpenAIFlow'],
 'rater': ['RaterFlow']}

### Initialize an `ExtractClient` with `ExtractGmailConfig` Config.

You will need to setup and download `credentials.json` following google workspace [instructions](https://developers.google.com/gmail/api/quickstart/python)

`extract_client` will extract the latest `10` unread email body and snippet.

In [174]:
extract_client = ExtractClient(
    ExtractGmailConfig(
        credentials_path="credentials.json",
        token_path="token.json",
        )
    )

In [175]:
extract_data = extract_client.run([{}])

100%|██████████| 1/1 [00:03<00:00,  3.51s/it]


### Initialize an `TransformClient` with `TransformGmailSpamConfig` Config.

`TransformGmailSpamConfig` contains instructions and few shots prompt regarding spam classification task.

`transform_client` will take the extract result from `extract_client` and further transform it with output contains classification label.

In [176]:
# Comment and uncomment to try both openai and google models
transform_client = TransformClient(
    TransformGmailSpamConfig(
        flow_name="TransformOpenAIFlow",
        model_config=OpenAIModelConfig(),
        # flow_name="TransformGoogleFlow",
        # model_config=GoogleModelConfig()
        )
    )

In [168]:
transform_data = []
for d in extract_data[0]['output'][0]:
    if d['body']:
        transform_data.append(Context(email=d['body'][:5000]))
    else:
        transform_data.append(Context(email=d['snippet'][:5000]))

In [169]:
transform_output = transform_client.run(transform_data)

  0%|          | 0/10 [00:00<?, ?it/s]

100%|██████████| 10/10 [00:08<00:00,  1.12it/s]


### Update corresponding email with label

In [170]:
from google.oauth2.credentials import Credentials
from googleapiclient.discovery import build

SPAM_LABEL = "Spam Email (AI Email Filter)"
NON_SPAM_LABEL = "Email (AI Email Filter)"

SCOPES = ["https://www.googleapis.com/auth/gmail.modify"]
creds = Credentials.from_authorized_user_file("token.json", SCOPES)
service = build("gmail", "v1", credentials=creds)


def get_label_id(service, label_name):
    labels = service.users().labels().list(userId='me').execute().get('labels', [])
    for label in labels:
        if label['name'] == label_name:
            return label['id']
    return None

SPAM_LABEL_ID = get_label_id(service, SPAM_LABEL)
NON_SPAM_LABEL_ID = get_label_id(service, NON_SPAM_LABEL)

for e, t in zip(extract_data[0]['output'][0], transform_output):
    # true if spam, false if not
    is_spam = "yes" in t['output'][0]['response'][0].lower()
    print(f"Email {e['email_id']} is spam: {is_spam}")
    email_id = e['email_id']
    label_id = SPAM_LABEL_ID if is_spam else NON_SPAM_LABEL_ID
    service.users().messages().modify(userId='me', id=e['email_id'], body={'addLabelIds': [label_id], 'removeLabelIds': []}).execute()

Email 18dfc3488fc902f1 is spam: False
Email 18dfc1ef230f2165 is spam: True
Email 18dfc1153607218b is spam: False
Email 18dfbdae16df6616 is spam: False
Email 18dfb65c017999d8 is spam: False
Email 18dfb383083d31c4 is spam: False
Email 18dfb3609af5acc7 is spam: False
Email 18dfb3282cdd9716 is spam: True
Email 18dfb151d492a69f is spam: False
Email 18dfafdd5ebbc628 is spam: False
