# DocAI Dialogflow CX & Email Ingestion Tool

* Author: docai-incubator@google.com

# Disclaimer
This tool is not supported by the Google engineering team or product team. It is provided and supported on a best-effort basis by the **DocAI Incubator Team**. No guarantees of performance are implied.

# Objective: 

1. Dialogflow Email Integration tool which can be used to fetch the mail, analyze the body to get the relevant gcp services  and send a response with the help of dialogflow agent as a mail. [DocAI Dialogflow CX]

2. DocAI Email ingestion via Cloud function which can be used to ingest all the emails with their body, subject and the attachments and store it in the cloud bucket. [Email Ingestion Tool]


# DocAI Dialogflow CX

## Step 1 : Enable Email Communication with a Dialogflow CX Agent


In this guide, you learn how to configure Google Cloud services for email communication with a Dialogflow CX Agent.

This solution leverages Gmail API Push Notifications to watch a Gmail or Google Workspace inbox that has been created as an email receiver for   Dialogflow CX agent.  you deploy a Cloud Function to handle communication between Gmail & Dialogflow, and Pub/Sub are configured to trigger the Cloud Function when new emails arrive. Cloud Firebase in Datastore Mode stores information related to messages, conversation threads, and corresponding Dialogflow sessions.

<img src='./images/dfcx_1.png' width=800 height=400>

### Prequisites :

* **Cloud Pub/Sub** : For triggering cloud function whenever new mail is received.
* **Gmail API** : For reading and sending the emails. 
* **Cloud Scheduler** : To schedule the cloud function for refreshing the token.
* **Cloud Functions (NodeJs)** : Using as backend to connect all the services 
* **Cloud Firestore** : A database storing hyperlinks for the gcp products and services which can be used when sending responses to the user.
* **Dialogflow CX** : To send the response according to the user query sent over email.
* **AutoML Natural Language - Entity Extraction** : Trained a model for the extraction of the signature of the sender from a mail.
* **AutoML Natural Language - Text Classification** : Trained a model to determine which GCP products and services are relevant to the questions asked by the users.

## Step 2 : Create a GCP Project

Create a new GCP Project or use an existing project that contains the Dialogflow CX Agent to be enabled for email conversations. For this guide, an agent export is available in this repository that  you can upload to Dialogflow CX.
### Enable Google Cloud APIs
Enable the following APIs in the project.
* Cloud Functions API
* Cloud Build API
* Cloud Firestore
* Cloud Pub/Sub
* AutoML Natural Language

## Step 3 : Download the Application Code

Cloud Shell provides  us command-line access to   Google Cloud Platform resources directly from the browser.  you can use Cloud Shell to execute the terminal commands required to deploy this solution.

To open Google Cloud Shell, click the Activate Cloud Shell button on the top blue horizontal bar. A new panel appears at the bottom of the screen:
<img src='./images/dfcx_3.png' width=800 height=400>

```bash
git clone https://github.com/GoogleCloudPlatform/dialogflow-email-agent-demo.git
```

## Step 4 : Restore Dialogflow CX Demo Support Agent

First, create a new Dialogflow CX agent by following the instructions here: Create an Agent. Then, follow the instructions to restore an agent using the agent export available in this repository: dialogflow-support-agent.blob. Name the agent something like "GCP Support Agent" and leave the defaults, using us-central1 as the location. Cloud Shell allows us to download files -  you can use this feature to download dialogflow-support-agent.blob and upload into Dialogflow CX.

This is what the agent should look like after restore from dialogflow-support-agent.blob.

<img src='./images/dfcx_4.png' width=800 height=400>


## Step 5 : Train ML Models Using AutoML Natural Language

### Email Signature Extraction

The BC3: British Columbia Conversation Corpora is a public dataset of emails containing signatures that you can use to train a basic machine learning model for email signature extraction. This model is for demo purposes only - ideally there would be a larger quantity of training data to improve the signature recognition capability.Incubator team already prepared annotated training data in a format that AutoML Natural Language Entity Extraction accept. Learn more about preparing data for AutoML here.

If  you need to create our own json file using the raw .xml file provided by the University of British Columbia,  you can follow along with the Colab Notebook included in this repository: Training_Data_for_Signature_Extraction.ipynb. Once  you have the .json file uploaded to AutoML,  you need to use AutoML to annotate all of the signatures in the dataset or submit a data labeling request.

Otherwise, continue with the instructions to use the annotated data you provided to us.
1. Create a storage bucket and upload the email signature training data to Cloud Storage using the following commands. 

```bash
export BUCKET_NAME=new-storage-bucket-name
cd dialogflow-email-agent-demo
unzip bc3_annotated_email_data.zip
sed "s/ -storage-bucket/$BUCKET_NAME/g" ./bc3_annotated_email_data/text_extraction_template.csv > ./bc3_annotated_email_data/text_extraction.csv
gsutil mb gs://$BUCKET_NAME
gsutil -m cp -r bc3_annotated_email_data gs://$BUCKET_NAME/bc3_annotated_email_data
```

2. Navigate to Natural Language > AutoML Entity Extraction > Get Started using the Cloud Console navigation menu.
<img src='./images/dfcx_5_2.png' width=800 height=400>

3. Create a new dataset with a name like bc3_email_data for AutoML Entity Extraction. Select the bc3_annotated_email_data/text_extraction.csv file in the cloud storage location from the step above to import the training data. The csv file points to the individual json files that were also uploaded to the cloud storage bucket.
<img src='./images/dfcx_5_3i.png' width=800 height=400>  
<img src='./images/dfcx_5_3ii.png' width=800 height=400>  

4. Once the dataset has been created, navigate to the Train tab and train a new model. Leave the box checked to deploy the model after training finishes.
<img src='./images/dfcx_5_4.png' width=800 height=400>  
5. When training completes, you able to test and deploy the model for integration into our email application using the Test & Deploy tab.

## Step 6 : Email Topic Classification


Next you trains a model in our project to classify emails and determine which GCP products and services are relevant to the questions asked by your users. This is safe to do while the entity extraction model above is training. Training can take up to an hour to complete. Google Cloud provides public datasets that can be used to create training datasets for some problems. Here you can use the public StackOverflow dataset filtered on posts with Google or Dialogflow in the tag. These posts are similar in nature to support emails that you receive for Google Cloud products and services.

If  you would like to see how our training data was formatted for AutoML Natural Language,  you can follow along with the Colab Notebook included in this repository: StackOverflow_Topic_Classification.ipynb. Labels that have less than 100 items needs to be removed.

Otherwise, continue with the instructions to use the prelabeled dataset that you exported from our demo environment.

1. Create a storage bucket and upload the StackOverflow training data to Cloud Storage using the following commands.
```bash
export BUCKET_NAME=new-storage-bucket-name
cd dialogflow-email-agent-demo
unzip stackoverflow_train_data.zip
sed "s/ -storage-bucket/$BUCKET_NAME/g" ./stackoverflow_train_data/text_classification_template.csv > ./stackoverflow_train_data/text_classification.csv
gsutil mb -l us-central1 gs://$BUCKET_NAME
gsutil -m cp -r stackoverflow_train_data gs://$BUCKET_NAME/stackoverflow_train_data
``` 

2. Navigate to Natural Language > AutoML Text & Document Classification using the Cloud Console navigation menu. Create a new dataset for Multi-label Classification and give it a name like stackoverflow_topic_classifier.  
<img src='./images/dfcx_6_2.png' width=800 height=400>  

3. Import the training data by browsing to the text_classification.csv file in the storage bucket that  you created in the prior step.  
<img src='./images/dfcx_6_3.png' width=800 height=400>  

4. Once the data has been imported, navigate to the Train tab and train a new model. Leave the box checked to deploy the model after training finishes.  
<img src='./images/dfcx_6_4.png' width=800 height=400>  

5. When training completes,  you are able to test and deploy the model for integration into our email application using the Test & Deploy tab.

## Step 7 : Configure Pub/Sub and Gmail Push Notifications

### Create a PubSub Topic

In this section,  you create a PubSub Topic to receive the Gmail API Push Notifications.

1. In the GCP Project, use the Navigation menu to locate Pub/Sub and create a new Topic. For this guide,you can use gmail-inbox-watch. Uncheck the box to “Add a default subscription.”  you can create a subscription for Cloud Functions in a later step.  
<img src='./images/dfcx_7_1.png' width=800 height=400>  

2. Additionally,  you must give the Gmail API permission to send messages to   Pub/Sub topic: click the context menu of the topic  you just created (three vertical dots), and choose View permissions.  
<img src='./images/dfcx_7_2.png' width=800 height=400>  

3. Click Add members, specify gmail-api-push@system.gserviceaccount.com as a new member, and give it the role of Pub/Sub > Pub/Sub Publisher; lastly, click Save to apply the changes.  
<img src='./images/dfcx_7_3.png' width=800 height=400>  

## Step 8 : Create or Use Existing Gmail or Google Workspace Account

Create or use an existing Gmail or Google Workspace account to act on behalf of our Dialogflow CX Agent. You can  communicate with our agent by sending an email to this address, and our application sends a response using this account and the Gmail API.

In the next section,  you see how to configure access for the application to this Gmail or Google Workspace account inbox.


## Step 9 : Enable Gmail API & Create an OAuth 2.0 Client


In this section,  you enable the Gmail API in   GCP Project, create an OAuth 2.0 client, and configure Gmail Push Notifications against  our receiving inbox for Dialogflow CX integration. This guide leverages some steps outlined in Implementing Server-Side Authorization and Push Notifications.

To get started using Gmail API,  you need to enable the API and create credentials for the application.

1. Select   GCP Project in the first window.  

2. Navigate to APIs & Services > Credentials and select "Create Credentials" and "Help Me Choose" to create OAuth 2.0 credentials. Specify the Gmail API and your  Data to create an OAuth 2.0 Client. Then, click Next.  
<img src='./images/dfcx_9_2.png' width=800 height=400>  

3. In the next screen, give the App a name and provide a support email address and developer contact address.  
<img src='./images/dfcx_9_3.png' width=800 height=400>  

4. In the next screen, continue without defining Scopes. Scopes are defined in the application code when authorization is requested.  
<img src='./images/dfcx_9_4.png' width=800 height=400>  

Next, select the Desktop app as the Application Type and give it a name like “Desktop Authorization App.”

**NOTE**:  In this example, you perform 2 one-time authorizations locally to create the access & refresh tokens, which are included in our deployments to GCP. One authorization is for a Python Scheduled Task to renew the Gmail Push Notification & Pub/Sub integration daily. The second authorization is for the NodeJs app that are parsing & sending emails on behalf of   Dialogflow CX agent.

Only the one-time authorizations are needed to establish the connection between   Gmail or Google Workspace inbox for the application unless access scopes are changed or the OAuth Client Id changes.

Next, click Download to save the OAuth client secret for use in a later step. Then click Done.  
<img src='./images/dfcx_9_5.png' width=800 height=400>  


## Step 10: Enable Gmail Push Notifications

In this section,  you execute a Python script locally to generate an access token for the Gmail API and enable Gmail Push Notifications to the Pub/Sub Topic created earlier.

1. Locate the Oauth client secrets file downloaded in the prior step and upload the file to Cloud Shell.  
<img src='./images/dfcx_10_1.png' width=800 height=400>  

2. The file uploaded to the home directory in Cloud Shell. Let’s rename the file and move it to the working directory for this section with the following command.  
```bash
cp ~/client_secret_*.apps.googleusercontent.com.json ~/dialogflow-email-integration/enable_push_notifications/client_credentials.json
```

3. Next, open the Cloud Shell Editor and the file dialogflow-email-integration/enable_push_notifications/config.yaml. Update GMAIL_ID with the email address for the Gmail or Google Workspace inbox  you use for Dialogflow integration. Save the change and return to the Cloud Shell Terminal.  
<img src='./images/dfcx_10_3.png' width=800 height=400>  

4. Run the following commands to authorize the Gmail API and enable Push Notifications to Pub/Sub. Follow the prompts to allow access to the email account specified.
```bash
cd ~/dialogflow-email-agent-demo/gmail_push_notifications
pip3 install -r requirements.txt
export GCP_PROJECT= -project-id
export PUBSUB_TOPIC=gmail-inbox-watch
export GMAIL_ID= -dialogflow-inbox@gmail.com
python3 -c 'import main; main.main()'
```

<img src='./images/dfcx_10_4i.png' width=800 height=400>  

A response like the following shows that you are successful. Notice that a token.json was stored in the working directory. This contains the refresh token for our server-side application.

<img src='./images/dfcx_10_4ii.png' width=800 height=400>  

```bash
cd ~/dialogflow-email-agent-demo/gmail_push_notifications
gcloud functions deploy renew-gmail-watch --entry-point main --runtime python39 --trigger-topic renew-gmail-watch --env-vars-file config.yaml --project  -project-id
```

Navigate to Cloud Scheduler and create a new job. Choose a location based on   needs, a daily frequency (the screenshot below shows daily at 1am CDT)

Configure the following attributes as shown in the screenshot below.  
<img src='./images/dfcx_10_4iii.png' width=800 height=400>  

Configure advanced settings and deploy.  
<img src='./images/dfcx_10_4iv.png' width=800 height=400>  


## Step 11 : Deploy a NodeJS Email Integration Service

Finally you need to deploy a NodeJS application using Cloud Functions that handles processing of emails between Dialogflow and Gmail. The source code for the NodeJS integration service can be found in ~/dialogflow-email-agent-demo/df_integration_service.

1. First, run the following command to copy the OAuth client credentials into this directory, which is used to generate a token for the Gmail API.
```bash
cp ~/client_secret_*.apps.googleusercontent.com.json ~/dialogflow-email-agent-demo/df_integration_service/credentials.json
```

2. Update ~/dialogflow-email-agent-demo/df_integration_service/config.yaml     with the appropriate values using the Cloud Shell Editor. Here is an explanation of the variables that need to be populated.

* **GMAIL_ID** : The email address of the agent / support gmail inbox.
* **GCP_PROJECT** :   GCP project id.
* **LOCATION** : The location of the agent which can be found in the Dialogflow CX console. ex. us-central1
* **AGENT_ID** : The id of an agent which can be found in the Dialogflow CX console.
* **SUBJECT_KEY** : The subject you needs to use in order for a response to be sent from the application. This prevents unwanted emails from being sent by our application.
* **ENTITY_EXTRACT_MODEL_ID** : The id of the entity extraction model which can be found in the AutoML Natural Language console for the deployed model.
* **TOPIC_CLASSIFY_MODEL_ID** : The id of the topic classification model which can be found in the AutoML Natural Language console for the deployed model.

3. Next, execute the following commands to generate the access token. If the token.json file has been generated then  you are ready to deploy.

```bash
cd ~/dialogflow-email-agent-demo/df_integration_service/  
export GMAIL_ID= -agent-email@gmail.com  
export GCP_PROJECT= -project-id  
export LOCATION=us-central1  
npm install node 
const SCOPES = ['https://www.googleapis.com/auth/gmail.readonly', 'https://www.googleapis.com/auth/gmail.compose']; 
var gmailHelper = require('./gmail_auth_helper.js'); 
const gmail = gmailHelper.newClient('credentials.json', SCOPES); 
```

4. Finally, execute the following to deploy the Cloud Function.

```bash
gcloud functions deploy main --runtime=nodejs14 --trigger-topic=gmail-inbox-watch --env-vars-file=config.yaml --project=$GCP_PROJECT
```

## Step 12 : Upload Knowledgebase of Reference Links

When our AutoML model discovers relevant GCP products within the incoming email, you can lookup reference materials for these products using Cloud Firestore in Database Mode. An exported Datastore entity has been included with the repository for upload to Cloud Datastore to simplify setup of this demo.

1. First, upload the demo knowledge base to Cloud Storage.
```bash
export BUCKET_NAME=new-storage-bucket-name
cd edialogflow-email-agent-demo
unzip datastore-knowledgebase.zip
gsutil mb -l us-central1 gs://$BUCKET_NAME
gsutil -m cp -r datastore-knowledgebase gs://$BUCKET_NAME/datastore-knowledgebase
```

2. Next, navigate to Datastore from the console navigation menu. Use the Import/Export pane to import the knowledgebase data that  you uploaded to Cloud Storage in the prior step. Specify knowledgeBase as the Datastore Kind when performing this import. See the screenshot below.
<img src='./images/dfcx_12.png' width=800 height=400>  


## Step 13 : Test the Application

1. Send an email to the agent email address with the subject that  you specified as the "SUBJECT_KEY" in the section above. Try the following message:
Hi,

I'm having problems with an App Engine service and Cloud Storage bucket. My application isn't authenticating users correctly. I'd also like to cancel a prior request that I had created.

Thanks, Greg

2. The agent should respond with something like…
<img src='./images/dfcx_13_2.png' width=800 height=400>  

3. Provide a reference number for the request to cancel.
I'd like to cancel request #123456
Best, Greg

4.The final response from the agent should look something like this.
<img src='./images/dfcx_13_4.png' width=800 height=400>  


# Email Ingestion Tool

<img src='./images/eit.png' width=800 height=400>  

### Prequisites :
* Gmail API
* Cloud Scheduler
* Cloud Functions (Python)

### Enable Google Cloud APIs
Enable the following APIs in the project.
* Cloud Functions API
* Cloud Scheduler 
* Gmail API


## Step 1 : Create Service Account


1. Navigate to the IAM & Admin > Service Account. Click on create service account.  
<img src='./images/eit_1_1.png' width=800 height=400>  

2. Provide a name for the service account and click on done.  
<img src='./images/eit_1_2.png' width=800 height=400>   

For additional information about creating service account [Click here](https://cloud.google.com/iam/docs/creating-managing-service-accounts)

**NOTE**: Provide the Cloud Function ( cloudfunctions.invoker ) and Cloud Storage (Storage Object Creator -  roles/storage.objectCreator to create files and 
Storage Object Viewer -  roles/storage.objectViewer to read files)
 permission to this service account.


## Step 2 : Enable Gmail API & Create an OAuth 2.0 Client

In this section,  you enable the Gmail API in   GCP Project, create an OAuth 2.0 client, and configure Gmail Push Notifications against receiving inboxes for Dialogflow CX integration. This guide leverages some steps outlined in Implementing Server-Side Authorization and Push Notifications.

To get started using Gmail API,  you need to enable the API and create credentials for the application.

1. Search APIs & Services in the search box and click on Enable APIS and Service Button shown in the screenshot.  
<img src='./images/eit_2_1.png' width=800 height=400>

2. Search for Gmail API and select it.  
<img src='./images/eit_2_2.png' width=800 height=400>

3. Enable the API if  you see the enable button instead of the manage button.  
<img src='./images/eit_2_3.png' width=800 height=400>

4. Navigate to APIs & Services > Credentials  
<img src='./images/eit_2_4.png' width=800 height=400>

5. select "Create Credentials" and "Help Me Choose" to create OAuth 2.0 credentials.  
<img src='./images/eit_2_5.png' width=800 height=400>

6. Specify the Gmail API and your  Data to create an OAuth 2.0 Client. Then, click Next.  
<img src='./images/eit_2_6.png' width=800 height=400>

7. Click on Save and continue.  
<img src='./images/eit_2_7.png' width=800 height=400>

8. Select the Desktop App from dropdown and enter the name and click on create.  
<img src='./images/eit_2_8.png' width=800 height=400>

9. Download the json file by clicking the download button and click on done  
<img src='./images/eit_2_9.png' width=800 height=400>

Rename this file to `client_credentials.json`. This is needed in the next step.

## Step 3 : Create Token

In this section you create a token for the first time for our script.

1. Here is the file structure for creating the token. Copy the code with the file name mentioned above the respective code block in cloud shell.  
<img src='./images/eit_3_1i.png' width=800 height=400>  

### File Structure for creating token 
#### client_credentials.json  
`Paste the file which  you downloaded in above step`


#### main.py

In [9]:
from __future__ import print_function
import os
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
import json
from google.cloud import storage


TOKEN_FILE = os.environ.get("TOKEN_FILE")
OAUTH_CLIENT_CREDS = os.environ.get("OAUTH_CLIENT_CREDS")
BUCKET_NAME = os.environ.get("GCP_BUCKET")
# If modifying these scopes, delete the file token.json.
SCOPES = ["https://www.googleapis.com/auth/gmail.readonly"]


def read_token():
    try:
        storage_client = storage.Client()
        bucket = storage_client.get_bucket(BUCKET_NAME)
        blob = bucket.blob(TOKEN_FILE)

        # Download the contents of the blob as a string and then parse it using json.loads() method

        token = json.loads(json.loads((blob.download_as_text(client=None))))
    except:
        token = None
    return token


def main(*args, **kwargs):
    creds = None
    # getting token from bucket
    token_as_json = read_token()
    if token_as_json:
        creds = Credentials(
            token=token_as_json.get("token"),
            refresh_token=token_as_json.get("refresh_token"),
            token_uri=token_as_json.get("token_uri"),
            client_id=token_as_json.get("client_id"),
            client_secret=token_as_json.get("client_secret"),
        )

    # if os.path.exists(TOKEN_FILE):
    #     creds = Credentials.from_authorized_user_file(TOKEN_FILE, SCOPES)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(OAUTH_CLIENT_CREDS, SCOPES)
            creds = flow.run_console()
        # Save the credentials for the next run
        storage_client = storage.Client()
        bucket = storage_client.bucket(BUCKET_NAME)
        blob = bucket.blob(TOKEN_FILE)
        blob.upload_from_string(
            data=json.dumps(creds.to_json()), content_type="application/json"
        )
    print("Token Saved.")
    return ""

#### requirements.txt 
```python
google-api-python-client
google-auth
google-auth-oauthlib
google-cloud-storage


File structure should look like this.  
<img src='./images/eit_3_1ii.png' width=800 height=400>  

2. Run this command in cloud shell
```bash
pip3 install -r requirements.txt
export TOKEN_FILE=token-file-location/token.json
export OAUTH_CLIENT_CREDS=client_credentials.json
export GCP_BUCKET=bucket-name
python3 -c 'import main; main.main()'
```

3. This creates an authentication link shown in image,click on the link.  
<img src='./images/eit_3_3.png' width=800 height=400>  

4. Select your account   
<img src='./images/eit_3_4.png' width=800 height=400>  

5. Click on the allow button.  
<img src='./images/eit_3_5.png' width=800 height=400>  

6. Copy the authentication code.  
<img src='./images/eit_3_6.png' width=800 height=400>  

7. Paste the code in the terminal and hit enter.  
<img src='./images/eit_3_7.png' width=800 height=400>  

This creates the token.json file in the storage bucket location which  you mentioned.


## Step 4 : Deploy on cloud function

### File Structure for deployment 
1. Copy the code with the file name mentioned above the respective code block.

#### client_credentials.json 
`'replace this file with  client credentials file downloaded earlier'`

#### config.yaml
```yaml
OAUTH_CLIENT_CREDS: "client_credentials.json"
TOKEN_FILE: "path-of-json-file"
GCP_BUCKET: " -bucket-name"
OUTPUT_PATH: "mail_dataset_v2"
```

**NOTE**: provide the `TOKEN_FILE` bucket location where token.json is downloaded in step 2.

#### requirements.txt
```python
google-api-python-client
google-auth
google-auth-oauthlib
google-cloud-storage
Flask
functions-framework
```

#### main.py

In [12]:
# import the required libraries
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
import pickle
import os.path
import base64
import email
from google.oauth2.credentials import Credentials
from google.cloud import storage
from datetime import date, timedelta
import mimetypes
import json
import datetime
from flask import escape
import functions_framework


# Scope only to read mails
SCOPES = ["https://www.googleapis.com/auth/gmail.readonly"]
# USER_ID = "mohammadzaida@google.com"
OAUTH_CLIENT_CREDS = os.environ.get("OAUTH_CLIENT_CREDS")
BUCKET_NAME = os.environ.get("GCP_BUCKET")
TOKEN_FILE = os.environ.get("TOKEN_FILE")
OUTPUT_PATH = os.environ.get("OUTPUT_PATH")


def upload_blob(source_file_name, data, data_type):
    client = storage.Client()
    bucket = client.get_bucket(BUCKET_NAME)
    blob = bucket.blob(source_file_name)
    blob.upload_from_string(data, content_type=data_type)

    return "uploaded"


def read_token():
    try:
        storage_client = storage.Client()
        bucket = storage_client.get_bucket(BUCKET_NAME)
        blob = bucket.blob(TOKEN_FILE)

        # Download the contents of the blob as a string and then parse it using json.loads() method

        token = json.loads(json.loads((blob.download_as_text(client=None))))

    except:
        token = {}
    return token


def refresh_token():
    creds = None
    # getting token from bucket
    token_as_json = read_token()
    if token_as_json:
        creds = Credentials(
            token=token_as_json.get("token"),
            refresh_token=token_as_json.get("refresh_token"),
            token_uri=token_as_json.get("token_uri"),
            client_id=token_as_json.get("client_id"),
            client_secret=token_as_json.get("client_secret"),
        )

    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(OAUTH_CLIENT_CREDS, SCOPES)
            creds = flow.run_console()
        # Save the credentials for the next run
        upload_blob(TOKEN_FILE, json.dumps(creds.to_json()), "application/json")
    print("Token Refreshed.")
    return creds

    print("Token Refreshed.")
    return creds


def get_attachments(service, msg_id):
    try:
        message = service.users().messages().get(userId="me", id=msg_id).execute()

        for part in message["payload"]["parts"]:
            if part["filename"]:
                if "data" in part["body"]:
                    data = part["body"]["data"]
                else:
                    att_id = part["body"]["attachmentId"]
                    att = (
                        service.users()
                        .messages()
                        .attachments()
                        .get(userId="me", messageId=msg_id, id=att_id)
                        .execute()
                    )
                    data = att["data"]
                file_data = base64.urlsafe_b64decode(data.encode("UTF-8"))
                path = part["filename"]

                try:
                    file_type = mimetypes.guess_type(path)[0]
                    if file_type == None:
                        file_type = "text/plain"
                except:
                    file_type = "text/plain"
                print("\t", file_type)
                upload_blob(f"{OUTPUT_PATH}/{msg_id}/{path}", file_data, file_type)

    except Exception as error:
        print("Attachment error", error)


def get_message(service, msg_id):
    data = ""
    try:
        message = service.users().messages().get(userId="me", id=msg_id).execute()
        headers = message["payload"]["headers"]
        subject = [i["value"] for i in headers if i["name"] == "Subject"]
        subject = subject[0] if subject else ""
        if message["payload"]["mimeType"] == "multipart/mixed":
            for part in message["payload"]["parts"]:
                for sub_part in part["parts"]:
                    if sub_part["mimeType"] == "text/plain":
                        data = sub_part["body"]["data"]
                        break
                if data:
                    break
        else:
            for part in message["payload"]["parts"]:
                if part["mimeType"] == "text/plain":
                    data = part["body"]["data"]
                    break

        # content = base64.b64decode(data).decode('utf-8')
        content = data
        # content = base64.b64decode(data).decode('utf-8',errors='ignore')
        content = base64.urlsafe_b64decode(content.encode("UTF-8"))
        upload_blob(f"{OUTPUT_PATH}/{msg_id}/body.txt", content, "text/plain")
        # subject = base64.b64decode(subject).decode('utf-8',errors='ignore')
        upload_blob(f"{OUTPUT_PATH}/{msg_id}/subject.txt", subject, "text/plain")
        return content
    except Exception as error:
        print("mail body Error", error)


@functions_framework.http
def email_ingestion(request, *args, **kwargs):
    creds = refresh_token()
    # Connect to the Gmail API
    service = build("gmail", "v1", credentials=creds)
    today = datetime.datetime.today()
    yesterday = today - timedelta(1)

    # query = "before: {0} after: {1}".format(today.strftime('%Y/%m/%d'),
    #                                     yesterday.strftime('%Y/%m/%d'))
    query = "before:{0} after:{1}".format(
        int(today.timestamp()), int(yesterday.timestamp())
    )

    # query = "after:{0}".format(int(today.timestamp()))
    # query = "after:{0}".format(today.strftime('%Y/%m/%d'))
    print(query)
    # query = ""
    # request a list of all the messages
    # default message size is 100
    # you can also pass maxResults to get any number of emails. Like this:
    # result = service.users().messages().list(maxResults=5, userId='me').execute()
    result = (
        service.users().messages().list(maxResults=500, userId="me", q=query).execute()
    )

    messages = result.get("messages")
    if not messages:
        print("No email found OR Something went wrong")
        return ""
    # messages is a list of dictionaries where each dictionary contains a message id.

    # iterate through all the messages
    print("No of emails : ", len(messages))
    for msg in messages:
        print(msg["id"])
        get_message(service, msg["id"])
        get_attachments(service, msg["id"])

    print("Finish")
    return ""

File structure should look like this.  
<img src='./images/eit_4_1.png' width=800 height=400>  

2. Deploy the cloud function by this command and provide the service account which you created in step 1.  
<img src='./images/eit_4_2.png' width=800 height=400>  

```bash
gcloud functions deploy email_ingestion --runtime=python39  --env-vars-file=config.yaml --project= -project-name --service-account= your-service-account --trigger-http
```

## Step 5 : Schedule By Cloud Scheduler

1. Search for Cloud Scheduler in the search box and click on the create job.  
<img src='./images/eit_5_1.png' width=800 height=400>  

2. Enter the name for the scheduler, enter the frequency and select the timezone as shown in image and click continue.  
<img src='./images/eit_5_2i.png' width=800 height=400>  

   Select http from the drop down.  
<img src='./images/eit_5_2ii.png' width=800 height=400>  

3. Copy the Cloud function url by following steps
    a. Navigate to the Cloud Function and click on the  cloud function which you created  
    <img src='./images/eit_5_3i.png' width=800 height=400>  
    
    b. Go to the trigger section and copy the URL.  
    <img src='./images/eit_5_3ii.png' width=800 height=400>  

4. Paste the url in   Cloud scheduler and select the GET as Http method.  
<img src='./images/eit_5_4.png' width=800 height=400>  

5. Select the Add OIDC token option from dropdown.  
<img src='./images/eit_5_5.png' width=800 height=400>  

6. Select the service account which  you created in step 1 and click on continue.  
<img src='./images/eit_5_6.png' width=800 height=400>  

7. Enter Max retry attempts (2) and enter the Max retry duration (5s). Click on create.  
<img src='./images/eit_5_7.png' width=800 height=400>  

8. Cloud Scheduler  has scheduled for the cloud function which runs every day at 12 AM.  
<img src='./images/eit_5_8.png' width=800 height=400>  

## Output 


Output from this script is stored in a bucket with folder name mentioned in config.yaml file as OUTPUT_PATH with respective mails IDs as folders inside it.  
<img src='./images/output_sample_1.png' width=800 height=400> 

Inside each folder there is  subject.txt containing the subject of that email, body.txt containing body of the mail and the attachments with their name in the email.  
<img src='./images/output_sample_2.png' width=800 height=400>  
