# EmailSearchAI

A generative search system for emails that helps organisations find and validate past decisions, strategies, and data in a huge corpus of email threads.

Steps:

1. Ingest emails and their metadata into a vector database.
2. Implement a retrieval-augmented generation (RAG) architecture.
3. Use embeddings to find relevant email threads.
4. Generate responses based on retrieved emails.


## The Embedding Layer

### About dataset
- The dataset is provided in CSV and Pickle formats for ease of use.
- Each thread contains multiple emails, allowing for analysis of conversation flow and decision-making processes.
- Human-generated summaries enable quick understanding and validation of thread content.
- Suitable for tasks such as search, summarization, and retrieval-augmented generation in enterprise settings.

**Email Thread Summary Dataset**

**Overview:**  
The Email Thread Dataset consists of two main files: `email_thread_details` and `email_thread_summaries`. These files collectively offer a comprehensive compilation of email thread information alongside human-generated summaries.

**Email Thread Details**  
*Description:*  
The `email_thread_details` file provides a detailed perspective on individual email threads, encompassing crucial information such as subject, timestamp, sender, recipients, and the content of the email.

*Columns:*
- `thread_id`: A unique identifier for each email thread.
- `subject`: Subject of the email thread.
- `timestamp`: Timestamp indicating when the message was sent.
- `from`: Sender of the email.
- `to`: List of recipients of the email.
- `body`: Content of the email message.

*Additional Information:*  
The "to" column is available in both CSV and Pickle (pkl) formats, facilitating convenient access to recipient information as a column of lists of strings.

**Email Thread Summaries**  
*Description:*  
The `email_thread_summaries` file contains concise summaries crafted by human annotators for each email thread, offering a high-level overview of the content.

*Columns:*
- `thread_id`: A unique identifier for each email thread.
- `summary`: A concise summary of the email thread.

**Dataset Structure:**  
The dataset is organized into threads and emails. There are a total of 4,167 threads and 21,684 emails, providing a rich source of information for analysis and research purposes.

- Threads: 4,167 threads  
- Emails: 21,684 emails

**Language:**  
- Languages: English (en)

In [153]:
import pandas as pd
import numpy as np
import torch
from transformers import AutoTokenizer, AutoModel
from sentence_transformers import SentenceTransformer
import chromadb
import openai
import tiktoken
from huggingface_hub import hf_hub_download
import matplotlib.pyplot as plt
import seaborn as sns
from bs4 import BeautifulSoup
from dotenv import load_dotenv
import os
import re

In [154]:
email_thread_details = pd.read_csv("./dataset/csv/trimmed_email_thread_details.csv")
email_thread_summaries = pd.read_csv("./dataset/csv/trimmed_email_thread_summaries.csv")

In [155]:
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

In [156]:
email_thread_details.head()

Unnamed: 0,thread_id,subject,timestamp,from,to,body
0,1,FW: Master Termination Log,2002-01-29 11:23:42,"Gossett, Jeffrey C. JGOSSET","['Giron', 'Darron C. Dgiron', 'Love', 'Phillip M. Plove']","\n\n -----Original Message-----\nFrom: =09Theriot, Kim S. =20\nSent:=09Tuesday, January 29, 2002 1:23 PM\nTo:=09Richardson, Stacey; Anderson, Diane; Gossett, Jeffrey C.; White, Stac=\ney W.; Murphy, Melissa; Hall, D. Todd; Sweeney, Kevin\nCc:=09Aucoin, Evelyn; Baxter, Bryce; Wynne, Rita\nSubject:=09FW: Master Termination Log\n\n\n\n -----Original Message-----\nFrom: =09Panus, Stephanie =20\nSent:=09Tuesday, January 29, 2002 11:39 AM\nTo:=09Adams, Laurel; Alonso, Tom; Aronowitz, Alan; Bailey, Susan; Balfour-F=\nlanagan, Cyndie; Baughman, Edward; Belden, Tim; Bishop, Serena; Brackett, D=\nebbie R.; Bradford, William S.; Browning, Mary Nell; Bruce, James; Bruce, M=\nichelle; Bruce, Robert; Buerkle, Jim; Calger, Christopher F.; Carrington, C=\nlara; Considine, Keith; Cordova, Karen A.; Crandall, Sean; Cutsforth, Diane=\n; Diamond, Russell; Dunton, Heather; Edison, Susan; Elafandi, Mo; Fischer, =\nMark; Flores, Nony; Fondren, Mark; Gorny, Vladimir; Gorte, David; Gresham, =\nWayne; Hagelmann, Bjorn; Hall, Steve C. (Legal); Harkness, Cynthia; Hendry,=\n Brent; Johnston, Greg; Keohane, Peter; Lindeman, Cheryl; Little, Kelli; Ma=\nllory, Chris; Mann, Kay; Mcginnis, Stephanie; McGrory, Robert; McMichael Jr=\n., Ed; Miller, Don (Asset Mktg); Moore, Janet H.; Moran, Tom; Murphy, Harla=\nn; Murray, Julia; Nemec, Gerald; Ogden, Mary; Otto, Randy; Page, Jonalan; P=\nostlethwaite, John; Prejean, Frank; Presto, Kevin M.; Puchot, Paul; Rasmuss=\nen, Dale; Richter, Brad; Richter, Jeff; Robison, Michael A.; Rohauer, Tanya=\n; Rosman, Stewart; Runswick, Stacy; Sacks, Edward; Scholtes, Diana; Shackle=\nton, Sara; Simons, Paul; Swinney, John; Thapar, Raj; Theriot, Kim S.; Thoma=\ns, Jake; Thome, Stephen; Tricoli, Carl; Van Hooser, Steve; Wente, Laura; Wi=\nlson, Shona; Winfree, O'Neal D.; Woodland, Andrea; Yoder, Christian\nSubject:=09Master Termination Log\n\nAttached is the Daily Termination List for January 25 as well as the Master=\n Termination Log, which incorporates all terminations received through Janu=\nary 25.\n\n =20\n\nThe following were previously on the Master Termination Log and have now be=\nen marked as ""Y"" for a valid termination:\n\nAtlantic Coast Fibers, Inc.=09=09=09ENA=09=09pulp/paper transactions\nCNC-Containers Corporation=09=09=09EPMI=09=09master power agreement\nPublic Utility District No. 1 of Chelan County=09EPMI=09=09deal no. 757497.=\n01\nConnect Energy Services, Inc.=09=09=09ENA=09=09liquids agreement\nNGL Supply, Inc. (including Premier=09=09ENA/EGLI=09physical & financial tr=\nansactions referenced\nEnergy Partners, a division of NGL Supply, Inc.)\nPlains Marketing, L.P.=09=09=09=09ERAC=09=09deal no. QG4563.1\nPlains Marketing, L.P.=09=09=09=09ERAC=09=09deal no. QG4482.2\n\nStephanie Panus\nEnron Wholesale Services\nph: 713.345.3249\nfax: 713.646.3490"
1,1,FW: Master Termination Log,2002-01-31 12:50:00,"Theriot, Kim S. KTHERIO","['Murphy', 'Melissa Mmurphy', 'Gossett', 'Jeffrey C. Jgosset', 'White', 'Stacey W. Swhite', 'Hall', 'D. Todd Thall', 'Sweeney', 'Kevin Ksweene', 'Anderson', 'Diane Danders2', 'Hunter', 'Larry Joe Jhunte2']","\n\n -----Original Message-----\nFrom: =09Panus, Stephanie =20\nSent:=09Thursday, January 31, 2002 12:08 PM\nTo:=09Adams, Laurel; Albrecht, Kristin; Alonso, Tom; Aronowitz, Alan; Baile=\ny, Susan; Balfour-Flanagan, Cyndie; Baughman, Edward; Belden, Tim; Bishop, =\nSerena; Boyd, Samantha; Brackett, Debbie R.; Bradford, William S.; Browning=\n, Mary Nell; Bruce, James; Bruce, Michelle; Bruce, Robert; Buerkle, Jim; Ca=\nlger, Christopher F.; Carrington, Clara; Considine, Keith; Cordova, Karen A=\n.; Crandall, Sean; Cutsforth, Diane; Diamond, Russell; Dunton, Heather; Edi=\nson, Susan; Elafandi, Mo; Fischer, Mark; Flores, Nony; Fondren, Mark; Gorny=\n, Vladimir; Gorte, David; Gresham, Wayne; Hagelmann, Bjorn; Hall, Steve C. =\n(Legal); Harkness, Cynthia; Hendry, Brent; Johnston, Greg; Keohane, Peter; =\nLindeman, Cheryl; Mallory, Chris; Mann, Kay; Mcginnis, Stephanie; McGrory, =\nRobert; McMichael Jr., Ed; Miller, Don (Asset Mktg); Moore, Janet H.; Moran=\n, Tom; Murphy, Harlan; Murray, Julia; Nemec, Gerald; Ogden, Mary; Page, Jon=\nalan; Postlethwaite, John; Prejean, Frank; Presto, Kevin M.; Puchot, Paul; =\nRasmussen, Dale; Richardson, Stacey; Richter, Brad; Richter, Jeff; Robison,=\n Michael A.; Rohauer, Tanya; Rosman, Stewart; Sacks, Edward; Scholtes, Dian=\na; Sevitz, Robert; Shackleton, Sara; Simons, Paul; Swinney, John; Thapar, R=\naj; Theriot, Kim S.; Thomas, Jake; Thome, Stephen; Tricoli, Carl; Van Hoose=\nr, Steve; Wente, Laura; Wilson, Shona; Winfree, O'Neal D.; Woodland, Andrea=\n; Yoder, Christian\nSubject:=09Master Termination Log\n\nAttached are the Daily Lists for January 29 and January 30 as well as the M=\naster Termination Log, which incorporates all terminations received through=\n January 30. Also, prepetition mutual terminations have been added to this=\n list. They are identified under ""Nature of Default"" as ""mutual terminatio=\nn"".\n\n =20\n\nStephanie Panus\nEnron Wholesale Services\nph: 713.345.3249\nfax: 713.646.3490"
2,1,FW: Master Termination Log,2002-02-05 15:03:35,"Theriot, Kim S. KTHERIO","['Murphy', 'Melissa Mmurphy', 'Anderson', 'Diane Danders2', 'White', 'Stacey W. Swhite', 'Gossett', 'Jeffrey C. Jgosset', 'Hall', 'D. Todd Thall', 'Sweeney', 'Kevin Ksweene', 'Aucoin', 'Evelyn Eaucoin', 'Baxter', 'Bryce Bbaxter']","Note to Stephanie Panus....\n\nStephanie...please remove my name as well as Melissa Murphy's from the dist=\nribution list below.\n\nPlease add the following:\n\nTodd D. Hall\nKevin Sweeney\nRita Wynne\nRebecca Grace\nRhonda Robinson\nKerri Thomspon\nKristin Albrecht\nTom Chapman\n\n\nThanks!\n\nKim Theriot\n\n -----Original Message-----\nFrom: =09Panus, Stephanie =20\nSent:=09Tuesday, February 05, 2002 8:18 AM\nTo:=09Adams, Laurel; Albrecht, Kristin; Alonso, Tom; Aronowitz, Alan; Baile=\ny, Susan; Balfour-Flanagan, Cyndie; Baughman, Edward; Belden, Tim; Bishop, =\nSerena; Boyd, Samantha; Brackett, Debbie R.; Bradford, William S.; Browning=\n, Mary Nell; Bruce, James; Bruce, Michelle; Bruce, Robert; Buerkle, Jim; Ca=\nlger, Christopher F.; Carrington, Clara; Chilkina, Elena; Considine, Keith;=\n Cordova, Karen A.; Crandall, Sean; Cutsforth, Diane; Diamond, Russell; Dun=\nton, Heather; Edison, Susan; Elafandi, Mo; Fischer, Mark; Flores, Nony; Fon=\ndren, Mark; Gorny, Vladimir; Gorte, David; Gresham, Wayne; Hagelmann, Bjorn=\n; Hall, Steve C. (Legal); Harkness, Cynthia; Hendry, Brent; Johnston, Greg;=\n Keohane, Peter; Lindeman, Cheryl; Mallory, Chris; Mann, Kay; Mcginnis, Ste=\nphanie; McGrory, Robert; McMichael Jr., Ed; Miller, Don (Asset Mktg); Moore=\n, Janet H.; Moran, Tom; Murphy, Harlan; Murray, Julia; Nemec, Gerald; Ogden=\n, Mary; Page, Jonalan; Postlethwaite, John; Prejean, Frank; Presto, Kevin M=\n.; Puchot, Paul; Rasmussen, Dale; Richardson, Stacey; Richter, Brad; Richte=\nr, Jeff; Robison, Michael A.; Rohauer, Tanya; Rosman, Stewart; Sacks, Edwar=\nd; Scholtes, Diana; Sevitz, Robert; Shackleton, Sara; Simons, Paul; Swinney=\n, John; Thapar, Raj; Theriot, Kim S.; Thomas, Jake; Thome, Stephen; Tricoli=\n, Carl; Van Hooser, Steve; Wente, Laura; Wilson, Shona; Winfree, O'Neal D.;=\n Woodland, Andrea; Yoder, Christian\nSubject:=09Master Termination Log\n\nAttached is the Daily List for January 31 as well as the Master Termination=\n Log, which incorporates all terminations received through January 31.\n\n =20\n\nStephanie Panus\nEnron Wholesale Services\nph: 713.345.3249\nfax: 713.646.3490"
3,1,FW: Master Termination Log,2002-02-05 15:06:25,"Theriot, Kim S. KTHERIO","['Hall', 'D. Todd Thall', 'Sweeney', 'Kevin Ksweene', 'Anderson', 'Diane Danders2', 'Gossett', 'Jeffrey C. Jgosset', 'White', 'Stacey W. Swhite', 'Murphy', 'Melissa Mmurphy']","\n\n -----Original Message-----\nFrom: =09Panus, Stephanie =20\nSent:=09Tuesday, February 05, 2002 4:59 PM\nTo:=09Adams, Laurel; Albrecht, Kristin; Alonso, Tom; Aronowitz, Alan; Baile=\ny, Susan; Balfour-Flanagan, Cyndie; Baughman, Edward; Belden, Tim; Bishop, =\nSerena; Boyd, Samantha; Brackett, Debbie R.; Bradford, William S.; Browning=\n, Mary Nell; Bruce, James; Bruce, Michelle; Bruce, Robert; Buerkle, Jim; Ca=\nlger, Christopher F.; Carrington, Clara; Chilkina, Elena; Considine, Keith;=\n Cordova, Karen A.; Crandall, Sean; Cutsforth, Diane; Diamond, Russell; Dun=\nton, Heather; Edison, Susan; Elafandi, Mo; Fischer, Mark; Flores, Nony; Fon=\ndren, Mark; Glover, Sheila; Gorny, Vladimir; Gorte, David; Gresham, Wayne; =\nHagelmann, Bjorn; Hall, Steve C. (Legal); Harkness, Cynthia; Hendry, Brent;=\n Johnston, Greg; Keohane, Peter; Lindeman, Cheryl; Mallory, Chris; Mann, Ka=\ny; Mcginnis, Stephanie; McGrory, Robert; McMichael Jr., Ed; Miller, Don (As=\nset Mktg); Moore, Janet H.; Moran, Tom; Murphy, Harlan; Murray, Julia; Neme=\nc, Gerald; Ogden, Mary; Page, Jonalan; Postlethwaite, John; Prejean, Frank;=\n Presto, Kevin M.; Puchot, Paul; Rasmussen, Dale; Richardson, Stacey; Richt=\ner, Brad; Richter, Jeff; Robison, Michael A.; Rohauer, Tanya; Rosman, Stewa=\nrt; Sacks, Edward; Scholtes, Diana; Sevitz, Robert; Shackleton, Sara; Simon=\ns, Paul; Swinney, John; Thapar, Raj; Theriot, Kim S.; Thomas, Jake; Thome, =\nStephen; Tricoli, Carl; Van Hooser, Steve; Wente, Laura; Wilson, Shona; Win=\nfree, O'Neal D.; Woodland, Andrea; Yoder, Christian\nSubject:=09Master Termination Log\n\nAttached is the Daily List for February 4 as well as the Master Termination=\n Log, which incorporates all termination received through February 4 (with =\nthe exception of February 1, which is under legal review and contains all f=\ninancial transactions).\n\n =20\n\nStephanie Panus\nEnron Wholesale Services\nph: 713.345.3249\nfax: 713.646.3490"
4,1,FW: Master Termination Log,2002-05-28 07:20:35,"Kelly, Katherine L. KKELLY","['Germany', 'Chris Cgerman']","\n\n -----Original Message-----\nFrom: =09McMichael Jr., Ed =20\nSent:=09Tuesday, May 28, 2002 8:15 AM\nTo:=09Lagrasta, Fred; Kelly, Katherine L.; Versen, Victoria\nSubject:=09FW: Master Termination Log\n\nPlease look into the CNG LDC (Hope Gas) termination 12/1 and the $66 MM set=\ntlement offer that is listed on the Letter Log below. Let me know what tha=\nt is after you figure it out. If you have any questions, please ask.\nEd =20\n\n -----Original Message-----\nFrom: =09Panus, Stephanie =20\nSent:=09Friday, May 24, 2002 3:49 PM\nTo:=09Adams, Laurel; Alon, Heather; Apollo, Beth; Arnold, Matthew; Aronowit=\nz, Alan; Bailey, Susan; Balfour-Flanagan, Cyndie; Barbe, Robin; Baughman, E=\ndward; Berryman, Kyle; Bolt, Laurel; Botello, Rose; Boudreau, Kara; Brennig=\n, Tammy; Bridges, Michael; Bruck, Sarah; Camarillo, Juan; Coleman, David; C=\nomeaux, Clinton; Concannon, Ruth; Cordova, Karen A.; Couch, Greg; Danaher, =\nPatrick; Darmitzel, Paul; Del vecchio, Peter; Despres, Dan; Dicarlo, Louis;=\n Edison, Susan; Elafandi, Mo; Fay, Ashley; Flores, Nony; Fowler, Kulvinder;=\n Garza, Maria; Germany, Chris; Gonzalez, Victor; Gorte, David; Grace, Rebec=\nca M.; Guillen, Andrea R.; Hagelmann, Bjorn; Haralson, Nancy L; Harkness, C=\nynthia; Herrera, Olga; Heuertz, Kelly; Hoang, Charlie; Johnson, Luchas; Kel=\nler, James E.; Lagrasta, Fred; Leuschen, Sam; Lindeman, Cheryl; Lowry, Donn=\na; Mann, Kay; Matheson, A.k.; Mausser, Gregory A.; McClure, Zakiyyah; McMic=\nhael Jr., Ed; Miller, Don (Asset Mktg); Moore, Janet H.; Moscoso, Michael E=\n.; Muench, Gayle W.; Murphy, Harlan; Murray, Julia; Nelson, Michelle; Polsk=\ny, Phil; Prejean, Frank; Puchot, Paul; Richard, Robert; Richardson, Stacey;=\n Roberson, Weezie ; Robison, Michael A.; Sacchi, Martin; Sayre, Frank; Sevi=\ntz, Robert; Shackleton, Sara; Sharma, Shifali; Shivers, Lynn; Shoup, Cynthi=\na; Smida, Ed; Stai, Aaron ; Sweeney, Kevin; Thapar, Raj; Thibaut, Dan; Tric=\noli, Carl; Versen, Victoria; Ward, Charles; Wilson, Shona; Wolgel, Fred\nSubject:=09Master Termination Log\n\nAttached is the Daily List for May 24, 2002 as well as the Master Terminati=\non Log, which incorporates all terminations received through May 24.\n\n =20\n\nStephanie Panus\nEnron Wholesale Services\nph: 713.345.3249\nfax: 713.646.3490"


In [157]:
email_thread_summaries['text_length'] = email_thread_summaries['summary'].apply(lambda x: len(x.split(' ')))

In [158]:
# Count number of emails per thread
email_thread_counts = email_thread_details.groupby('thread_id').size()
email_thread_summaries['num_emails'] = email_thread_summaries['thread_id'].map(email_thread_counts)

In [159]:
# Use an LLM to generate a summary of the related emails and people involved in the conversations

import json

def generate_response(related_emails, existing_summary):
    response = openai.chat.completions.create(
        model="gpt-4.1",
        messages=[
            {"role": "system", "content": "You are a helpful assistant that summarizes email threads and identifies key participants. Always respond in JSON format."},
            {"role": "user", "content": f"""
                Summarize the following emails and identify the key people involved in the conversation.
                You will be provided a list of emails. It will be a list of strings.
                You will also be provided with an existing summary which you can use as a reference

                Following are the emails you need to summarize and identify the key people involved in the conversation: \n\n{related_emails}.
                Here is the existing summary you can use as a reference: \n\n{existing_summary}
                
                Provide only the summary and key people in the following format (respond in JSON):
                {{
                    "summary": "The summary of the emails...",
                    "key_people": ["person1@example.com", "person2@example.com"]
                }}
            """}
        ],
        response_format={"type": "json_object"},
    )

    json_output = json.loads(response.choices[0].message.content)
    return json_output

In [160]:
email_thread_details[email_thread_details['thread_id'] == 2]

Unnamed: 0,thread_id,subject,timestamp,from,to,body
5,2,Credit Group Lunch,2000-01-12 05:26:00,Tana Jones,['Suzanne Adams'],I'll be there...
6,2,Credit Group Lunch,2000-02-15 01:08:00,Tana Jones,['Suzanne Adams'],I will attend.
7,2,Credit Group Lunch,2000-04-18 04:54:00,Carol St Clair,['Suzanne Adams'],"Suzanne:\nHere is the complete list of credit folks. Please send an e-mail to each of \nthem concerning the 5th. Please include the description that I have bolded. \nIn our group, you don't need to include Marie or Shari. Thanks.\n\nCarol\n---------------------- Forwarded by Carol St Clair/HOU/ECT on 04/18/2000 \n11:52 AM ---------------------------\n \n\n\nFrom: John Suttle \n 04/18/2000 11:47 AM\t\n\t\n\t\n\t \n\t\n\nTo: Carol St Clair/HOU/ECT@ECT\ncc: \nSubject: Re: Credit Group Lunch \n\nCarol,\n\nThree more have recently joined our group:\nEd Sacks\nBrad Schneider\nWendy LeBrocq\n\nJS\n\n\n\nCarol St Clair\n04/18/2000 11:43 AM\nTo: John Suttle/HOU/ECT@ECT\ncc: \nSubject: Credit Group Lunch\n\nJohn:\nSara and I would like to hold another lunch with your group on Friday, May \n5th to go through in detail how the ISDA and CSA Masters and Schedules work. \nCould you please take a look at this list and let me know of any additions or \ndeletions? Thanks.\n\nCarol\n\nBill Bradford\nDebbie Brackett\nTanya Rohauer\nRod Nelson\nRussell Diamond\nVeronica Espinoza\nTracy Ngo\nBrant Reves\nKevin Radous\nTom Moran\nChristopher Smith\nLesli Campbell\nCathy Tudon\nNidia Martinez\nMolly Harris\n\nThanks.\n\nCarol\n\n\n\n\n\n"
8,2,Credit Group Lunch,2000-04-18 06:13:00,Carol St Clair,['Suzanne Adams'],Suzanne:\nCould you please check the names of Cathy Tudon and Nidia Martinez? They \nneed to be included on this and I wasn't sure if who we sent it to covered \nthem. Seems like last time we had a problem sending it to them.\nCarol
9,2,Credit Group Lunch,2000-04-18 08:25:00,Mark Taylor,['Suzanne Adams'],"I will not be able to attend.\n\n\n\n\nSuzanne Adams\n04/18/2000 12:05 PM\nTo: Carol St Clair/HOU/ECT@ECT, Mark Taylor/HOU/ECT@ECT, Sara \nShackleton/HOU/ECT@ECT, Tana Jones/HOU/ECT@ECT, Susan Flynn/HOU/ECT@ECT, \nSusan Bailey/HOU/ECT@ECT, Tanya Rohauer/HOU/ECT@ECT, William S \nBradford/HOU/ECT@ECT, Debbie R Brackett/HOU/ECT@ECT, Russell \nDiamond/HOU/ECT@ECT, Veronica Espinoza/Corp/Enron@ENRON, Tracy \nNgo/HOU/ECT@ECT, Brant Reves/HOU/ECT@ECT, Rod Nelson/HOU/ECT@ECT, John \nSuttle/HOU/ECT@ECT, Tom Moran/HOU/ECT@ECT, Christopher Smith/HOU/ECT@ECT, \nLesli Campbell/HOU/ECT@ECT, Mary Tudon/HOU/ECT@ECT, Paul \nRadous/Corp/Enron@ENRON, Molly Harris/HOU/ECT@ECT, Nidia Mendoza/HOU/ECT@ECT, \nEdward Sacks/Corp/Enron@Enron, Brad Schneider/Corp/Enron@Enron, Wendi \nLeBrocq/Corp/Enron@Enron\ncc: \nSubject: Credit Group Lunch\n\nA lunch meeting has been scheduled for Friday, May 5, 2000, from 12:00 p.m. \nuntil 1:30 p.m. in 30C2 to go through in detail how the ISDA and CSA Masters \nand Schedules work.\n\nPlease reply as soon as possible if you are going to attend this lunch \nmeeting (for catering purposes). Thanks.\n\n"
10,2,Credit Group Lunch,2000-04-18 08:29:00,Sara Shackleton,['Kaye Ellis'],"Gosh, I guessed right!!!!\n\n\n\n\nKaye Ellis\n04/18/2000 01:51 PM\nTo: Sara Shackleton/HOU/ECT@ECT\ncc: \nSubject: Re: Credit Group Lunch \n\nJeff Sorenson would like the meeting on May 12 to be from 11:30a to 1p.\n\n"
11,2,Credit Group Lunch,2000-04-18 09:18:00,Carol St Clair,['Suzanne Adams'],yes. That's okay.\nCarol


In [161]:
email_thread_details[email_thread_details['thread_id'] == 2]['body'].tolist()

["I'll be there...",
 'I will attend.',
 "Suzanne:\nHere is the complete list of credit folks.  Please send an e-mail to each of \nthem concerning the 5th.  Please include the description that I have bolded.  \nIn our group, you don't need to include Marie or Shari. Thanks.\n\nCarol\n---------------------- Forwarded by Carol St Clair/HOU/ECT on 04/18/2000 \n11:52 AM ---------------------------\n   \n\n\nFrom:  John Suttle                                                            \n 04/18/2000 11:47 AM\t\n\t\n\t\n\t                           \n\t\n\nTo: Carol St Clair/HOU/ECT@ECT\ncc:  \nSubject: Re: Credit Group Lunch  \n\nCarol,\n\nThree more have recently joined our group:\nEd Sacks\nBrad Schneider\nWendy LeBrocq\n\nJS\n\n\n\nCarol St Clair\n04/18/2000 11:43 AM\nTo: John Suttle/HOU/ECT@ECT\ncc:  \nSubject: Credit Group Lunch\n\nJohn:\nSara and I would like to hold another lunch with your group on Friday, May \n5th to go through in detail how the ISDA and CSA Masters and Schedules wo

In [162]:
import json
chat_gpt_response = generate_response(email_thread_details[email_thread_details['thread_id'] == 2]['body'].tolist(), email_thread_summaries['summary'][1])
chat_gpt_response['summary']

'Multiple emails coordinate a Credit Group Lunch scheduled for May 5th from 12:00 p.m. to 1:30 p.m., focused on reviewing ISDA and CSA Masters and Schedules. Carol St Clair and Sara Shackleton organize the event, asking Suzanne Adams to confirm that invitations have reached all necessary members, particularly Cathy Tudon and Nidia Martinez. John Suttle updates the invite list with new group members Ed Sacks, Brad Schneider, and Wendy LeBrocq. Attendees are asked to RSVP for catering purposes, and some replies confirm attendance or regrets.'

In [163]:
email_thread_summaries[email_thread_summaries['thread_id'] == 2]['summary']

1    A lunch meeting has been scheduled for May 5th from 12:00 p.m. to 1:30 p.m. to discuss the ISDA and CSA Masters and Schedules. Attendees are asked to RSVP for catering purposes. Carol requests confirmation of attendees and adds three new members to the group. John confirms the lunch and suggests two additional names to include. Suzanne is asked to send an email to all credit group members. Carol and Sara express their attendance.
Name: summary, dtype: object

In [164]:
# Add a generated summary and key_people as a column in email_thread_summary

# iterate over email_thread_summaries and create 2 new columns

# Generate summaries and key people for each thread
def get_summary_and_people(thread_id):
    bodies = email_thread_details[email_thread_details['thread_id'] == thread_id]['body'].tolist()
    existing_summary = email_thread_summaries[email_thread_summaries['thread_id'] == thread_id]['summary'].values[0]
    if bodies:
        result = generate_response(bodies, existing_summary)
        return result['summary'], ", ".join(result['key_people'])
    else:
        return "", ""

email_thread_summaries[['generated_summary', 'key_people']] = email_thread_summaries['thread_id'].apply(
    lambda x: pd.Series(get_summary_and_people(x))
)



In [165]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', None)
email_thread_summaries.head(2)


Unnamed: 0,thread_id,summary,text_length,num_emails,generated_summary,key_people
0,1,The email thread discusses the Master Termination Log and the need to investigate a CNG LDC (Hope Gas) termination and a $66 million settlement offer. Stephanie Panus sends out the Daily List and Master Termination Log for various dates. Kim Theriot requests her name and Melissa Murphy's name to be removed from the distribution list and adds several names to it. The thread also includes updates on terminations and valid terminations for various companies.,74,5,"The email thread centers around the ongoing distribution and update of the Master Termination Log, with Stephanie Panus regularly sending out updated lists of terminations and related details. Kim Theriot requests changes to the distribution list by removing herself and Melissa Murphy, and adding several new recipients. There are updates provided for specific terminations, with Ed McMichael Jr. later raising a question regarding a specific termination (CNG LDC/Hope Gas) and a large settlement, asking colleagues to investigate. The conversation includes coordination among many distribution list members and administrative actions regarding the termination process.","Stephanie Panus, Kim Theriot, Ed McMichael Jr., Melissa Murphy, Todd D. Hall, Kevin Sweeney, Rita Wynne, Rebecca Grace, Rhonda Robinson, Kerri Thomspon, Kristin Albrecht, Tom Chapman, Fred Lagrasta, Katherine L. Kelly, Victoria Versen"
1,2,A lunch meeting has been scheduled for May 5th from 12:00 p.m. to 1:30 p.m. to discuss the ISDA and CSA Masters and Schedules. Attendees are asked to RSVP for catering purposes. Carol requests confirmation of attendees and adds three new members to the group. John confirms the lunch and suggests two additional names to include. Suzanne is asked to send an email to all credit group members. Carol and Sara express their attendance.,74,7,"The email thread centers around organizing a Credit Group Lunch scheduled for May 5th to review ISDA and CSA Masters and Schedules. Carol St Clair coordinates the attendee list with input from John Suttle, who adds three new members. Suzanne Adams is tasked with sending invitations and confirming if all key individuals are included, particularly Cathy Tudon and Nidia Martinez, due to previous communication issues. Participants respond with their attendance status. There is also a brief mention of a separate meeting John Sorenson wants to schedule on May 12, which is approved by Carol.","Carol St Clair, John Suttle, Suzanne Adams, Sara Shackleton, Kaye Ellis, Jeff Sorenson"


In [166]:
email_thread_summaries.info

<bound method DataFrame.info of    thread_id  \
0          1   
1          2   
2          3   
3          4   
4          5   
5          6   
6          7   
7          8   
8          9   
9         10   

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      summary  \
0                                                                                     

In [167]:
# Stop at this cell

# raise RuntimeError("This is a purposeful error for demonstration.")



- We will create 2 tables in chroma db
- First match query with email_thread_summaries
- Then find top k documents from 2nd table in email_thread_details

In [168]:
chroma_client = chromadb.Client()

In [169]:
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction

In [170]:
chroma_data_path = './chroma_db'

In [171]:
import chromadb

In [172]:
# Call PersistentClient()

client = chromadb.PersistentClient(chroma_data_path)


In [173]:
# Set up the embedding function using the OpenAI embedding model
model = "text-embedding-ada-002"
embedding_function = OpenAIEmbeddingFunction(api_key=openai.api_key, model_name=model)

In [174]:
# Initialise a collection in chroma and pass the embedding_function to it so that it used OpenAI embeddings to embed the documents

email_summaries_collection = client.get_or_create_collection(name='Email_Summaries', embedding_function=embedding_function)

In [175]:
# Batch the data to avoid exceeding the token limit per request
batch_size = 100  # You can adjust this value if needed

summaries = email_thread_summaries['generated_summary'].tolist()
metadatas = email_thread_summaries[['thread_id', 'key_people', 'num_emails', 'text_length']].to_dict(orient='records')
ids = [str(i) for i in range(len(email_thread_summaries))]

for start_idx in range(0, len(summaries), batch_size):
    end_idx = start_idx + batch_size
    email_summaries_collection.add(
        documents=summaries[start_idx:end_idx],
        metadatas=metadatas[start_idx:end_idx],
        ids=ids[start_idx:end_idx],
    )

In [176]:
email_details_collection = client.get_or_create_collection(name='Email_Details', embedding_function=embedding_function)


In [177]:
# Batch the data to avoid exceeding the token limit per request
details_batch_size = 100  # You can adjust this value if needed

details_documents = email_thread_details['body'].tolist()
details_metadatas = email_thread_details[['thread_id', 'subject', 'from', 'to', 'timestamp']].to_dict(orient='records')
details_ids = [str(i) for i in range(len(email_thread_details))]

# Estimate tokens using tiktoken
encoding = tiktoken.encoding_for_model(model)
max_tokens = 8192

def count_tokens(text):
    return len(encoding.encode(text))

current_batch_docs = []
current_batch_metas = []
current_batch_ids = []
current_tokens = 0

for doc, meta, doc_id in zip(details_documents, details_metadatas, details_ids):
    doc_tokens = count_tokens(doc)
    # Skip documents that exceed the model's max token limit
    if doc_tokens > max_tokens:
        continue
    # If adding this document exceeds the token limit, flush the batch
    if current_tokens + doc_tokens > max_tokens and current_batch_docs:
        email_details_collection.add(
            documents=current_batch_docs,
            metadatas=current_batch_metas,
            ids=current_batch_ids,
        )
        current_batch_docs = []
        current_batch_metas = []
        current_batch_ids = []
        current_tokens = 0
    current_batch_docs.append(doc)
    current_batch_metas.append(meta)
    current_batch_ids.append(doc_id)
    current_tokens += doc_tokens

# Add any remaining documents in the last batch
if current_batch_docs:
    email_details_collection.add(
        documents=current_batch_docs,
        metadatas=current_batch_metas,
        ids=current_batch_ids,
    )


In [178]:
email_details_collection.peek()

{'ids': ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'],
 'embeddings': array([[-4.99777636e-03, -3.85044119e-03, -1.66196488e-02, ...,
         -1.36046447e-02, -1.78969409e-02, -3.53186093e-02],
        [-1.08943125e-02,  9.83235799e-03, -8.81496165e-03, ...,
         -1.80309415e-02, -2.90514994e-02, -3.19031812e-02],
        [-1.91488657e-02,  2.40259315e-03, -2.32604090e-02, ...,
         -1.00129014e-02, -2.97583714e-02, -1.78406462e-02],
        ...,
        [-2.15793010e-02,  2.23011686e-03,  1.24077490e-02, ...,
         -1.60833448e-02, -3.17517016e-03, -2.14956068e-02],
        [-1.28342956e-02,  5.01978444e-03,  4.15857974e-03, ...,
          8.00886843e-03, -2.68883519e-02, -5.22419484e-03],
        [-2.20529865e-02, -1.45485550e-02, -6.64200634e-05, ...,
         -1.13682384e-02, -5.30052837e-03, -2.70466432e-02]],
       shape=(10, 1536)),
 'documents': ['\n\n -----Original Message-----\nFrom: =09Theriot, Kim S. =20\nSent:=09Tuesday, January 29, 2002 1:23 PM\nTo:=09Ri

## The Search Layer

In [179]:
# query = "Who proposed the chosen approach for the data migration and when?"
query = input()

In [180]:
related_threads = email_summaries_collection.query(
    query_texts=query,
    n_results=10
)

In [181]:
related_threads.items()

# convert into data frame

# Convert related_threads to a DataFrame
summary_df = pd.DataFrame([
    {
        'thread_id': meta['thread_id'],
        'key_people': meta['key_people'],
        'num_emails': meta['num_emails'],
        'text_length': meta['text_length'],
        'summary': doc
    }
    for meta, doc in zip(related_threads['metadatas'][0], related_threads['documents'][0])
])

In [182]:
summary_df.head()

Unnamed: 0,thread_id,key_people,num_emails,text_length,summary
0,8,"Jeff.Dasovich@ENRON.com, Nancy.Sellers@RobertMondavi.com, scottwl@hotmail.com, eldon@direcpc.com, cameron@perfect.com, psellers@pacbell.net",5,71,"The group discusses plans to play golf in Napa over the weekend. Jeff initiates the conversation, suggesting golf on either Saturday or Sunday. Scott and Cameron are interested but have scheduling conflicts, with Scott preferring Saturday afternoon after his mandolin lesson and mentioning other commitments on Sunday. Nancy points out that Eldon has rehearsals both days so their usual course isn't available, but suggests alternative courses like Silverado or Kennedy. Eventually, Scott and Cameron decide not to join due to their busy schedules. The group coordinates on potential tee times, with Eldon checking availability. There is also some off-topic mention of mailing items and a check being received."
1,9,"erwollam@hotmail.com, Joe.Parks@bridgeline.net, Ben",5,86,"Erik Wollam asks Joe Parks about the possibility of renting a trailer that night, sharing that it would cost $45 through Aztec and inquiring if Joe knows anyone who could offer a better price. Joe responds apologizing for the delay, explaining he's been busy and unable to make contact about the trailer. The conversation also features casual banter among the group, referencing shared acquaintances and joking about finances. Ben shares news about his personal life, upcoming NYC Marathon plans, travels, and potential job offers, expressing interest in staying connected."
2,5,"kay.mann@enron.com, kay.mann@worldnet.att.net, reagan.rorschach@enron.com, edward.sacks@enron.com",8,110,"The email thread centers around the drafting and review of the 'long form confirm/MDEA' and related ILA documents. Kay Mann initiates internal distribution for feedback and notes pending formatting and content issues, recommending that recipients recognize the drafts as works in progress. Edward Sacks provides comments, particularly questioning whether a financial support covenant or an 'agent for' structure is stronger, and raises invoicing procedure points. Reagan Rorschach forwards documents to a wider internal group, soliciting comments for Kay. Reagan also asks Kay if raised issues have been incorporated into the ILA and whether they are material, to which Kay responds that the issues have not been incorporated and are not material to the ILA. There is ongoing discussion about addressing legal and structural questions, with several parties, including those dealing with Mississippi law, needing to weigh in."
3,6,"MC, unknown recipient",4,63,"The email thread is casual and playful in tone, featuring four short messages between friends. The emails include questions about the recipient's current location, curiosity about the date of an important upcoming event referred to as a ""big day,"" surprise about the recipient having a girlfriend with a humorous comment about whether she shaves her legs, and a one-word message, 'dirty!!!!!', that seems to be an inside joke or playful tease. The initials 'MC' are signed at the end of the third message, indicating one of the participants."
4,2,"Carol St Clair, Sara Shackleton, Suzanne Adams, John Suttle, Kaye Ellis",7,74,"The email thread discusses organizing a Credit Group Lunch scheduled for May 5th from 12:00 p.m. to 1:30 p.m. The purpose is to review the ISDA and CSA Masters and Schedules. Carol and Sara are coordinating, with Carol asking Suzanne to send invitations and ensure all necessary individuals are included, notably checking on Cathy Tudon and Nidia Martinez. John Suttle notifies Carol about three new members to be added. Participants are asked to confirm attendance for catering. There is brief mention of a different meeting date by Kaye Ellis and approval from Carol."


In [183]:
# Let's look at the related emails based on the fetched summary entries

related_emails = email_details_collection.query(
    query_texts = [query],
    n_results=10,
    where={"thread_id": {"$in": summary_df['thread_id'].tolist()}}
)


In [184]:
related_emails

{'ids': [['41', '42', '40', '38', '44', '39', '46', '6', '11', '5']],
 'embeddings': None,
 'documents': [['Might be able to do Sunday morning.  Eldon\'s going to check and see if there\'s a tee time.  Will report back as the news breaks.\n\n-----Original Message-----\nFrom: Nancy Sellers [mailto:Nancy.Sellers@RobertMondavi.com]\nSent: Wednesday, October 17, 2001 3:52 PM\nTo: \'Scott Laughlin\'; Dasovich, Jeff; psellers@pacbell.net; Nancy\nSellers; eldon@direcpc.com; cameron@perfect.com\nSubject: RE: Golf Anyone?\n\n\nFYI - Eldon has rehearsal/concerts on both Sat and Sun which means you\nreally can\'t play NVCC - however, there are certainly other places that you\ncould play - Silverado, Kennedy, etc.\n\n-----Original Message-----\nFrom: Scott Laughlin [mailto:scottwl@hotmail.com]\nSent: Wednesday, October 17, 2001 1:36 PM\nTo: Jeff.Dasovich@ENRON.com; psellers@pacbell.net;\nNancy.Sellers@RobertMondavi.com; eldon@direcpc.com; cameron@perfect.com\nSubject: Re: Golf Anyone?\n\n\nI have 

In [185]:
# convert related_emails into a dataframe
related_emails_df = pd.DataFrame({
    'thread_id': [meta['thread_id'] for meta in related_emails['metadatas'][0]],
    'subject': [meta['subject'] for meta in related_emails['metadatas'][0]],
    'from': [meta['from'] for meta in related_emails['metadatas'][0]],
    'to': [meta['to'] for meta in related_emails['metadatas'][0]],
    'timestamp': [meta['timestamp'] for meta in related_emails['metadatas'][0]],
    'body': [doc for doc in related_emails['documents'][0]],
    'distance': [dist for dist in related_emails['distances'][0]]
})

In [186]:
related_emails_df.head()

Unnamed: 0,thread_id,subject,from,to,timestamp,body,distance
0,8,RE: Golf Anyone?,"Dasovich, Jeff JDASOVIC","[""'Nancy Sellers' <Nancy.Sellers@RobertMondavi.com"", ""'Scott Laughlin' <scottwl@hotmail.com"", 'psellers@pacbell.net', 'eldon@direcpc.com', 'cameron@perfect.com']",2001-10-17 15:06:59,"Might be able to do Sunday morning. Eldon's going to check and see if there's a tee time. Will report back as the news breaks.\n\n-----Original Message-----\nFrom: Nancy Sellers [mailto:Nancy.Sellers@RobertMondavi.com]\nSent: Wednesday, October 17, 2001 3:52 PM\nTo: 'Scott Laughlin'; Dasovich, Jeff; psellers@pacbell.net; Nancy\nSellers; eldon@direcpc.com; cameron@perfect.com\nSubject: RE: Golf Anyone?\n\n\nFYI - Eldon has rehearsal/concerts on both Sat and Sun which means you\nreally can't play NVCC - however, there are certainly other places that you\ncould play - Silverado, Kennedy, etc.\n\n-----Original Message-----\nFrom: Scott Laughlin [mailto:scottwl@hotmail.com]\nSent: Wednesday, October 17, 2001 1:36 PM\nTo: Jeff.Dasovich@ENRON.com; psellers@pacbell.net;\nNancy.Sellers@RobertMondavi.com; eldon@direcpc.com; cameron@perfect.com\nSubject: Re: Golf Anyone?\n\n\nI have a mandolin lesson at 11am on Saturday, which I can't miss because Tom\n\nis teaching me less these days. That means I can be up in Napa for a 2 pm \ntee-off. Or, we can play on Sunday at 1 or so, after the first set of games?\n\nOr earlier, if you want. How does that sound? Let me know.\n\n\n\n>From: ""Dasovich, Jeff"" <Jeff.Dasovich@ENRON.com>\n>To: <psellers@pacbell.net>, ""Nancy Sellers (E-mail)"" \n><Nancy.Sellers@RobertMondavi.com>, ""eldon sellers (E-mail)"" \n><eldon@direcpc.com>, ""Scott Laughlin (E-mail)"" <scottwl@hotmail.com>, \n><cameron@perfect.com>\n>Subject: Golf Anyone?\n>Date: Wed, 17 Oct 2001 10:26:52 -0500\n>\n>Looks like Prentice and Nancy will be getting together in Napa to do a\n>little bonding this weekend. Therefore, it looks like an opportunity to\n>golf on Saturday or Sunday (screw football). Any takers? (Eldon has\n>promised not to be the score Nazi.)\n>\n>Best,\n>Jeff\n>\n>\n>**********************************************************************\n>This e-mail is the property of Enron Corp. and/or its relevant affiliate \n>and may contain confidential and privileged material for the sole use of \n>the intended recipient (s). Any review, use, distribution or disclosure by \n>others is strictly prohibited. If you are not the intended recipient (or \n>authorized to receive for the recipient), please contact the sender or \n>reply to Enron Corp. at enron.messaging.administration@enron.com and delete\n\n>all copies of the message. This e-mail (and any attachments hereto) are not\n\n>intended to be an offer (or an acceptance) and do not create or evidence a \n>binding and enforceable contract between Enron Corp. (or any of its \n>affiliates) and the intended recipient or any other party, and may not be \n>relied on by anyone as the basis of a contract by estoppel or otherwise. \n>Thank you.\n>**********************************************************************\n\n\n_________________________________________________________________\nGet your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp",0.491284
1,8,RE: Golf Anyone?,"Dasovich, Jeff JDASOVIC","[""'Scott Laughlin' <scottwl@hotmail.com""]",2001-10-17 17:03:43,"party pooper. (i understand. always more complicated than necessary.)\n\n-----Original Message-----\nFrom: Scott Laughlin [mailto:scottwl@hotmail.com]\nSent: Wednesday, October 17, 2001 7:02 PM\nTo: Dasovich, Jeff; Nancy.Sellers@RobertMondavi.com;\npsellers@pacbell.net; eldon@direcpc.com; cameron@perfect.com\nSubject: RE: Golf Anyone?\n\n\nWe'd love to play golf, but because of all this, it seems like Cameron and I \nare just going to chill in SF this weekend. We've been out of town every \nweekend for the past six months, it seems.\n\nGood luck!\n\n\n>From: ""Dasovich, Jeff"" <Jeff.Dasovich@ENRON.com>\n>To: ""Nancy Sellers"" <Nancy.Sellers@RobertMondavi.com>, ""Scott Laughlin"" \n><scottwl@hotmail.com>, <psellers@pacbell.net>, <eldon@direcpc.com>, \n><cameron@perfect.com>\n>Subject: RE: Golf Anyone?\n>Date: Wed, 17 Oct 2001 17:06:59 -0500\n>\n>Might be able to do Sunday morning. Eldon's going to check and see if\n>there's a tee time. Will report back as the news breaks.\n>\n>-----Original Message-----\n>From: Nancy Sellers [mailto:Nancy.Sellers@RobertMondavi.com]\n>Sent: Wednesday, October 17, 2001 3:52 PM\n>To: 'Scott Laughlin'; Dasovich, Jeff; psellers@pacbell.net; Nancy\n>Sellers; eldon@direcpc.com; cameron@perfect.com\n>Subject: RE: Golf Anyone?\n>\n>\n>FYI - Eldon has rehearsal/concerts on both Sat and Sun which means you\n>really can't play NVCC - however, there are certainly other places that\n>you\n>could play - Silverado, Kennedy, etc.\n>\n>-----Original Message-----\n>From: Scott Laughlin [mailto:scottwl@hotmail.com]\n>Sent: Wednesday, October 17, 2001 1:36 PM\n>To: Jeff.Dasovich@ENRON.com; psellers@pacbell.net;\n>Nancy.Sellers@RobertMondavi.com; eldon@direcpc.com; cameron@perfect.com\n>Subject: Re: Golf Anyone?\n>\n>\n>I have a mandolin lesson at 11am on Saturday, which I can't miss because\n>Tom\n>\n>is teaching me less these days. That means I can be up in Napa for a 2\n>pm\n>tee-off. Or, we can play on Sunday at 1 or so, after the first set of\n>games?\n>\n>Or earlier, if you want. How does that sound? Let me know.\n>\n>\n>\n> >From: ""Dasovich, Jeff"" <Jeff.Dasovich@ENRON.com>\n> >To: <psellers@pacbell.net>, ""Nancy Sellers (E-mail)""\n> ><Nancy.Sellers@RobertMondavi.com>, ""eldon sellers (E-mail)""\n> ><eldon@direcpc.com>, ""Scott Laughlin (E-mail)"" <scottwl@hotmail.com>,\n>\n> ><cameron@perfect.com>\n> >Subject: Golf Anyone?\n> >Date: Wed, 17 Oct 2001 10:26:52 -0500\n> >\n> >Looks like Prentice and Nancy will be getting together in Napa to do a\n> >little bonding this weekend. Therefore, it looks like an opportunity\n>to\n> >golf on Saturday or Sunday (screw football). Any takers? (Eldon has\n> >promised not to be the score Nazi.)\n> >\n> >Best,\n> >Jeff\n> >\n> >\n> >**********************************************************************\n> >This e-mail is the property of Enron Corp. and/or its relevant\n>affiliate\n> >and may contain confidential and privileged material for the sole use\n>of\n> >the intended recipient (s). Any review, use, distribution or disclosure\n>by\n> >others is strictly prohibited. If you are not the intended recipient\n>(or\n> >authorized to receive for the recipient), please contact the sender or\n> >reply to Enron Corp. at enron.messaging.administration@enron.com and\n>delete\n>\n> >all copies of the message. This e-mail (and any attachments hereto) are\n>not\n>\n> >intended to be an offer (or an acceptance) and do not create or\n>evidence a\n> >binding and enforceable contract between Enron Corp. (or any of its\n> >affiliates) and the intended recipient or any other party, and may not\n>be\n> >relied on by anyone as the basis of a contract by estoppel or\n>otherwise.\n> >Thank you.\n> >**********************************************************************\n>\n>\n>_________________________________________________________________\n>Get your FREE download of MSN Explorer at\n>http://explorer.msn.com/intl.asp\n\n\n_________________________________________________________________\nGet your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp",0.507695
2,8,RE: Golf Anyone?,"Dasovich, Jeff JDASOVIC","[""'Scott Laughlin' <scottwl@hotmail.com"", 'psellers@pacbell.net', 'Nancy.Sellers@RobertMondavi.com', 'eldon@direcpc.com', 'cameron@perfect.com']",2001-10-17 13:41:49,"I like Saturday afternoon after your lesson, since we've got a wedding at 3 pm on Sunday to go to.\n\n-----Original Message-----\nFrom: Scott Laughlin [mailto:scottwl@hotmail.com]\nSent: Wednesday, October 17, 2001 3:36 PM\nTo: Dasovich, Jeff; psellers@pacbell.net;\nNancy.Sellers@RobertMondavi.com; eldon@direcpc.com; cameron@perfect.com\nSubject: Re: Golf Anyone?\n\n\nI have a mandolin lesson at 11am on Saturday, which I can't miss because Tom \nis teaching me less these days. That means I can be up in Napa for a 2 pm \ntee-off. Or, we can play on Sunday at 1 or so, after the first set of games? \nOr earlier, if you want. How does that sound? Let me know.\n\n\n\n>From: ""Dasovich, Jeff"" <Jeff.Dasovich@ENRON.com>\n>To: <psellers@pacbell.net>, ""Nancy Sellers (E-mail)"" \n><Nancy.Sellers@RobertMondavi.com>, ""eldon sellers (E-mail)"" \n><eldon@direcpc.com>, ""Scott Laughlin (E-mail)"" <scottwl@hotmail.com>, \n><cameron@perfect.com>\n>Subject: Golf Anyone?\n>Date: Wed, 17 Oct 2001 10:26:52 -0500\n>\n>Looks like Prentice and Nancy will be getting together in Napa to do a\n>little bonding this weekend. Therefore, it looks like an opportunity to\n>golf on Saturday or Sunday (screw football). Any takers? (Eldon has\n>promised not to be the score Nazi.)\n>\n>Best,\n>Jeff\n>\n>\n>**********************************************************************\n>This e-mail is the property of Enron Corp. and/or its relevant affiliate \n>and may contain confidential and privileged material for the sole use of \n>the intended recipient (s). Any review, use, distribution or disclosure by \n>others is strictly prohibited. If you are not the intended recipient (or \n>authorized to receive for the recipient), please contact the sender or \n>reply to Enron Corp. at enron.messaging.administration@enron.com and delete \n>all copies of the message. This e-mail (and any attachments hereto) are not \n>intended to be an offer (or an acceptance) and do not create or evidence a \n>binding and enforceable contract between Enron Corp. (or any of its \n>affiliates) and the intended recipient or any other party, and may not be \n>relied on by anyone as the basis of a contract by estoppel or otherwise. \n>Thank you.\n>**********************************************************************\n\n\n_________________________________________________________________\nGet your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp",0.518338
3,8,RE: Golf Anyone?,"Dasovich, Jeff JDASOVIC","[""'Nancy Sellers' <Nancy.Sellers@RobertMondavi.com""]",2001-10-17 08:50:02,"what stuff?\n\n-----Original Message-----\nFrom: Nancy Sellers [mailto:Nancy.Sellers@RobertMondavi.com]\nSent: Wednesday, October 17, 2001 10:50 AM\nTo: Dasovich, Jeff; psellers@pacbell.net; Nancy Sellers; eldon sellers\n(E-mail); Scott Laughlin (E-mail); cameron@perfect.com\nSubject: RE: Golf Anyone?\n\n\nShould I not mail this stuff then???\n\n-----Original Message-----\nFrom: Dasovich, Jeff [mailto:Jeff.Dasovich@ENRON.com]\nSent: Wednesday, October 17, 2001 8:27 AM\nTo: psellers@pacbell.net; Nancy Sellers (E-mail); eldon sellers\n(E-mail); Scott Laughlin (E-mail); cameron@perfect.com\nSubject: Golf Anyone?\n\n\nLooks like Prentice and Nancy will be getting together in Napa to do a\nlittle bonding this weekend. Therefore, it looks like an opportunity to\ngolf on Saturday or Sunday (screw football). Any takers? (Eldon has\npromised not to be the score Nazi.) \n\nBest,\nJeff\n\n\n**********************************************************************\nThis e-mail is the property of Enron Corp. and/or its relevant affiliate and\nmay contain confidential and privileged material for the sole use of the\nintended recipient (s). Any review, use, distribution or disclosure by\nothers is strictly prohibited. If you are not the intended recipient (or\nauthorized to receive for the recipient), please contact the sender or reply\nto Enron Corp. at enron.messaging.administration@enron.com and delete all\ncopies of the message. This e-mail (and any attachments hereto) are not\nintended to be an offer (or an acceptance) and do not create or evidence a\nbinding and enforceable contract between Enron Corp. (or any of its\naffiliates) and the intended recipient or any other party, and may not be\nrelied on by anyone as the basis of a contract by estoppel or otherwise.\nThank you. \n**********************************************************************",0.523112
4,9,RE: YO,Mark Guzman,['Katie Trullinger <Katie.Trullinger@wfsg.com> @ ENRON'],2000-11-08 07:32:00,sounds good to me.,0.525471


In [187]:
from sentence_transformers.cross_encoder import CrossEncoder

model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2")
scores = model.predict([["My first", "sentence pair"], ["Second text", "pair"]])
scores

array([-11.01487 , -10.854008], dtype=float32)

In [188]:
cross_inputs = [[query, response] for response in related_emails_df['body']]
cross_rerank_scores = model.predict(cross_inputs)

In [189]:
related_emails_df['ranking'] = cross_rerank_scores

In [190]:
related_emails_df.head()

Unnamed: 0,thread_id,subject,from,to,timestamp,body,distance,ranking
0,8,RE: Golf Anyone?,"Dasovich, Jeff JDASOVIC","[""'Nancy Sellers' <Nancy.Sellers@RobertMondavi.com"", ""'Scott Laughlin' <scottwl@hotmail.com"", 'psellers@pacbell.net', 'eldon@direcpc.com', 'cameron@perfect.com']",2001-10-17 15:06:59,"Might be able to do Sunday morning. Eldon's going to check and see if there's a tee time. Will report back as the news breaks.\n\n-----Original Message-----\nFrom: Nancy Sellers [mailto:Nancy.Sellers@RobertMondavi.com]\nSent: Wednesday, October 17, 2001 3:52 PM\nTo: 'Scott Laughlin'; Dasovich, Jeff; psellers@pacbell.net; Nancy\nSellers; eldon@direcpc.com; cameron@perfect.com\nSubject: RE: Golf Anyone?\n\n\nFYI - Eldon has rehearsal/concerts on both Sat and Sun which means you\nreally can't play NVCC - however, there are certainly other places that you\ncould play - Silverado, Kennedy, etc.\n\n-----Original Message-----\nFrom: Scott Laughlin [mailto:scottwl@hotmail.com]\nSent: Wednesday, October 17, 2001 1:36 PM\nTo: Jeff.Dasovich@ENRON.com; psellers@pacbell.net;\nNancy.Sellers@RobertMondavi.com; eldon@direcpc.com; cameron@perfect.com\nSubject: Re: Golf Anyone?\n\n\nI have a mandolin lesson at 11am on Saturday, which I can't miss because Tom\n\nis teaching me less these days. That means I can be up in Napa for a 2 pm \ntee-off. Or, we can play on Sunday at 1 or so, after the first set of games?\n\nOr earlier, if you want. How does that sound? Let me know.\n\n\n\n>From: ""Dasovich, Jeff"" <Jeff.Dasovich@ENRON.com>\n>To: <psellers@pacbell.net>, ""Nancy Sellers (E-mail)"" \n><Nancy.Sellers@RobertMondavi.com>, ""eldon sellers (E-mail)"" \n><eldon@direcpc.com>, ""Scott Laughlin (E-mail)"" <scottwl@hotmail.com>, \n><cameron@perfect.com>\n>Subject: Golf Anyone?\n>Date: Wed, 17 Oct 2001 10:26:52 -0500\n>\n>Looks like Prentice and Nancy will be getting together in Napa to do a\n>little bonding this weekend. Therefore, it looks like an opportunity to\n>golf on Saturday or Sunday (screw football). Any takers? (Eldon has\n>promised not to be the score Nazi.)\n>\n>Best,\n>Jeff\n>\n>\n>**********************************************************************\n>This e-mail is the property of Enron Corp. and/or its relevant affiliate \n>and may contain confidential and privileged material for the sole use of \n>the intended recipient (s). Any review, use, distribution or disclosure by \n>others is strictly prohibited. If you are not the intended recipient (or \n>authorized to receive for the recipient), please contact the sender or \n>reply to Enron Corp. at enron.messaging.administration@enron.com and delete\n\n>all copies of the message. This e-mail (and any attachments hereto) are not\n\n>intended to be an offer (or an acceptance) and do not create or evidence a \n>binding and enforceable contract between Enron Corp. (or any of its \n>affiliates) and the intended recipient or any other party, and may not be \n>relied on by anyone as the basis of a contract by estoppel or otherwise. \n>Thank you.\n>**********************************************************************\n\n\n_________________________________________________________________\nGet your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp",0.491284,-6.355474
1,8,RE: Golf Anyone?,"Dasovich, Jeff JDASOVIC","[""'Scott Laughlin' <scottwl@hotmail.com""]",2001-10-17 17:03:43,"party pooper. (i understand. always more complicated than necessary.)\n\n-----Original Message-----\nFrom: Scott Laughlin [mailto:scottwl@hotmail.com]\nSent: Wednesday, October 17, 2001 7:02 PM\nTo: Dasovich, Jeff; Nancy.Sellers@RobertMondavi.com;\npsellers@pacbell.net; eldon@direcpc.com; cameron@perfect.com\nSubject: RE: Golf Anyone?\n\n\nWe'd love to play golf, but because of all this, it seems like Cameron and I \nare just going to chill in SF this weekend. We've been out of town every \nweekend for the past six months, it seems.\n\nGood luck!\n\n\n>From: ""Dasovich, Jeff"" <Jeff.Dasovich@ENRON.com>\n>To: ""Nancy Sellers"" <Nancy.Sellers@RobertMondavi.com>, ""Scott Laughlin"" \n><scottwl@hotmail.com>, <psellers@pacbell.net>, <eldon@direcpc.com>, \n><cameron@perfect.com>\n>Subject: RE: Golf Anyone?\n>Date: Wed, 17 Oct 2001 17:06:59 -0500\n>\n>Might be able to do Sunday morning. Eldon's going to check and see if\n>there's a tee time. Will report back as the news breaks.\n>\n>-----Original Message-----\n>From: Nancy Sellers [mailto:Nancy.Sellers@RobertMondavi.com]\n>Sent: Wednesday, October 17, 2001 3:52 PM\n>To: 'Scott Laughlin'; Dasovich, Jeff; psellers@pacbell.net; Nancy\n>Sellers; eldon@direcpc.com; cameron@perfect.com\n>Subject: RE: Golf Anyone?\n>\n>\n>FYI - Eldon has rehearsal/concerts on both Sat and Sun which means you\n>really can't play NVCC - however, there are certainly other places that\n>you\n>could play - Silverado, Kennedy, etc.\n>\n>-----Original Message-----\n>From: Scott Laughlin [mailto:scottwl@hotmail.com]\n>Sent: Wednesday, October 17, 2001 1:36 PM\n>To: Jeff.Dasovich@ENRON.com; psellers@pacbell.net;\n>Nancy.Sellers@RobertMondavi.com; eldon@direcpc.com; cameron@perfect.com\n>Subject: Re: Golf Anyone?\n>\n>\n>I have a mandolin lesson at 11am on Saturday, which I can't miss because\n>Tom\n>\n>is teaching me less these days. That means I can be up in Napa for a 2\n>pm\n>tee-off. Or, we can play on Sunday at 1 or so, after the first set of\n>games?\n>\n>Or earlier, if you want. How does that sound? Let me know.\n>\n>\n>\n> >From: ""Dasovich, Jeff"" <Jeff.Dasovich@ENRON.com>\n> >To: <psellers@pacbell.net>, ""Nancy Sellers (E-mail)""\n> ><Nancy.Sellers@RobertMondavi.com>, ""eldon sellers (E-mail)""\n> ><eldon@direcpc.com>, ""Scott Laughlin (E-mail)"" <scottwl@hotmail.com>,\n>\n> ><cameron@perfect.com>\n> >Subject: Golf Anyone?\n> >Date: Wed, 17 Oct 2001 10:26:52 -0500\n> >\n> >Looks like Prentice and Nancy will be getting together in Napa to do a\n> >little bonding this weekend. Therefore, it looks like an opportunity\n>to\n> >golf on Saturday or Sunday (screw football). Any takers? (Eldon has\n> >promised not to be the score Nazi.)\n> >\n> >Best,\n> >Jeff\n> >\n> >\n> >**********************************************************************\n> >This e-mail is the property of Enron Corp. and/or its relevant\n>affiliate\n> >and may contain confidential and privileged material for the sole use\n>of\n> >the intended recipient (s). Any review, use, distribution or disclosure\n>by\n> >others is strictly prohibited. If you are not the intended recipient\n>(or\n> >authorized to receive for the recipient), please contact the sender or\n> >reply to Enron Corp. at enron.messaging.administration@enron.com and\n>delete\n>\n> >all copies of the message. This e-mail (and any attachments hereto) are\n>not\n>\n> >intended to be an offer (or an acceptance) and do not create or\n>evidence a\n> >binding and enforceable contract between Enron Corp. (or any of its\n> >affiliates) and the intended recipient or any other party, and may not\n>be\n> >relied on by anyone as the basis of a contract by estoppel or\n>otherwise.\n> >Thank you.\n> >**********************************************************************\n>\n>\n>_________________________________________________________________\n>Get your FREE download of MSN Explorer at\n>http://explorer.msn.com/intl.asp\n\n\n_________________________________________________________________\nGet your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp",0.507695,-6.146529
2,8,RE: Golf Anyone?,"Dasovich, Jeff JDASOVIC","[""'Scott Laughlin' <scottwl@hotmail.com"", 'psellers@pacbell.net', 'Nancy.Sellers@RobertMondavi.com', 'eldon@direcpc.com', 'cameron@perfect.com']",2001-10-17 13:41:49,"I like Saturday afternoon after your lesson, since we've got a wedding at 3 pm on Sunday to go to.\n\n-----Original Message-----\nFrom: Scott Laughlin [mailto:scottwl@hotmail.com]\nSent: Wednesday, October 17, 2001 3:36 PM\nTo: Dasovich, Jeff; psellers@pacbell.net;\nNancy.Sellers@RobertMondavi.com; eldon@direcpc.com; cameron@perfect.com\nSubject: Re: Golf Anyone?\n\n\nI have a mandolin lesson at 11am on Saturday, which I can't miss because Tom \nis teaching me less these days. That means I can be up in Napa for a 2 pm \ntee-off. Or, we can play on Sunday at 1 or so, after the first set of games? \nOr earlier, if you want. How does that sound? Let me know.\n\n\n\n>From: ""Dasovich, Jeff"" <Jeff.Dasovich@ENRON.com>\n>To: <psellers@pacbell.net>, ""Nancy Sellers (E-mail)"" \n><Nancy.Sellers@RobertMondavi.com>, ""eldon sellers (E-mail)"" \n><eldon@direcpc.com>, ""Scott Laughlin (E-mail)"" <scottwl@hotmail.com>, \n><cameron@perfect.com>\n>Subject: Golf Anyone?\n>Date: Wed, 17 Oct 2001 10:26:52 -0500\n>\n>Looks like Prentice and Nancy will be getting together in Napa to do a\n>little bonding this weekend. Therefore, it looks like an opportunity to\n>golf on Saturday or Sunday (screw football). Any takers? (Eldon has\n>promised not to be the score Nazi.)\n>\n>Best,\n>Jeff\n>\n>\n>**********************************************************************\n>This e-mail is the property of Enron Corp. and/or its relevant affiliate \n>and may contain confidential and privileged material for the sole use of \n>the intended recipient (s). Any review, use, distribution or disclosure by \n>others is strictly prohibited. If you are not the intended recipient (or \n>authorized to receive for the recipient), please contact the sender or \n>reply to Enron Corp. at enron.messaging.administration@enron.com and delete \n>all copies of the message. This e-mail (and any attachments hereto) are not \n>intended to be an offer (or an acceptance) and do not create or evidence a \n>binding and enforceable contract between Enron Corp. (or any of its \n>affiliates) and the intended recipient or any other party, and may not be \n>relied on by anyone as the basis of a contract by estoppel or otherwise. \n>Thank you.\n>**********************************************************************\n\n\n_________________________________________________________________\nGet your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp",0.518338,-8.634404
3,8,RE: Golf Anyone?,"Dasovich, Jeff JDASOVIC","[""'Nancy Sellers' <Nancy.Sellers@RobertMondavi.com""]",2001-10-17 08:50:02,"what stuff?\n\n-----Original Message-----\nFrom: Nancy Sellers [mailto:Nancy.Sellers@RobertMondavi.com]\nSent: Wednesday, October 17, 2001 10:50 AM\nTo: Dasovich, Jeff; psellers@pacbell.net; Nancy Sellers; eldon sellers\n(E-mail); Scott Laughlin (E-mail); cameron@perfect.com\nSubject: RE: Golf Anyone?\n\n\nShould I not mail this stuff then???\n\n-----Original Message-----\nFrom: Dasovich, Jeff [mailto:Jeff.Dasovich@ENRON.com]\nSent: Wednesday, October 17, 2001 8:27 AM\nTo: psellers@pacbell.net; Nancy Sellers (E-mail); eldon sellers\n(E-mail); Scott Laughlin (E-mail); cameron@perfect.com\nSubject: Golf Anyone?\n\n\nLooks like Prentice and Nancy will be getting together in Napa to do a\nlittle bonding this weekend. Therefore, it looks like an opportunity to\ngolf on Saturday or Sunday (screw football). Any takers? (Eldon has\npromised not to be the score Nazi.) \n\nBest,\nJeff\n\n\n**********************************************************************\nThis e-mail is the property of Enron Corp. and/or its relevant affiliate and\nmay contain confidential and privileged material for the sole use of the\nintended recipient (s). Any review, use, distribution or disclosure by\nothers is strictly prohibited. If you are not the intended recipient (or\nauthorized to receive for the recipient), please contact the sender or reply\nto Enron Corp. at enron.messaging.administration@enron.com and delete all\ncopies of the message. This e-mail (and any attachments hereto) are not\nintended to be an offer (or an acceptance) and do not create or evidence a\nbinding and enforceable contract between Enron Corp. (or any of its\naffiliates) and the intended recipient or any other party, and may not be\nrelied on by anyone as the basis of a contract by estoppel or otherwise.\nThank you. \n**********************************************************************",0.523112,-8.798704
4,9,RE: YO,Mark Guzman,['Katie Trullinger <Katie.Trullinger@wfsg.com> @ ENRON'],2000-11-08 07:32:00,sounds good to me.,0.525471,-10.283324


In [191]:
top_5_emails_by_distance = related_emails_df.sort_values(by='distance').head(5)

top_5_emails_by_distance

Unnamed: 0,thread_id,subject,from,to,timestamp,body,distance,ranking
0,8,RE: Golf Anyone?,"Dasovich, Jeff JDASOVIC","[""'Nancy Sellers' <Nancy.Sellers@RobertMondavi.com"", ""'Scott Laughlin' <scottwl@hotmail.com"", 'psellers@pacbell.net', 'eldon@direcpc.com', 'cameron@perfect.com']",2001-10-17 15:06:59,"Might be able to do Sunday morning. Eldon's going to check and see if there's a tee time. Will report back as the news breaks.\n\n-----Original Message-----\nFrom: Nancy Sellers [mailto:Nancy.Sellers@RobertMondavi.com]\nSent: Wednesday, October 17, 2001 3:52 PM\nTo: 'Scott Laughlin'; Dasovich, Jeff; psellers@pacbell.net; Nancy\nSellers; eldon@direcpc.com; cameron@perfect.com\nSubject: RE: Golf Anyone?\n\n\nFYI - Eldon has rehearsal/concerts on both Sat and Sun which means you\nreally can't play NVCC - however, there are certainly other places that you\ncould play - Silverado, Kennedy, etc.\n\n-----Original Message-----\nFrom: Scott Laughlin [mailto:scottwl@hotmail.com]\nSent: Wednesday, October 17, 2001 1:36 PM\nTo: Jeff.Dasovich@ENRON.com; psellers@pacbell.net;\nNancy.Sellers@RobertMondavi.com; eldon@direcpc.com; cameron@perfect.com\nSubject: Re: Golf Anyone?\n\n\nI have a mandolin lesson at 11am on Saturday, which I can't miss because Tom\n\nis teaching me less these days. That means I can be up in Napa for a 2 pm \ntee-off. Or, we can play on Sunday at 1 or so, after the first set of games?\n\nOr earlier, if you want. How does that sound? Let me know.\n\n\n\n>From: ""Dasovich, Jeff"" <Jeff.Dasovich@ENRON.com>\n>To: <psellers@pacbell.net>, ""Nancy Sellers (E-mail)"" \n><Nancy.Sellers@RobertMondavi.com>, ""eldon sellers (E-mail)"" \n><eldon@direcpc.com>, ""Scott Laughlin (E-mail)"" <scottwl@hotmail.com>, \n><cameron@perfect.com>\n>Subject: Golf Anyone?\n>Date: Wed, 17 Oct 2001 10:26:52 -0500\n>\n>Looks like Prentice and Nancy will be getting together in Napa to do a\n>little bonding this weekend. Therefore, it looks like an opportunity to\n>golf on Saturday or Sunday (screw football). Any takers? (Eldon has\n>promised not to be the score Nazi.)\n>\n>Best,\n>Jeff\n>\n>\n>**********************************************************************\n>This e-mail is the property of Enron Corp. and/or its relevant affiliate \n>and may contain confidential and privileged material for the sole use of \n>the intended recipient (s). Any review, use, distribution or disclosure by \n>others is strictly prohibited. If you are not the intended recipient (or \n>authorized to receive for the recipient), please contact the sender or \n>reply to Enron Corp. at enron.messaging.administration@enron.com and delete\n\n>all copies of the message. This e-mail (and any attachments hereto) are not\n\n>intended to be an offer (or an acceptance) and do not create or evidence a \n>binding and enforceable contract between Enron Corp. (or any of its \n>affiliates) and the intended recipient or any other party, and may not be \n>relied on by anyone as the basis of a contract by estoppel or otherwise. \n>Thank you.\n>**********************************************************************\n\n\n_________________________________________________________________\nGet your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp",0.491284,-6.355474
1,8,RE: Golf Anyone?,"Dasovich, Jeff JDASOVIC","[""'Scott Laughlin' <scottwl@hotmail.com""]",2001-10-17 17:03:43,"party pooper. (i understand. always more complicated than necessary.)\n\n-----Original Message-----\nFrom: Scott Laughlin [mailto:scottwl@hotmail.com]\nSent: Wednesday, October 17, 2001 7:02 PM\nTo: Dasovich, Jeff; Nancy.Sellers@RobertMondavi.com;\npsellers@pacbell.net; eldon@direcpc.com; cameron@perfect.com\nSubject: RE: Golf Anyone?\n\n\nWe'd love to play golf, but because of all this, it seems like Cameron and I \nare just going to chill in SF this weekend. We've been out of town every \nweekend for the past six months, it seems.\n\nGood luck!\n\n\n>From: ""Dasovich, Jeff"" <Jeff.Dasovich@ENRON.com>\n>To: ""Nancy Sellers"" <Nancy.Sellers@RobertMondavi.com>, ""Scott Laughlin"" \n><scottwl@hotmail.com>, <psellers@pacbell.net>, <eldon@direcpc.com>, \n><cameron@perfect.com>\n>Subject: RE: Golf Anyone?\n>Date: Wed, 17 Oct 2001 17:06:59 -0500\n>\n>Might be able to do Sunday morning. Eldon's going to check and see if\n>there's a tee time. Will report back as the news breaks.\n>\n>-----Original Message-----\n>From: Nancy Sellers [mailto:Nancy.Sellers@RobertMondavi.com]\n>Sent: Wednesday, October 17, 2001 3:52 PM\n>To: 'Scott Laughlin'; Dasovich, Jeff; psellers@pacbell.net; Nancy\n>Sellers; eldon@direcpc.com; cameron@perfect.com\n>Subject: RE: Golf Anyone?\n>\n>\n>FYI - Eldon has rehearsal/concerts on both Sat and Sun which means you\n>really can't play NVCC - however, there are certainly other places that\n>you\n>could play - Silverado, Kennedy, etc.\n>\n>-----Original Message-----\n>From: Scott Laughlin [mailto:scottwl@hotmail.com]\n>Sent: Wednesday, October 17, 2001 1:36 PM\n>To: Jeff.Dasovich@ENRON.com; psellers@pacbell.net;\n>Nancy.Sellers@RobertMondavi.com; eldon@direcpc.com; cameron@perfect.com\n>Subject: Re: Golf Anyone?\n>\n>\n>I have a mandolin lesson at 11am on Saturday, which I can't miss because\n>Tom\n>\n>is teaching me less these days. That means I can be up in Napa for a 2\n>pm\n>tee-off. Or, we can play on Sunday at 1 or so, after the first set of\n>games?\n>\n>Or earlier, if you want. How does that sound? Let me know.\n>\n>\n>\n> >From: ""Dasovich, Jeff"" <Jeff.Dasovich@ENRON.com>\n> >To: <psellers@pacbell.net>, ""Nancy Sellers (E-mail)""\n> ><Nancy.Sellers@RobertMondavi.com>, ""eldon sellers (E-mail)""\n> ><eldon@direcpc.com>, ""Scott Laughlin (E-mail)"" <scottwl@hotmail.com>,\n>\n> ><cameron@perfect.com>\n> >Subject: Golf Anyone?\n> >Date: Wed, 17 Oct 2001 10:26:52 -0500\n> >\n> >Looks like Prentice and Nancy will be getting together in Napa to do a\n> >little bonding this weekend. Therefore, it looks like an opportunity\n>to\n> >golf on Saturday or Sunday (screw football). Any takers? (Eldon has\n> >promised not to be the score Nazi.)\n> >\n> >Best,\n> >Jeff\n> >\n> >\n> >**********************************************************************\n> >This e-mail is the property of Enron Corp. and/or its relevant\n>affiliate\n> >and may contain confidential and privileged material for the sole use\n>of\n> >the intended recipient (s). Any review, use, distribution or disclosure\n>by\n> >others is strictly prohibited. If you are not the intended recipient\n>(or\n> >authorized to receive for the recipient), please contact the sender or\n> >reply to Enron Corp. at enron.messaging.administration@enron.com and\n>delete\n>\n> >all copies of the message. This e-mail (and any attachments hereto) are\n>not\n>\n> >intended to be an offer (or an acceptance) and do not create or\n>evidence a\n> >binding and enforceable contract between Enron Corp. (or any of its\n> >affiliates) and the intended recipient or any other party, and may not\n>be\n> >relied on by anyone as the basis of a contract by estoppel or\n>otherwise.\n> >Thank you.\n> >**********************************************************************\n>\n>\n>_________________________________________________________________\n>Get your FREE download of MSN Explorer at\n>http://explorer.msn.com/intl.asp\n\n\n_________________________________________________________________\nGet your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp",0.507695,-6.146529
2,8,RE: Golf Anyone?,"Dasovich, Jeff JDASOVIC","[""'Scott Laughlin' <scottwl@hotmail.com"", 'psellers@pacbell.net', 'Nancy.Sellers@RobertMondavi.com', 'eldon@direcpc.com', 'cameron@perfect.com']",2001-10-17 13:41:49,"I like Saturday afternoon after your lesson, since we've got a wedding at 3 pm on Sunday to go to.\n\n-----Original Message-----\nFrom: Scott Laughlin [mailto:scottwl@hotmail.com]\nSent: Wednesday, October 17, 2001 3:36 PM\nTo: Dasovich, Jeff; psellers@pacbell.net;\nNancy.Sellers@RobertMondavi.com; eldon@direcpc.com; cameron@perfect.com\nSubject: Re: Golf Anyone?\n\n\nI have a mandolin lesson at 11am on Saturday, which I can't miss because Tom \nis teaching me less these days. That means I can be up in Napa for a 2 pm \ntee-off. Or, we can play on Sunday at 1 or so, after the first set of games? \nOr earlier, if you want. How does that sound? Let me know.\n\n\n\n>From: ""Dasovich, Jeff"" <Jeff.Dasovich@ENRON.com>\n>To: <psellers@pacbell.net>, ""Nancy Sellers (E-mail)"" \n><Nancy.Sellers@RobertMondavi.com>, ""eldon sellers (E-mail)"" \n><eldon@direcpc.com>, ""Scott Laughlin (E-mail)"" <scottwl@hotmail.com>, \n><cameron@perfect.com>\n>Subject: Golf Anyone?\n>Date: Wed, 17 Oct 2001 10:26:52 -0500\n>\n>Looks like Prentice and Nancy will be getting together in Napa to do a\n>little bonding this weekend. Therefore, it looks like an opportunity to\n>golf on Saturday or Sunday (screw football). Any takers? (Eldon has\n>promised not to be the score Nazi.)\n>\n>Best,\n>Jeff\n>\n>\n>**********************************************************************\n>This e-mail is the property of Enron Corp. and/or its relevant affiliate \n>and may contain confidential and privileged material for the sole use of \n>the intended recipient (s). Any review, use, distribution or disclosure by \n>others is strictly prohibited. If you are not the intended recipient (or \n>authorized to receive for the recipient), please contact the sender or \n>reply to Enron Corp. at enron.messaging.administration@enron.com and delete \n>all copies of the message. This e-mail (and any attachments hereto) are not \n>intended to be an offer (or an acceptance) and do not create or evidence a \n>binding and enforceable contract between Enron Corp. (or any of its \n>affiliates) and the intended recipient or any other party, and may not be \n>relied on by anyone as the basis of a contract by estoppel or otherwise. \n>Thank you.\n>**********************************************************************\n\n\n_________________________________________________________________\nGet your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp",0.518338,-8.634404
3,8,RE: Golf Anyone?,"Dasovich, Jeff JDASOVIC","[""'Nancy Sellers' <Nancy.Sellers@RobertMondavi.com""]",2001-10-17 08:50:02,"what stuff?\n\n-----Original Message-----\nFrom: Nancy Sellers [mailto:Nancy.Sellers@RobertMondavi.com]\nSent: Wednesday, October 17, 2001 10:50 AM\nTo: Dasovich, Jeff; psellers@pacbell.net; Nancy Sellers; eldon sellers\n(E-mail); Scott Laughlin (E-mail); cameron@perfect.com\nSubject: RE: Golf Anyone?\n\n\nShould I not mail this stuff then???\n\n-----Original Message-----\nFrom: Dasovich, Jeff [mailto:Jeff.Dasovich@ENRON.com]\nSent: Wednesday, October 17, 2001 8:27 AM\nTo: psellers@pacbell.net; Nancy Sellers (E-mail); eldon sellers\n(E-mail); Scott Laughlin (E-mail); cameron@perfect.com\nSubject: Golf Anyone?\n\n\nLooks like Prentice and Nancy will be getting together in Napa to do a\nlittle bonding this weekend. Therefore, it looks like an opportunity to\ngolf on Saturday or Sunday (screw football). Any takers? (Eldon has\npromised not to be the score Nazi.) \n\nBest,\nJeff\n\n\n**********************************************************************\nThis e-mail is the property of Enron Corp. and/or its relevant affiliate and\nmay contain confidential and privileged material for the sole use of the\nintended recipient (s). Any review, use, distribution or disclosure by\nothers is strictly prohibited. If you are not the intended recipient (or\nauthorized to receive for the recipient), please contact the sender or reply\nto Enron Corp. at enron.messaging.administration@enron.com and delete all\ncopies of the message. This e-mail (and any attachments hereto) are not\nintended to be an offer (or an acceptance) and do not create or evidence a\nbinding and enforceable contract between Enron Corp. (or any of its\naffiliates) and the intended recipient or any other party, and may not be\nrelied on by anyone as the basis of a contract by estoppel or otherwise.\nThank you. \n**********************************************************************",0.523112,-8.798704
4,9,RE: YO,Mark Guzman,['Katie Trullinger <Katie.Trullinger@wfsg.com> @ ENRON'],2000-11-08 07:32:00,sounds good to me.,0.525471,-10.283324


In [192]:
top_5_emails_by_ranking = related_emails_df.sort_values(by='ranking', ascending=False).head(5)
# Display the top 5 emails by ranking with truncating the body
top_5_emails_by_ranking['truncated_body'] = top_5_emails_by_ranking['body'].apply(lambda x: x[:100] + '...' if len(x) > 100 else x)
top_5_emails_by_ranking[['thread_id','subject','from', 'to','ranking','truncated_body']]

Unnamed: 0,thread_id,subject,from,to,ranking,truncated_body
1,8,RE: Golf Anyone?,"Dasovich, Jeff JDASOVIC","[""'Scott Laughlin' <scottwl@hotmail.com""]",-6.146529,party pooper. (i understand. always more complicated than necessary.)\n\n-----Original Message-----\nFr...
0,8,RE: Golf Anyone?,"Dasovich, Jeff JDASOVIC","[""'Nancy Sellers' <Nancy.Sellers@RobertMondavi.com"", ""'Scott Laughlin' <scottwl@hotmail.com"", 'psellers@pacbell.net', 'eldon@direcpc.com', 'cameron@perfect.com']",-6.355474,Might be able to do Sunday morning. Eldon's going to check and see if there's a tee time. Will rep...
2,8,RE: Golf Anyone?,"Dasovich, Jeff JDASOVIC","[""'Scott Laughlin' <scottwl@hotmail.com"", 'psellers@pacbell.net', 'Nancy.Sellers@RobertMondavi.com', 'eldon@direcpc.com', 'cameron@perfect.com']",-8.634404,"I like Saturday afternoon after your lesson, since we've got a wedding at 3 pm on Sunday to go to.\n\n..."
3,8,RE: Golf Anyone?,"Dasovich, Jeff JDASOVIC","[""'Nancy Sellers' <Nancy.Sellers@RobertMondavi.com""]",-8.798704,what stuff?\n\n-----Original Message-----\nFrom: Nancy Sellers [mailto:Nancy.Sellers@RobertMondavi.com]...
5,8,RE: Golf Anyone?,"Dasovich, Jeff JDASOVIC","[""'Scott Laughlin' <scottwl@hotmail.com""]",-9.066766,"Cool. If $150 don't cut it on the check, just let me know. No sweat. EVERYTHING is confidential.\n..."


In [193]:
# raise RuntimeError("This is a purposeful error for demonstration.")

## The Generation Layer

In [194]:
# Define the function to generate the response. Provide a comprehensive prompt that passes the user query and the top 3 results to the model

def generate_response(user_query, top_5_results):
    """
    Generate a response using GPT's ChatCompletion based on the user query and retrieved information.
    """

    messages = [
        { "role": "system", "content": "You are a helpful assistant that provides accurate and concise answers based on the provided email content."},
        { "role": "user", "content": f"""
            You are a helpful assistant that provides accurate and concise answers based on the provided email content.
            You have a question asked by the user in '{user_query}' and you have top 5 emails in the dataframe '{top_5_results}'. 
         
            User Query: {user_query}

            Top 5 Relevant Emails:
            1. {top_5_results.iloc[0]['body']}
            2. {top_5_results.iloc[1]['body']}
            3. {top_5_results.iloc[2]['body']}
            4. {top_5_results.iloc[3]['body']}
            5. {top_5_results.iloc[4]['body']}

            The generated response should answer the query directly addressing the user and avoiding additional information. 
            If you think that the query is not relevant to the email content, reply that the query is irrelevant. 
            Provide the final response as a well-formatted and easily readable text along with the citation (subject, from, time). 
            Provide your complete response first with all information, and then provide the citations.
            """ }
    ]

    response = openai.chat.completions.create(
        model="gpt-4.1",
        messages=messages
    )

    return response.choices[0].message.content


In [195]:
# Generate the response

response = generate_response(query, top_5_emails_by_ranking)


In [196]:
# Print the response
print("Query:", query)
print("\nResponse:")
print(response)

Query: Does anyone play Mondolin?

Response:
Yes, someone does play mandolin. Scott Laughlin mentions that he has a mandolin lesson at 11am on Saturday.

Citations:
Subject: RE: Golf Anyone?
From: Scott Laughlin
Time: Wednesday, October 17, 2001, 1:36 PM
