<a href="https://colab.research.google.com/github/Priyal686/Email-Automation/blob/main/EmailSearchAI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Email Search AI

### Importing required libraries

In [1]:
#import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer


In [2]:
#load data from drive
email_details=pd.read_csv("/content/drive/MyDrive/SemanticSpotter/archive/CSV/email_thread_details.csv")
email_summeries=pd.read_csv("/content/drive/MyDrive/SemanticSpotter/archive/CSV/email_thread_summaries.csv")

### 1.Understand Data and Preprocessing

In [3]:
#take a view of data
email_details.head()

Unnamed: 0,thread_id,subject,timestamp,from,to,body
0,1,FW: Master Termination Log,2002-01-29 11:23:42,"Gossett, Jeffrey C. JGOSSET","['Giron', 'Darron C. Dgiron', 'Love', 'Phillip...",\n\n -----Original Message-----\nFrom: =09Ther...
1,1,FW: Master Termination Log,2002-01-31 12:50:00,"Theriot, Kim S. KTHERIO","['Murphy', 'Melissa Mmurphy', 'Gossett', 'Jeff...",\n\n -----Original Message-----\nFrom: =09Panu...
2,1,FW: Master Termination Log,2002-02-05 15:03:35,"Theriot, Kim S. KTHERIO","['Murphy', 'Melissa Mmurphy', 'Anderson', 'Dia...",Note to Stephanie Panus....\n\nStephanie...ple...
3,1,FW: Master Termination Log,2002-02-05 15:06:25,"Theriot, Kim S. KTHERIO","['Hall', 'D. Todd Thall', 'Sweeney', 'Kevin Ks...",\n\n -----Original Message-----\nFrom: =09Panu...
4,1,FW: Master Termination Log,2002-05-28 07:20:35,"Kelly, Katherine L. KKELLY","['Germany', 'Chris Cgerman']",\n\n -----Original Message-----\nFrom: =09McMi...


In [4]:
email_summeries.head()

Unnamed: 0,thread_id,summary
0,1,The email thread discusses the Master Terminat...
1,2,A lunch meeting has been scheduled for May 5th...
2,3,Ben is updating a friend on his progress with ...
3,4,The recipient of the email thread initially ex...
4,5,The email thread discusses the long form confi...



Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.



We have data stored in two files lets merge those two files

In [5]:
email_data=pd.merge(email_details,email_summeries,on="thread_id")

print(email_data.head())

   thread_id                     subject            timestamp  \
0          1  FW: Master Termination Log  2002-01-29 11:23:42   
1          1  FW: Master Termination Log  2002-01-31 12:50:00   
2          1  FW: Master Termination Log  2002-02-05 15:03:35   
3          1  FW: Master Termination Log  2002-02-05 15:06:25   
4          1  FW: Master Termination Log  2002-05-28 07:20:35   

                          from  \
0  Gossett, Jeffrey C. JGOSSET   
1      Theriot, Kim S. KTHERIO   
2      Theriot, Kim S. KTHERIO   
3      Theriot, Kim S. KTHERIO   
4   Kelly, Katherine L. KKELLY   

                                                  to  \
0  ['Giron', 'Darron C. Dgiron', 'Love', 'Phillip...   
1  ['Murphy', 'Melissa Mmurphy', 'Gossett', 'Jeff...   
2  ['Murphy', 'Melissa Mmurphy', 'Anderson', 'Dia...   
3  ['Hall', 'D. Todd Thall', 'Sweeney', 'Kevin Ks...   
4                       ['Germany', 'Chris Cgerman']   

                                                body  \
0  \n\n ---

In [6]:
email_data.head()

Unnamed: 0,thread_id,subject,timestamp,from,to,body,summary
0,1,FW: Master Termination Log,2002-01-29 11:23:42,"Gossett, Jeffrey C. JGOSSET","['Giron', 'Darron C. Dgiron', 'Love', 'Phillip...",\n\n -----Original Message-----\nFrom: =09Ther...,The email thread discusses the Master Terminat...
1,1,FW: Master Termination Log,2002-01-31 12:50:00,"Theriot, Kim S. KTHERIO","['Murphy', 'Melissa Mmurphy', 'Gossett', 'Jeff...",\n\n -----Original Message-----\nFrom: =09Panu...,The email thread discusses the Master Terminat...
2,1,FW: Master Termination Log,2002-02-05 15:03:35,"Theriot, Kim S. KTHERIO","['Murphy', 'Melissa Mmurphy', 'Anderson', 'Dia...",Note to Stephanie Panus....\n\nStephanie...ple...,The email thread discusses the Master Terminat...
3,1,FW: Master Termination Log,2002-02-05 15:06:25,"Theriot, Kim S. KTHERIO","['Hall', 'D. Todd Thall', 'Sweeney', 'Kevin Ks...",\n\n -----Original Message-----\nFrom: =09Panu...,The email thread discusses the Master Terminat...
4,1,FW: Master Termination Log,2002-05-28 07:20:35,"Kelly, Katherine L. KKELLY","['Germany', 'Chris Cgerman']",\n\n -----Original Message-----\nFrom: =09McMi...,The email thread discusses the Master Terminat...


Since email_details contains multiple emails with the same thread_id, we need to group them together. Each thread may have multiple messages, and this grouping is crucial for creating a coherent search experience.

In [11]:
#Grouping emails by thread_id and aggregate the body
email_data_grouped=email_data.groupby("thread_id").agg({
    'body':' '.join,
    'summary':'first',
    'subject':'first',
    'from':'first',
    'to':'first',
    'timestamp':'first'
}).reset_index()

email_data_grouped.head()

Unnamed: 0,thread_id,body,summary,subject,from,to,timestamp
0,1,\n\n -----Original Message-----\nFrom: =09Ther...,The email thread discusses the Master Terminat...,FW: Master Termination Log,"Gossett, Jeffrey C. JGOSSET","['Giron', 'Darron C. Dgiron', 'Love', 'Phillip...",2002-01-29 11:23:42
1,2,I'll be there... I will attend. Suzanne:\nHere...,A lunch meeting has been scheduled for May 5th...,Credit Group Lunch,Tana Jones,['Suzanne Adams'],2000-01-12 05:26:00
2,3,"Hey there; \n""Do you know who your ""big toe"" i...",Ben is updating a friend on his progress with ...,New Address,Benjamin Rogers,"['""CHOBY', 'C."" <G7PWC3@stennis.navy.mil']",2000-01-09 08:26:00
3,4,thanks for the update.\nPL that is ok. Thanks...,The recipient of the email thread initially ex...,EOL Data,Phillip M Love,['Julie Ferrara'],2001-02-01 09:55:00
4,5,I think you can send it just so he has the for...,The email thread discusses the long form confi...,RE: long form confirm/MDEA,Kay Mann,['Reagan Rorschach'],2001-04-27 09:20:00


In [12]:
email_data_grouped.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4167 entries, 0 to 4166
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   thread_id  4167 non-null   int64 
 1   body       4167 non-null   object
 2   summary    4167 non-null   object
 3   subject    4167 non-null   object
 4   from       4167 non-null   object
 5   to         4167 non-null   object
 6   timestamp  4167 non-null   object
dtypes: int64(1), object(6)
memory usage: 228.0+ KB
