# Cold Email Agent
In this notebook, we will create an autonomous agent which will automate our cold email workflow using Langchain

## Features
- Classify emails
- Reply in your tone
- Can be easily integrated with smartlead

Let's start by installing the required dependencies

In [None]:
# Install dependencies
%pip install requests html2text langchain openai tiktoken

In [None]:
# Get environment variables
import os
secret = os['SMARTLEAD_API_KEY']
openai_secret = os["OPENAI_KEY"]
apollo_secret = os['APOLLO_API_KEY']

Now let's import required dependencies and also initiate models and clients

In [None]:
# Import dependencies
import requests
import csv
import json
import html2text
import ast
from openai import OpenAI as OpenAIPython
from langchain.llms import OpenAI
from langchain.schema import SystemMessage
from langchain.prompts import MessagesPlaceholder
from langchain.memory import ConversationBufferWindowMemory, ConversationSummaryBufferMemory
from langchain.agents import initialize_agent, AgentType
from langchain.chat_models import ChatOpenAI
OpenAI_LLM = OpenAI(temperature=0.6,api_key=openai_secret)
ChatOpenAI_LLM = ChatOpenAI(temperature=0, model="gpt-4",api_key=openai_secret)
client = OpenAIPython(
    # This is the default and can be omitted
    api_key=openai_secret,
)
from datetime import datetime
current_datetime = datetime.utcnow()

## Preparing our knowledge base
To make our agent reply in our tone, we first have to feed our knowledge base to agent so that it can take reference from that and generate reply

We will provide our past email/reply pair and some FAQs to agent and we will use vector embeddings to get the most similar type of reply for current message from past emails and then generate similar or different response

To get email/reply pair for any lead, we will need to get the conversation history for that lead from smartlead and to get that we first need to specify for which campaign we want to get the conversation history.

You can get the campaign information using `list all campaigns` route from smartlead API.

In [None]:
# Get all the campaigns
response = requests.get(f"https://server.smartlead.ai/api/v1/campaigns?api_key={secret}")
response = response.json()
print(response)

From the given array of objects, you can get the campaign id of your campaign.

Once you got the campaign id, it’s time to create our knowledge base.

You can get all the leads who have replied to your cold email from smartlead dashboard by going to your campaign -> inbox. In inbox, filter the leads who have replied to your email from sequence status filter.

We have all the lead information which we downloaded from smartlead in “Lead_Data.csv” file and after formatting each reply, we will store it in a new csv file called “final.csv” and it will have only 2 columns called `Message and Reply`

In [None]:
with open('Lead_Data.csv', mode='r') as file:
    # Create a CSV reader object
    csv_reader = csv.reader(file)
    # Skip the header row
    next(csv_reader)
    with open('final.csv', mode='w', newline='') as final_csv:
      fieldnames = ['Message', 'Reply']
      writer = csv.DictWriter(final_csv, fieldnames=fieldnames)
      writer.writeheader()
      # Iterate over each row in the CSV file
      for row in csv_reader:
		      # Get the lead id from smartlead API
          lead_info = requests.get(f"https://server.smartlead.ai/api/v1/leads/?api_key={secret}&email={row[0]}")
          lead_info = lead_info.json()
          # Get message history from smartlead API
          message_history = requests.get(f"https://server.smartlead.ai/api/v1/campaigns/{campaign_id}/leads/{lead_info['id']}/message-history?api_key={secret}")
          message_history = message_history.json()
          # traverse message history array
          for index, message in enumerate(message_history["history"]):
            if(message["type"] == "REPLY" and index != len(message_history["history"]) - 1):
	            # The conversation you will get from smartlead will be in html so we need to convert it into plain text
              plain_text = html2text.html2text(message["email_body"])
							# Prompt for LLM to format the email thread 
              prompt = f"""
                  Email Thread:
                  ---
                  {plain_text}
                  ---
                  You have given a email thread as a plain text and you have to return the latest email from it

                  You have to follow these steps to do it:
                  Step-1: Get the text which don't starts with '>' because every other text which starts with '>' is a old message in thread
                  (It will be at the starting of text and older messages will be at below this latest message)

                  The email thread format typically look like this:
                  ---
                  Hey john,
                  I am interested
                  Thanks,
                  shivam
                  > sentence 1 ....
                  > sentence 2 ....
                  > ...
                  >> sentence from older emails ...
                  >> sentence from older emails ....
                  >> ....
                  ---
                  For above example, the email message content will be:
                  Hey john,
                  I am interested
                  Thanks,
                  Shivam

                  Note: Please note that the above message content was just an example and your respond must be related to given plain text

                  Step-2: Once you got the latest email then get the message content from it and remove any extra blank space or unnecessary content from email
                  After these steps, Return the email message content in response

                  RESPONSE (Don't return anything except email message):
                """
              email_content = client.chat.completions.create(
                  model="gpt-4",
                  messages=[
                      {"role": "user", "content": prompt}
                  ]
              )
              email_content = email_content.choices[0].message.content
              reply = html2text.html2text(message_history["history"][index+1]["email_body"])
              # Add the formatted data into new csv file
              writer.writerow({'Message': email_content, 'Reply': reply})

### Creating FAQs
Now let’s create our 2nd knowledge base which is FAQs about you and your organization

To create faqs, we will get all the replies from the “final.csv” file which we just created and generate some quetions and answers about you and your organization from those replies so that our agent can use this information as a reference. You can also use your own FAQs instead of generating them from emails.

Here we will use textsplitter from Langchain because we can’t pass this whole csv content at once in prompt so we will convert the csv content in chunks and generate FAQs for that specific chunk and at last we will combine every chunk faqs in single array of json.

Let’s write the code for it

In [None]:
from langchain.prompts import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain
llm = OpenAI(temperature=0,api_key=openai_secret)
def load_csv(file_path):
    # Create a list to hold dictionaries
    data_list = []

    # Open the CSV file and read its content
    with open(file_path, 'r') as csv_file:
        csv_reader = csv.DictReader(csv_file)

        # For each row, append it as a dictionary to the list
        for row in csv_reader:
            data_list.append(row)

    return data_list

def extract_faq(text_data):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=30,
        length_function = len,
        is_separator_regex=False)

    texts = text_splitter.split_text(text_data)
    docs = text_splitter.create_documents(texts)
    print(docs)
    map_prompt = """
    PAST EMAILS:
    {text}
    ----

    You are a smart AI assistant, above is some past emails from Rohan (Founder and CEO of Ionio),
    your goal is to learn & extract common FAQ about Rohan and Ionio
    (include both question & answer, return results in JSON)

    Response MUST be like this:
    Question: Question 1
    Answer: Answer 1

    Question: Question 2
    Answer: Answer 2
    """
    map_prompt_template = PromptTemplate(template=map_prompt, input_variables=["text"])

    combine_prompt = """
    The following are set of FAQs about Rohan (Founder and CEO of Ionio):
    {text}
    Take every question and answer and combine them into a final array of faq,
    include both question & answer in json format
    Response should be atleast 3000 characters long and ask incrementally better questions and give better answers

    Every json object will have these 2 fields:
    Question:
    Answer:

    NOTE: IGNORE THE UNTERMINATED STRINGS AND DON'T ADD THEM IN ARRAY BUT ADD ALL OTHER COMPLETE QUESTIONS IN ARRAY
    Make sure the response array is parsable to json otherwise the code will break

    array of FAQ:
    """
    combine_prompt_template = PromptTemplate(template=combine_prompt, input_variables=["text"])
		# we will use map_reduce chain here
    summary_chain = load_summarize_chain(llm=llm,
                                        chain_type='map_reduce',
                                        map_prompt=map_prompt_template,
                                        combine_prompt=combine_prompt_template,
                                        verbose=True,

                                        )

    output = summary_chain.run(docs)
    print("--------- OUTPUT -------------")
    print(output)
    # faqs = json.loads(output)
    
# Function to save json object in csv file
def save_json_to_csv(data, file_name):
    with open(file_name, mode='w', newline='', encoding='utf-8') as file:
        # Get the keys (column names) from the first dictionary in the list
        fieldnames = data[0].keys()

        # Create a CSV dict writer object
        writer = csv.DictWriter(file, fieldnames=fieldnames)

        # Write the header row
        writer.writeheader()

        # Write the data rows
        for entry in data:
            writer.writerow(entry)


# Print or save the JSON data
past_emails = load_csv("final.csv")

# Extracting Rohan's replies
replies = [entry["Reply"] for entry in past_emails]
replies_string = json.dumps(replies)
extract_faq(replies_string)

## Creating our agent

Now we have all the things ready which we want to create our agent so let’s start creating our agent by importing required dependencies

### Email Categorization Tool

Once we get the email conversation from smartlead, the first task of agent is to categorize this email based on the email conversation and then decide whether to reply to this email or not. We will use gpt-4 as our LLM model here to categorize these emails.

This tool will take only 1 input parameter:

- **Conversation:** Past email conversation between rohan and lead as an array of json objects where each object contains sender and message field

We will categorize these emails in total **8 categories** but you can customize these categories according to your needs and use case. Let’s take a look at prompt which we are going to pass to our LLM.

In [None]:
from langchain.pydantic_v1 import BaseModel, Field
from typing import Type, List
from langchain.tools import BaseTool
# Input class for tool so that it can follow strict input parameter schema
class CategorizeEmailInput(BaseModel):
    conversation: str = Field(description="Email conversation array")

class CategorizeEmailTool(BaseTool):
		# Provide proper name and description for your tool
    name = "email_categorizer_tool"
    description = "use this tool when have email conversation history and you want to categorize this email"
    args_schema: Type[BaseModel] = CategorizeEmailInput

    def _run(self, conversation: str):
      prompt = f"""
        Email Conversation History:
        ---
        {conversation}
        ---
        You have given an array of conversation between Rohan Sawant and a client
        Your goal is to categorize this email based on the conversation history from the given categories:

        1. Meeting_Ready_Lead: they have shown positive intent and are interested in getting on a call
        2. Power: If they’re interested and we want to push for a call
        3. Question: If they have any question regarding anything
        4. Unsubscribe: They want to unsubscribe themselves from our email list
        5. OOO: They are out of office
        6. No_Longer_Works: They no longer works in the company
        7. Not_Interested: They are not interested
        8. Info: these are emails that don't fit into any of the above categories.

        Note: Your final response MUST BE the category name ONLY

        RESPONSE:
      """
      message = client.chat.completions.create(
          model="gpt-4",
          messages=[
              {"role": "user", "content": prompt}
          ]
      )
      category = message.choices[0].message.content
      return category

    def _arun(self, url: str):
        raise NotImplementedError(
            "categorise_email does not support async")

### Company Search Tool

To give more personal touch to every email, it is good to search about lead and it’s organization first before booking a meet with them and if you mention it in your email reply then it can make a good impact which shows that the person sending an email is real person who searched about your organization and is interested to work with you 😁

To search about any organization, we will use [apollo API](https://www.apollo.io/product/api) which will give us information about any organization from the lead email. 

This tool will take 2 parameters as input:

- **Email:** Email of lead
- **Category:** Category of email conversation (we are passing this because we want this tool to run only after categorization is done)

So let’s create our tool!

In [None]:
class CompanySearchToolInput(BaseModel):
    email: str = Field(description="Email of sender")
    category: str = Field(description="Category of email")

class CompanySearchTool(BaseTool):
    name = "company_search_tool"
    description = "use this tool when you want to get information about any company"
    args_schema: Type[BaseModel] = CompanySearchToolInput

    def _run(self, email: str, category: str):
        data = {
            "api_key":apollo_secret,
            "email":email
        }
        response = requests.post(f"https://api.apollo.io/v1/people/match",data=data)
        response = response.json()
        return response["person"]["organization"]["short_description"]

    def _arun(self, url: str):
        raise NotImplementedError(
            "categorise_email does not support async")

### Email Writer Tool

Now it’s time to create the core tool of whole workflow because the main goal of this agent is to mimic the tone of rohan and reply in a way in which rohan replies to his cold emails. So we will have to provide all the required information and a detailed prompt to this tool so that we can get more better results.

We can write a code for this entire architecture where first we have to store our CSV knowledge base in a vector database and then based on the user input we can perform **semantic search** on the message and FAQs. Once we got all the similar email and reply pairs, we can pass this to LLM and it will construct an email response for you which mimics the tone of rohan.

I am going to use [relevance](https://relevance.com) platform where we can create our custom AI tools and host them. The main advantage of relevance is it comes with pre built vector database where you can store your data and then use that data in your tools and you don’t need to worry about any vector embedding or semantic search process.

[Read the blog](https://www.ionio.ai/blog/how-to-create-an-ai-agent-to-manage-your-email-inbox-and-reply-to-your-cold-email-code-included) if you want to take a look at how to make this tool.

In [None]:
class EmailWriterToolInput(BaseModel):
    latest_reply: str = Field(description="Latest reply from the prospect")
    conversation_history: str = Field(description="Array of conversation history")
    sender: str = Field(description="Name of sender")
    company_info: str = Field(description="Information about sender's company")

class EmailWriterTool(BaseTool):
    name = "email_writer_tool"
    description = "use this tool when you have given a email and you have to construct a reply for it"
    args_schema: Type[BaseModel] = EmailWriterToolInput

    def _run(self, latest_reply: str, conversation_history: str, sender: str,company_info: str):
        # making api call to relevance tool
        headers = {
            "Content-Type": "application/json"
        }
        data = {
            "params": {
                "client_email": latest_reply,
                "sender":sender,
                "conversation_history":conversation_history,
                "company_description":company_info
            },
            "project": "Your_Project_Id"
        }

        res = requests.post("https://api-xxxxx.tryrelevance.com/latest/studios/xxxxxx-xxxx-xxxx-xxxxxx/trigger_limited",data=json.dumps(data),headers=headers)
        res = res.json()
        return res["output"]["answer"]

    def _arun(self, url: str):
        raise NotImplementedError(
            "email writer tool does not support async")


### Email Sender Tool

Let’s create our last tool which is email sender tool (we can trigger this function with agent too but instead of passing so many parameters to agent, we can just call it after agent gives us reply) which will help us to send emails back to leads using [smartlead API](https://api.smartlead.ai/reference/reply-to-lead-from-master-inbox-via-api). we will require some parameters mentioned on their API documentation page to send an email.

In [None]:
def EmailSenderTool(campaign_id,email_stats_id,email_body,reply_message_id,reply_email_body,email):
  url = f"https://server.smartlead.ai/api/v1/campaigns/{campaign_id}/reply-email-thread"
  data = {
    "email_stats_id": email_stats_id,
    "email_body": email_body,
    "reply_message_id": reply_message_id,
    "reply_email_time": current_datetime.strftime('%Y-%m-%dT%H:%M:%S.%fZ'),
    "reply_email_body": reply_email_body,
    "cc": email,
  }
  response = requests.post(url,data=data)
  response = response.json()
  print("Email sent to lead!")

Now let’s initialize our agent by providing a system prompt, memory and list of tools to it!

In [None]:
# Creating agent
system_message = SystemMessage(
    content="""
    You are an email inbox assistant of an Rohan sawant who is founder and CEO of Ionio,
    Which provides AI-solutions to technical and non-technical organizations
    Rohan have sent a cold email to some leads and you have provided a conversation history between rohan sawant and the lead

    Follow these steps while generating email reply:
    Step-1: First categorize the email based on given conversation history and get the category of email.
    Step-2: check the sender of the last message and if the sender is not rohan sawant then goto step-3
    If the sender of last message is rohan sawant then you don't need to construct a reply

    Step-3: Once you get the category, follow these conditions while constructing a reply email:
    1. If category is "Meeting_Ready_Lead" or "Power", ONLY THEN search about company using lead's email and then construct the reply email
    2. For all the other categories, DON'T construct a reply

    Your final response MUST BE in json with these keys:
    reply: Constructed email reply for positive email (leave it blank if no reply constructed or the last sender is rohan sawant)
    category: Category of given email based on email conversation history

    RESPONSE(Don't return anything except the json object):
    """
)
agent_kwargs = {
    "system_message": system_message,
}
memory = ConversationBufferWindowMemory(
    memory_key='memory',
    k=1,
    return_messages=True
)

tools = [
    CategorizeEmailTool(),
    CompanySearchTool(),
    EmailWriterTool()
]

agent = initialize_agent(
    tools,
    llm=ChatOpenAI_LLM,
    agent=AgentType.OPENAI_FUNCTIONS,
    verbose=True,
    agent_kwargs=agent_kwargs,
    memory=memory,
    handle_parsing_errors=True
)


Now we are ready to use this agent and we will take 2 things from user before running this agent:

- **Campaign ID:** Campaign id of the campaign for which you want to run this agent
- **Lead CSV:** The CSV file of leads whom you want to reply or categorize their replies (follow the same procedure which we discussed while building our email/reply knowledge base to get the csv file)

Once we get the CSV file of leads, we will traverse every lead conversation one by one and run agent for every lead and we will reply to them if needed.

In [None]:
# Run the agent for your leads
campaign_id = input("Enter campaign id:")
with open('Campaign_Leads.csv', mode='r') as file:
  # Create a CSV reader object
  csv_reader = csv.reader(file)
  for index,row in enumerate(csv_reader):
    if index == 0:
      continue
    # Get lead ID
    lead_info = requests.get(f"https://server.smartlead.ai/api/v1/leads/?api_key={secret}&email={row[1]}")
    lead_info = lead_info.json()
    # Get conversation history of lead
    message_history = requests.get(f"https://server.smartlead.ai/api/v1/campaigns/{campaign_id}/leads/{lead_info['id']}/message-history?api_key={secret}")
    message_history = message_history.json()
    message_history = message_history["history"]
    conversation_history = []
    # Format every message in conversation history
    for message in message_history:
      plain_text = html2text.html2text(message["email_body"])
      prompt = f"""
        Email Thread:
        ---
        {plain_text}
        ---
        You have given a email thread as a plain text and you have to return the latest email from it

        You have to follow these steps to do it:
        Step-1: Get the text which don't starts with '>' because every other text which starts with '>' is a old message in thread
        (It will be at the starting of text and older messages will be at below this latest message)

        The email thread format typically look like this:
        ---
        Hey john,
        I am interested
        Thanks,
        shivam
        > sentence 1 ....
        > sentence 2 ....
        > ...
        >> sentence from older emails ...
        >> sentence from older emails ....
        >> ....
        ---
        For above example, the email message content will be:
        Hey john,
        I am interested
        Thanks,
        Shivam

        Note: Please note that the above message content was just an example and your respond must be related to given plain text

        Step-2: Once you got the latest email then get the message content from it and remove any extra blank space or unnecessary content from email
        After these steps, Return the email message content in response

        RESPONSE (Don't return anything except email message):
      """
      email_content = client.chat.completions.create(
          model="gpt-4",
          messages=[
              {"role": "user", "content": prompt}
          ]
      )
      email_content = email_content.choices[0].message.content
      convo = {
          "sender": "rohan sawant" if message["type"] == "SENT" else row[0],
          "message": email_content
      }
      conversation_history.append(convo)
    # Prompt for our agent
    prompt = f"""
      Email conversation history:
      ---
      {conversation_history}
      ---
      Lead Name: {row[0]}
      Lead Email: {row[1]}

      Sender of last message: {conversation_history[len(conversation_history) - 1]["sender"]}
    """
    response = agent({"input": prompt})
    response = json.loads(response["output"])
	  # If there is reply which needs to be send then use email sender tool to send email
    history = conversation_history[len(conversation_history)-1]
    if response['reply'] != "":
       EmailSenderTool(campaign_id=campaign_id,email_stats_id=history["stats_id"],email_body=response["output"],reply_message_id=history["message_id"],reply_email_body=history["email_body"],email=row[1])