***AI_mail_writer***


This file writes Emails using OpenAI API and Langchain and saves new emails in the "Data/Training_Data/Spam" folder. Once you run the file, the Popup window will ask you to input the required number of mails you want to generate so please key in the amount. Also, The program pushes Langchain summarised data into the OpenAI API GPT4 model, and this is a paid service. The API key is saved in "Data/APIKEY/key.txt" file. Please feel free to use it as much as you want for your marking process. in this process, the author used multiple Imaginary user profiles and course details to feed to the AI model to generate outputs. That information is saved in text files in the “Data/users” and “Data/Course_list” folders. All those names and details are Imaginary.

In [17]:
# import libraries
import pandas as pd
import random
import time
import os
import nltk
from langchain.document_loaders import DirectoryLoader, TextLoader, PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.prompts import PromptTemplate
from langchain.document_loaders import UnstructuredURLLoader
from langchain.chains.summarize import load_summarize_chain
from langchain.llms import OpenAI
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import SeleniumURLLoader,PlaywrightURLLoader
from langchain.chat_models import ChatOpenAI
from pandas.io.formats.format import FilePath

In [18]:
# text splitter for the text
text_splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=0)

In [20]:
# Please set your Data Directory path
dirpath = "Data/"

#### Import Api key which is generated from OpenAI


In [21]:
# Import API key. from key.txt file in Data/APIKEY folder
key = open(f"{dirpath}APIKEY/key.txt", "r")
OPENAI_API_KEY=key.read()
key.close()


### Course details

In [None]:
#  Function to get the course details
def get_product_details(filepath):
    df_courses = pd.read_csv(f"{filepath}course_list.csv")
    rand =random.randint(0, len(df_courses) - 1)
    course= df_courses.iloc[rand].Course
    courseURL = df_courses.iloc[rand].URL
    # print(course+ " \n" + courseURL)

    product_loader = TextLoader(f"{filepath}{course}.txt")
    Product_details = product_loader.load()
    Product_details_text = text_splitter.split_documents(Product_details)
    return (course, courseURL, Product_details_text)


In [23]:
# overview of the company which will be used in the email
company_information = """Step into the world of cybersecurity with XYZ-Edu, a veteran-founded academy. Gain hands-on expertise in penetration testing,
 security training, and compliance services. Our certified instructors, equipped with government clearances, will guide you to become a skilled Practical Network Penetration Tester (PNPT).
 Launch your cyber career with confidence at XYZ-Edu!"""

In [24]:
#  Langchain map prompt template
map_prompt = """Below is the details about the {course}
Write a concise summary about {course}.
{text}

% CONCISE SUMMARY:"""
map_prompt_template = PromptTemplate(template=map_prompt, input_variables=["text", "course"])

In [25]:
# Langchain combine prompt template
combine_prompt = """
*   Your goal is to write a personalised non-spammy outbound email from {sales_rep}, a sales rep at {company} to {target_person}. Please note that the email should offer a 20% /discount on this course for the first 100 students.
*   Start the email how {course} can helps to {target_person} to achieve his interests/improve expertise.
*   An excellent personalised email, and how important it is to use {course} to {target_person} according to the {target_Person_details_text}.
*   Be sure to use value selling and avoid spammy words: A sales methodology focusing on how your product or service will provide value to the customer instead of focusing on price or solution.
*   The email should be written in British English, Start with "Hi" and at the end of the email, say "Thank you", and should be written in human language style, and it has to pass through the AI detection tools.
*   Email length should be around 100-200 words long.
*   Important: Please insert this course URL: {courseURL} into the body of the email.


% INFORMATION ABOUT {target_person}:
{target_Person_details_text}

% INFORMATION ABOUT {course}:

{text}

# % INCLUDE THE FOLLOWING DETAILS IN YOUR RESPONSE:
 Necessary: Please insert this link  “https://xyz-edu.com/newsletter/unsubscribe”  into the body of the email.

% YOUR RESPONSE:
"""
combine_prompt_template = PromptTemplate(template=combine_prompt, input_variables=["sales_rep", "company", "course", "target_person", "target_Person_details_text", "text", "courseURL"])


In [26]:
# Define the LLM and the chain
llm = ChatOpenAI(model_name="gpt-4", temperature=0.8, openai_api_key=OPENAI_API_KEY)

chain = load_summarize_chain(llm,
                             chain_type="map_reduce",
                             map_prompt=map_prompt_template,
                             combine_prompt=combine_prompt_template,
                             verbose=True
                            )


In [27]:
#  Define the target person and load the target person's profile and details

def get_target_person(path_to_file):
    df_target_person = pd.read_csv(f"{path_to_file}users/target_person.csv")
    random_person = random.randint(0, len(df_target_person)-1)
    target_person = df_target_person.iloc[random_person].Name
    print (f"Target person: {target_person}")
    target_person_profile = df_target_person.iloc[random_person].Profile
    print(target_person_profile)
    Target_loader = TextLoader(f"{path_to_file}users/{target_person}.txt")
    target_Person_details = Target_loader.load()
    print(target_Person_details)
    target_Person_details_text = text_splitter.split_documents(target_Person_details)
    return target_person, target_Person_details_text



In [28]:
# Target Person Selection
def writemail(dirpath):
    target_person, target_Person_details_text = get_target_person(dirpath)
    print("\n\n")

    # Course Details for target person
    courses=get_product_details(f"{dirpath}Course_list/")
    course = courses[0]
    courseURL = courses[1]
    Product_details_text=courses[2]
    print (f"Course : {course}")
    print (f"Course URL : {courseURL}")
    print (f'You have {len(Product_details_text)} Course document(s) in your data')

    output = chain({"input_documents": Product_details_text,
                    "company": "XYZ-Edu", \
                    "company_information" : company_information,
                    "sales_rep" : "Greg", \
                    "course": course, \
                    "target_person": target_person, \
                    "target_Person_details_text": target_Person_details_text, \
                    "courseURL": courseURL
                })

    print (output['output_text'])
    filename = time.strftime("%Y%m%d-%H%M%S")
    f = open(f"{dirpath}AI_dataset/Spam/{filename}.txt", "w")
    f.write(output['output_text'])
    f.close()


count = input("how many mails do you want to write?")
for i in range(int(count)):
    writemail(dirpath)
    print(f"{i+1} number of mail(s) written")
print("All Done")


Target person: Sarah Lee
https://www.linkedin.com/in/sarahlee
[Document(page_content='Name: Sarah Lee\nHeadline: Cyber Security Student | Aspiring Cyber Defense Specialist\n\nSummary:\nMotivated Cyber Security student pursuing a career in cyber defense. Adept at identifying and neutralizing cyber threats to safeguard digital assets and privacy.\n\nEducation:\n\nBachelor of Science in Cyber Security | GHI University | Graduation: August 2024\nCertifications:\n\nCompTIA Security+ | CompTIA | Certified in February 2023\nSkills:\n\nCyber Defense\nSecurity Monitoring\nIntrusion Detection\nThreat Hunting\nSecurity Tools: Snort, Zeek\nProgramming: Python, C\nContact:\n\nEmail: sarah.lee@email.com\nLinkedIn: linkedin.com/in/sarahlee', metadata={'source': 'Data/users/Sarah Lee.txt'})]



Course : Practical Malware Analysis & Triage
Course URL : https://academy.xyz-edu.com/p/practical-malware-analysis-triage
You have 2 Course document(s) in your data


[1m> Entering new MapReduceDocumentsChain 