<a href="https://colab.research.google.com/github/Jatingpt/GenAI-Cold-Email-Generator-Project/blob/main/Gen_AI_Cold_Email_Generator_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##**Problem Statement: Enhancing Client Acquisition through Targeted Cold Emails in IT Services**

**In today’s competitive IT landscape, software services companies like TechNova Solutions, InfoSphere Technologies, and Atliq Systems are key enablers for global enterprises by providing scalable and skilled technical support. These companies maintain large pools of software engineers and offer their expertise to clients such as StrideWear (a sportswear brand), FinCore Capital (a financial services firm), and many others.**

**To grow their business, these service companies rely heavily on their sales and business development teams to acquire new projects. One of the most effective techniques used is cold emailing—a strategic outreach method to potential clients who may require technical support but haven’t directly engaged with the service provider yet.**

**For example, a Business Development Executive from TechNova Solutions might visit the careers page of StrideWear and notice multiple job openings for roles such as Data Engineers, AI Specialists, or Full Stack Developers. These openings typically indicate an ongoing project or a future initiative that requires specialized skills.**

**Instead of letting StrideWear go through a lengthy recruitment process via job portals like LinkedIn or Naukri, TechNova can pitch a faster and cost-effective solution. Through a cold email, they propose allocating their already-trained and project-ready engineers to support or even fully deliver the required project. They can also include case studies or links to similar past projects to demonstrate credibility.**

* **This approach benefits both parties:**

**TechNova acquires a new project and increases its revenue.**

**StrideWear saves time and resources by avoiding the traditional hiring process and gains access to expert talent immediately.**

**This strategy helps IT service providers penetrate new markets, strengthen partnerships, and offer clients flexibility without long-term employment commitments.**

In [None]:
# Install Required Packages
!pip install langchain langchain-community langchain-core langchain-groq chromadb pandas

# Import Libraries
from langchain_groq import ChatGroq
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser
import pandas as pd
import uuid
import chromadb


Collecting langchain-community
  Downloading langchain_community-0.3.21-py3-none-any.whl.metadata (2.4 kB)
Collecting langchain-groq
  Downloading langchain_groq-0.3.2-py3-none-any.whl.metadata (2.6 kB)
Collecting chromadb
  Downloading chromadb-1.0.4-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.9 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.8.1-py3-none-any.whl.metadata (3.5 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting groq<1,>=0.4.1 (from langchain-groq)
  Downloading groq-0.22.0-py3-none-any.whl.metadata (15 kB)
Collecting build>=1.0.3 (from chromadb)
  Downloading build-1.2.2.post1-py3-none-any.whl.metadata (6.5 kB)
Collecting chroma-hnswlib==0.7.6 (from chromadb)
 



##**Initializing the Language Model (LLM)**
In this step, I’m setting up the LLM (Large Language Model) from Groq using the llama3-70b-8192 model. This model will help generate smart, human-like responses — for example, extracting job info from text or writing personalized cold email

In [None]:
# Initialize LLM
llm = ChatGroq(
    temperature=0,
    groq_api_key='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',
    model_name="llama3-70b-8192"
)


##**Scraping the Job Description from a Webpage**
Here, I'm using WebBaseLoader to load and scrape the content from a job posting URL (in this case, Nike's careers page). This helps us extract the full job description text

In [None]:
# Scrape Job Description
loader = WebBaseLoader("https://jobs.nike.com/job/R-33460")  #for the future reference if the job is expire then use the other job reference link.
page_data = loader.load().pop().page_content


##**Creating a Prompt to Extract Job Information**
In this step, I’m creating a prompt template that will be sent to the LLM. It tells the model to look at the scraped job text and extract key details like the role, experience required, skills, and a short job description — all formatted as clean JSON.

In [None]:

# Prompt Template for Job Info Extraction
prompt_extract = PromptTemplate.from_template(
    """
    ### SCRAPED TEXT FROM WEBSITE:
    {page_data}
    ### INSTRUCTION:
    The scraped text is from the career's page of a website.
    Your job is to extract the job postings and return them in JSON format containing the
    following keys: `role`, `experience`, `skills` and `description`.
    Only return the valid JSON.
    ### VALID JSON (NO PREAMBLE):
    """
)


##**Extracting Job Information Using LLM**
combining the prompt and the LLM to extract structured job details from the scraped webpage content. This step sends the job description to the model and gets a response with useful info like role, skills, and experience in JSON format.

In [None]:
#  Extracting Job Info
chain_extract = prompt_extract | llm
res = chain_extract.invoke(input={'page_data': page_data})

##**Parsing the Extracted Job Info**
In this step, I'm using a JSON parser to convert the model’s response into a Python dictionary.

In [None]:
#  Parse Extracted Info
json_parser = JsonOutputParser()
job = json_parser.parse(res.content)

##**Loading My Portfolio and Setting Up ChromaDB**
First, I’m loading my portfolio CSV file into a DataFrame — this file contains my projects, tech stack, and related links.

Then, I’m creating a vector database using ChromaDB. This will help match job requirements with my most relevant projects by storing and retrieving similar content based on text similarity.

In [None]:
#  Loading the Portfolio CSV
df = pd.read_csv("/content/my_portfolio.csv")  # Must have columns: Techstack, Links

#  Create Vector DB Using ChromaDB
client = chromadb.PersistentClient(path='vectorstore')
collection = client.get_or_create_collection(name="portfolio")

##**Storing Portfolio Projects in ChromaDB**
Here, I’m connecting to a persistent ChromaDB database and creating a collection named "portfolio".

Then, I’m adding my portfolio projects to the database, where each project’s tech stack becomes a searchable document, and its link is stored as metadata.
This is done only once, so the database doesn't get duplicate entries every time the code runs.

In [None]:
#  Creating Vector DB Using ChromaDB
client = chromadb.PersistentClient(path='vectorstore')
collection = client.get_or_create_collection(name="portfolio")

#  Adding to ChromaDB
if collection.count() == 0:
    for _, row in df.iterrows():
        collection.add(
            documents=[row["Techstack"]],
            metadatas={"links": row["Links"]},
            ids=[str(uuid.uuid4())]
        )


##**Querying Top Matching Portfolio Projects**
In this section, I'm extracting and sanitizing the skills from the job description to search for the most relevant projects in my portfolio.

If no skills are found, I set default keywords like AI, Python, and Machine Learning as a fallback.

If skills are present, I ensure they’re in the proper format and non-empty.

The final check ensures that the list of skills isn’t empty before proceeding with the search.

In [None]:
#  Query Top Matching Portfolios
#  Safely select first job if job is a list
if isinstance(job, list):
    job = job[0]  # pick the first job posting

# Safely extract and sanitize skills
raw_skills = job.get('skills', [])
if not raw_skills:
    print(" No skills found in job description.")
    skills_query = ["AI", "Python", "Machine Learning"]  # Fallback keywords
elif isinstance(raw_skills, list):
    skills_query = [str(skill) for skill in raw_skills if isinstance(skill, (str, int, float)) and str(skill).strip()]
else:
    skills_query = [str(raw_skills)] if str(raw_skills).strip() else ["AI", "Python"]

# Final check
if not skills_query:
    raise ValueError(" No valid skills extracted. Cannot proceed with empty query.")


 No skills found in job description.


##**Extracting Relevant Portfolio Links**
In this part, I’m running a query on the ChromaDB vector database using the skills we extracted earlier. This finds the top 2 matching portfolio projects based on text similarity.

After getting the results, I flatten the metadata list to extract the portfolio project links safely. If the link exists in the metadata, I store it in matched_links for future use in the cold email.

In [None]:
#  Flattening the metadata list to extract links safely
results = collection.query(query_texts=skills_query, n_results=2)

metadatas = results.get('metadatas', [])
matched_links = [meta['links'] for sublist in metadatas for meta in sublist if isinstance(meta, dict) and 'links' in meta]


##**Creating a Cold Email Prompt**
In this step, I'm setting up a prompt template that will instruct the LLM to write a cold email. The email will introduce AtliQ, highlight our expertise in AI & software consulting, and offer solutions tailored to the job description from Nike.

The email will also include relevant portfolio links that showcase AtliQ’s capabilities in a similar context. This email will be crafted without any introductory text, keeping it direct and professional.

In [None]:
#  Creating a Prompt for Cold Email
prompt_email = PromptTemplate.from_template(
    """
    ### JOB DESCRIPTION:
    {job_description}

    ### INSTRUCTION:
    You are Jatin, a business development executive at AtliQ. AtliQ is an AI & Software Consulting company dedicated to facilitating
    the seamless integration of business processes through automated tools.
    Over our experience, we have empowered numerous enterprises with tailored solutions, fostering scalability,
    process optimization, cost reduction, and heightened overall efficiency.
    Your job is to write a cold email to the client regarding the job mentioned above describing the capability of AtliQ
    in fulfilling their needs.
    Also add the most relevant ones from the following links to showcase Atliq's portfolio: {link_list}
    Remember you are Jatin, BDE at AtliQ.
    Do not provide a preamble.
    ### EMAIL (NO PREAMBLE):
    """
)

##**Generating the Cold Email**
In this part, I'm using the prompt template created earlier and feeding it into the LLM to generate the cold email. The LLM will use the provided job description and the most relevant portfolio links to write a personalized email from the perspective of Mohan, the Business Development Executive at AtliQ.

The result is a well-crafted email ready to be sent to the client.

In [None]:

#  Generating the Email
chain_email = prompt_email | llm
email_response = chain_email.invoke({
    "job_description": str(job),
    "link_list": matched_links
})


##**Displaying the Final Cold Email**
In this step, I'm printing the final cold email generated by the LLM. This allows me to see the content of the email before sending it. The email is printed with a simple header ("------ FINAL COLD EMAIL ------") to clearly separate it from other outputs.

In [None]:
#  The Final Email after completion of all the tasks
print("------ FINAL COLD EMAIL ------\n")
print(email_response.content)

------ FINAL COLD EMAIL ------

Subject: Revolutionize Your Retail Experience with AtliQ's AI-Powered Solutions

Dear Hiring Manager,

I came across the job posting for a Nike Athlete (Sales Associate) and was impressed by the emphasis on providing exceptional customer experiences. As a Business Development Executive at AtliQ, I believe our AI-driven solutions can help elevate your retail operations and enhance customer engagement.

At AtliQ, we specialize in developing tailored solutions that streamline business processes, reduce costs, and boost overall efficiency. Our expertise in AI, machine learning, and automation can help you:

* Optimize inventory management and supply chain logistics
* Implement personalized customer recommendations and loyalty programs
* Enhance in-store experiences with interactive kiosks and AR/VR technologies
* Analyze customer behavior and preferences to inform data-driven decisions

Our portfolio showcases our capabilities in developing innovative soluti

#**Conclusion**

#**Project: Generative AI-Based Cold Email Generator for IT Service Companies**
##**Objective:**
To develop a **Generative AI-powered cold email generator** that enables IT services companies to automate and personalize outreach to potential clients by analyzing job postings and identifying project needs in real-time.

##**Project Overview:**
This project aims to build a smart cold-email generator using cutting-edge GenAI technologies, including **LLMs (via Groq API), ChromaDB** (Vector Database), and web scraping tools. The system is designed to simulate the role of a Business Development Executive in IT services companies by automating the process of lead generation and personalized cold outreach.

##**How It Works:**
1. **Web Scraping Career Pages:**
The system navigates to the careers page of target companies (e.g., product-based firms like StrideWear, FinCore, etc.) and scrapes job listings.

2. **Keyword & Intent Extraction using LLMs:**
Using LLMs (hosted on Groq API), it extracts critical information from job postings, such as:

* Role Titles

* Required Skills and Experience

* Project Descriptions or Domain

* Urgency or Hiring Trends

3. **Contextual Understanding and Email Generation:**
Based on extracted information, the LLM formulates a **personalized cold email**, offering the company a plug-and-play solution:

Instead of hiring an individual, they can **contract a pre-vetted team** or expert from the IT services company.

The email also highlights **relevant past project links or case studies** from the service provider’s portfolio.

4. **Vector Search for Relevancy (ChromaDB):**
ChromaDB helps in searching through internal project records to find the most contextually relevant case studies, which are then embedded into the email to build credibility.

##**Business Impact:**
**For Service Companies:**
Saves time for sales teams, scales outreach, and increases the chances of acquiring high-value projects.

**For Product Companies:**
Reduces hiring efforts and onboarding time by getting access to skilled professionals on-demand.

**Tech Stack:**
* **Groq API** -  for high-speed LLM inference and prompt execution

* **ChromaDB** - for semantic search and project portfolio retrieval

* **LangChain** - for chaining logic and query orchestration

* **Python + Scrapy** - for web scraping

* **FastAPI/Flask** - for serving as an API endpoint