<a href="https://colab.research.google.com/github/AnushkaKalra/ai-powered-outreach-personalizer/blob/main/AI_Outreach_Personalizer_Tool.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install groq pandas python-dotenv tqdm

import pandas as pd
import json
from tqdm import tqdm
from groq import Groq
import os

from google.colab import userdata

# Load API Key
from dotenv import load_dotenv
load_dotenv()

client = Groq(api_key=userdata.get('GROQ_API_KEY'))




In [None]:
candidate_df = pd.read_csv(r"/weekday_candidates_sample.csv")

startup_df = pd.read_csv(r"/startup_data_sample.csv")

print("Candidates loaded: ", len(candidate_df))
print("Startups loaded: ", len(startup_df))

Candidates loaded:  30
Startups loaded:  10


In [None]:
candidate_df.head()

Unnamed: 0,name,current_role,company,skills,profile_links,linkedin_summary,experience,tags
0,Candidate 1,Backend Engineer,Hasura,"Go, Rust, GraphQL, Kubernetes",github.com/candidate1 ; candidate1.dev ; candi...,Engineer passionate about distributed systems ...,1,"backend, systems, infra"
1,Candidate 2,Frontend Engineer,Razorpay,"React, TypeScript, Next.js, UI/UX",github.com/candidate2,,2,"frontend, react, design"
2,Candidate 3,ML Engineer,Swiggy,"Python, PyTorch, NLP, LLMs",,ML engineer working on deep learning and model...,3,"ml, ai, nlp"
3,Candidate 4,Data Scientist,CRED,"SQL, Python, Analytics, Dashboarding",medium.com/@candidate4,Data scientist working on product analytics an...,4,"data, analytics, bi"
4,Candidate 5,Fullstack Developer,Meesho,"Node.js, React, Postgres",,,5,"fullstack, web, product"


In [None]:
startup_df.head()

Unnamed: 0,startup_name,role_title,role_summary,tech_stack,why_this_role_is_cool,what_founder_is_looking_for
0,Finverse,Backend + Infra Engineer,"Build low-latency financial APIs, work on dist...","Go, Rust, Kubernetes, Postgres",You'll own infra that processes 50M+ API calls...,Deep systems thinker with experience in concur...
1,ByteCraft,Frontend Engineer,Build customer-facing dashboards and UI workfl...,"React, TypeScript, Next.js",You'll design UI used by thousands of customer...,Engineer with strong React skills and eye for ...
2,AstraAI,ML Engineer,Train and optimize NLP/LLM models for enterpri...,"Python, PyTorch, HuggingFace, Transformers",You'll help fine-tune proprietary LLM models f...,Someone who understands ML experimentation and...
3,FlowMetrics,Data Scientist,"Build dashboards, run experiments, and drive i...","SQL, Python, dbt, BigQuery",Your work will directly shape product decisions.,Analytical thinker who loves data storytelling.
4,ZenPay,Fullstack Engineer,"Work on checkout, payments infra, and admin da...","Node.js, React, MongoDB, Redis",You'll build features used by 20M+ users.,Fullstack generalist comfortable with fast ite...


In [None]:
selected_role = startup_df.iloc[0]
startup_context = {
    "startup_name": selected_role["startup_name"],
    "role_title": selected_role["role_title"],
    "role_summary": selected_role["role_summary"],
    "tech_stack": selected_role["tech_stack"],
    "why_cool": selected_role["why_this_role_is_cool"],
    "founder_needs": selected_role["what_founder_is_looking_for"]
}

print("Using role:", startup_context["role_title"], "at", startup_context["startup_name"])


Using role: Backend + Infra Engineer at Finverse


In [None]:
def clean_text(value):
    if pd.isna(value):
        return ""
    return str(value).strip()

def normalize_skills(skills):
    skills = clean_text(skills).lower()
    return ", ".join([s.strip() for s in skills.split(",")])

def infer_tags_from_skills(skills):
    skills = skills.lower()
    tags = []

    backend_keywords = ["go", "rust", "java", "python", "node", "graphql", "postgres"]
    frontend_keywords = ["react", "javascript", "typescript", "next", "ui", "frontend"]
    ml_keywords = ["pytorch", "nlp", "llm", "machine learning", "transformer"]
    data_keywords = ["sql", "analytics", "tableau", "dbt", "bigquery"]
    devops_keywords = ["aws", "docker", "kubernetes", "terraform", "ci/cd"]

    if any(k in skills for k in backend_keywords):
        tags.append("backend")
    if any(k in skills for k in frontend_keywords):
        tags.append("frontend")
    if any(k in skills for k in ml_keywords):
        tags.append("ml")
    if any(k in skills for k in data_keywords):
        tags.append("data")
    if any(k in skills for k in devops_keywords):
        tags.append("devops")

    return ", ".join(tags) if tags else ""

def infer_seniority(exp):
    try:
        exp = float(exp)
    except:
        return "unknown"

    if exp < 2:
        return "junior"
    elif exp < 5:
        return "mid"
    else:
        return "senior"

def parse_candidate_row(row):
    skills_clean = normalize_skills(row["skills"])
    tags_auto = infer_tags_from_skills(skills_clean)

    return {
        "name": clean_text(row["name"]),
        "current_role": clean_text(row["current_role"]),
        "company": clean_text(row["company"]),
        "skills": skills_clean,
        "profile_links": clean_text(row["profile_links"]),
        "linkedin_summary": clean_text(row["linkedin_summary"]),
        "experience": clean_text(row["experience"]),
        "seniority": infer_seniority(row["experience"]),
        "tags": row["tags"] if clean_text(row["tags"]) else tags_auto
    }


In [None]:
def build_prompt(c, s):
    prompt = f"""
You are an expert technical recruiter and founder writing deeply personalized outreach messages
for engineering candidates.

CANDIDATE DATA:
Name: {c['name']}
Role: {c['current_role']}
Company: {c['company']}
Skills: {c['skills']}
Profile Links: {c['profile_links']}
LinkedIn Summary: {c['linkedin_summary']}
Experience: {c['experience']}
Seniority: {c['seniority']}
Tags: {c['tags']}

STARTUP ROLE DATA:
Startup: {s['startup_name']}
Role: {s['role_title']}
Role Summary: {s['role_summary']}
Tech Stack: {s['tech_stack']}
Why This Role Is Cool: {s['why_cool']}
What Founder Is Looking For: {s['founder_needs']}

First analyze the data. Then output JSON:

{{
  "subject_line": "",
  "variant_1_short_crisp": "",
  "variant_2_warm_personal": "",
  "variant_3_technical": "",
  "personalization_reasoning": ""
}}

Rules:
- No clichés.
- No invented info.
- Use EXACT keywords from skills.
- Tone = founder writing personally.
- Include a clear CTA.
"""
    return prompt


In [None]:
# GROQ API Call function

def generate_outreach(prompt):
    response = client.chat.completions.create(
        model="llama-3.3-70b-versatile",
        messages=[
            {"role": "system", "content": "You MUST output only valid JSON. No commentary, no analysis, no backticks."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.3,
        response_format={"type": "json_object"}
    )

    # Groq returns the message content already as a JSON string
    # raw = response.choices[0].messages["content"]

    raw = response.choices[0].message.content

    return json.loads(raw)


In [None]:
# running on all candidates

results = []

for idx, row in tqdm(candidate_df.iterrows(), total=len(candidate_df)):
    c = parse_candidate_row(row)
    prompt = build_prompt(c, startup_context)
    output = generate_outreach(prompt)

    results.append({
        "candidate": c["name"],
        "subject_line": output.get("subject_line"),
        "variant_1": output.get("variant_1_short_crisp"),
        "variant_2": output.get("variant_2_warm_personal"),
        "variant_3": output.get("variant_3_technical"),
        "reasoning": output.get("personalization_reasoning")
    })

100%|██████████| 30/30 [00:46<00:00,  1.54s/it]


In [None]:
prompt = build_prompt(parse_candidate_row(candidate_df.iloc[0]), startup_context)
raw = generate_outreach(prompt)
raw


{'subject_line': 'Join Finverse: Build Low-Latency Financial APIs with Go, Rust, and Kubernetes',
 'variant_1_short_crisp': "Hi Candidate 1, I'm the founder of Finverse. We're building low-latency financial APIs with Go, Rust, and Kubernetes. I saw your experience with go, rust, graphql, and kubernetes and thought you'd be a great fit. Let's discuss how you can own infra that processes 50M+ API calls/day. Check out finverse.com and let's schedule a call.",
 'variant_2_warm_personal': "Hi Candidate 1, I came across your profile on github.com/candidate1 and was impressed by your passion for distributed systems and backend performance. As someone who's worked with go, rust, graphql, and kubernetes, I think you'd love our work at Finverse. We're looking for a deep systems thinker to help us build core infra. Would you be open to exploring our Backend + Infra Engineer role?",
 'variant_3_technical': "Hi Candidate 1, as a fellow systems enthusiast, I wanted to reach out about our Backend + I

In [None]:
results

[{'candidate': 'Candidate 1',
  'subject_line': 'Build scalable financial APIs with Finverse',
  'variant_1': "Hi Candidate 1, I'm the founder of Finverse. We're building low-latency financial APIs using Go, Rust, and Kubernetes. I saw your experience with these technologies and thought you'd be a great fit. Let's discuss how you can own infra that processes 50M+ API calls/day. Check out finverse.com and let's schedule a call.",
  'variant_2': "Hi Candidate 1, I came across your work on github.com/candidate1 and was impressed with your passion for distributed systems and backend performance. As someone who's also passionate about these areas, I thought you'd love our work at Finverse. We're looking for a deep systems thinker like yourself to help us build scalable financial APIs. Would love to discuss further and explore how your skills in go, rust, graphql, and kubernetes can contribute to our mission.",
  'variant_3': "Hi Candidate 1, as a fellow engineer, I was excited to see your e

In [None]:
df_out = pd.DataFrame(results)
output_path = "/content/generated_outreach_groq.csv"

df_out.to_csv(output_path, index=False)
output_path


'/content/generated_outreach_groq.csv'

In [None]:
sample = df_out.iloc[0]

print("Candidate:", sample["candidate"])
print("\n=== SUBJECT ===")
print(sample["subject_line"])
print("\n=== SHORT + CRISP ===")
print(sample["variant_1"])
print("\n=== WARM PERSONAL ===")
print(sample["variant_2"])
print("\n=== TECHNICAL ===")
print(sample["variant_3"])
print("\n=== REASONING ===")
print(sample["reasoning"])


Candidate: Candidate 1

=== SUBJECT ===
Build scalable financial APIs with Finverse

=== SHORT + CRISP ===
Hi Candidate 1, I'm the founder of Finverse. We're building low-latency financial APIs using Go, Rust, and Kubernetes. I saw your experience with these technologies and thought you'd be a great fit. Let's discuss how you can own infra that processes 50M+ API calls/day. Check out finverse.com and let's schedule a call.

=== WARM PERSONAL ===
Hi Candidate 1, I came across your work on github.com/candidate1 and was impressed with your passion for distributed systems and backend performance. As someone who's also passionate about these areas, I thought you'd love our work at Finverse. We're looking for a deep systems thinker like yourself to help us build scalable financial APIs. Would love to discuss further and explore how your skills in go, rust, graphql, and kubernetes can contribute to our mission.

=== TECHNICAL ===
Hi Candidate 1, as a fellow engineer, I was excited to see your