# Cold Email Generator with Llama 3.3 70b, Langchain, ChromaDB

In [2]:
# Import langchain_groq to get fast API inferences from Llama model
from langchain_groq import ChatGroq



## Here we create an LLM object with the API key generated through Groq
We are using the Llama 3.3 70b model

In [None]:
import os
import dotenv
# Load environment variables from .env file
dotenv.load_dotenv()
# Set the environment variable for Groq API key
KEY = os.getenv("GROQ_API_KEY")


llm = ChatGroq(
    temperature = 0,    
    api_key = KEY,
    model = "llama-3.3-70b-versatile"
)

### Testing the LLM by invoking a sample prompt

In [6]:
response = llm.invoke("What is your name?")
print(response.content)

I'm an artificial intelligence model known as Llama. Llama stands for "Large Language Model Meta AI."


### Installing ChromaDB which is going to be our Vector Database as it is reliable and lightweight

In [None]:
!pip install chromadb

In [8]:
import chromadb

### Creating a VectorDB object and creating a new collection in it

In [27]:
client = chromadb.Client()
collection = client.create_collection(name="new_collection")

### Adding sample documents into our collection in the Vector DB
We can also notice that Chroma DB is internally using MiniLM-L6-v2 to get the vector embedding from our documents

In [28]:
collection.add(
    documents=[
        "This is a sample document about ML",
        "This is a sample document about Love"
    ],
    ids = ['id1', 'id2'],
    metadatas=[
        {"url": "https://en.wikipedia.org/wiki/Machine_learning"},
        {"url": "https://en.wikipedia.org/wiki/Love"}
    ],
)

### We can see how the Vector DB has stored our documents in the collection

In [29]:
all_docs = collection.get()
all_docs

{'ids': ['id1', 'id2'],
 'embeddings': None,
 'documents': ['This is a sample document about ML',
  'This is a sample document about Love'],
 'uris': None,
 'included': ['metadatas', 'documents'],
 'data': None,
 'metadatas': [{'url': 'https://en.wikipedia.org/wiki/Machine_learning'},
  {'url': 'https://en.wikipedia.org/wiki/Love'}]}

### Since it's a vector DB, we can query a sample text to get the closest n vectors to it and their respective distances

In [30]:
results = collection.query(
    query_texts=['I am in a relationship'],
    n_results=2
)
results

{'ids': [['id2', 'id1']],
 'embeddings': None,
 'documents': [['This is a sample document about Love',
   'This is a sample document about ML']],
 'uris': None,
 'included': ['metadatas', 'documents', 'distances'],
 'data': None,
 'metadatas': [[{'url': 'https://en.wikipedia.org/wiki/Love'},
   {'url': 'https://en.wikipedia.org/wiki/Machine_learning'}]],
 'distances': [[1.3829848766326904, 1.902585744857788]]}

In [31]:
results = collection.query(
    query_texts=['I also like data science'],
    n_results=2
)
results

{'ids': [['id1', 'id2']],
 'embeddings': None,
 'documents': [['This is a sample document about ML',
   'This is a sample document about Love']],
 'uris': None,
 'included': ['metadatas', 'documents', 'distances'],
 'data': None,
 'metadatas': [[{'url': 'https://en.wikipedia.org/wiki/Machine_learning'},
   {'url': 'https://en.wikipedia.org/wiki/Love'}]],
 'distances': [[1.6828935146331787, 1.9099119901657104]]}

## Installing Langchain

In [None]:
 !pip install langchain_community

### Using WebBaseLoader from Langchain to scrape the webpages of given job opening

In [35]:
from langchain_community.document_loaders import WebBaseLoader

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [68]:
# loader = WebBaseLoader('https://www.google.com/about/careers/applications/jobs/results/126881918376387270-cloud-engineer-ai?q=%22Data%20Scientist%22&location=Bengaluru%2C%20India&target_level=EARLY')
loader = WebBaseLoader("https://www.amazon.jobs/en/jobs/2960212/applied-scientist-alexa-sensitive-content-intelligence-asci?cmpid=SPLICX0248M&utm_source=linkedin.com&utm_campaign=cxro&utm_medium=social_media&utm_content=job_posting&ss=paid")

In [69]:
page_data = loader.load().pop().page_content

In [70]:
print(page_data)

Applied Scientist, Alexa Sensitive Content Intelligence (ASCI) - Job ID: 2960212 | Amazon.jobs
Skip to main contentHomeTeamsLocationsJob categoriesMy careerMy applicationsMy profileAccount securitySettingsSign outResourcesDisability accommodationsBenefitsInclusive experiencesInterview tipsLeadership principlesWorking at AmazonFAQ×Applied Scientist, Alexa Sensitive Content Intelligence (ASCI)Job ID: 2960212 | ADCI - BLR 14 SEZApply nowDESCRIPTIONAlexa is the voice activated digital assistant powering devices like Amazon Echo, Echo Dot, Echo Show, and Fire TV, which are at the forefront of this latest technology wave. To preserve our customers’ experience and trust, the Alexa Privacy team creates policies and builds services and tools through Machine Learning techniques to detect and mitigate sensitive content across Alexa. We are looking for an experienced Senior Applied Scientist to build industry-leading technologies in attribute extraction and sensitive content detection across all l

### Creating a prompt template using langchain prompts

In [77]:
from langchain_core.prompts import PromptTemplate

prompt_extract = PromptTemplate.from_template(
        """
            ### SCRAPED TEXT FROM WEBSITE:
            {data}
            ### INSTRUCTION:
            The scraped text is from the career's page of a website.
            Your task is to extract the job postings and return them in JSON format containing the 
            following keys: role, experience, skills and description.
            ### VALID JSON, NO PREAMBLE:
        """
)

#### Creating a pipeline to create the prompt from the scraped data and then giving that prompt to our LLM to get the response

In [78]:
chain_extract = prompt_extract | llm

In [79]:
# Invoking the pipeline made in previous step
response = chain_extract.invoke(input={'data': page_data})
print(response.content)

```json
{
  "role": "Applied Scientist, Alexa Sensitive Content Intelligence (ASCI)",
  "experience": "3+ years of building models for business application experience",
  "skills": [
    "NLP models (e.g. LSTM, transformer based models)",
    "CV models (e.g. CNN, AlexNet, ResNet)",
    "Java, C++, Python or related language",
    "algorithms and data structures, parsing, numerical optimization, data mining, parallel and distributed computing, high-performance computing"
  ],
  "description": "We are looking for an experienced Senior Applied Scientist to build industry-leading technologies in attribute extraction and sensitive content detection across all languages and countries."
}
```


### Converting the output from string format to JSON format

In [80]:
from langchain_core.output_parsers import JsonOutputParser

json_parser = JsonOutputParser()
json_res = json_parser.parse(response.content)
json_res

{'role': 'Applied Scientist, Alexa Sensitive Content Intelligence (ASCI)',
 'experience': '3+ years of building models for business application experience',
 'skills': ['NLP models (e.g. LSTM, transformer based models)',
  'CV models (e.g. CNN, AlexNet, ResNet)',
  'Java, C++, Python or related language',
  'algorithms and data structures, parsing, numerical optimization, data mining, parallel and distributed computing, high-performance computing'],
 'description': 'We are looking for an experienced Senior Applied Scientist to build industry-leading technologies in attribute extraction and sensitive content detection across all languages and countries.'}

In [81]:
type(json_res)

dict

In [None]:
!pip install pandas numpy matplotlib

In [84]:
import pandas as pd

In [85]:
df = pd.read_csv('my_portfolio.csv')
df.head()

Unnamed: 0,Techstack,Links
0,"React, Node.js, MongoDB",https://example.com/react-portfolio
1,"Angular,.NET, SQL Server",https://example.com/angular-portfolio
2,"Vue.js, Ruby on Rails, PostgreSQL",https://example.com/vue-portfolio
3,"Python, Django, MySQL",https://example.com/python-portfolio
4,"Java, Spring Boot, Oracle",https://example.com/java-portfolio


In [88]:
import uuid
import chromadb

In [89]:
client = chromadb.PersistentClient("Vector_Store")
portfolio = client.create_collection("portfolios")

In [90]:
if not portfolio.count():
    for idx, row in df.iterrows():
        portfolio.add(
            documents=row["Techstack"],
            metadatas={"url": row["Links"]},
            ids=[str(uuid.uuid4())]
        )

In [92]:
portfolio.get()

{'ids': ['a923b080-3151-4211-b6f6-1b85c1c39361',
  '928183b7-7a16-4829-9fcf-bbb26af4cc4c',
  '197677f5-7a3d-4a84-ac4b-cb597e07c49c',
  '632741e0-aaf1-4ea9-83fd-36bed0675aed',
  'fbf50f9f-d981-4cbb-8fc7-679080347d9a',
  '48889f17-f2c7-4bc2-8402-43c1189a1b99',
  '7bc07b0d-8603-43cb-9842-133884bb405b',
  'e8f94b0c-ac0d-4eaa-aedb-71ec7dabe897',
  '6e8cc08f-9c06-41f1-892a-a9886c8ea7f2',
  '37c62e67-1209-4221-86f0-2cc85746cb2f',
  'd350ccdb-17c1-44c6-8dd0-9b9217ef6511',
  'c7abc42b-1968-4698-b1d5-31df8db241a2',
  '1a105a61-9c86-4431-929d-4953b8744bf2',
  '5d9cc418-7b66-40a8-bb6e-69165397cea0',
  '9c7f5944-678b-4a0d-8c19-7dbbdbf64df2',
  '19ad6a18-a24f-40e8-aaae-1a8a79d3bc21',
  'cb64d319-e1d3-465b-995a-4f1365088725',
  '71e097ad-11fd-4ef9-9fd4-0f75b79e2ca4',
  '54c92071-c13e-4f65-aea0-ea6f80c167eb',
  '16bdb87c-6603-492a-a993-55f46d912c5f'],
 'embeddings': None,
 'documents': ['React, Node.js, MongoDB',
  'Angular,.NET, SQL Server',
  'Vue.js, Ruby on Rails, PostgreSQL',
  'Python, Django, M

In [102]:
query_output = portfolio.query(query_texts=response.content, n_results=2)
query_output

{'ids': [['54c92071-c13e-4f65-aea0-ea6f80c167eb',
   '197677f5-7a3d-4a84-ac4b-cb597e07c49c']],
 'embeddings': None,
 'documents': [['Machine Learning, Python, TensorFlow',
   'Vue.js, Ruby on Rails, PostgreSQL']],
 'uris': None,
 'included': ['metadatas', 'documents', 'distances'],
 'data': None,
 'metadatas': [[{'url': 'https://example.com/ml-python-portfolio'},
   {'url': 'https://example.com/vue-portfolio'}]],
 'distances': [[1.361631989479065, 1.7005481719970703]]}

In [104]:
links = query_output['metadatas']
links

[[{'url': 'https://example.com/ml-python-portfolio'},
  {'url': 'https://example.com/vue-portfolio'}]]

In [105]:
prompt_email = PromptTemplate.from_template(
            """
                ### JOB DESCRIPTION:
                {job_description}
                
                ### INSTRUCTION:
                You are Mohan, a business development executive at AtliQ. AtliQ is an AI & Software Consulting company dedicated to facilitating
                the seamless integration of business processes through automated tools. 
                Over our experience, we have empowered numerous enterprises with tailored solutions, fostering scalability, 
                process optimization, cost reduction, and heightened overall efficiency. 
                Your job is to write a cold email to the client regarding the job mentioned above describing the capability of AtliQ 
                in fulfilling their needs.
                Also add the most relevant ones from the following links to showcase Atliq's portfolio: {link_list}
                Remember you are Mohan, BDE at AtliQ. 
                Do not provide a preamble.
                ### EMAIL (NO PREAMBLE):
            """
)

chain = prompt_email | llm

In [110]:
res = chain.invoke(input={'job_description': response.content, 'link_list': links})

In [111]:
print(res.content)

Subject: Expert Solutions for Alexa Sensitive Content Intelligence (ASCI)

Dear Hiring Manager,

I came across the job description for an Applied Scientist, Alexa Sensitive Content Intelligence (ASCI), and I am excited to introduce AtliQ, an AI & Software Consulting company that can help you build industry-leading technologies in attribute extraction and sensitive content detection.

With over 3+ years of experience in building models for business applications, our team of experts possesses the required skills to fulfill your needs. We have hands-on experience with NLP models (e.g., LSTM, transformer-based models) and CV models (e.g., CNN, AlexNet, ResNet). Our proficiency in programming languages such as Java, C++, Python, and related languages, along with expertise in algorithms and data structures, parsing, numerical optimization, data mining, parallel and distributed computing, and high-performance computing, makes us a perfect fit for this role.

At AtliQ, we have empowered numero