[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/integrations/groq/groq-llama-3-rag.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/integrations/groq/groq-llama-3-rag.ipynb)

# RAG with Groq and Llama 3

To begin, we setup our prerequisite libraries.

In [1]:
!pip install -qU \
    groq==0.8.0 \
    "semantic-router[local]==0.0.45" \
    pinecone-client==4.1.0

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m63.8/63.8 MB[0m [31m11.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.4/105.4 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.5/72.5 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m215.5/215.5 kB[0m [31m18.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m221.3/221.3 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [7]:
!pip install datasets


Collecting datasets
  Downloading datasets-2.21.0-py3-none-any.whl.metadata (21 kB)
Collecting pyarrow>=15.0.0 (from datasets)
  Downloading pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Downloading datasets-2.21.0-py3-none-any.whl (527 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m527.3/527.3 kB[0m [31m40.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m12.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl (39.9 MB)
[

## Data Preparation

We start by downloading a dataset that we will encode and store. The dataset [`jamescalam/ai-arxiv2-semantic-chunks`](https://huggingface.co/datasets/jamescalam/ai-arxiv2-semantic-chunks) contains scraped data from many popular ArXiv papers centred around LLMs and GenAI.

In [1]:
file_path = './data/job_title_des.csv'

In [2]:
import pandas as pd

df = pd.read_csv(file_path)

# Check for null values in each column
null_counts = df.isnull().sum()
print(null_counts)

Unnamed: 0         0
Job Title          0
Job Description    0
dtype: int64


In [None]:
df.head()

Unnamed: 0.1,Unnamed: 0,Job Title,Job Description
0,0,Flutter Developer,We are looking for hire experts flutter develo...
1,1,Django Developer,PYTHON/DJANGO (Developer/Lead) - Job Code(PDJ ...
2,2,Machine Learning,"Data Scientist (Contractor)\n\nBangalore, IN\n..."
3,3,iOS Developer,JOB DESCRIPTION:\n\nStrong framework outside o...
4,4,Full Stack Developer,job responsibility full stack engineer – react...


In [3]:
df = df.rename(columns={df.columns[0]: 'id'})
df.head()

Unnamed: 0,id,Job Title,Job Description
0,0,Flutter Developer,We are looking for hire experts flutter develo...
1,1,Django Developer,PYTHON/DJANGO (Developer/Lead) - Job Code(PDJ ...
2,2,Machine Learning,"Data Scientist (Contractor)\n\nBangalore, IN\n..."
3,3,iOS Developer,JOB DESCRIPTION:\n\nStrong framework outside o...
4,4,Full Stack Developer,job responsibility full stack engineer – react...


In [4]:
data_list = df.to_dict('records')
data_list[0]

{'id': 0,
 'Job Title': 'Flutter Developer',
 'Job Description': 'We are looking for hire experts flutter developer. So you are eligible this post then apply your resume.\nJob Types: Full-time, Part-time\nSalary: ₹20,000.00 - ₹40,000.00 per month\nBenefits:\nFlexible schedule\nFood allowance\nSchedule:\nDay shift\nSupplemental Pay:\nJoining bonus\nOvertime pay\nExperience:\ntotal work: 1 year (Preferred)\nHousing rent subsidy:\nYes\nIndustry:\nSoftware Development\nWork Remotely:\nTemporarily due to COVID-19'}

In [5]:
from datasets import load_dataset

dataset = load_dataset("csv", data_files={"train": file_path})

Generating train split: 0 examples [00:00, ? examples/s]

In [6]:
data = dataset['train']
print(type(data))

<class 'datasets.arrow_dataset.Dataset'>


In [7]:
data[0]

{'Unnamed: 0': 0,
 'Job Title': 'Flutter Developer',
 'Job Description': 'We are looking for hire experts flutter developer. So you are eligible this post then apply your resume.\nJob Types: Full-time, Part-time\nSalary: ₹20,000.00 - ₹40,000.00 per month\nBenefits:\nFlexible schedule\nFood allowance\nSchedule:\nDay shift\nSupplemental Pay:\nJoining bonus\nOvertime pay\nExperience:\ntotal work: 1 year (Preferred)\nHousing rent subsidy:\nYes\nIndustry:\nSoftware Development\nWork Remotely:\nTemporarily due to COVID-19'}

Format the data into the format we need, this will contain `id`, `text` (which we will embed), and `metadata`.

In [8]:
data = data.map(lambda x, i: {
    "id": i,  # Use the index as the ID
    "metadata": {
         "Job Description": x["Job Description"],
        "Job Title": x["Job Title"],
    }
}, with_indices=True)


# drop uneeded columns
data = data.remove_columns([
    "Job Title", "Job Description" , "Unnamed: 0",
 ])
data

Map:   0%|          | 0/2277 [00:00<?, ? examples/s]

Dataset({
    features: ['id', 'metadata'],
    num_rows: 2277
})

In [9]:
data[1]

{'id': 1,
 'metadata': {'Job Description': 'PYTHON/DJANGO (Developer/Lead) - Job Code(PDJ - 04)\nStrong Python experience in API development (REST/RPC).\nExperience working with API Frameworks (Django/flask).\nExperience evaluating and improving the efficiency of programs in a Linux environment.\nAbility to effectively handle multiple tasks with a high level of accuracy and attention to detail.\nGood verbal and written communication skills.\nWorking knowledge of SQL.\nJSON experience preferred.\nGood knowledge in automated unit testing using PyUnit.',
  'Job Title': 'Django Developer'}}

We need to define an embedding model to create our embedding vectors for retrieval, for that we will be using a variation of the `e5-base` model with a longer context length of `4k` tokens. Ideally we should be running this on GPU for optimal runtimes.

In [10]:
from semantic_router.encoders import HuggingFaceEncoder

encoder = HuggingFaceEncoder(name="dwzhu/e5-base-4k")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/82.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/228 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/691 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/225M [00:00<?, ?B/s]

We can check whether our `encoder` will use `cpu` or a `cuda` GPU (where available).

In [11]:
encoder.device

'cuda'

We can create embeddings now like so:

In [17]:
embeds = encoder(["this is a test etermine the dimensionality of your embeddings test test stet etermine the dimensionality of your embeddings"])

We can view the dimensionality of our returned embeddings, which we'll need soon when initializing our vector index:

In [18]:
dims = len(embeds[0])
dims

768

Now we create our vector DB to store our vectors. For this we need to get a [free Pinecone API key](https://app.pinecone.io) — the API key can be found in the "API Keys" button found in the left navbar of the Pinecone dashboard.

In [12]:
import os
import getpass
from pinecone import Pinecone

# initialize connection to pinecone (get API key at app.pinecone.io)
api_key = os.getenv("PINECONE_API_KEY") or getpass.getpass("Enter your Pinecone API key: ")

# configure client
pc = Pinecone(api_key=api_key)

Enter your Pinecone API key: ··········


Now we setup our index specification, this allows us to define the cloud provider and region where we want to deploy our index. You can find a list of all [available providers and regions here](https://docs.pinecone.io/docs/projects).

In [13]:
from pinecone import ServerlessSpec

spec = ServerlessSpec(
    cloud="aws", region="us-east-1"
)

Creating an index, we set `dimension` equal to the dimensionality of our encoder (`384`), and use a `metric` also compatible with the model (this can be `cosine`). We also pass our `spec` to index initialization.

In [19]:
import time

index_name = "groq-llama-3-rag"
existing_indexes = [
    index_info["name"] for index_info in pc.list_indexes()
]

# check if index already exists (it shouldn't if this is first time)
if index_name not in existing_indexes:
    # if does not exist, create index
    pc.create_index(
        index_name,
        dimension=dims,
        metric='cosine',
        spec=spec
    )
    # wait for index to be initialized
    while not pc.describe_index(index_name).status['ready']:
        time.sleep(1)

# connect to index
index = pc.Index(index_name)
time.sleep(1)
# view index stats
index.describe_index_stats()

{'dimension': 768,
 'index_fullness': 0.0,
 'namespaces': {},
 'total_vector_count': 0}

We can see the index is currently empty with a `total_vector_count` of `0`. We can begin populating it with our embeddings.

In [20]:
from tqdm.auto import tqdm

batch_size = 128  # how many embeddings we create and insert at once

for i in tqdm(range(0, len(data), batch_size)):
    # find end of batch
    i_end = min(len(data), i+batch_size)
    # create batch
    batch = data[i:i_end]
    # create embeddings
    chunks = [f'{x["Job Title"]}: {x["Job Description"]}' for x in batch["metadata"]]
    embeds = encoder(chunks)
    assert len(embeds) == (i_end-i)
    to_upsert = list(zip(map(str, batch["id"]), embeds, batch["metadata"]))
    # upsert to Pinecone
    index.upsert(vectors=to_upsert)

  0%|          | 0/18 [00:00<?, ?it/s]

Now let's test retrieval!

In [21]:
def get_docs(query: str, top_k: int) -> list[str]:
    # encode query
    xq = encoder([query])
    # search pinecone index
    res = index.query(vector=xq, top_k=top_k, include_metadata=True)
    # get doc text
    docs = [x["metadata"]['Job Description'] for x in res["matches"]]
    return docs

Modify top_k to see more outputs

In [22]:
query = "can you give me job description for a software developer ?"
docs = get_docs(query, top_k=1)
print("\n---\n".join(docs))

JOB DESCRIPTION

Job Title: Software Engineer I

SUMMARY

The Software Engineer I is responsible to design, code, and/or configure solutions for moderate complexity Agile stories, as well as writing automated unit and integration-level tests.

ESSENTIAL JOB FUNCTIONS/RESPONSIBILITIES
Designs, codes, and/or configures solutions for moderate complexity Agile stories with some guidance from more a senior software engineer.
Debugs and resolves moderate complexity software bugs or issues, working independently, and finds the real root cause and provides a fix without collateral damage.
Writes automated unit and integration-level tests under own direction.
May create or support the creation of a conceptual design/architecture for small scale software solutions with guidance from an architect or more senior developer.
May provide guidance and mentoring to more junior software engineers.
Follows development standards and effectively demonstrates technical solutions to other software engineers 

Our retrieval component works, now let's try feeding this into a Llama 3 70B model hosted by Groq to produce an answer.

In [23]:
from groq import Groq

os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY") or getpass.getpass("Enter your Groq API key: ")

groq_client = Groq(api_key=os.environ["GROQ_API_KEY"])

Enter your Groq API key: ··········


Now we can generate responses using gemma2-9b-it, we'll wrap this logic into a help function called `generate`:

In [24]:
def generate(query: str, docs: list[str]):
    system_message = (
        "You are a helpful assistant that generates job descriptions "
        "context provided below.\n\n"
        "CONTEXT:\n"
        "\n---\n".join(docs)
    )
    messages = [
        {"role": "system", "content": system_message},
        {"role": "user", "content": query}
    ]
    # generate response
    chat_response = groq_client.chat.completions.create(
        model="gemma2-9b-it",
        messages=messages
    )
    return chat_response.choices[0].message.content

In [25]:
query = "  give me job description for a software developer  in the tech industry with formal tone "
docs = get_docs(query, top_k=1)
docs

["requisition id req4196 job title full stack software developer number opening 2 job category professional/technical employment type regular full-time shift first weekend required location tempe az opening team develops support internally facing web application apis cater developer community within state farm team ’ goal create simplified self-service solution managing system operation data business capability deployable ’ high performing product team 's working fast-paced agile environment key focus around automation enabling continuous delivery team additional detail posse understanding technology solution meet business outcome offer range solution business partner ; understand business current aspirational need participates sprint planning ; provides work estimate deliver product story ; owns development story develops solution variety platform according business requirement completes required coding satisfy defined acceptance criterion deliver desired outcome lead solution design 

In [26]:
out = generate(query=query, docs=docs)
print(out)

## Software Developer 

**About the Role**

We are seeking a talented and motivated Software Developer to join our dynamic team. In this role, you will play a critical part in the design, development, and deployment of innovative software solutions that power our technological infrastructure. 

**Responsibilities:**

* **Design and Development:** Participate in the entire software development lifecycle, from requirements gathering and design to implementation, testing, and deployment.
* **Coding:** Write clean, efficient, and maintainable code in accordance with best practices and established coding standards.
* **Technical Proficiency:** Demonstrate strong expertise in multiple programming languages, frameworks, and technologies relevant to the position.  Continuously expand your knowledge base through research and development of new skills.
* **Collaboration:** Work effectively within a collaborative team environment, communicating clearly with peers, product managers, and stakeholde

Don't forget to delete your index when you're done to save resources!

In [27]:
query = "  give me job description for a ui/ux designer  in the design industry with formal tone "
docs = get_docs(query, top_k=1)
out = generate(query=query, docs=docs)
print(out)

## UI/UX Designer

**Company:** [Company Name]

**Job Type:** Full-Time

**Location:** [City, State]

**About the Role:**

[Company Name] seeks a talented and passionate UI/UX Designer to join our growing design team. In this role, you will play a crucial part in shaping the user experience of our products, ensuring they are intuitive, delightful, and effective. The ideal candidate is a creative problem-solver with a strong understanding of user-centered design principles and a keen eye for detail.

**Key Responsibilities:**

* **User Research & Analysis:** Conduct user research, interviews, and usability testing to deeply understand user needs, behaviors, and pain points.
* **Conceptualization & Design:** Translate user insights into innovative and compelling user interface (UI) and user experience (UX) designs.
* **Wireframing & Prototyping:** Create low-fidelity wireframes and high-fidelity prototypes to effectively communicate design concepts and test user flows.
* **Visual Design:

In [15]:
pc.delete_index(index_name)

---