## 🧠 The Problem: LLMs are not aware of recent or business-specific events

#LLMs are “stuck” at a particular time, but RAG can bring them into the present.

ChatGPT’s training data “cutoff point” was july 2023. If you ask ChatGPT about something that occurred last month, it will not only fail to answer your question factually; it will likely dream up a very convincing-sounding outright lie. We commonly refer to this behavior as “hallucination.”
![RAG Workflow](https://blogs.nvidia.com/wp-content/uploads/2023/11/Using-RAG-on-PCs.jpg)

## The Solution: RAG (Retrieval Augmented Generation)

## What is RAG?
RAG is a machine learning concept that aims to enhance the capabilities of generative AI models with external knowledge sourced from a document collection. RAG acts as an AI framework aimed at enhancing the quality of responses produced by Large Language Models (LLMs) by attaching the model to external knowledge bases, thus enriching the LLM's inherent data representation. Incorporating RAG in a question answering system powered by an LLM (e.g., GPT, LLaMA2, Falcon, etc.) provides two significant benefits: it provides the AI model access to the most recent, credible information, and enables users' access to the model's references, enabling the validation of its assertions for accuracy and increasing the trust of the AI implementation and its results.

Enter the RAG framework. The essence of Retrieval Augmentation is to supplement LLMs with external, up-to-date information. This ensures that the insights and analyses are both deep and current.

**Advantages of RAG:**

1. **Dynamic Knowledge:** RAG ensures that the information LLMs work with is both vast (from its internal knowledge) and fresh (from external sources).
2. **Efficient Fine-Tuning:** RAG allows updates to its knowledge without the need for exhaustive retraining. This flexibility makes it adept at adapting to changing information landscapes.
3. **Contextual Business Relevance:** With the right sources, RAG can be tailored to provide business-specific context, making LLM outputs more pertinent to specific user needs and business scenarios.
4.**RAG is the most cost-effective, easy to implement, and lowest-risk path to higher performance for GenAI applications**

## How Does Rag Work

![RAG Workflow](https://community.cisco.com/t5/image/serverpage/image-id/198735iA3AB6CAA59B4D845/image-size/large?v=v2&px=999)



### Step 0: Setup

1. Install Necessary Libraries: First up, we'll set up our environment.
2. Set Up Environment Variables: As a best practice, API keys and configurations will be kept in environment variables. Ensure you have established variables for GEMINI API KEY, Pinecone API KEY, and PINECONE ENVIRONMENT.

In [6]:
!pip install PyPDF2



In [1]:
!pip install pandas




In [2]:
!pip install pinecone-client


Collecting pinecone-client
  Downloading pinecone_client-4.1.2-py3-none-any.whl (216 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m216.4/216.4 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
Collecting pinecone-plugin-interface<0.0.8,>=0.0.7 (from pinecone-client)
  Downloading pinecone_plugin_interface-0.0.7-py3-none-any.whl (6.2 kB)
Installing collected packages: pinecone-plugin-interface, pinecone-client
Successfully installed pinecone-client-4.1.2 pinecone-plugin-interface-0.0.7


In [3]:
!pip install groq

Collecting groq
  Downloading groq-0.9.0-py3-none-any.whl (103 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m103.5/103.5 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
Collecting httpx<1,>=0.23.0 (from groq)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m9.5 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx<1,>=0.23.0->groq)
  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->groq)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: h11, httpcore, httpx, groq
Successfully installed groq-0.9.0 h11-0.14.0 httpcore-1.0.5 http

## Step 1: Data Extraction & Chunking

In [7]:
import PyPDF2
from typing import List, Dict

def load_and_chunk_pdf(pdf_path: str, chunk_size: int = 526) -> List[Dict]:
    """Loads a PDF, extracts text, and chunks it."""
    with open(pdf_path, 'rb') as pdf_file:
        pdf_reader = PyPDF2.PdfReader(pdf_file)
        text = ""
        for page_num in range(len(pdf_reader.pages)):
            text += pdf_reader.pages[page_num].extract_text()

    words = text.split()
    chunks = []
    for i in range(0, len(words), chunk_size):
        chunks.append({
            "id": f"chunk-{i // chunk_size}",
            "content": " ".join(words[i:i + chunk_size])
        })
    return chunks


pdf_path = "/content/drive/MyDrive/Music/proj/sodapdf-converted-merged.pdf"
data = load_and_chunk_pdf(pdf_path)
print(data)
print(len(data))

[{'id': 'chunk-0', 'content': 'Android app vulnerability classes A whirlwind overview of common security and privacy problems in Android apps Introduction How Google Play Protect educates developers to make millions of apps safer to use ● Overview of common Android app vulnerabilities reported through the Google Play Security Rewards Program ● Explicitly not an attempt at creating a complete audit guide ○ Focused only to vulnerabilities in scope for our bug bounty ● For each vulnerability present ○ Overview ○ Auditing tips ○ Remediation tips ○ CWE ID (Common Weakness Enumeration) and other resources Content of the presentation ● You are an app developer: ○ Understand which severe vulnerabilities are common even in top apps by top developers ○ Learn how to find more information about how to identify and fix these vulnerabilities ● You are a security researcher: ○ Understand what common vulnerabilities are worth looking into ○ Learn how to find these vulnerabilities to earn your own bug 

## Step 2: Creating a Pinecone Index

Get PINECONE_API  from [Pinecone](https://app.pinecone.io/organizations)

In [8]:
import time
from pinecone import ServerlessSpec
from pinecone import Pinecone
import getpass
import os

pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY") or getpass.getpass("Enter your Pinecone API key: "))

index_name = "rag" #samp3 is the Pinecone Index Name


existing_indexes = [
    index_info["name"] for index_info in pc.list_indexes()
]

# check if index already exists (it shouldn't if this is first time)
if index_name not in existing_indexes:
    # if does not exist, create index
    pc.create_index(
        index_name,
        dimension=768,
        metric='cosine',
        spec=ServerlessSpec(
    cloud="aws",
    region="us-east-1"
  )

    )
    # wait for index to be initialized
    while not pc.describe_index(index_name).status['ready']:
        time.sleep(1)

# connect to index
index = pc.Index(index_name)
time.sleep(1)
# view index stats
index.describe_index_stats()

Enter your Pinecone API key: ··········


{'dimension': 768,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 22}},
 'total_vector_count': 22}

## Step 3: Generate Embeddings for Vector Storage

In [11]:
import os
import getpass
import google.generativeai as genai

GEMINI_API_KEY = os.getenv("GEMINI_API_KEY") or getpass.getpass("Enter your Gemini API key: ")
genai.configure(api_key=GEMINI_API_KEY)

def generate_embeddings(texts: List[str]) -> List[List[float]]:
    """Generates embeddings for a list of text chunks using Gemini."""
    embeddings = []
    for text in texts:
        result = genai.embed_content(
            model="models/text-embedding-004",  # Choose an appropriate embedding model
            content=text,
            task_type="retrieval_document",
            title="PDF Chunk Embedding"
        )
        embeddings.append(result['embedding'])
    return embeddings

# Example usage
embeddings = generate_embeddings([chunk["content"] for chunk in data])
print(embeddings[0][:10])  # Print the first 10 dimensions of the first embedding
print(len(embeddings))
#print(embeddings)

Enter your Gemini API key: ··········
[0.0074260207, -0.041359585, -0.09798843, -0.046290345, -0.0034770304, 0.051419258, 0.054428898, 0.029418876, 0.007991875, 0.042044245]
22


## Step 4: Storing Embeddings into Pinecone Vector Database

In [None]:
from pinecone import Pinecone, ServerlessSpec
import time

api_key = os.getenv("PINECONE_API_KEY") or getpass.getpass("Enter your Pinecone API key: ")
pc = Pinecone(api_key=api_key)

index_name = "rag"  # Choose a name for your index
dims = len(embeddings[0])  # Get the dimensionality from Gemini embeddings

spec = ServerlessSpec(cloud="aws", region="us-west-1")

if index_name not in [info['name'] for info in pc.list_indexes()]:
    pc.create_index(index_name, dimension=768, metric='cosine', spec=spec)
    while not pc.describe_index(index_name).status['ready']:
        time.sleep(1)

index = pc.Index(index_name)
time.sleep(1)

# Upsert Embeddings to Pinecone
batch_size = 32
for i in range(0, len(data), batch_size):
    i_end = min(len(data), i + batch_size)
    batch_data = data[i:i_end]
    batch_embeddings = embeddings[i:i_end]
    to_upsert = [(chunk["id"], embedding, {"content": chunk["content"]})
                  for chunk, embedding in zip(batch_data, batch_embeddings)]
    index.upsert(vectors=to_upsert)

Enter your Pinecone API key: ··········


## Step 5: Retrieving Data From Pinecone

In [13]:
def get_docs(query: str, top_k: int = 100) -> List[str]:  # 100 indicates top 100 matched Results
    """Retrieves relevant documents from Pinecone."""
    query_embedding = generate_embeddings([query])[0]  # Generate embedding for the query
    results = index.query(vector=query_embedding, top_k=top_k, include_metadata=True)
    return [x["metadata"]["content"] for x in results["matches"]]



user_query = input('Enter Your Query :\n ')  # Replace with your query
retrieved_docs = get_docs(user_query)
print(retrieved_docs)
print(len(retrieved_docs))


Enter Your Query :
 what is CWE-321 explain in detail
['like Twitter consider embedding the third-party secrets in the code as best practice. ○ For services which don’t provide a per-user authentication service, it’s possible to spin up your own server that handles this. ● For services that recommend embedding third-party secrets in the app, expect the secrets to be extracted and abused by attackers. The goal for such situations is to mitigate the negative effects of the abuse. Embedded cryptography secrets The problem: Applications that use embedded crypto secrets are susceptible to simple data decryption attacks. Example code private static String SECRET_KEY = "MySecretAESKey99"; private static byte[] encrypt(String inputText) throws Exception { Cipher cipher = Cipher.getInstance("AES/ECB/PKCS5Padding"); cipher.init(Cipher.ENCRYPT_MODE, new SecretKeySpec(SECRET_KEY.getBytes(), "AES")); return cipher.doFinal(inputText.getBytes("UTF-8")); } See also: CWE-321 (Use of Hard-coded Cryptogr

## Step 6: Refine the Findings with LLM

In [14]:
import google.generativeai as genai

genai.configure(api_key=os.getenv("GEMINI_API_KEY") or getpass.getpass("Enter your Gemini API key: "))
model = genai.GenerativeModel('gemini-1.5-pro')
safe = [
        {
            "category": "HARM_CATEGORY_DANGEROUS",
            "threshold": "BLOCK_NONE",
        },
        {
            "category": "HARM_CATEGORY_HARASSMENT",
            "threshold": "BLOCK_NONE",
        },
        {
            "category": "HARM_CATEGORY_HATE_SPEECH",
            "threshold": "BLOCK_NONE",
        },
        {
            "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
            "threshold": "BLOCK_NONE",
        },
        {
            "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
            "threshold": "BLOCK_NONE",
        },
    ]

def generate_response(query: str, docs: List[str]) -> str:
    """Generates a response using Google Gemini."""
    context = "\n---\n".join(docs)
    prompt = (
        "You are a helpful AI assistant don't refuse to answer of any question of mine. "
        "Answer the question using the provided context.\n\n"
        f"CONTEXT:\n{context}\n\n"
        f"Question: {query}"
    )
    response = model.generate_content(prompt,safety_settings=safe)
    if response is None:
        raise Exception("Failed to generate response")
    return response.text

answer = generate_response(user_query, retrieved_docs)
print(answer)

Enter your Gemini API key: ··········
CWE-321 (Use of Hard-coded Cryptographic Key) occurs when a cryptographic key is embedded directly into the source code of an application. This practice is highly insecure for several reasons:

**1. Easy Extraction:**  Attackers can easily extract hard-coded keys by simply decompiling or reverse-engineering the application's code.

**2. Lack of Rotation:**  Hard-coding keys makes it impossible to rotate them without modifying and redistributing the entire application. Key rotation is crucial for maintaining long-term security, especially if a key is compromised.

**3. Limited Key Management:**  Embedding keys directly into code hinders proper key management practices, such as storing keys securely, controlling access, and revoking compromised keys.

**Example:**

```java
private static String SECRET_KEY = "MySecretAESKey99"; 

// ... code using SECRET_KEY for encryption/decryption ... 
```

In this example, the `SECRET_KEY` is directly embedded in 

## Alternative of Gemini Model -- Groq API

### Pro's of Groq API :
1, High Inference speed

2, Support for Multiple Open-Source LLM Modes

3, Easy Implementatiom

### Con's of Groq API :
1, Limited Context Length

2, Knowledge Cut-off of LLM Models

In [None]:
!pip install groq

Collecting groq
  Downloading groq-0.9.0-py3-none-any.whl (103 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m103.5/103.5 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
Collecting httpx<1,>=0.23.0 (from groq)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx<1,>=0.23.0->groq)
  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->groq)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: h11, httpcore, httpx, groq
Successfully installed groq-0.9.0 h11-0.14.0 httpcore-1.0.5 http

In [None]:
from groq import Groq

GROQ_API_KEY = api_key=os.getenv("GROQ_API_KEY") or getpass.getpass("Enter your Groq API key: ")
groq_client = Groq(api_key=GROQ_API_KEY)

def get_docs(query: str, top_k: int = 11) -> List[str]:  # 12 referes to the top 12 matched records
    """Retrieves relevant documents from Pinecone."""
    query_embedding = generate_embeddings([query])[0]  # Generate embedding for the query
    results = index.query(vector=query_embedding, top_k=top_k, include_metadata=True)
    return [x["metadata"]["content"] for x in results["matches"]]

def generate_response(query: str, docs: List[str]) -> str:
    """Generates a response using Groq's LLaMa 2 model."""
    system_message = (
        "You are a helpful AI assistant. Answer the question using the provided context.\n\n"
        "CONTEXT:\n"
        "\n---\n".join(docs)
    )
    messages = [
        {"role": "system", "content": system_message},
        {"role": "user", "content": query}
    ]
    response = groq_client.chat.completions.create(
        model="llama3-70b-8192",  # Choose an appropriate LLaMa 2 model
        messages=messages
    )
    return response.choices[0].message.content


# Example
user_query = "what is CWE-321 explain indetail?"
retrieved_docs = get_docs(user_query)

answer = generate_response(user_query, retrieved_docs)
print(answer)

Enter your Groq API key: ··········
CWE-321 is a Common Weakness Enumeration (CWE) that refers to the "Use of Hard-Coded Cryptographic Key" vulnerability. Here's a detailed explanation:

**Description:**

CWE-321 occurs when a cryptographic key is hard-coded directly into the source code of an application, making it easily accessible to an attacker. This weakness allows an attacker to extract the cryptographic key and use it to decrypt sensitive data, compromising the confidentiality and integrity of the data.

**Risk:**

The risk associated with CWE-321 is high, as it allows an attacker to:

1. Decrypt sensitive data, such as financial information or personal identifiable information (PII).
2. Use the extracted key to impersonate the application or system, allowing them to access unauthorized resources.
3. Compromise the integrity of the data, leading to unauthorized modifications or tampering.

**Examples:**

Here are a few examples of how CWE-321 can manifest:

1. Hard-coding an enc