In [1]:
!pip install openai==0.28



**1. Preprocessing and Document Indexing (Retriever Techniques):**

In [2]:
!pip install faiss-cpu



In [3]:
import faiss
# Import necessary libraries
from transformers import DPRContextEncoder, DPRContextEncoderTokenizer


# Load the document
with open('Employee_info.txt', 'r') as file:
    documents = file.readlines()

# Initialize the tokenizer and model for document encoding
tokenizer = DPRContextEncoderTokenizer.from_pretrained("facebook/dpr-ctx_encoder-single-nq-base")
model = DPRContextEncoder.from_pretrained("facebook/dpr-ctx_encoder-single-nq-base")

# Tokenize and encode the documents
inputs = tokenizer(documents, return_tensors='pt', padding=True, truncation=True)
embeddings = model(**inputs).pooler_output

# Index the documents using Faiss
index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(embeddings.detach().numpy())

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'DPRQuestionEncoderTokenizer'. 
The class this function is called from is 'DPRContextEncoderTokenizer'.
Some weights of the model checkpoint at facebook/dpr-ctx_encoder-single-nq-base were not used when initializing DPRContextEncoder: ['ctx_encoder.bert_model.pooler.dense.bias', 'ctx_encoder.bert_model.pooler.dense.weight']
- This IS expected if you are initializing DPRContex

**2. User Query and Document Retrieval**

In [4]:
from transformers import DPRQuestionEncoder, DPRQuestionEncoderTokenizer

# Initialize the tokenizer and model for question encoding
question_tokenizer = DPRQuestionEncoderTokenizer.from_pretrained("facebook/dpr-question_encoder-single-nq-base")
question_model = DPRQuestionEncoder.from_pretrained("facebook/dpr-question_encoder-single-nq-base")

def get_user_query():
    """
    Prompts the user to enter a query in a text box using a loop to handle potential errors.
    """
    while True:
        query = input("Enter your query about employees (or 'quit' to exit): ")
        if query.lower() == 'quit':
            break
        return query

def retrieve_documents(query):
    """
    Encodes the user's query and retrieves relevant documents from the Faiss index.

    Args:
        query (str): The user's query.

    Returns:
        list: A list of retrieved document strings.
    """

    # Encode the query
    question_inputs = question_tokenizer(query, return_tensors='pt')
    question_embedding = question_model(**question_inputs).pooler_output

    # Retrieve the top-k relevant documents
    k = 3
    D, I = index.search(question_embedding.detach().numpy(), k)
    retrieved_docs = [documents[i] for i in I[0]]
    return retrieved_docs


Some weights of the model checkpoint at facebook/dpr-question_encoder-single-nq-base were not used when initializing DPRQuestionEncoder: ['question_encoder.bert_model.pooler.dense.bias', 'question_encoder.bert_model.pooler.dense.weight']
- This IS expected if you are initializing DPRQuestionEncoder from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DPRQuestionEncoder from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


**3. Context Generation and Response with LLM (Integrating LLMs with Retrieved Information):**

In [7]:
!pip install faiss-cpu transformers openai



In [None]:
import openai
def retrieve_and_respond(query):
    """
    Retrieves relevant documents, combines them into context, and generates a response using GPT-3.5-turbo.

    Args:
        query (str): The user's query.

    Returns:
        str: The generated response from GPT-3.5-turbo.
    """

    retrieved_docs = retrieve_documents(query)

    # Combine retrieved documents into a single context
    context = " ".join(retrieved_docs)

    # Use OpenAI API key (replace with your actual key)
    #api_key = ''  # Replace with your OpenAI API key

    # Generate a response using GPT-3.5-turbo
    try:
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": f"Based on the following information: {context}. Please provide a detailed summary."}
            ]
        )
        return response['choices'][0]['message']['content']
    except openai.error.OpenAIError as error:
        print(f"OpenAI API Error: {error}")
        return "An error occurred while generating the response. Please try again later."

# Main loop
while True:
    query = get_user_query()
    if query.lower() == 'quit':
        break

    response = retrieve_and_respond(query)
    print(response)


Enter your query about employees (or 'quit' to exit): How many employees are located in Bangalore?
Based on the information provided:

1. Bangalore has 4 employees.
2. Chennai has the highest number of employees, with a total of 8.
3. The total salary of all employees located in Pune is 4,210,000.

If we assume that each employee in each location earns an equal salary, we can calculate the average salary for employees in each location by dividing the total salary by the number of employees:

1. Bangalore:
Total Employees: 4
Total Salary: Unknown
Average Salary per Employee: Unknown

2. Chennai:
Total Employees: 8
Total Salary: Unknown
Average Salary per Employee: Unknown

3. Pune:
Total Employees: Unknown
Total Salary: 4,210,000
Average Salary per Employee: Unknown

Without precise salary information for each location, we are unable to provide specific salary figures or detailed summaries for each location.
