In [21]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain.llms import HuggingFacePipeline
from transformers import pipeline

# Step 1: Create embeddings and documents
documents = [
    "Emily Johnson is a Senior Data Scientist with 7 years of experience specializing in natural language processing and generative AI. She has implemented RAG systems in production, integrating Pinecone and GPT-4 for customer support chatbots. Emily holds a Master’s in Machine Learning from Stanford University and is certified in MLOps practices.",
    "Liam Chen is a DevOps Engineer with expertise in container orchestration and cloud infrastructure. He has deployed large-scale vector databases like FAISS and Milvus on Kubernetes clusters, optimizing query performance for high-traffic applications. Liam also mentors junior engineers and has authored whitepapers on CI/CD pipelines for AI workloads.",
    "Sophia Martinez is a Product Manager with a background in computer science and a focus on AI-powered tools. She has overseen the development of RAG systems for e-commerce platforms, ensuring efficient retrieval of product recommendations. Sophia excels in stakeholder communication and is skilled in agile methodologies.",
    "Oliver Thompson is a Machine Learning Engineer who specializes in vector search and dense embedding models. He has fine-tuned transformers like 'all-MiniLM-L6-v2' for custom domains, achieving state-of-the-art retrieval accuracy in healthcare applications. Oliver is passionate about explainable AI and writes technical blogs on embedding techniques.",
    "Ava Kim is a Research Scientist with expertise in computational linguistics and semantic search. She has developed dense embedding models for multilingual corpora, allowing real-time information retrieval in over 15 languages. Ava holds a Ph.D. in AI and frequently speaks at international conferences on advances in vector databases."
]


# Step 2: Initialize the embedding model
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")  # Small and efficient model for embeddings

# Step 3: Create FAISS vector store
vector_store = FAISS.from_texts(documents, embeddings)

# Step 4: Set up a free and small LLM
llm_pipeline = pipeline(
    "text-generation",
    model="meta-llama/Llama-3.2-1B",  # Replace with a smaller finetuned LLaMA model if needed
    tokenizer="meta-llama/Llama-3.2-1B",
    max_new_tokens=100,  # Adjust length for responses
    temperature=0.1,     # Adjust creativity
    do_sample=True       # Enable sampling for varied responses
)
llm = HuggingFacePipeline(pipeline=llm_pipeline)

# Step 5: Build the RAG chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vector_store.as_retriever(),
    return_source_documents=True
)

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


In [23]:
query = "What does Oliver Thompson do?"
result = qa_chain({"query": query})

# Output results
print("\n### Answer ###")
print(result['result'])

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.



### Answer ###
Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

Oliver Thompson is a Machine Learning Engineer who specializes in vector search and dense embedding models. He has fine-tuned transformers like 'all-MiniLM-L6-v2' for custom domains, achieving state-of-the-art retrieval accuracy in healthcare applications. Oliver is passionate about explainable AI and writes technical blogs on embedding techniques.

Emily Johnson is a Senior Data Scientist with 7 years of experience specializing in natural language processing and generative AI. She has implemented RAG systems in production, integrating Pinecone and GPT-4 for customer support chatbots. Emily holds a Master’s in Machine Learning from Stanford University and is certified in MLOps practices.

Liam Chen is a DevOps Engineer with expertise in container orchestration and cloud infrastructure. He has deployed large-s