# Generative AI and RAG

In [3]:
'''
Title: Generative AI and RAG
Name: Irene Gichana
Date: 1 August 2025

'''
# Import Libraries
!pip install langchain langchain-community transformers sentence-transformers faiss-cpu pypdf

from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, AutoModelForCausalLM, pipeline



## Upload Document


In [4]:
# Load PDF
loader = PyPDFLoader("CV.pdf")
docs = loader.load()

## Splitting the document into chunks
Used RecursiveCharacterTextSplitter with a sensible chunk size and overlap.

In [5]:
# Split Documents into Chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(docs)

## Creating embeddings
Used HuggingFaceEmbeddings and FAISS to store vectors.

In [6]:
# Create Embeddings and Vector Store
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(chunks, embeddings)
retriever = vectorstore.as_retriever()


  embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

## Adding Large Language Model(LLM)
Integrated google/flan-t5-large via Hugging Face Transformers.

In [7]:
# LLM
model_name = "google/flan-t5-large"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
flan_pipeline = pipeline("text2text-generation", model=model, tokenizer=tokenizer)

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

config.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Device set to use cpu


## Querying using RAG
Implemented a retrieval-augmented generation pipeline with context-based prompting.

In [9]:
# RAG
def query_rag(question):
    relevant_docs = retriever.get_relevant_documents(question)
    context = "\n".join([doc.page_content for doc in relevant_docs])
    prompt = f"Answer the question using only the context:\n\nContext:\n{context}\n\nQuestion: {question}\n\nAnswer:"
    response = flan_pipeline(
        prompt,
        max_new_tokens=200,
        temperature=0.9,      # Creativity control (lower = deterministic, higher = more diverse)
        top_k=50,             # Only sample from the top-k most likely tokens
        top_p=0.9,            # Nucleus sampling: only sample from tokens with cumulative prob <= top_p
        do_sample=True        # Enables sampling (required for temperature/top-k/top-p to work)
    )
    return response[0]['generated_text']

print(query_rag("Summarize my CV in 3 sentences."))
print(query_rag("What are my main skills and competencies?"))
print(query_rag("List the programming languages I know."))
print(query_rag("What job roles have I held?"))
print(query_rag("Generate a professional bio based on my CV."))
print(query_rag("What are my top achievements?"))


Results-driven Data Analyst with 2+ years of experience in data analysis, reporting, and AI-driven solutions. Strong foundation in statistics, Python, R, SQL, and machine learning, complemented by hands- on work with generative AI. Committed to delivering clear, data-driven solutions in fast-paced, data-centric environments.
Programming & Analysis: Python, SQL, Excel, NumPy, Pandas, Scikit-learn Data Visualization / Business Intelligence: Power BI, Tableau Databases & Tools: MySQL, PostgreSQL, Oracle, Git Machine Learning and AI: Regression, Classification, Model Evaluation, LLaMA, Retrieval-Augmented Generation (RAG) Extract, Transform, Load (ETL): Data Cleaning, Feature Engineering Soft Skills: Communication, Collaboration, Attention to Detail
Python, SQL, Excel, NumPy, Pandas, Scikit-learn
Data Analyst
Shujaa IRENE GICHANA is a data analyst with experience in data analysis, reporting, and AI-driven solutions. Strong foundation in statistics, Python, R, SQL, and machine learning, com