# Retrieval-Augmented Generation (RAG) System  
## Case Study: How Apple Is Organized for Innovation

This notebook implements a Retrieval-Augmented Generation (RAG) system using a Harvard Business Review (HBR) article PDF as the knowledge source.

The system:
1. Extracts text from the PDF
2. Splits text into chunks
3. Converts chunks into embeddings
4. Stores embeddings in FAISS
5. Retrieves relevant chunks
6. Generates answers using a language model


In [None]:
!pip install sentence-transformers faiss-cpu transformers pypdf


Collecting faiss-cpu
  Downloading faiss_cpu-1.13.2-cp310-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (7.6 kB)
Collecting pypdf
  Downloading pypdf-6.7.0-py3-none-any.whl.metadata (7.1 kB)
Downloading faiss_cpu-1.13.2-cp310-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (23.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.8/23.8 MB[0m [31m36.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pypdf-6.7.0-py3-none-any.whl (330 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m330.6/330.6 kB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pypdf, faiss-cpu
Successfully installed faiss-cpu-1.13.2 pypdf-6.7.0


In [None]:
import numpy as np
import faiss
from pypdf import PdfReader
from sentence_transformers import SentenceTransformer
from transformers import pipeline


In [5]:
from google.colab import files
uploaded = files.upload()


Saving HBR_How_Apple_Is_Organized_For_Innovation-4.pdf to HBR_How_Apple_Is_Organized_For_Innovation-4.pdf


In [6]:
from pypdf import PdfReader

pdf_path = list(uploaded.keys())[0]

reader = PdfReader(pdf_path)

text = ""
for page in reader.pages:
    text += page.extract_text()

print("Total characters in document:", len(text))


Total characters in document: 36629


## Text Chunking Strategy

- Chunk Size: 500 characters
- Chunk Overlap: 100 characters

Reason:
This preserves context while preventing loss of information across boundaries.


In [7]:
chunk_size = 500
chunk_overlap = 100

chunks = []
start = 0

while start < len(text):
    end = start + chunk_size
    chunks.append(text[start:end])
    start += chunk_size - chunk_overlap

print("Total chunks created:", len(chunks))


Total chunks created: 92


## Embedding Model

Model Used: sentence-transformers/all-MiniLM-L6-v2

Reason:
- Lightweight
- Fast
- Good semantic similarity
- Suitable for CPU in Colab


In [8]:
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")

chunk_embeddings = embedding_model.encode(chunks)
chunk_embeddings = np.array(chunk_embeddings)

print("Embedding shape:", chunk_embeddings.shape)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]



config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Loading weights:   0%|          | 0/103 [00:00<?, ?it/s]

BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Embedding shape: (92, 384)


## Vector Database

Vector Store Used: FAISS

Reason:
- Fast similarity search
- Efficient for dense vector retrieval
- Works locally without external service


In [9]:
dimension = chunk_embeddings.shape[1]

index = faiss.IndexFlatL2(dimension)
index.add(chunk_embeddings)

print("FAISS index created with", index.ntotal, "vectors")


FAISS index created with 92 vectors


In [11]:
from transformers import pipeline

generator = pipeline(
    task="text-generation",
    model="google/flan-t5-base",
    max_new_tokens=200
)


model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

Loading weights:   0%|          | 0/282 [00:00<?, ?it/s]



generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

Passing `generation_config` together with generation-related arguments=({'max_new_tokens'}) is deprecated and will be removed in future versions. Please pass either a `generation_config` object OR all generation parameters explicitly, but not both.
The model 'T5ForConditionalGeneration' is not supported for text-generation. Supported models are ['PeftModelForCausalLM', 'AfmoeForCausalLM', 'ApertusForCausalLM', 'ArceeForCausalLM', 'AriaTextForCausalLM', 'BambaForCausalLM', 'BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BitNetForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'BltForCausalLM', 'CamembertForCausalLM', 'LlamaForCausalLM', 'CodeGenForCausalLM', 'CohereForCausalLM', 'Cohere2ForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'CwmForCausalLM', 'Data2VecTextForCausalLM', 'DbrxForCausalLM', 'DeepseekV2ForCausalLM', 'DeepseekV3ForCausalLM', 'DiffLlamaF

In [13]:
def rag_query(question, top_k=3):

    question_embedding = embedding_model.encode([question])

    distances, indices = index.search(np.array(question_embedding), top_k)

    retrieved_chunks = [chunks[i] for i in indices[0]]

    context = " ".join(retrieved_chunks)

    prompt = f"""
    Answer the question based only on the context below.

    Context:
    {context}

    Question:
    {question}
    """

    answer = generator(prompt)[0]["generated_text"]

    return answer


## Testing the RAG System

Below are three test queries used to evaluate the system.


In [14]:
rag_query("How is Apple organized for innovation?")


Both `max_new_tokens` (=200) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


'\n    Answer the question based only on the context below.\n\n    Context:\n    REPRINT R2006F\nPUBLISHED IN HBR\nNOVEMBER–DECEMBER 2020\nARTICLEORGANIZATIONAL CULTURE\nHow Apple Is \nOrganized  \nfor Innovation\nIt’s about experts leading experts. \nby Joel M. Podolny and Morten T. Hansen\nThis article is made available to you with compliments of Apple Inc for your personal use. Further posting, copying or distribution is not permitted.2\nHarvard Business Review\nNovember–December 2020\nThis article is made available to you with compliments of Apple Inc for your personal use. Further  \nThis article is made available to you with compliments of Apple Inc for your personal use. Further posting, copying or distribution is not permitted.PHOTOGRAPHER\u2002MIKAEL JANSSON\nHow Apple Is  Organized  for InnovationIt’s about experts  leading experts.\nORGANIZATIONAL \nCULTURE\nJoel M. \nPodolny\nDean, Apple \nUniversity\nMorten T. \nHansen\nFaculty, Apple \nUniversity\nAUTHORS\nFOR ARTICLE REP

In [15]:
rag_query("What is the role of functional structure at Apple?")


Both `max_new_tokens` (=200) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


'\n    Answer the question based only on the context below.\n\n    Context:\n    al \napproach is not necessary and that the functional structure \nmay benefit companies facing tremendous technological \nchange and industry upheaval.\nApple’s commitment to a functional organization does \nnot mean that its structure has remained static. As the \nimportance of artificial intelligence and other new areas has \nincreased, that structure has changed. Here we discuss the \ninnovation benefits and leadership challenges of Apple’s \ndistinctive and ever-evolving organizational model, which  LTURE\nCOPYRIGHT © 2020 HARVARD BUSINESS SCHOOL PUBLISHING CORPORATION. ALL RIGHTS RESERVED.\n4\nHarvard Business Review\nNovember–December 2020\nThis article is made available to you with compliments of Apple Inc for your personal use. Further posting, copying or distribution is not permitted.WHY A FUNCTIONAL ORGANIZATION?\nApple’s main purpose is to create products that enrich \npeople’s daily lives. Tha

In [16]:
rag_query("What leadership characteristics does Apple emphasize?")


Both `max_new_tokens` (=200) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


'\n    Answer the question based only on the context below.\n\n    Context:\n    xpertise and \ndecision rights.\nThus the link between how Apple is organized and \nthe type of innovations it produces is clear. As Chandler \nfamously argued, “structure follows strategy”—even though \nApple doesn’t use the structure that he anticipated large \nmultinationals would adopt.\nNow let’s turn to the leadership model underlying Apple’s \nstructure.\nTHREE LEADERSHIP CHARACTERISTICS\nEver since Steve Jobs implemented the functional organi-\nzation, Apple’s managers at every level, from senior v above: experts leading experts, immersion in the details, \nand collaborative debate. We have codified these adaptions \nin what we call the discretionary leadership model, which \nwe have incorporated into a new educational program for \nApple’s VPs and directors. Its purpose is to address the chal-\nlenge of getting this leadership approach to drive innovation \nin all areas of the company, not just pr

## Future Improvements

- Semantic chunking instead of fixed-size chunking
- Hybrid search (keyword + vector search)
- Reranking using cross-encoders
- Metadata filtering (page numbers)
- Streamlit UI integration
- Use larger LLMs (Llama, GPT models)


## Project Summary

This project demonstrates a complete Retrieval-Augmented Generation (RAG) system using an HBR article as a knowledge source.

Tools Used:
- Python
- Sentence Transformers
- FAISS
- HuggingFace Transformers
- Google Colab

The system retrieves relevant text from the document and generates context-aware answers.
