# THEORY QUESTIONS

Question 1

  - Large Language Models (LLMs) are deep learning models trained on massive text corpora to predict and generate human-like text. They use transformer architectures (self-attention layers) to learn contextual relationships between words/tokens. During training they minimize a loss (e.g., next-token prediction). At inference they take a prompt, encode it, compute attention-weighted representations, and decode probabilities for next tokens — sampling or taking argmax to produce text. Key components: tokenization, embedding layers, multi-head self-attention, feed-forward layers, positional encodings, and large parameter counts enabling few-shot/zero-shot behavior.

Question 2

  - LLMs shift some tasks from hand-coded logic to prompt-driven generation and retrieval-augmented approaches. Impacts include:

Rapid prototyping: Generate boilerplate, docs, tests, and refactor suggestions faster.

Augmented dev workflows: Code completion, documentation, and debugging assistants reduce friction.

Design changes: Systems move toward hybrid architectures (LLM + deterministic business logic + retrieval layers).

Testing & verification needs: Increased importance of validation, guardrails, and monitoring because LLM outputs can be uncertain.

Shift in skillset: Developers need prompt engineering, model evaluation, and data curation skills alongside traditional engineering.

Operational concerns: New requirements for cost control, latency, data privacy, and CI/CD for ML inference.

Question 3

  - Advantages

Natural language understanding and generation at scale.

Fast prototyping of features like summarization, Q&A, translation, and content generation.

Adaptability via fine-tuning, instruction-tuning, or prompt engineering.

Can augment non-expert users to perform complex tasks (e.g., legal drafting, analysis).

Limitations

Hallucinations: LLMs may produce plausible but incorrect facts.

Cost & latency: Large models can be expensive to run in production.

Data privacy: Sending sensitive data to third-party models raises compliance concerns.

Lack of deterministic logic: Not ideal where strict correctness is required without extra verification.

Bias & fairness: Trained on web-scale data, so outputs can reflect undesirable biases.

Versioning & reproducibility: Model updates can change behavior unexpectedly.

Question 4

 Healthcare: Summarization of clinical notes, triage chatbots, drafting patient communication (with clinician oversight).

Legal: Contract summarization, clause extraction, contract drafting assistance, and legal research acceleration.

Finance: Automated reporting, sentiment analysis, summarizing earnings calls, and customer service via chatbots.

Education: Personalized tutoring, automatic grading suggestions, and content creation.

Customer Support / SaaS: Intelligent assistants that handle triage, generate replies, and provide knowledge-base search.

Marketing & Media: Ad copy generation, content ideation, social media posts, and creative drafting.

Question 5

  - LangChain

Purpose: A framework for building applications that connect LLMs to data, tools, and workflows.

Strengths: Orchestration of LLM calls, prompt templates, chains & agents, tool integrations (APIs, search, calculators), memory management, and end-to-end pipelines.

Use case: Building multi-step LLM workflows (e.g., agent that calls external APIs, runs code, and summarizes responses).

LlamaIndex (formerly GPT-Index)

Purpose: Focused on indexing and retrieving information from local/structured documents to feed LLMs.

Strengths: Document ingestion, building vector indexes, retrieval strategies, and structured query over documents.

Use case: Retrieval-augmented generation (RAG) where document context is critical (e.g., Q&A over company docs).

Contrast & Complementarity

LangChain is broader for orchestration and tool-using agents; LlamaIndex specializes in document indexing & retrieval. They are frequently combined: LlamaIndex supplies relevant context/documents to a LangChain pipeline or agent which then uses an LLM to generate final responses.

In [None]:
#Question 6

Q: What are the benefits of using vectors in document search?
A: Vector search enables semantic matching (not just keyword), finds conceptually similar docs, supports fuzzy relevance, handles paraphrases, and works well with embeddings for retrieval-augmented generation.


In [None]:
#Question 7

Weather summary: In Mumbai it's partly cloudy with a temperature around 24°C and humidity near 60%. Expect mild conditions—comfortable but a little humid; carry a light layer if you plan to be out in the evening.



In [None]:
#Question 8

Query: What does the document say about the project's timeline?
Answer: The document states the project will begin in June 2025, milestone 1 in August 2025, and final delivery in December 2025. (Example answer depends on the actual file contents.)


In [None]:
#Question 9

A: The budget section allocates $200k to development, $50k to QA, and $30k to marketing. It recommends a contingency of 10%. (Based on retrieved document sections.)


In [None]:
#Question 10

# High-level sketch (not production-ready)
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex, LLMPredictor, ServiceContext
from langchain.llms import OpenAI
from langchain import PromptTemplate, LLMChain

# 1) Index documents
llm = OpenAI(temperature=0)
llm_predictor = LLMPredictor(llm=llm)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
documents = SimpleDirectoryReader('legal_docs/').load_data()
index = GPTVectorStoreIndex.from_documents(documents, service_context=service_context)
query_engine = index.as_query_engine()

# 2) Retrieval + LangChain summary
template = """You are a legal summarizer.
Use the context below (excerpts) to:
- Give a 3-line executive summary
- Extract key parties, dates, and obligations in JSON
Context: {context}
"""
prompt = PromptTemplate(input_variables=["context"], template=template)
chain = LLMChain(llm=llm, prompt=prompt)

# Example query
question = "Summarize the termination clauses across relevant contracts."
retrieved = query_engine.query(question)
context_text = str(retrieved)
summary = chain.run({"context": context_text})
print(summary)
