<a href="https://colab.research.google.com/github/Aparnamol-KS/CodeCompanion-GroqAI/blob/main/GenAI_Bootcamp_Part_A.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# GenAI Bootcamp - LLMs, LangChain, LlamaIndex
Welcome to Day 1 of the GenAI Bootcamp! This notebook covers foundational concepts and hands-on exercises for building with LangChain and LlamaIndex.

**Topics Covered:**
- Introduction to LLMs
- LangChain basics with OpenAI
- LlamaIndex for document querying
- Mini project: Resume Q&A bot

## Introduction to LLMs
Large Language Models (LLMs) are deep learning models trained on massive corpora of text data.

**Popular LLMs:** GPT-4 (OpenAI), LLaMA (Meta), Mistral, Claude (Anthropic)

**Key Concepts:**
- Tokenization
- Attention mechanism
- In-context learning

## Environment Setup

In [None]:
# Install necessary packages
!pip install --quiet langchain llama-index openai tiktoken faiss-cpu PyPDF2
!pip install -q langchain-community langchainhub


# Set OpenAI API Key
import os
os.environ['OPENAI_API_KEY'] = API_KEY


In [None]:
from langchain.chat_models import ChatOpenAI

# Use gpt-3.5-turbo or gpt-4-turbo instead
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

response = llm.predict("Explain large language models in one sentence")
print(response)


### In the context of language models like GPT, temperature is a parameter that controls the randomness or creativity of the model’s responses.

🔥 **Understanding Temperature**
Range: Typically from **0.0 to 1.0** (can go up to 2.0 in some APIs, but rare).


**Low Temperature (e.g., 0.0 to 0.3):**

The model is more deterministic.

It chooses the most likely next word.

Responses are more focused, predictable, and reliable.

Ideal for: factual answers, summaries, code generation.

**High Temperature (e.g., 0.7 to 1.0):**

The model becomes more creative and diverse.

It explores less likely word choices.

Responses may be more engaging, but also less accurate.

Ideal for: storytelling, brainstorming, creative writing.

Example:

Prompt: "Write a tagline for a space travel company."

temperature = 0.2 →
"Experience space like never before." (safe, generic)

temperature = 0.9 →
"Leave gravity behind. Chase the stars with us." (more flair and creativity)

# **When to Use What?**

| Use Case            | Suggested Temperature |
| ------------------- | --------------------- |
| Factual QA          | 0.0 – 0.3             |
| Code generation     | 0.1 – 0.3             |
| Creative writing    | 0.7 – 1.0             |
| Brainstorming ideas | 0.6 – 0.9             |
| Business summaries  | 0.3 – 0.5             |


## LangChain - Basic Chatbot with OpenAI

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage

llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.7)

message = HumanMessage(content="What is LangChain and how is it used?")
messages = [message]
response = llm(messages)
print(response)

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema import HumanMessage

# Step 1: Define your template
template = "Explain the concept of {concept} in simple terms in a minimum of {min} and maximum of {max} lines."
prompt = ChatPromptTemplate.from_template(template)

# Step 2: Format the prompt with a value
messages = prompt.format_messages(concept="Neuron", min=5, max=10)

# Step 3: Run it through the model
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
response = llm(messages)

# Step 4: Output the result
print(response.content)


In [None]:
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

prompt_template = PromptTemplate(
    input_variables=["concept"],
    template="Explain the concept of {concept}"
)

chain = LLMChain(llm=llm, prompt=prompt_template)

print(chain.run({"concept": "autoencoder", "language": "English"}))


In [None]:

# Define a second prompt

second_prompt = PromptTemplate(
    input_variables=["ml_concept"],
    template="Turn the concept description of {ml_concept} and describe it to me using emojis",
)
chain_two = LLMChain(llm=llm, prompt=second_prompt)

In [None]:
from langchain.chains import SimpleSequentialChain
overall_chain = SimpleSequentialChain(chains=[chain, chain_two], verbose=True)

# Run the chain specifying only the input variable for the first chain.
explanation = overall_chain.run("cars")
print(explanation)

In [None]:
# Upload a PDF file (e.g., your resume)
from google.colab import files
uploaded = files.upload()

## LlamaIndex - Index and Query PDFs

In [None]:
# Install everything you need
!pip install -q llama-index langchain openai faiss-cpu tiktoken PyPDF2 langchain-community langchainhub


# LlamaIndex Components (0.10+)
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.readers.file import PDFReader

# Read PDF
reader = PDFReader()
documents = reader.load_data(file='/content/CV_APARNAMOLKS.pdf')

# Build index
index = VectorStoreIndex.from_documents(documents)

# Query the index
query_engine = index.as_query_engine()
response = query_engine.query("Summarize my technical skills and education.")
print(response)


## Mini Project: Resume Q&A Bot
**Goal:** Create a chatbot that answers questions based on your resume PDF.

**Steps:**
1. Upload your resume as `resume.pdf`
2. Use LlamaIndex to index the document
3. Query it using natural language

In [None]:
!pip install -q llama-index langchain openai faiss-cpu tiktoken PyPDF2 langchain-community langchainhub

from google.colab import files
from llama_index.readers.file import PDFReader
from llama_index.core import VectorStoreIndex
uploaded = files.upload()

# Extract the uploaded file name
filename = next(iter(uploaded))

reader = PDFReader()
documents = reader.load_data(file=filename)

# Build vector index
index = VectorStoreIndex.from_documents(documents)

# Query the document
query_engine = index.as_query_engine()
response = query_engine.query("Summarize my key skills and experience.")
print(response)


In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

prompt = PromptTemplate(input_variables=["topic"], template="Explain {topic} in simple terms.")
llm = ChatOpenAI()
chain = LLMChain(prompt=prompt, llm=llm)

print(chain.run("quantum computing"))


In [None]:
import openai


response = openai.images.generate(
    model="dall-e-3",
    prompt="I need a sign language image of saying I love you",
    size="1024x1024",
    quality="standard",  # or "hd" if you're using GPT-4 Pro
    n=1
)

# Extract and print image URL
image_url = response.data[0].url
print("Generated image URL:", image_url)


In [None]:
# Step 1: Install necessary packages and ffmpeg
!pip install --upgrade openai
!apt-get -y install ffmpeg


# Step 3: Imports
import openai
import requests
import time
from IPython.display import Video, display

openai.api_key = os.getenv("OPENAI_API_KEY")

# Step 4: Define prompts for each video frame
scene_prompts = [
    "A sunrise over a quiet mountain village, soft golden light",
    "A child flying a kite on a green hill under a bright blue sky",
    "A spaceship launching from a futuristic city at sunset",
    "A thunderstorm over the ocean with crashing waves",
    "A calm starry night in the desert with glowing cacti"
]

image_paths = []

# Step 5: Generate images using DALL·E 3 and save locally
for i, prompt in enumerate(scene_prompts):
    print(f"Generating frame {i+1} for prompt: {prompt}")
    response = openai.images.generate(
        model="dall-e-3",
        prompt=prompt,
        size="1024x1024",
        n=1
    )
    image_url = response.data[0].url

    # Download the image
    img_data = requests.get(image_url).content
    file_name = f"frame_{i:03d}.png"
    with open(file_name, "wb") as f:
        f.write(img_data)
    image_paths.append(file_name)

    # Wait a bit to avoid rate limits
    time.sleep(2)

print("All frames generated and saved.")

# Step 6: Create video from generated images using ffmpeg
print("Creating video from frames...")
!ffmpeg -framerate 1 -i frame_%03d.png -c:v libx264 -r 30 -pix_fmt yuv420p dalle_video.mp4

print("Video creation complete!")

# Step 7: Display the generated video
display(Video("dalle_video.mp4", embed=True))


#MINI-PROJECT - Resume Q&A bot


In [None]:
!pip install -q gradio groq faiss-cpu sentence-transformers pyPDF2 gradio numpy

In [None]:
import os
import gradio as gr
import faiss
import numpy as np
from PyPDF2 import PdfReader
from sentence_transformers import SentenceTransformer
from groq import Groq

# 2. Groq API Setup (use environment variable in production)
# GROQ_API_KEY = os.getenv("GROQ_API_KEY")
client = Groq(api_key="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")

# 3. Helper functions
def extract_text_from_pdf(pdf_file):
    reader = PdfReader(pdf_file)
    text = ""
    for page in reader.pages:
        if page.extract_text():
            text += page.extract_text() + "\n"
    return text

def split_into_chunks(text, chunk_size=500):
    words = text.split()
    return [" ".join(words[i:i+chunk_size]) for i in range(0, len(words), chunk_size)]

embedding_model = SentenceTransformer("all-MiniLM-L6-v2")

# Global variables to hold index and chunks
faiss_index = None
text_chunks = []

def process_pdf(pdf_file):
    global faiss_index, text_chunks
    raw_text = extract_text_from_pdf(pdf_file.name)
    text_chunks = split_into_chunks(raw_text)
    embeddings = embedding_model.encode(text_chunks)
    dimension = embeddings.shape[1]
    faiss_index = faiss.IndexFlatL2(dimension)
    faiss_index.add(np.array(embeddings))
    return "PDF processed and vector index created successfully."

def query_document(query_text, top_k=3):
    if faiss_index is None or not text_chunks:
        return "Please upload and process a PDF first."

    query_vector = embedding_model.encode([query_text])
    distances, indices = faiss_index.search(np.array(query_vector), top_k)
    context = "\n\n".join([text_chunks[i] for i in indices[0]])

    response = client.chat.completions.create(
        model="llama3-8b-8192",
        messages=[
            {"role": "system", "content": "You are an assistant that summarizes and analyzes documents."},
            {"role": "user", "content": f"{context}\n\nQuestion: {query_text}"}
        ]
    )
    return response.choices[0].message.content

# 4. Gradio UI
iface = gr.Interface(
    fn=lambda file, question: (
        process_pdf(file),         # status output
        query_document(question)   # answer output
    ),
    inputs=[
        gr.File(label="Upload your PDF"),
        gr.Textbox(label="Enter your question")
    ],
    outputs=[
        gr.Textbox(label="Status", interactive=False),
        gr.Textbox(label="Answer", interactive=False)
    ],
    title="Resume Q&A Bot",
    description="Upload a PDF and ask questions about its content."
)

iface.launch()
